Unit II 2.2 ML Kernel Machines SVM
Unit II 2.2 ML Kernel Machines SVM
(20BT60501)
COURSE DESCRIPTION:
Concept learning, General to specific ordering, Decision tree
learning, Support vector machine, Artificial neural networks,
Multilayer neural networks, Bayesian learning, Instance based
learning, reinforcement learning.
Subject :MACHINE LEARNING -(20BT60501)
Prepared By:
Dr.J.Avanija
Professor
Dept. of CSE
Sree Vidyanikethan Engineering College
Tirupati.
Unit II – DECISION TREE LEARNING AND KERNEL MACHINES
4
Kernel Methods in Machine Learning
5
Kernel Methods in Machine Learning
Polynomial Kernel
Following is the formula for the polynomial kernel:
8
Kernels in SVM
Sigmoid Kernel
Can be used as Neural networks:
9
Kernels in SVM
10
Kernels in SVM
RBF Kernel
11
Support Vector Machines
A support vector machine (SVM) is machine learning algorithm
that analyzes data for classification and regression analysis.
SVM is a supervised learning method that looks at data and
sorts it into one of two categories.
It is trained with a series of data already classified into two
categories, building the model as it is initially trained. The task of
an SVM algorithm is to determine which category a new data point
belongs in.
SVM a kind of non-binary linear classifier.
12
Support Vector Machines
Important Terminologies
Hyperplane
Support Vectors
Marginal Distance
Linear Separable
Non-linear Separable
13
Support Vector Machines
Hyperplane
Marginal
Distance
Support
Vectors
14
Support Vector Machines
Applications of SVM
Text and hypertext classification
Image classification
Recognizing handwritten characters
Biological sciences, including protein classification
15
Support Vector Machines
SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine
The data points or vectors that are the closest to the hyperplane and which
affect the position of the hyperplane are termed as Support Vector.
Two different categories that are classified using a decision boundary or
hyperplane:
16
Support Vector Machines
The red coloured dashed line is the optimal hyper plane. The green
coloured dashed lines define the boundary for each class. And the
data points with green coloured thick outline that are on the
boundary of the class are called support vectors. Hence, the
name Support Vector Machine.
17
Support vectors used to determine optimal hyperplane.
Support Vector Machines
SVM can be of two types:
Linear SVM: Linear SVM is used for linearly separable data, which means if
a dataset can be classified into two classes by using a single straight
line, then such data is termed as linearly separable data, and classifier is
used called as Linear SVM classifier.
18
Support Vector Machines
19
Support Vector Machines
Mathematical Modeling:
Given, training set {(Xᵢ,Yᵢ) where i=1,2,3,…,n}, Xᵢ ∈ ℜᵐ, Yᵢ ∈ {+1,-1}.
Here, Xᵢ is the feature vector for the iᵗʰ data point and Yᵢ is the label for
the iᵗʰ data point. The label can be either ‘+1’ for positive class or ‘-1’ for
negative class. The value ‘1’ is taken for the mathematical convenience.
Wᵢ be a vector perpendicular to the decision boundary (the optimal hyper
plane) and Xᵢ be an unknown vector. Then the projection of Xᵢ vector on
the unit vector of Wᵢ will determine if that unknown point belongs to
positive class or negative class
20
Support Vector Machines
Mathematical Modeling:
Y= WᵗXᵢ + b – Equation of hyperplane
The dot product between two vector W and X is same as matrix
multiplication between Wᵗ and X.
21
Support Vector Machines
Mathematical Modeling:
Let, X⁺ be a support vector under positive class and X⁻ be a support vector
under negative class.Then, WX⁺ + b =1 ⇒ WX⁺ = 1 - b. Similarly, WX⁻ + b =-
1 ⇒ WX⁻ = -1 - b. Then, the projection of the vector (X⁺ — X⁻) on the unit
vector of W vector gives the width of the separation gap or the margin between
the support vectors of the two classes. The width of the margin is given by:
22
Support Vector Machines
Mathematical Modeling:
The objective of SVM is to maximise the width of the separation gap. That
means to maximise 2/||W|| which is same as minimising||W|| which is same
as minimising ||W||² and which is same as minimising (1/2)||W||² and the
same thing can be written as (1/2)WᵗW.
23
Support Vector Machines
Hard Margin and Soft Margin
Hard margin SVM, used for linear separable data
Soft margin SVM used for non-linear separable data.
24
Support Vector Machines
•
Soft Margin Constraints
25
Support Vector Machines
Mathematical Modeling:
In real scenarios, the data are not strictly linearly separable. Thus, the problem is
modified by introducing the slack variables ‘ξ’ and a penalty term ‘C’.
Here, ‘C’ is a kind of regularisation parameter.
slack variables ‘ξ’ which is the distance between the data point and the
margin of the class from the other side
If ξᵢ for a data point is less then the mistake is less bad and C*ξᵢ will be less.
f ξᵢ for a data point is high then the mistake is more bad
26
Support Vector Machines
Mathematical Modeling:
27
SVM for Classification
Hinge Loss
The hinge loss is a specific type of cost function that incorporates a margin or
distance from the classification boundary into the cost calculation.
The hinge loss increases linearly.
Associated with soft-margin support vector machines.
The distance from the hyperplane can be regarded as a measure of confidence.
28
SVM for Classification
Hinge Loss
Given input features “X” and target “y”, the goal of the SVM algorithm is to
predict a value ( ‘predicted y’) close to the target (‘actual y’) for each
observation.
Equation that could calculate ‘predicted y’ depends on some weighted values of
input X. It can be written as :
Decision boundary
classifying positive
and negative
points.
Points marked in
red are
misclassified
30
SVM for Classification
Hinge Loss
Plot the yf(x) against the loss function, For points in yf(x) > 0, assign
‘0’ loss.
For points where yf(x) < 0, assign
a loss of ‘1’
31
SVM for Classification
X is a positive sample. Penalty = 1 – t.y where t is actual output,
y is predicted output by
SVM
Positiv
e
Plane
Negativ
e
Plane
Penalty = Penalty = 0 to 1
0
32
SVMs for Classification
2 𝑖 =1 Negative
Plane
33
SVM for Classification
Hinge Loss
Hinge loss = [0, 1- yf(x)].
For yf(x) ≥ 1, hinge loss is ‘0’.
For yf(x) < 1, then hinge loss increases massively.
If yf(x) increases with every misclassified point the upper bound of
hinge loss {1- yf(x)} also increases exponentially.
35
SVM for Classification
Large Margin Principle
The margin is the distance between the two boundaries. The support vectors are
the instances at the boundaries (when WᵗX = 1 or -1) • Or within the boundaries,
if not linearly separable
The goal of SVMs is to learn the boundaries to make the margin as large as
possible (Large Margin Classification)
The size of the margin is: 2 / ||w||
||w|| is the L2 norm of the weight vector
Learning goal:
Maximize 2 / ||w||, subject to the constraints that all instances are correctly
classified
Turn it into minimization problem by taking the inverse: ½ ||w||
Can also square the L2 norm (makes the calculus easier), just like with L2
regularization: ½ ||w||2
36
SVM for Classification
Choosing C
C is chosen from cross validation
Support Vector Machine always looks for
-Setting a larger margin
-lowering misclassification rate
Increase in margin, leads to high misclassfication rate
Decrease in margin, leads to low misclassfication rate
Priority should be getting a lower misclassfication rate
Can be achieved by parameter C
37
SVM for Classification
Cross Validation- k-fold CV
39
SVM for Classification
SVMs for Multiclass Classification
40
SVM for Classification
SVMs for Multiclass Classification
• in which we train 𝐶 binary classifiers, 𝑓𝑐(𝐱), where the data from class 𝑐
is treated as positive, and the data from all the other classes is treated as
negative
42
SVM for Classification
SVMs for Multiclass Classification
43
SVM for Regression
Regression analysis consists of a set of machine learning methods
that allow us to predict a continuous outcome variable (y) based
on the value of one or multiple predictor variables (x).
Goal of regression model is to build a mathematical equation that
defines y as a function of the x variables.
This equation can be used to predict the outcome (y) on the basis of
new values of the predictor variables (x).
t can be utilized to assess the strength of the relationship between
variables and for modeling the future relationship between them.
44
SVM for Regression
• This means that any point lying inside an 𝜖-tube around the
prediction is not penalized
46
SVM for Regression
• Mean Square Error
• Huber Loss
47
SVMs for Regression
Huber Loss:
48
SVMs for Regression
• The corresponding objective function
49
SVMs Pros and Cons
– It works really well with a clear margin of separation.
– It is effective in high dimensional spaces.
– It is effective in cases where number of dimensions > number of samples
– It uses a subset of training points in the decision function (called support
vectors), so it is also memory efficient.
– This classifier is heavily reliant on the support vectors and changes as
support vectors change. As a result, they tend to overfit. Hence kernels
functions and regularization is important.
– It does not provide probability estimates.
– It doesn’t perform well with large datasets because the required training
time is higher.
– It also doesn’t perform very well, when the data set has more noise i.e.
target classes are overlapping.
50