Support Vector Machine (SVM) Algorithm
Support Vector Machine (SVM) Algorithm
SVM algorithms are very effective as we try to find the maximum separating
hyperplane between the different classes available in the target feature.
Let’s consider two independent variables x1, x2, and one dependent variable
which is either a blue circle or a red circle.
Linearly Separable Data points
From the figure above it’s very clear that there are multiple lines (our
hyperplane here is a line because we are considering only two input features
x1, x2) that segregate our data points or do a classification between red and
blue circles. So how do we choose the best line or in general the best
hyperplane that segregates our data points?
One reasonable choice as the best hyperplane is the one that represents the
largest separation or margin between the two classes.
Here we have one blue ball in the boundary of the red ball. So how does
SVM classify the data? It’s simple! The blue ball in the boundary of red ones
is an outlier of blue balls. The SVM algorithm has the characteristics to
ignore the outlier and finds the best hyperplane that maximizes the margin.
SVM is robust to outliers.
So in this type of data point what SVM does is, finds the maximum margin as
done with previous data sets along with that it adds a penalty each time a
point crosses the margin. So the margins in these types of cases are called
soft margins. When there is a soft margin to the data set, the SVM tries to
minimize (1/margin+∧(∑penalty)). Hinge loss is a commonly used penalty. If
no violations no hinge loss.If violations hinge loss proportional to the
distance of violation.
Till now, we were talking about linearly separable data(the group of blue
balls and red balls are separable by a straight line/linear line). What to do if
data are not linearly separable?
Say, our data is shown in the figure above. SVM solves this by creating a new
variable using a kernel. We call a point xi on the line and we create a new
variable yi as a function of distance from origin o.so if we plot this we get
something like as shown below
In this case, the new variable y is created as a function of distance from the
origin. A non-linear function that creates a new variable is referred to as a
kernel.
wT x + b = 0
The vector W represents the normal vector to the hyperplane. i.e the
direction perpendicular to the hyperplane. The parameter b in the equation
represents the offset or distance of the hyperplane from the origin along the
normal vector w.
The distance between a data point x_i and the decision boundary can be
calculated as:
w T xi +b
di =
∣∣w∣∣
where ||w|| represents the Euclidean norm of the weight vector w. Euclidean
norm of the normal vector W
Optimization:
For Hard margin linear SVM classifier:
minimize 12 w T w = minimize 12 ∥w ∥2
w,b W ,b
subject to yi (w T xi + b) ≥ 1 f or i = 1, 2, 3, ⋯ , m
The target variable or label for the ith training instance is denoted by the
symbol ti in this statement. And ti=-1 for negative occurrences (when yi= 0)
and ti=1positive instances (when yi = 1) respectively. Because we require the
decision boundary that satisfy the constraint: ti (w T xi + b) ≥ 1
minimize 12 w T w + C ∑m
i=1 ζi
w,b
subject to yi (w T xi + b) ≥ 1 − ζi and ζi ≥ 0 f or i = 1, 2, 3, ⋯ , m
1
maximize
α
:
2
∑ ∑ αi αj ti tj K(xi , xj ) − ∑ αi
where,
w = ∑ αi ti K (xi , x) + b
i→m
ti (w T xi − b) = 1 ⟺ b = w T xi − ti
Linear SVM: Linear SVMs use a linear decision boundary to separate the
data points of different classes. When the data can be precisely linearly
separated, linear SVMs are very suitable. This means that a single straight
line (in 2D) or a hyperplane (in higher dimensions) can entirely divide the
data points into their respective classes. A hyperplane that maximizes the
margin between the classes is the decision boundary.
Non-Linear SVM: Non-Linear SVM can be used to classify data when it
cannot be separated into two classes by a straight line (in the case of 2D).
By using kernel functions, nonlinear SVMs can handle nonlinearly
separable data. The original input data is transformed by these kernel
functions into a higher-dimensional feature space, where the data points
can be linearly separated. A linear SVM is used to locate a nonlinear
decision boundary in this modified space.
The SVM kernel is a function that takes low-dimensional input space and
transforms it into higher-dimensional space, ie it converts nonseparable
problems to separable problems. It is mostly useful in non-linear separation
problems. Simply put the kernel, does some extremely complex data
transformations and then finds out the process to separate the data based on
the labels or outputs defined.
Linear : K(w, b) = w T x + b
Polynomial : K(w, x) = (γw T x + b)N
Gaussian RBF: K(w, x) = exp(−γ∣∣xi − xj ∣∣n
Advantages of SVM
Python
Output: