0% found this document useful (0 votes)

6 views66 pages

Lecture 7 - Kernels and Support Vector

The document discusses kernels and support vector machines, focusing on concepts such as maximal margin classifiers, support vector classifiers, and the formulation of these classifiers. It explains the geometric properties of hyperplanes and the importance of support vectors in determining the decision boundary. Additionally, it addresses the limitations of maximal margin classifiers and introduces the concept of slack variables to create a more robust support vector classifier.

Uploaded by

trol.man890

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views66 pages

Lecture 7 - Kernels and Support Vector

Uploaded by

trol.man890

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

Kernels and support vector machines

2DV516/2DT916

Jonas Nordqvist

jonas.nordqvist@lnu.se

May 16, 2025

Department of Mathematics
Kernels and support vector machines 1(41)
Agenda

▶ Maximal margin classifier

▶ Support vector classifier
▶ Kernels
▶ Support vector machines
▶ Kernel Ridge Regression
Reading Instructions
Chapter 8

Department of Mathematics
Kernels and support vector machines 2(41)
Example: an ill-defined problem
Find the separating hyperplane

But first... what is a hyperplane?

Department of Mathematics
Kernels and support vector machines 3(41)
Hyperplanes

Hyperplanes are a geometrical structure, in particular a subspace of dimension

one less than its ambient space, i.e. the space in which the hyperplane is defined.

Department of Mathematics
Kernels and support vector machines 4(41)
Hyperplanes

Hyperplanes are a geometrical structure, in particular a subspace of dimension

one less than its ambient space, i.e. the space in which the hyperplane is defined.

If the ambient space is of dimension:

▶ 1 = ) a hyperplane is a dot.
▶ 2 =) a hyperplane is a line.
▶ 3 =) a hyperplane is a plane.
▶ 4 =) a hyperplane is a 3-dimensional plane
▶ p =) the hyperplane is a (p 1)-dimensional plane

Department of Mathematics
Kernels and support vector machines 4(41)
Hyperplanes

Hyperplanes are a geometrical structure, in particular a subspace of dimension

one less than its ambient space, i.e. the space in which the hyperplane is defined.

If the ambient space is of dimension:

▶ 1 = ) a hyperplane is a dot.
▶ 2 =) a hyperplane is a line.
▶ 3 =) a hyperplane is a plane.
▶ 4 =) a hyperplane is a 3-dimensional plane
▶ p =) the hyperplane is a (p 1)-dimensional plane

The hyperplane always parts the space in two disjoint parts, for example above or
below a line.

Given p points in the ambient space a hyperplane through these points are unique
up to a constant, e.g. 2Y + 6X + 4 = 0 and Y + 3X + 2 = 0.

Department of Mathematics
Kernels and support vector machines 4(41)
Hyperplanes

A hyperplane in p-dimensional ambient space can be described by an equation of

the form
0 + 1 X1 + 2 X2 +
+ p Xp = 0: (1)
Any point x which lies in the hyperplane satisfies (1).

There are essentially three scenarios for a point x 0 = (x10 ; x20 ; : : : ; xp0 ), either
▶ x 0 lies in the hyperplane and

0 + 0 + +
1 x1
0 = 0;
p xp

▶ x 0 lies ‘above’ the hyperplane and

0 + 0 + +
1 x1
0 > 0;
p xp

▶ or x 0 lies ‘below’ the hyperplane and

0 + 0 + +
1 x1
0 < 0:
p xp

Department of Mathematics
Kernels and support vector machines 5(41)
Geometric motivation

Let be a hyperplane in a p-dimensional space, and put := ( 1 ; : : : ; p )⊺ . For

simplicity we assume 0 := 0, then the equation of is given by

: 1 X1 + + p Xp = 0:

Department of Mathematics
Kernels and support vector machines 6(41)
Geometric motivation

Let be a hyperplane in a p-dimensional space, and put := ( 1 ; : : : ; p )⊺ . For

simplicity we assume 0 := 0, then the equation of is given by

: 1 X1 + + p Xp = 0:
Assume that x = (x1 ; : : : ; xp ) lies ‘above’ then we will show that

0 + 1 x1 + + p xp > 0:

Recall that since 0 = 0 we have 0 + 1 x1 + + p xp = (x )T .

Department of Mathematics
Kernels and support vector machines 6(41)
Geometric motivation

Let be a hyperplane in a p-dimensional space, and put := ( 1 ; : : : ; p )⊺ . For

simplicity we assume 0 := 0, then the equation of is given by

: 1 X1 + + p Xp = 0:
Assume that x = (x1 ; : : : ; xp ) lies ‘above’ then we will show that

0 + 1 x1 + + p xp > 0:

Recall that since 0 = 0 we have 0 + 1 x1 + + p xp = (x )T .

By the definition of the scalar product we have

(x )T = jjx jj jj jj cos();
where is the angle between x and . The sign of this quantity is determined
completely by cos(). Thus, (x )T > 0 if and only if 2 2 .

Department of Mathematics
Kernels and support vector machines 6(41)
Geometric motivation

x

2 2

Department of Mathematics
Kernels and support vector machines 7(41)
Back to the problem
We want to find a hyperplane that separates our data.

Is this always possible given any data?

Department of Mathematics
Kernels and support vector machines 8(41)
Back to the problem
We want to find a hyperplane that separates our data.

Is this always possible given any data? No!

Department of Mathematics
Kernels and support vector machines 8(41)
Back to the problem
We want to find a hyperplane that separates our data.

Is this always possible given any data? No! Why is the problem ill-defined?

Department of Mathematics
Kernels and support vector machines 8(41)
Back to the problem
We want to find a hyperplane that separates our data.

Is this always possible given any data? No! Why is the problem ill-defined? No
unique solution!
Department of Mathematics
Kernels and support vector machines 9(41)
Maximal margin classifier
We want to choose the hyperplane which separates the data, and which has the
largest margin (or cushion or slab) separating the two classes.

Note that there are only three points contributing to the computation of the slab,
the ones on the margin. These points are called support vectors.

Department of Mathematics
Kernels and support vector machines 10(41)
Formulating the (hard) problem

Remark
We will consider the binary classification case. For this problem it is convenient
to use 1 for positive and 1 for negatives labels in y as this implies

yi ( 0 + 1 xi 1 + + p xip ) 0;
for all 1 i n.
Denote by M the distance from the hyperplane to the two classes.1
The main objective is the following

max
; 1 ;:::;
M (2)
0 p

X
p

subject to 2
i = jj jj = 1 (3)
i =1
yi ( 0 + 1 xi 1 + + p xip ) M: (4)

1
By distance to a class we mean the shortest distance from any point in the class to the
hyperplane.
Department of Mathematics
Kernels and support vector machines 11(41)
Distance formula

Lemma
The distance between the point xi and the hyperplane is given by

jj jj yi ( 0 +
1
1 xi 1 + + p xip ):

Department of Mathematics
Kernels and support vector machines 12(41)
Distance formula

Lemma
The distance between the point xi and the hyperplane is given by

jj jj yi ( 0 +
1
1 xi 1 + + p xip ):

A hyperplane with the equation

0 + 1 X1 + + p Xp = 0; (5)

has normal = ( 1; : : : ; p ) ⊺.

Hence, the shortest path from a point xi to the hyperplane goes along the line
described by and the point xi . So, the equation of the line is given by xi + t ,
2
t R.

The line satisfies (5) in exactly one point

0 + 1 (xi 1 + t 1 ) + + p (xip + t p ) = 0

Department of Mathematics
Kernels and support vector machines 12(41)
Distance formula

Solving for t yields

= 0 + + +
1 xi 1 p xip
= jj 1jj2 ( 0 + + + ):
t 2
1 + + p2 1 xi 1 p xip

Denote by x the point in the hyperplane which is the intersection between the
hyperplane and the line. Then the smallest distance between xi and x and thus
the plane is given by
x xi = t jj =t ; jj jj jj j jjj jj
and we obtain

jt jjj jj = jj 1 jj j 0 + 1 xi 1 + + p xip j = jj 1 jj yi ( 0 + 1 xi 1 + + p xip ):

Department of Mathematics
Kernels and support vector machines 13(41)
Reformulating the problem
Denote by := ( 1 ; : : : ; p ), and
p
jj jj = 2
1 + + p2 :
The distance from the hyperplane to any point xi is given by yi (xi + 0 ), by (3),
⊺

and in particular if jj jj
is no longer necessarily equal to 1 we have

jj jj yi (xi + 0 ) M () yi (xi + 0 ) M jj jj:

1 ⊺ ⊺

Put M = 1=jj jj, and hence maximizing the margin M implies minimizing jj jj.
This is further equivalent to min jj jj2 . So, our problem can instead be formulated
as

min 21 jj jj2 ; subject to yi ( 0 + 1 xi 1 + + p xip ) 1;

or equivalently
min 12 ⊺
; subject to yi ( 0 + ⊺
xi ) 1;

Department of Mathematics
Kernels and support vector machines 14(41)
Reformulating the problem
Denote by := ( 1 ; : : : ; p ), and
p
jj jj = 2
1 + + p2 :
The distance from the hyperplane to any point xi is given by yi (xi + 0 ), by (3),
⊺

and in particular if jj jj
is no longer necessarily equal to 1 we have

jj jj yi (xi + 0 ) M () yi (xi + 0 ) M jj jj:

1 ⊺ ⊺

Put M = 1=jj jj, and hence maximizing the margin M implies minimizing jj jj.
This is further equivalent to min jj jj2 . So, our problem can instead be formulated
as

min 21 jj jj2 ; subject to yi ( 0 + 1 xi 1 + + p xip ) 1;

or equivalently
min 12 ⊺
; subject to yi ( 0 + ⊺
xi ) 1;
Note that the points which give equality in the above constraint are the support
vectors.

Department of Mathematics
Kernels and support vector machines 14(41)
Maximal margin classifier is non-robust

The maximal margin classifier has good performance on very special problems,
but it is very non-robust. Here this means: minor changes in the input data, may
yield major changes in the decision boundary.

Department of Mathematics
Kernels and support vector machines 15(41)
Decision boundary examples

Department of Mathematics
Kernels and support vector machines 16(41)
Soften the margin

A natural extension of the maximal margin classifier is the support vector

classifier, which allows for some violation of the margin, but in most cases
instances are on the correct side of the margin.

Department of Mathematics
Kernels and support vector machines 17(41)
Formulating the support vector classifier