0% found this document useful (0 votes)

7 views46 pages

ML TCS Lecture 15

The document discusses Support Vector Machines (SVMs) for binary classification, detailing the process of selecting hypotheses from a hypothesis set and the formulation of the generalization error. It explains the linear classification approach, the concept of maximizing the margin for a hyperplane, and introduces the non-separable case where slack variables are used to accommodate outliers. Additionally, it covers the optimization problem associated with SVMs, including the use of Lagrange multipliers and the dual formulation.

Uploaded by

nehofo2338

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views46 pages

ML TCS Lecture 15

Uploaded by

nehofo2338

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Working with Support Vector Machines (Lecture 15)

Machine Learning for Real-World Applications

Date 20-Sep-2021

Copyright © 2021 Tata Consultancy Services Limited

1 External
Binary Classification

• Consider an input space ⊆ ℝN with N ≥ 1, and the output or target space is

= {−1, + 1}
• Let f : → be the target function.

• Given a hypothesis set H of functions mapping to

• The learner uses a training set and selects a hypothesis from H

H
Training set Learner Selects
a hypothesis
𝒳
𝒴
𝒳𝒴
𝒳
𝒴
2 External
Input to the binary classification task

• Consider an input space ⊆ ℝN with N ≥ 1, and the output or target space is

= {−1, + 1}
• Let f : → be the target function.

• The learner receives a training set S of m examples drawn i.i.d. from according
to some unknown distribution D

• S = {(x1, y1), (x2, y2), …, (xm, ym)} ∈ ( × )m

with yi = f (xi) for all i ∈ [1,m].
𝒳
𝒴
𝒳
𝒳
𝒴
𝒳
𝒴
3 External
Choosing a hypothesis for binary classification

• The learner chooses a hypothesis h ∈ H, a binary classifier, with small

generalisation error.

• The generalisation error is formulated as the error rate on the actual data
generating distribution D (which is unknown).
P (h(x) ≠ f (x))
x∼D

4 External
Linear Classification

• Different hypothesis sets H can be selected for this task.

• The simplest hypothesis class is that of linear classifiers, or hyperplanes
H = {x ↦ sign(w ⋅ x + b) : w ∈ ℝN , b ∈ ℝ}

• The hypothesis h(x) ≡ sign(w ⋅ x + b) labels the points falling on one side of the
hyperplane (w ⋅ x + b) as positive and points on the other side as negative.

5 External
Linear Classification

w⋅x+b=0

6 External
SVMs − The separable case

7 External
SVMs − The separable case

• The general equation of a hyperplane in ℝN is

w⋅x+b =0
where w ∈ ℝN x ∈ ℝN and scalar b ∈ ℝ
• Multiplying this equation with a scalar does not affect the hyperplane.

min | w . x + b | = 1
• So we can scale w and b appropriately such that (x,y)∈S

• We define this representation of the hyperplane (w, b) as the canonical

hyperplane. For a canonical hyperplane | w ⋅ xi + b | ≥ 1 ∀i ∈ [1,m]
8 External
SVMs − The separable case

• The distance of any point x0 ∈ ℝN to the hyperplane is given by

| w ⋅ x0 + b |
∥w∥

• Thus, for a canonical hyperplane, the margin ρ is given by

|w ⋅ x + b| 1
ρ = min =
(x,y)∈S ∥w∥ ∥w∥

9 External
SVMs − The separable case

• When a training point xi is correctly classified by a hyperplane defined by (w, b),

then (w ⋅ xi + b) has the same sign as that of yi.

• This implies that for correct classification yi(w ⋅ xi + b) ≥ 1

1
• Maximizing the margin ρ = ∥w∥ of a canonical hyperplane is equivalent to

1
minimizing ∥w∥ or ∥w∥2
2
10 External
SVMs − The separable case

• The SVM solution is a hyperplane which maximises the margin while correctly
classifying all the training points

1
min ∥w∥2 subject to. yi(w ⋅ xi + b) ≥ 1, ∀i ∈ [1,m]
w,b 2

1
• The objective function F : w ↦ ∥w∥2 is infinitely differentiable.
2

• We have ΔF(w) = w and Δ2F(w) = I

11 External
SVMs − The separable case

• Since the identity matrix is positive definite, the eigenvalues are strictly positive
and therefore the function F is strictly convex.
• The constraints are affine functions: gi(w, b) ≤ 0 where gi ≡ 1 − yi(w ⋅ xi + b)
• Therefore optimisation admits a unique solution.
• The optimisation problem is a specific instance of quadratic programming (QP).
• Special QP solvers such as the block coordinate descent algorithm can be used.

12 External
SVMs − The Lagrangian

• Since the constraints are convex and differentiable, we can introduce Lagrange
variables αi ≥ 0, i ∈ [1,m] for the m constraints and denote by α the vector
α1
α2
α=
⋮
αm

• The Lagrangian can be then defined for all w ∈ ℝN , b ∈ ℝ, and α ∈ ℝm

+ , by
m
1
ℒ(w, b, α) = ∥w∥2 − αi [yi(w ⋅ xi + b) − 1)]
2 ∑
i=1
13 External
SVMs − The Support Vectors

• The KKT (Karush Kuhn Tucker) conditions apply at the optimum point.

• The KKT conditions are obtained by

- setting the gradient of the Lagrangian with respect to the
primal variables w and b to zero, and
- by writing the complementarity conditions:

14 External
SVMs − The Support Vectors

m m

∑ ∑
Δwℒ = w − αi yixi = 0 ⟹ w= αi yixi
i=1 i=1

m m

∑ ∑
Δbℒ = − αi yi = 0 ⟹ αi yi = 0
i=1 i=1

∀i, αi [yi(w ⋅ xi + b) − 1] = 0 ⟹ αi = 0 ∨ yi(w ⋅ xi + b) = 1

15 External
SVMs − The Support Vectors

• The weight vector w solution of the SVM is a linear combination of the training
m

∑
set vectors x1, …, xm. w= αi yixi
i=1

• A vector xi appears in that expansion iff αi ≠ 0

• Such vectors are called support vectors.
• By the complementarity condition, if αi ≠ 0, then yi(w ⋅ xi + b) = 1
Thus, support vectors lie on the marginal hyperplanes w ⋅ xi + b = ± 1

16 External
SVMs − The Dual Formulation

• The Lagrangian can be then defined for all w ∈ ℝN , b ∈ ℝ, and α ∈ ℝm

+ , by
m
1
ℒ(w, b, α) = ∥w∥2 − αi [yi(w ⋅ xi + b) − 1)]
∑
(to be minimized)
2 i=1

∑
Substituting w = αi yixi and after some rearrangement, we get
•
i=1

m
1 m
∑ ∑
max αi − yi yj αiαj ⟨xi, xj⟩
α
i=1
2 i, j

• This is the dual formulation of SVM

17 External
SVMs − The Dual Formulation

m
1 m
∑ ∑
max αi − yi yj αiαj ⟨xi, xj⟩
α
i
2 i, j

• The dual formulation of SVM is parameterised on α (unknown)

• Also note that the inputs occur in the inner product ⟨xi, xj⟩

• The optimization can be expressed as a standard quadratic programming

problem. Let H be a matrix such that Hij = yi yj⟨xi, xj⟩

18 External
SVMs − The Dual Formulation

m
1 m
∑ ∑
max αi − yi yj αiαj ⟨xi, xj⟩
α
i
2 i, j

• The optimization can be expressed as a standard quadratic programming

problem. Let H be a matrix such that Hij = yi yj⟨xi, xj⟩
m
1 ⊤
∑
max αi − α H α
α
i
2

∑
such that αi ≥ 0 αi yi = 0
i
19 External
SVMs − The Dual Formulation
m
1 ⊤ 1 ⊤ 1 ⊤
min α H α − 1⊤α min x Px + q⊤x
∑
max αi − α H α
α 2 α 2 x 2
i
such that such that
such that

∀i −αi ≤ 0 or −α < 0 Gx < h

αi ≥ 0
Ax = b
m y⊤α = 0
∑
αi yi = 0
i

20 External
SVMs − The Dual Formulation

1 1 P ≡ H size: m × m
min α ⊤H α − 1⊤α min x⊤Px + q⊤x
α 2 x 2
q ≡ − 1⊤ size:m × 1
such that such that
G ≡ − diag[1]
(diagonal matrix size:
∀i −αi ≤ 0 or −α ≤ 0 Gx ≤ h
m × m of -1s)
Ax = b
y⊤α = 0 h ≡ 0 size: m × 1
A ≡ y⊤ size: 1 × m
b ≡ 0 (scalar)

21 External
SVMs − The Dual Formulation

• The quadratic programming solver gives us the value of unknown vector α

• Using α we compute the value of hyperplane parameters w and b
m

∑
w= αi yixi
i=1

y = ⟨w, xi⟩ + b (any of the support vectors xi will satisfy this equation)

therefore b = y − ⟨w, xi⟩

22 External
Mapping from Input space to Feature space

ϕ
Feature Space

Input Space

23 External
Kernel Function

• A kernel function formulates an inner product in the feature space to computes

the similarity of points mapped to the feature space.

• Different kernel functions induce a different notion of similarity in the feature

space
• Example kernels:
RBF: Radial Basis Function
Linear
Polynomial, etc

24 External
Non-Separable Case for Binary Classification
SVMs − Non-separable Case

• In most practical settings, the training data is not linearly separable, i.e., for any
hyperplane w ⋅ x + b = 0, there exists xi ∈ S such that

yi [w ⋅ xi + b] ≱ 1
• Thus, the constraints imposed in the linearly separable case, i.e.

yi [w ⋅ xi + b] ≥ 1 do not hold
• However, a relaxed version of these constraints can still hold

yi [w ⋅ xi + b] ≥ 1 − ξi for each i ∈ [1,m], and for ξi ≥ 0

26 External
SVMs − The Non-Separable Case
•

ξi

ξj

27 External
SVMs − Non-separable case

• The variables ξi are known as slack variables

• They measure the distance by which vector xi violate the desired inequality

• That is, it allows certain outlier points which have ξi > 0. These are the points
that are placed on the wrong side of the marginal hyperplane.

28 External
SVMs − Non-separable case
• A vector which is correctly classified by the separating hyperplane can also be an
outlier if it is in the wrong side of the marginal hyperplane.
• For the separable case, we say that the training data is separated by a hard
margin, but for the non-separable case, we say that the training data is separated
by a soft margin
ξi

ξj

29 External
SVMs − Optimization in the non-separable case

• How should we select the hyperplane in the non-separable case?

• Objective 1:
We seek to limit the total amount of slack due to outliers, which can be
m

∑
measured by ξi
i

• Objective 2: We seek a hyperplane with a large margin, though a larger margin

can lead to more outliers and thus larger amounts of slack

• These are two conflicting objectives.

30 External
SVMs − Formulation of Optimization
• The objective function:
m
1
min ∥w∥2 + C ξip
w,b,ξ 2 ∑
i=1

subject to yi(w ⋅ xi) + b) ≥ 1 − ξi ∧ ξi ≥ 0, i ∈ [1,m]

• The parameter C is typically determined by n-fold cross validation
• This is a convex optimisation problem since the constraints are affine and thus
convex and since the objective function is convex for any p ≥ 1
m
ξip = ∥ξ∥pp is convex in view of the convexity of the norm ∥ ⋅ ∥p
∑
The sum
•
i=1

31 External
SVMs − The Support Vectors

• Loss function corresponding to p = 1 is called the hinge loss

• Loss function corresponding to p = 2 is called the quadratic hinge loss.

• Both hinge losses are convex upper bounds on the zero-one loss, thus making
them well suited for optimisation.

32 External
SVMs − The Support Vectors

Quadratic Hinge Loss ξ 2

Hinge
Loss Loss ξ 1

0/1 Loss

-1 0 1 2
w⋅x+b
33 External
SVMs − The Non-Separable Case

• The objective function as well as the affine constraints are convex and
differentiable.
• Thus, KKT conditions apply at the optimum.
• We introduce Lagrange variables αi ≥ 0, i ∈ [1,m], associated with the first m
constraints and βi ≥ 0, i ∈ [1,m] associated to the non-negativity constraints of
the slack variables.
• The Lagrangian can be defined as:
m m m
1
∑ i ∑ i[ i
ℒ(w, b, ξ, α, β) = ∥w∥2 + C α y (w ⋅ xi + b) − 1 + ξi] −
∑ i i
ξ− βξ
2 i=1 i=1 i=1

34 External
SVMs − The Support Vectors

m m m
1
∑ i ∑ i[ i
ℒ(w, b, ξ, α, β) = ∥w∥2 + C α y (w ⋅ xi + b) − 1 + ξi] −
∑ i i
ξ− βξ
2 i=1 i=1 i=1

• The KKT conditions are obtained by setting the gradient of the Lagrangian with
respect to the primal variables w, b and ξis to zero and by writing the
complementarity conditions:

35 External
SVMs − The Support Vectors

m m

∑ ∑
Δwℒ = w − αi yixi = 0 ⟹ w= αi yixi
•
i=1 i=1

m m

∑ ∑
Δbℒ = − αi yi = 0 ⟹ αi yi = 0
i=1 i=1

Δξiℒ = C − αi − βi = 0 ⟹ αi + βi = C

∀i, αi [yi(w ⋅ xi + b) − 1 + ξi] = 0 ⟹ αi = 0 ∨ yi(w ⋅ xi + b) = 1 − ξi

∀i, βiξi = 0 ⟹ βi = 0 ∨ ξi = 0

36 External
SVMs − The Support Vectors

• Thus, the weight vector w solution of the SVM problem is a linear combination of
m

∑
the training set vectors x1, …, xm w= αi yixi
i=1

• A vector xi appears in that expansion iff αi ≠ 0. Such vectors are called support
vectors.

37 External
SVMs − The Support Vectors

• Here there are two types of support vectors.

• By the complementarity condition,

if αi ≠ 0, then yi(w ⋅ xi + b) = 1 − ξi

if ξi = 0, then yi(w ⋅ xi + b) = 1 and xi lies on the marginal hyperplane

otherwise, ξi ≠ 0 and xi is an outlier and βi = 0 and αi = C

38 External
SVMs − The Support Vectors

• Thus, the support vectors are either outliers, in which case αi = C, or vectors
lying on the marginal hyperplanes.
• The solution vector w is unique, while the support vectors are not unique.

ξi

ξj

39 External
SVMs − Simplification of Lagrangian

m m m
1
∑ i ∑ i[ i
ℒ(w, b, ξ, α, β) = ∥w∥2 + C α y (w ⋅ xi + b) − 1 + ξi] −
∑ i i
ξ− βξ
2 i=1 i=1 i=1

m m
1
= ∥w∥2 + αi [yi(w ⋅ xi + b) − 1]
∑ ∑
ξi(C − αi − βi) −
2 i=1 i=1

Applying the box constraints that αi + βi = C ∀i

m
1
= ∥w∥2 − αi [yi(w ⋅ xi + b) − 1]
2 ∑
i=1
40 External
SVMs − Simplification of Lagrangian

m m
1
∥w∥2 − αi [yi(w ⋅ xi + b) − 1]
∑ ∑
Substituting for w = αi yixi in
i=1
2 i=1

m
1 ⊤
∑
gives an equivalent maximisation max αi − α H α
α
i
2

subject to 0 ≤ αi ≤ C ∀i

∑
αi yi = 0
i

41 External
SVMs − The Dual Formulation solved using a standard quadratic program

1 1 P ≡ H size: m × m
max α ⊤H α − 1⊤α min x⊤Px + q⊤x
α 2 x 2
q ≡ − 1⊤ size:m × 1
such that such that
G ≡ − diag[1] vertically
stacked on diag[1]
∀i −αi ≤ 0 or −α ≤ 0 Gx ≤ h
(matrix size:2m × m)
∀i αi ≤ C or α≤C Ax = b
h ≡ 0 vertically stacked on C
y⊤α = 0 size: 2m × 1

A ≡ y⊤ size: 1 × m
b ≡ 0 (scalar)
42 External
SVMs − The Dual Formulation solved using a standard quadratic program

A ≡ y⊤ size: 1 × m
b ≡ 0 (scalar)
43 External
SVMs − The Dual Formulation

• The quadratic programming solver gives us the value of unknown vector α

• Using α we compute the value of hyperplane parameters w and b
m

∑
w= αi yixi
i=1

y = ⟨w, xi⟩ + b (any of the support vectors xi will satisfy this equation)

therefore b = y − ⟨w, xi⟩

44 External
Mapping from Input space to Feature space

Feature Space

Input Space

45 External
Thank You

Lecture 7 - SVM
No ratings yet
Lecture 7 - SVM
125 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
103 pages
Support Vector Machine
No ratings yet
Support Vector Machine
49 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
L5 SVM
No ratings yet
L5 SVM
61 pages
Lecture 2.7
No ratings yet
Lecture 2.7
18 pages
20 SVM
No ratings yet
20 SVM
35 pages
PP5 - Solutions - 2D Moment of Force and Couples
100% (1)
PP5 - Solutions - 2D Moment of Force and Couples
6 pages
Overview of SVM: A Support Vector Machine (SVM) Performs by Finding The That The Margin Between The
No ratings yet
Overview of SVM: A Support Vector Machine (SVM) Performs by Finding The That The Margin Between The
20 pages
SVM 30thoct Annotated
No ratings yet
SVM 30thoct Annotated
35 pages
Support Vector Machine
No ratings yet
Support Vector Machine
29 pages
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
44 pages
Chapter 07 SVM
No ratings yet
Chapter 07 SVM
20 pages
SVM Consolidated
No ratings yet
SVM Consolidated
34 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
10 SVM
No ratings yet
10 SVM
77 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
Alpha Mathematics Homework Book Answers
100% (1)
Alpha Mathematics Homework Book Answers
6 pages
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
No ratings yet
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
69 pages
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
No ratings yet
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
28 pages
Lecture Slides-Week12
100% (1)
Lecture Slides-Week12
41 pages
HCF and LCM
No ratings yet
HCF and LCM
8 pages
SVM 1
No ratings yet
SVM 1
36 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
Support Vector Machines
No ratings yet
Support Vector Machines
33 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
SVM Student
No ratings yet
SVM Student
40 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
SVM New
No ratings yet
SVM New
12 pages
SVM Slides
No ratings yet
SVM Slides
22 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Straight Line 1
No ratings yet
Straight Line 1
67 pages
Support Vector Machine Classifiers
No ratings yet
Support Vector Machine Classifiers
44 pages
SVM
No ratings yet
SVM
11 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Survey Piccialli sciandrone4OR
No ratings yet
Survey Piccialli sciandrone4OR
29 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
Complex Analysis Problems
No ratings yet
Complex Analysis Problems
84 pages
Class Xii CS Practical File 2
No ratings yet
Class Xii CS Practical File 2
63 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
An Idiot Guide To SVM
No ratings yet
An Idiot Guide To SVM
25 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Detailed Lesson Plan 2
100% (1)
Detailed Lesson Plan 2
7 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
10 SVM
No ratings yet
10 SVM
23 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
S3 - Se2 - Paper4 - Revision Paper - Ay2021-2022
No ratings yet
S3 - Se2 - Paper4 - Revision Paper - Ay2021-2022
26 pages
DM Witten 03
No ratings yet
DM Witten 03
56 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
Pugh, Charles - Real Mathematical Analysis (Back Matter)
No ratings yet
Pugh, Charles - Real Mathematical Analysis (Back Matter)
12 pages
An Introduction To Support Vector Machines
No ratings yet
An Introduction To Support Vector Machines
13 pages
Support Vector Machines (SVM) : Y.H. Hu
No ratings yet
Support Vector Machines (SVM) : Y.H. Hu
25 pages
Textbook Inventory: Room 9 Grade Title Qty. Delivered On Hand Issued
No ratings yet
Textbook Inventory: Room 9 Grade Title Qty. Delivered On Hand Issued
4 pages
PRE CALCULUS 2ndQ SLM
No ratings yet
PRE CALCULUS 2ndQ SLM
45 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
COT Q3 - MATH Solid and Plane Figures
No ratings yet
COT Q3 - MATH Solid and Plane Figures
44 pages
Math Question Bank Grade X - Sahil
No ratings yet
Math Question Bank Grade X - Sahil
5 pages
Differential Equations (Calculus) Mathematics E-Book For Public Exams
From Everand
Differential Equations (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
12th Maths Revision Exam - 1
No ratings yet
12th Maths Revision Exam - 1
5 pages
g8 DLL q4 Lc52 - Fourth - Day1-4
100% (1)
g8 DLL q4 Lc52 - Fourth - Day1-4
3 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
CH 8 - Differerntial Equation - LMR Soln
No ratings yet
CH 8 - Differerntial Equation - LMR Soln
24 pages
Lecture Notes Part1
No ratings yet
Lecture Notes Part1
22 pages
Report 1
No ratings yet
Report 1
6 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Fundamentals of AI Unit-3 Notes
No ratings yet
Fundamentals of AI Unit-3 Notes
15 pages
Joseph, Kristen - Thesis
No ratings yet
Joseph, Kristen - Thesis
57 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Project Description - MMW - Ged102
No ratings yet
Project Description - MMW - Ged102
1 page
135 Circular 2022
No ratings yet
135 Circular 2022
4 pages
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Binary Arithmetic & Alus: O Objectives O Reading Assignment
No ratings yet
Binary Arithmetic & Alus: O Objectives O Reading Assignment
10 pages
Alg2 03 02
No ratings yet
Alg2 03 02
8 pages
Assignment 3 Ans
No ratings yet
Assignment 3 Ans
10 pages
Unit: Two Variable Linear Regression Model (Simple Linear Regression Model)
No ratings yet
Unit: Two Variable Linear Regression Model (Simple Linear Regression Model)
18 pages
Recurrence Relation - (Decreasing Functions)
No ratings yet
Recurrence Relation - (Decreasing Functions)
18 pages
Maths p1 Qns
No ratings yet
Maths p1 Qns
18 pages
Latihan 1 POM-QM
No ratings yet
Latihan 1 POM-QM
2 pages
Curvature Evolution of Plane Curves With Prescribed Opening Angle
No ratings yet
Curvature Evolution of Plane Curves With Prescribed Opening Angle
10 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ML TCS Lecture 15

Uploaded by

ML TCS Lecture 15

Uploaded by

Working with Support Vector Machines (Lecture 15)

Machine Learning for Real-World Applications

Copyright © 2021 Tata Consultancy Services Limited

• Consider an input space ⊆ ℝN with N ≥ 1, and the output or target space is

• Given a hypothesis set H of functions mapping to

• Consider an input space ⊆ ℝN with N ≥ 1, and the output or target space is

• S = {(x1, y1), (x2, y2), …, (xm, ym)} ∈ ( × )m

• The learner chooses a hypothesis h ∈ H, a binary classifier, with small

• Different hypothesis sets H can be selected for this task.

• The general equation of a hyperplane in ℝN is

• We define this representation of the hyperplane (w, b) as the canonical

• The distance of any point x0 ∈ ℝN to the hyperplane is given by

• Thus, for a canonical hyperplane, the margin ρ is given by

• When a training point xi is correctly classified by a hyperplane defined by (w, b),

• This implies that for correct classification yi(w ⋅ xi + b) ≥ 1

• We have ΔF(w) = w and Δ2F(w) = I

• The Lagrangian can be then defined for all w ∈ ℝN , b ∈ ℝ, and α ∈ ℝm

• The KKT conditions are obtained by

∀i, αi [yi(w ⋅ xi + b) − 1] = 0 ⟹ αi = 0 ∨ yi(w ⋅ xi + b) = 1

• A vector xi appears in that expansion iff αi ≠ 0

• The Lagrangian can be then defined for all w ∈ ℝN , b ∈ ℝ, and α ∈ ℝm

• This is the dual formulation of SVM

• The dual formulation of SVM is parameterised on α (unknown)

• The optimization can be expressed as a standard quadratic programming

• The optimization can be expressed as a standard quadratic programming

∀i −αi ≤ 0 or −α < 0 Gx < h

• The quadratic programming solver gives us the value of unknown vector α

therefore b = y − ⟨w, xi⟩

• A kernel function formulates an inner product in the feature space to computes

• Different kernel functions induce a different notion of similarity in the feature

yi [w ⋅ xi + b] ≥ 1 − ξi for each i ∈ [1,m], and for ξi ≥ 0

• The variables ξi are known as slack variables

• How should we select the hyperplane in the non-separable case?

• Objective 2: We seek a hyperplane with a large margin, though a larger margin

• These are two conflicting objectives.

subject to yi(w ⋅ xi) + b) ≥ 1 − ξi ∧ ξi ≥ 0, i ∈ [1,m]

• Loss function corresponding to p = 1 is called the hinge loss

• Loss function corresponding to p = 2 is called the quadratic hinge loss.

Quadratic Hinge Loss ξ 2

∀i, αi [yi(w ⋅ xi + b) − 1 + ξi] = 0 ⟹ αi = 0 ∨ yi(w ⋅ xi + b) = 1 − ξi

• Here there are two types of support vectors.

• By the complementarity condition,

if ξi = 0, then yi(w ⋅ xi + b) = 1 and xi lies on the marginal hyperplane

otherwise, ξi ≠ 0 and xi is an outlier and βi = 0 and αi = C

Applying the box constraints that αi + βi = C ∀i

• The quadratic programming solver gives us the value of unknown vector α

therefore b = y − ⟨w, xi⟩

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.