0% found this document useful (0 votes)

15 views14 pages

MIT15 097S12 Lec12

mit notes

Uploaded by

asghk97

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views14 pages

MIT15 097S12 Lec12

mit notes

Uploaded by

asghk97

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Support Vector Machines

MIT 15.097 Course Notes

Cynthia Rudin
Credit: Ng, Hastie, Tibshirani, Friedman
Thanks: Şeyda Ertekin

Let’s start with some intuition about margins.

The margin of an example xi = “distance” from example to decision boundary

= yi f (xi )

The margin is positive if the example is on the correct side of the decision bound-
ary, otherwise it’s negative.

Here’s the intuition for SVM’s:

• We want all examples to have large margins, want them to be as far from
decision boundary as possible.
• That way, the decision boundary is more “stable,” we are confident in all
decisions.

1
Most other algorithms (logistic regression, decision trees, perceptron) don’t gen-
erally produce large margins. (AdaBoost generally produces large margins.)

As in logistic regression and AdaBoost, function f is linear,

m
X
f (x) = λ(j) x(j) + λ0 .
j=1

Note that the intercept term can get swept into x by adding a 1 as the last
component of each x. Then f (x) would be just λT x but for this lecture we’ll
keep the intercept term separately because SVM handles that term differently
than if you put the intercept as a separate feature. We classify x using sign(f (x)).

If xi has a large margin, we are confident that we classified it correctly. So we’re

essentially suggesting to use the margin yi f (xi ) to measure the confidence in our
prediction.

But there is a problem with using yi f (xi ) to measure confidence in prediction.

There is some arbitrariness about it.

How? What should we do about it?

2
SVM’s maximize the distance from the decision boundary to the nearest training
example – they maximize the minimum margin. There is a geometric perspective
too. I set the intercept to zero for this picture (so the decision boundary passes
through the origin):

The decision boundary are x’s where λT x = 0. That means the unit vector for
λ must be perpendicular to those x’s that lie on the decision boundary.

Now that you have the intuition, we’ll put the intercept back, and we have to
translate the decision boundary, so it’s really the set of x’s where λT x = λ0 .

The margin of example i is denoted γi :

B is the point on the decision boundary closest to the positive example xi . B is

λ
B = xi − γi
||λ|| 2

3
since we moved −γi units along the unit vector to get from the example to B.

Since B lies on the decision boundary, it obeys λT x + λ0 = 0, where x is B. (I

wrote the intercept there explicitly). So,

λ
λT xi − γi + λ0 = 0
||λ|| 2
T ||λ||22
λ xi − γi + λ0 = 0
||λ||2
Simplifying,
λT xi + λ0
γi =
||λ||2
=: f˜(xi ) (this is the normalized version of f )
= yi f˜(xi ) since yi = 1.
Note that here we normalized so we wouldn’t have the arbitrariness in the mean-
ing of the margin.

If the example is negative, the same calculation works, with a few sign flips (we’d
need to move γi units rather than −γi units).

So the “geometric” margin from the picture is the same as the “functional”
margin yi f˜(xi ).

Maximize the minimum margin

Support vector machines maximize the minimum margin. They would like to
have all examples being far from the decision boundary. So they’ll choose f this
way:
max max γ s.t. yi f (xi ) ≥ γ i = 1 . . . m
f γ

λT xi + λ0
max γ s.t. yi ≥γ i = 1...m
γ,λ,λ0 ||λ||2

max γ s.t. yi (λT xi + λ0 ) ≥ γ ||λ||2 i = 1 . . . m.

γ,λ,λ0

4
For any λ and λ0 that satisfy this, any positively scaled multiple satisfies them
too, so we can arbitrarily set ||λ||2 = 1/γ so that the right side is 1.

Now when we maximize γ, we’re maximizing γ = 1/ ||λ||2 . So we have

1
max s.t. yi (λT xi + λ0 ) ≥ 1 i = 1 . . . m.
λ,λ0 ||λ||2
Equivalently,
1
min ||λ||22 s.t. yi (λT xi + λ0 ) − 1 ≥ 0 i = 1 . . . m (1)
λ,λ0 2

(the 1/2 and square are just for convenience) which is the same as:
1
min ||λ||22 s.t. − yi (λT xi + λ0 ) + 1 ≤ 0 i = 1 . . . m
λ,λ0 2

leading to the Lagrangian

n m
1 X (j)2 X
αi −yi (λT xi + λ0 ) + 1

L ([λ, λ0 ], α) = λ +
2 j=1 i=1

Writing the KKT conditions, starting with Lagrangian stationarity, where we

need to find the gradient wrt λ and the derivative wrt λ0 :
m
X m
X
∇λ L ([λ, λ0 ], α) = λ − αi yi xi = 0 =⇒ λ = αi yi xi .
i=1 i=1
m m
∂ X X
L ([λ, λ0 ], α) = − αi yi = 0 =⇒ αi yi = 0.
∂λ0 i=1 i=1
αi ≥ 0 ∀i (dual feasibility)
αi −yi (λT xi + λ0 ) + 1 = 0 ∀i

(complementary slackness)
−yi (λT xi + λ0 ) + 1 ≤ 0. (primal feasibility)

5
Using the KKT conditions, we can simplify the Lagrangian.
m m m
1 X X X
L ([λ, λ0 ], α) = kλk22 + λT (−αi yi xi ) + (−αi yi λ0 ) + αi
2 i=1 i=1 i=1
(We just expanded terms. Now we’ll plug in the first KKT condition.)
m m
1 2 2
X X
= ||λ||2 − ||λ||2 − λ0 (αi yi ) + αi
2 i=1 i=1
(Plug in the second KKT condition.)
m m
1 X (j)2 X
=− λ + 0 + αi (2)
2 j=1 i=1

Again using the first KKT condition, we can rewrite the first term.
m m m
!2
1 X (j)2 1X X (j)
− λ =− αi y i x i
2 j=1 2 j=1 i=1
m m m
1 XXX (j) (j)
=− αi αk yi yk xi xk
2 j=1 i=1
k=1
m X
m
1 X
=− αi αk yi yk xTi xk .
2 i=1 k=1

Plugging back into the Lagrangian (2), which now only depends on α, and
putting in the second and third KKT conditions gives us the dual problem;
max L (α)
α

where
m
Pi m≥ 0 i = 1 . . . m
X 1X α
L (α) = αi − αi αk yi yk xTi xk s.t. (3)
i=1
2
i,k i=1 αi yi = 0

We’ll use the last two KKT conditions in what follows, for instance to get con-
ditions on λ0 , but what we’ve already done is enough to define the dual problem
for α.

We can solve this dual problem. Either (i) we’d use a generic quadratic pro-
gramming solver, or (ii) use another algorithm, like SMO, which I will discuss

6
later. For now, assume we solved it. So we have α1∗ , . . . , αm
∗
. We can use the
solution of the dual problem to get the solution of the primal problem. We can
plug α∗ into the first KKT condition to get
m
X
∗
λ = αi∗ yi xi . (4)
i=1

We still need to get λ∗0 , but we can see something cool in the process.

Support Vectors
Look at the complementary slackness KKT condition and the primal and dual
feasibility conditions:
 ∗
 αi > 0 ⇒ yi (λ∗T xi + λ∗0 ) = 1
 ∗

∗
∗T ∗
αi < 0 (Can’t happen)
αi −yi (λ xi + λ0 ) + 1 = 0 ⇒

 −yi (λ∗T xi + λ∗0 ) + 1 < 0 ⇒ αi∗ = 0
−yi (λ∗T xi + λ∗0 ) + 1 > 0 (Can’t happen)


Define the optimal (scaled) scoring function: f ∗ (xi ) = λ∗T xi + λ∗0 , then

αi∗ > 0 ⇒ yi f ∗ (xi ) = scaled margini = 1

1 < yi f ∗ (xi ) ⇒ αi∗ = 0

The examples in the first category, for which the scaled margin is 1 and the
constraints are active are called support vectors. They are the closest to the
decision boundary.

7
1
1
Support vectors

x1
Image by MIT OpenCourseWare.

Finish What We Were Doing Earlier

To get λ∗0 , use the complementarity condition for any of the support vectors (in
other words, use the fact that the unnormalized margin of the support vectors
is one):
1 = yi (λ∗T xi + λ∗0 ).
If you take a positive support vector, yi = 1, then

λ∗0 = 1 − λ∗T xi .

Written another way, since the support vectors have the smallest margins,

λ∗0 = 1 − min λ∗T xi .

i:yi =1

So that’s the solution! Just to recap, to get the scoring function f ∗ for SVM,
you’d compute α∗ from the dual problem (3), plug it into (4) to get λ∗ , plug that
into the equation above to get λ∗0 , and that’s the solution to the primal problem,
and the coefficients for f ∗ .

8
Because of the form of the solution:
m
X
∗
λ = αi∗ yi xi .
i=1

it is possible that λ∗ is very fast to calculate.

Why is that? Think support vectors.

The Nonseparable Case

If there is no separating hyperplane,

there is no feasible solution to the problem we wrote above. Most real problems
are nonseparable.

Let’s fix our SVM so it can accommodate the nonseparable case. The new for-
mulation will penalize mistakes the farther they are from the decision boundary.
So we are allowed to make mistakes now, but we pay a price.

9
Let’s change our primal problem (1) to this new primal problem:
m
yi (λT xi + λ0 ) ≥ 1 − ξi

1 X
min ||λ||22 + C ξi s.t. (5)
λ,λ0 ,ξ 2 ξi ≥ 0
i=1
So the constraints allow some slack of size ξi , but we pay a price for it in the
objective. That is, if yi f (xi ) ≥ 1 then ξi gets set to 0, penalty is 0. Otherwise,
if yi f (xi ) = 1 − ξi , we pay price ξi .

Parameter C trades off between the twin goals of making the ||λ||22 small (making
what-was-the-minimum-margin 1/ ||λ||22 large) and ensuring that most examples
have margin at least 1/ ||λ||22 .

Going on a Little Tangent

Rewrite the penalty another way:
If yi f (xi ) ≥ 1, zero penalty. Else, pay price ξi = 1 − yi f (xi )

Third time’s the charm:

Pay price ξi = b1 − yi f (xi )c+
where this notation bzc+ means take the maximum of z and 0.

Equation (5) becomes:

m
1 2
X
min ||λ||2 + C b1 − yi f (xi )c+
λ,λ0 2
i=1

Does that look familiar?

10
The Dual for the Nonseparable Case
Form the Lagrangian of (5):

m m m
1 X X X
L(λ, b, ξ , α, r) = ||λ||22 + C ξi − αi yi (λT xi + λ0 ) − 1 + ξi − ri ξi
2 i=1 i=1 i=1

where αi ’s and ri ’s are Lagrange multipliers (constrained to be ≥ 0). The dual

turns out to be (after some work)
m m
X 1X 0 ≤ αi ≤ C i = 1 . . . m
max αi − αi αk yi yk xiT xk s.t. P m (6)
α 2 i=1 α i yi = 0
i=1 i,k=1

So the only difference from the original problem’s Lagrangian (3) is that 0 ≤ αi
was changed to 0 ≤ αi ≤ C. Neat!

Solving the dual problem with SMO

SMO (Sequential Minimal Optimization) is a type of coordinate ascent algo-
rithm, but adapted to SVM so that the solution always stays within the feasible
region.

Start with (6). Let’s say you want to hold α2 , . . . , αm fixed and take a coordinate
step in the first direction. That is, change α1 to maximize the objective in (6).
Can we make any progress? Can we get a better feasible solution by doing this?
Pm
Turns out, no. Look at the constraint in (6), i=1 αi yi = 0. This means:
m
X
α 1 y1 = − αi yi , or multiplying by y1 ,
i=2
Xm
α1 = −y1 αi yi .
i=2

So, since α2 , . . . , αm are fixed, α1 is also fixed.

11
So, if we want to update any of the αi ’s, we need to update at least 2 of them
simultaneously to keep the solution feasible (i.e., to keep the constraints satis-
fied).

Start with a feasible vector α. Let’s update α1 and α2 , holding α3 , . . . , αm fixed.

What values of α1 and α2 are we allowed to choose?
Pm
Again, the constraint is: α1 y1 + α2 y2 = − i=3 αi yi =: ζ (fixed constant).

We are only allowed to choose α1 , α2 on the line, so when we pick α2 , we get α1

automatically, from
1
α1 = (ζ − α2 y2 )
y1
= y1 (ζ − α2 y2 ) (y1 = 1/y1 since y1 ∈ {+1, −1}).

Also, the other constraints in (6) say 0 ≤ α1 , α2 ≤ C. So, α2 needs to be within

[L,H] on the figure (in order for α1 to stay within [0, C]), where we will always
have 0 ≤ L,H ≤ C. To do the coordinate ascent step, we will optimize the
objective over α2 , keeping it within [L,H]. Intuitively, (6) becomes:

 
1X
max α1 + α2 + constants − αi αk yi yk xTi xk  where α1 = y1 (ζ − α2 y2 ).
α2 ∈[L,H] 2
i,k
(7)
The objective is quadratic in α2 . This means we can just set its derivative to 0
to optimize it and get α2 for the next iteration of SMO. If the optimal value is

12
outside of [L,H], just choose α2 to be either L or H for the next iteration.

For instance, if this is a plot of (7)’s objective (sometimes it doesn’t look like
this, sometimes it’s upside-down), then we’ll choose :

Note: there are heuristics to choose the order of αi ’s chosen to update.

13
MIT OpenCourseWare
http://ocw.mit.edu

15.097 Prediction: Machine Learning and Statistics

Spring 2012

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

QSRI Lecture5
No ratings yet
QSRI Lecture5
79 pages
12 - Bài Toán Phân L P - SVM - v2
No ratings yet
12 - Bài Toán Phân L P - SVM - v2
138 pages
04SVM
No ratings yet
04SVM
22 pages
Class06 SVM
No ratings yet
Class06 SVM
47 pages
Lec 06 SVM
No ratings yet
Lec 06 SVM
34 pages
SVM Slides
No ratings yet
SVM Slides
32 pages
Lecture 7 - SVM
No ratings yet
Lecture 7 - SVM
125 pages
Support Vecto Machine
No ratings yet
Support Vecto Machine
62 pages
Support Vector Machine
No ratings yet
Support Vector Machine
49 pages
Support Vector Machine
No ratings yet
Support Vector Machine
46 pages
1 Number 1: Support Vector Machine: 1.1 Case 1: Linear Separable Binary Classification
No ratings yet
1 Number 1: Support Vector Machine: 1.1 Case 1: Linear Separable Binary Classification
11 pages
SVM 30thoct Annotated
No ratings yet
SVM 30thoct Annotated
35 pages
斯坦福大学机器学习数学基础 57-64
No ratings yet
斯坦福大学机器学习数学基础 57-64
8 pages
07 SVMs
No ratings yet
07 SVMs
68 pages
SVM Slides
No ratings yet
SVM Slides
22 pages
Support Vector Machines - An Introduction: Department of Electrical Engineering Technion, Israel
100% (1)
Support Vector Machines - An Introduction: Department of Electrical Engineering Technion, Israel
44 pages
L5 SVM
No ratings yet
L5 SVM
61 pages
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
44 pages
Lec8 PDF
No ratings yet
Lec8 PDF
5 pages
ML TCS Lecture 15
No ratings yet
ML TCS Lecture 15
46 pages
SVM New
No ratings yet
SVM New
12 pages
Lecture 9 - SVMs
No ratings yet
Lecture 9 - SVMs
8 pages
Mathematics
No ratings yet
Mathematics
5 pages
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
No ratings yet
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
28 pages
Support Vector Machines: Javier B Ejar Cbea
No ratings yet
Support Vector Machines: Javier B Ejar Cbea
44 pages
Support Vector Machines
No ratings yet
Support Vector Machines
5 pages
Sequential Minimal Optimization Method To Solve The Support Vector Machine Problem
No ratings yet
Sequential Minimal Optimization Method To Solve The Support Vector Machine Problem
5 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
An Idiot's Guide To Support Vector Machines
No ratings yet
An Idiot's Guide To Support Vector Machines
28 pages
SAT Math Workbook D1-12 PDF
No ratings yet
SAT Math Workbook D1-12 PDF
157 pages
Support Vector Machines PDF
No ratings yet
Support Vector Machines PDF
5 pages
A Short SVM (Support Vector Machine) Tutorial
No ratings yet
A Short SVM (Support Vector Machine) Tutorial
6 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
45 pages
Support Vector Machines Jie Tang
No ratings yet
Support Vector Machines Jie Tang
28 pages
SVM Seminarbericht Hofmann
No ratings yet
SVM Seminarbericht Hofmann
16 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
SVM Scribe Notes
No ratings yet
SVM Scribe Notes
16 pages
Support Vector Machines
No ratings yet
Support Vector Machines
33 pages
SVM Explained PDF
No ratings yet
SVM Explained PDF
19 pages
SVM Student
No ratings yet
SVM Student
40 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
Support Vector Machines (SVM) : Y.H. Hu
No ratings yet
Support Vector Machines (SVM) : Y.H. Hu
25 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Support Vector Machines (SVM) : N I y X D
No ratings yet
Support Vector Machines (SVM) : N I y X D
5 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Dis11 Sol
No ratings yet
Dis11 Sol
5 pages
An Idiot Guide To SVM
No ratings yet
An Idiot Guide To SVM
25 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
Vogels Approximation Method
No ratings yet
Vogels Approximation Method
16 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
eYt2Kr3P PDF
100% (1)
eYt2Kr3P PDF
458 pages
Math 13123123
67% (3)
Math 13123123
5 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
Lec 1 Lagrange Interpolation
No ratings yet
Lec 1 Lagrange Interpolation
18 pages
9 Gauss Elimination
No ratings yet
9 Gauss Elimination
21 pages
Lecture (Distribution Models)
No ratings yet
Lecture (Distribution Models)
101 pages
MECH6390 Spring2013 Jan10
No ratings yet
MECH6390 Spring2013 Jan10
8 pages
Matrices and Determinant L1 Youtube PDF
No ratings yet
Matrices and Determinant L1 Youtube PDF
36 pages
TRB Cse Syllabus
No ratings yet
TRB Cse Syllabus
4 pages
Assembly Line Scheduling
No ratings yet
Assembly Line Scheduling
25 pages
PSO Mini Tutorial
No ratings yet
PSO Mini Tutorial
46 pages
Optimized Design of The Prestress in Continuous Bridge Decks
No ratings yet
Optimized Design of The Prestress in Continuous Bridge Decks
10 pages
Ncert Solutions Class 9 Math Chapter 2 Polynomials
No ratings yet
Ncert Solutions Class 9 Math Chapter 2 Polynomials
36 pages
Guide To Solvers - OpenSolver For Excel
0% (1)
Guide To Solvers - OpenSolver For Excel
7 pages
Unit 1
No ratings yet
Unit 1
30 pages
Chapter Wise Work Sheet - Polynomials
No ratings yet
Chapter Wise Work Sheet - Polynomials
4 pages
CH-2 Polynomials
No ratings yet
CH-2 Polynomials
6 pages
Different Types of Systems: TF X (TF)
No ratings yet
Different Types of Systems: TF X (TF)
20 pages
14.2.7 - Numerical Approximation Euler - S Method
No ratings yet
14.2.7 - Numerical Approximation Euler - S Method
23 pages
Practice Questions From Factorisation of Polynomials - CL X
No ratings yet
Practice Questions From Factorisation of Polynomials - CL X
2 pages
CONSTRUCTION OF IRREDUCIBLE POLYNOMIALS OF DEGREE N IN Z 2
No ratings yet
CONSTRUCTION OF IRREDUCIBLE POLYNOMIALS OF DEGREE N IN Z 2
7 pages
Matrices - Worksheet
No ratings yet
Matrices - Worksheet
6 pages
Determinants
No ratings yet
Determinants
24 pages
Lesson Plan
No ratings yet
Lesson Plan
3 pages
Chapter 6 Review Packet
No ratings yet
Chapter 6 Review Packet
10 pages
T9310. Quadratic Formula Maze
No ratings yet
T9310. Quadratic Formula Maze
2 pages
Examen Junio 2018 Preguntas y Respuestas
No ratings yet
Examen Junio 2018 Preguntas y Respuestas
7 pages
Solving Minimization Problems
No ratings yet
Solving Minimization Problems
5 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
An Introduction to Linear Algebra and Tensors
From Everand
An Introduction to Linear Algebra and Tensors
M. A. Akivis
1/5 (1)
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

MIT15 097S12 Lec12

Uploaded by

MIT15 097S12 Lec12

Uploaded by

Support Vector Machines

MIT 15.097 Course Notes

Let’s start with some intuition about margins.

The margin of an example xi = “distance” from example to decision boundary

Here’s the intuition for SVM’s:

As in logistic regression and AdaBoost, function f is linear,

If xi has a large margin, we are confident that we classified it correctly. So we’re

But there is a problem with using yi f (xi ) to measure confidence in prediction.

How? What should we do about it?

The margin of example i is denoted γi :

B is the point on the decision boundary closest to the positive example xi . B is

Since B lies on the decision boundary, it obeys λT x + λ0 = 0, where x is B. (I

Maximize the minimum margin

max γ s.t. yi (λT xi + λ0 ) ≥ γ ||λ||2 i = 1 . . . m.

Now when we maximize γ, we’re maximizing γ = 1/ ||λ||2 . So we have

leading to the Lagrangian

Writing the KKT conditions, starting with Lagrangian stationarity, where we

αi∗ > 0 ⇒ yi f ∗ (xi ) = scaled margini = 1

1 < yi f ∗ (xi ) ⇒ αi∗ = 0

Finish What We Were Doing Earlier

λ∗0 = 1 − min λ∗T xi .

it is possible that λ∗ is very fast to calculate.

Why is that? Think support vectors.

The Nonseparable Case

Going on a Little Tangent

Third time’s the charm:

Equation (5) becomes:

Does that look familiar?

where αi ’s and ri ’s are Lagrange multipliers (constrained to be ≥ 0). The dual

Solving the dual problem with SMO

So, since α2 , . . . , αm are fixed, α1 is also fixed.

Start with a feasible vector α. Let’s update α1 and α2 , holding α3 , . . . , αm fixed.

We are only allowed to choose α1 , α2 on the line, so when we pick α2 , we get α1

Also, the other constraints in (6) say 0 ≤ α1 , α2 ≤ C. So, α2 needs to be within

Note: there are heuristics to choose the order of αi ’s chosen to update.

15.097 Prediction: Machine Learning and Statistics

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.