0% found this document useful (0 votes)

20 views5 pages

SVM Problems1

The document contains solutions to various questions related to support vector machines (SVM) and logistic regression, focusing on their mathematical foundations and properties. It discusses the expressiveness of kernels, the convexity of logistic regression loss, and the linear separability of classes in different feature spaces. Additionally, it explores the implications of removing support vectors on the optimal margin and the effects of changing constraints in SVM optimization problems.

Uploaded by

geetikatalasila1710

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views5 pages

SVM Problems1

Uploaded by

geetikatalasila1710

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CS 194-10, Fall 2011

Assignment 2 Solutions

1. (8 pts) In this question we briefly review the expressiveness of kernels.

(a) Construct a support vector machine that computes the XOR function. Use values of +1 and -1
(instead of 1 and 0) for both inputs and outputs, so that an example looks like ([−1, 1], 1) or
([−1, −1], −1). Map the input [x1 , x2 ] into a space consisting of x1 and x1 x2 . Draw the four
input points in this space, and the maximal margin separator. What is the margin? Now draw the
separating line back in the original Euclidian input space.
The examples map from [x1 , x2 ] to [x1 , x1 x2 ] coordinates as follows:
[−1, −1] (negative) maps to [−1, +1]
[−1, +1] (positive) maps to [−1, −1]
[+1, −1] (positive) maps to [+1, −1]
[+1, +1] (negative) maps to [+1, +1]
Thus, the positive examples have x1 x2 = − 1 and the negative examples have x1 x2 = + 1. The
maximum margin separator is the line x1 x2 = 0, with a margin of 1. The separator corresponds to
the x1 = 0 and x2 = 0 axes in the original space—this can be thought of as the limit of a hyperbolic
separator with two branches.
(b) Recall that the equation of the circle in the 2-dimensional plane is (x1 − a)2 + (x2 − b)2 − r2 = 0.
Expand out the formula and show that every circular region is linearly separable from the rest of
the plane in the feature space (x1 , x2 , x21 , x22 ).
The circle equation expands into five terms

0 = x21 + x22 − 2ax1 − 2bx2 + (a2 + b2 − r2 )

corresponding to weights w = (2a, 2b, 1, 1) and intercept a2 + b2 − r2 . This shows that a circular
boundary is linear in this feature space, allowing linear separability.
In fact, the three features x1 , x2 , x21 + x22 suffice.
(c) Recall that the equation of an ellipse in the 2-dimensional plane is c(x1 − a)2 + d(x2 − b)2 − 1 = 0.
Show that an SVM using the polynomial kernel of degree 2, K(u, v) = (1 + u · v)2 , is equivalent
to a linear SVM in the feature space (1, x1 , x2 , x21 , x22 , x1 x2 ) and hence that SVMs with this kernel
can separate any elliptic region from the rest of the plane.
The (axis-aligned) ellipse equation expands into six terms

0 = cx21 + dx22 − 2acx1 − 2bdx2 + (a2 c + b2 d − 1)

corresponding to weights w = (2ac, 2bd, c, d, 0) and intercept a2 + b2 − r2 . This shows that an

elliptical boundary is linear in this feature space, allowing linear separability.
In fact, the four features x1 , x2 , x21 , x22 suffice for any axis-aligned ellipse.

2. (12 pts) Logistic regression is a method of fitting a probabilistic classifier that gives soft linear thresholds.
(See Russell & Norvig, Section 18.6.4.) It is common to use logistic regression with an objective function
consisting of the negative log probability of the data plus an L2 regularizer:
N
X 1
L(w) = − log + λ||w||22
i=1
1 + eyi (wT xi +b)

(Here w does not include the “extra” weight w0 .)

1
∂L
(a) Find the partial derivatives ∂wj .
1 ∂g(z) ∂ log(g(z))
First define the function g(z) = 1+e z . Note that ∂z = g(z)(1 − g(z)) and also ∂z =
1
g(z) g(z)(1 − g(z)) = (1 − g(z)). Then we get,

n
∂L X
=− yi xij (1 − g(yi (wT xi + b))) + 2λwj
∂wj i=1

∂2L
(b) Find the partial second derivatives ∂wj ∂wk .
n
∂2L X
= yi2 xij xik g(yi (wT xi + b))(1 − g(yi (wT xi + b))) + 2λδjk
∂wj ∂wk i=1

where δjk = 1 if i = j, 0 if i 6= j.
(c) From these results, show that L(w) is a convex function.
Hint: A function L is convex if its Hessian (the matrix H of second derivatives with elements
2
Hj,k = ∂w∂j ∂w
L
k
) is positive semi-definite (PSD). A matrix H is PSD if and only if
X
aT Ha ≡ aj ak Hj,k ≥ 0
j,k

for all real vectors a.

Applying the definition of PSD matrix to the Hessian of L(w) we get,
!
T
X ∂2L X X
a Ha = aj ak = aj ak yi2 xij xik g(yi (wT xi T
+ b))(1 − g(yi (w xi + b))) + 2λδjk(1)
∂wj ∂wk i
j,k j,k
X X
= aj ak yi2 xij xik g(yi (wT xi + b))(1 − g(yi (wT xi + b))) + 2λ a2j (2)
j,k,i j

√
Define Pi = g(yi (wT xi + b))(1 − g(yi (wT xi + b))) and ρij = yi xij Pi . Then,
X X X X X X
aT Ha = aj ak xij xik yi2 Pi + 2λ a2j = aT ρi ρTi a + 2λ a2j = (aT ρi )2 + 2λ a2j ≥ 0
j,k,i j i j i j

for λ ≥ 0.

3. (8 pts) Consider the following training data,

class x1 x2
+ 1 1
+ 2 2
+ 2 0
– 0 0
– 1 0
– 0 1

(a) Plot these six training points. Are the classes {+, −} linearly separable?
As seen in the plot the classes are linearly separable.

2
Figure 1: Question 3

(b) Construct the weight vector of the maximum margin hyperplane by inspection and identify the
support vectors.
The maximum margin hyperplane should have a slope of −1 and should satisfy x1 = 3/2, x2 = 0.
Therefore it’s equation is x1 + x2 = 3/2, and the weight vector is (1, 1)T .
(c) If you remove one of the support vectors does the size of the optimal margin decrease, stay the
same, or increase? In this specific dataset the optimal margin increases when we remove the
support vectors (1, 0) or (1, 1) and stays the same when we remove the other two.
(d) (Extra Credit) Is your answer to (c) also true for any dataset? Provide a counterexample or give
a short proof.
When we drop some constraints in a constrained maximization problem, we get an optimal value
which is at least as good the previous one. It is because the set of candidates satisfying the original
(larger, stronger) set of contraints is a subset of the candidates satisfying the new (smaller, weaker)
set of constraints. So, for the weaker constraints, the oldoptimal solution is still available and
there may be additions soltons that are even better. In mathematical form:

max f (x) ≤ max f (x) .

x∈A,x∈B x∈A

Finally, note that in SVM problems we are maximizing the margin subject to the constraints
given by training points. When we drop any of the constraints the margin can increase or stay
the same depending on the dataset. In general problems with realistic datasets it is expected
that the margin increases when we drop support vectors. The data in this problem is constructed
to demonstrate that when removing some constraints the margin can stay the same or increase
depending on the geometry.

4. (12 pts) Consider a dataset with 3 points in 1-D:

3
Figure 2: Question 4

(class) x
+ 0
– −1
– +1

(a) Are the classes {+, −} linearly separable?

Clearly the classes are not separable in 1 dimension.
√
(b) Consider mapping each point to 3-D using new feature vectors φ(x) = [1, 2x, x2 ]T . Are the
classes now linearly separable? If so, find
√ a separating
√ hyperplane.
The points are mapped to (1, 0, 0), (1, − 2, 1), (1, 2, 1) respectively. The points are now separa-
ble in 3-dimensional space. A separating hyperplane is given by the weight vector (0,0,1) in the
new space as seen in the figure.
(c) Define a class variable yi ∈ {−1, +1} which denotes the class of xi and let w = (w1 , w2 , w3 )T .
The max-margin SVM classifier solves the following problem
min 12 ||w||22 s.t. (3)
w,b

yi (wT φ(xi ) + b) ≥ 1, i = 1, 2, 3 (4)

Using the method of Lagrange multipliers show that the solution is ŵ = (0, 0, −2)T , b = 1 and the
margin is ||ŵ1|| .
2

For optimization problems with inequality constraints such as the above, we should apply KKT
conditions which is a generalization of Lagrange multipliers. However this problem can be solved
easier by noting that we have three vectors in the 3-dimensional space and all of them are support
vectors. Hence the all 3 constraints hold with equality. Therefore we can apply the method of
Lagrange multipliers to,
min 12 ||w||22 s.t. (5)
w,b

yi (wT φ(xi ) + b) = 1, i = 1, 2, 3 (6)

4
We have 3 constraints, and should have 3 Lagrange multipliers. We first form the Lagrangian
function L(w, λ) where λ = (λ1 , λ2 , λ3 ) as follows
3
1 X
L(w, λ) = ||w||22 + λi (yi (wT φ(xi ) + b) − 1) (7)
2 i=1

and differentiate with respect to optimization variables w and b and equate to zero,
3
∂L(w, λ) X
=w+ λi yi φ(xi ) = 0 (8)
∂w i=1
3
∂L(w, λ) X
= λi yi = 0 . (9)
∂b i=1

Using the data points φ(xi ), we get the following equations from the above lines,
w1 +λ1 − λ2 − λ3 = 0 (10)
√ √
w2 + 2λ2 − 2λ3 = 0 (11)
w3 −λ2 − λ3 = 0 (12)
λ1 −λ2 − λ3 = 0 (13)
(14)
Using (10) and (14) we get w1 = 0. Then plugging this to equality constraints in the optimization
problem, we get
b = 1 (15)
√
− 2w2 + w3 + b = −1 (16)
√
+ 2w2 + w3 + b = −1 (17)
(18)
(16) and (17) imply that w2 = 0, and w3 = −2. Therefore the optimal weights are ŵ = (0, 0, −2)T
and b = 1. And the margin is 1/2.
(d) Show that the solution remains the same if the constraints are changed to
yi (wT φ(xi ) + b) ≥ ρ, i = 1, 2, 3
for any ρ ≥ 1.
Note that changing the constraints in the solution of part (c) only changes equation (15-17) and we
get b = ρ, and ŵ = (0, 0, −2ρ)T . However the hyperplane described by the equation, ŵT x + b = 0,
remains the same as before: {x : −2ρx3 + ρ = 0} ≡ {x : −2x3 + 1 = 0}. Hence, we have the same
classifier in two cases: assign class label + if ŵT x + b ≥ 0 and assign class − otherwise.
(e) (Extra Credit) Is your answer to (d) also true for any dataset and ρ ≥ 1? Provide a counterexample
or give a short proof.
This is true for any dataset and it follows from the homogeneity of the optimization problem. For
constraints yi (wT φ(xi ) + b) ≥ ρ, we can define new weight vectors w̃ = w/ρ and b̃ = b/ρ. So
that the constraints in the new variables are yi (w̃T φ(xi ) + b̃) = 1. And equivalently optimize the
following,
min 12 ρ2 ||w̃||22 s.t. (19)
w̃,b̃

yi (w̃T φ(xi ) + b̃) ≥ 1, i = 1, 2, 3 (20)

2
Since ρ is a constant multiplying the objective function ||w̃||22 ,
it does not change the optimal
value, and the two solutions describe the same classifier w x + b ≥ 0 ≡ ρw̃T x + ρb̃ ≥ 0.
T

Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
Disk Management Systems
No ratings yet
Disk Management Systems
46 pages
Module 4 - 3 Bhargavi
No ratings yet
Module 4 - 3 Bhargavi
56 pages
Support Vector Machine
No ratings yet
Support Vector Machine
49 pages
Y6 Autumn Block 1 WO1 Numbers To 10 Million 2019
No ratings yet
Y6 Autumn Block 1 WO1 Numbers To 10 Million 2019
2 pages
GenMath11 - Q1 - Mod9 - Intercepts Zeroes and Asymptotes of Functions - 08082020
No ratings yet
GenMath11 - Q1 - Mod9 - Intercepts Zeroes and Asymptotes of Functions - 08082020
39 pages
ABM Specialized Subjects
No ratings yet
ABM Specialized Subjects
2 pages
Chapter 8
No ratings yet
Chapter 8
52 pages
7 SVM For Scientists Annotated
No ratings yet
7 SVM For Scientists Annotated
76 pages
Lecture 7 - SVM
No ratings yet
Lecture 7 - SVM
125 pages
Variable Elimination HMM ppt-1
No ratings yet
Variable Elimination HMM ppt-1
21 pages
10 SVM
No ratings yet
10 SVM
77 pages
Practice Questions Lec 18 45
No ratings yet
Practice Questions Lec 18 45
4 pages
MMP ppt-1
No ratings yet
MMP ppt-1
5 pages
Cryptography Project 001
No ratings yet
Cryptography Project 001
24 pages
ES Key
No ratings yet
ES Key
4 pages
Support Vector Machine
No ratings yet
Support Vector Machine
46 pages
Quiz 3
No ratings yet
Quiz 3
12 pages
Lec15 16
No ratings yet
Lec15 16
35 pages
HW 6
No ratings yet
HW 6
5 pages
SVM 30thoct Annotated
No ratings yet
SVM 30thoct Annotated
35 pages
08 Odds Ends
No ratings yet
08 Odds Ends
27 pages
Lecture 8
No ratings yet
Lecture 8
15 pages
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
No ratings yet
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
28 pages
General Electives Life Skills Higher Nitec
No ratings yet
General Electives Life Skills Higher Nitec
3 pages
Mathematics F.Y.B.Sc - VSC Syllabus With Practicals 24-25 Edited
No ratings yet
Mathematics F.Y.B.Sc - VSC Syllabus With Practicals 24-25 Edited
12 pages
Support Vector Machines: Vibhav Gogate The University of Texas at Dallas
No ratings yet
Support Vector Machines: Vibhav Gogate The University of Texas at Dallas
36 pages
Midterm F02soln
No ratings yet
Midterm F02soln
14 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Permutation Combination - JEE Main 2023 April Chapterwise PYQ - MathonGo
No ratings yet
Permutation Combination - JEE Main 2023 April Chapterwise PYQ - MathonGo
9 pages
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
44 pages
36-Multi-Layer Perceptron and Its Properties-30-10-2024
No ratings yet
36-Multi-Layer Perceptron and Its Properties-30-10-2024
39 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Random Variables and Mathematical Expectations - Lecture 13 Notes
No ratings yet
Random Variables and Mathematical Expectations - Lecture 13 Notes
9 pages
Set 4-Math-ClassV
No ratings yet
Set 4-Math-ClassV
7 pages
1 Number 1: Support Vector Machine: 1.1 Case 1: Linear Separable Binary Classification
No ratings yet
1 Number 1: Support Vector Machine: 1.1 Case 1: Linear Separable Binary Classification
11 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
ASU Assignment2 Sol
No ratings yet
ASU Assignment2 Sol
8 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
MFP1501 Assignment 2 2024
No ratings yet
MFP1501 Assignment 2 2024
16 pages
Lecture 9 - SVMs
No ratings yet
Lecture 9 - SVMs
8 pages
Partial Differentiation
100% (1)
Partial Differentiation
5 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
12th Maths Marking Scheme
No ratings yet
12th Maths Marking Scheme
6 pages
Introduction To Probability PPT 1 Final
No ratings yet
Introduction To Probability PPT 1 Final
71 pages
Epfl Machine Learning Final Exam 2021 Solutions
No ratings yet
Epfl Machine Learning Final Exam 2021 Solutions
21 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
SVM Seminarbericht Hofmann
No ratings yet
SVM Seminarbericht Hofmann
16 pages
A.2.2Solving Absolute Values
No ratings yet
A.2.2Solving Absolute Values
7 pages
HW 3
No ratings yet
HW 3
7 pages
2017-18-I MS Key
No ratings yet
2017-18-I MS Key
6 pages
Support Vector Machines
No ratings yet
Support Vector Machines
32 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
Midterm 2010 F
No ratings yet
Midterm 2010 F
15 pages
189 Cheat Sheet Nominicards PDF
No ratings yet
189 Cheat Sheet Nominicards PDF
2 pages
Algorithm and Flow Chart
No ratings yet
Algorithm and Flow Chart
20 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
8 pages
Endsem ML Makeup AK - 1
No ratings yet
Endsem ML Makeup AK - 1
7 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
Engineering Mathematics I
No ratings yet
Engineering Mathematics I
4 pages
Electro Mag Coursera
No ratings yet
Electro Mag Coursera
8 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
An Introduction To Support Vector Machines
No ratings yet
An Introduction To Support Vector Machines
13 pages
Add Maths Year 10
No ratings yet
Add Maths Year 10
17 pages
NNLS1 2019 HW4 Solutions
No ratings yet
NNLS1 2019 HW4 Solutions
11 pages
ML ES 23-24-II Key
No ratings yet
ML ES 23-24-II Key
4 pages
SVM
No ratings yet
SVM
36 pages
Learning-Plan-GENERAL MATHEMATICS 11 Lesson 9 and 10
No ratings yet
Learning-Plan-GENERAL MATHEMATICS 11 Lesson 9 and 10
4 pages
Domain Testing
No ratings yet
Domain Testing
12 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Final F04soln
No ratings yet
Final F04soln
10 pages
A Level Further Mathematics For AQA - Student Book 1
50% (2)
A Level Further Mathematics For AQA - Student Book 1
31 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
Detailed Lesson Plan in General Mathematics
No ratings yet
Detailed Lesson Plan in General Mathematics
7 pages
Test Yourself: Example 1.3.7 Equality of Functions
No ratings yet
Test Yourself: Example 1.3.7 Equality of Functions
2 pages
hw3 Solutions PDF
No ratings yet
hw3 Solutions PDF
11 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
CBSE Class 11th Mathematics Sample Ebook
No ratings yet
CBSE Class 11th Mathematics Sample Ebook
21 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
Dis11 Sol
No ratings yet
Dis11 Sol
5 pages
F DP DT DP FDT
No ratings yet
F DP DT DP FDT
3 pages
Report 1
No ratings yet
Report 1
6 pages
Maths
No ratings yet
Maths
5 pages
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
4 pages
189 Cheat Sheet Minicards
No ratings yet
189 Cheat Sheet Minicards
2 pages
Ist Q Math 9
100% (1)
Ist Q Math 9
2 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

SVM Problems1

Uploaded by

SVM Problems1

Uploaded by

CS 194-10, Fall 2011

1. (8 pts) In this question we briefly review the expressiveness of kernels.

0 = x21 + x22 − 2ax1 − 2bx2 + (a2 + b2 − r2 )

0 = cx21 + dx22 − 2acx1 − 2bdx2 + (a2 c + b2 d − 1)

corresponding to weights w = (2ac, 2bd, c, d, 0) and intercept a2 + b2 − r2 . This shows that an

(Here w does not include the “extra” weight w0 .)

for all real vectors a.

3. (8 pts) Consider the following training data,

max f (x) ≤ max f (x) .

4. (12 pts) Consider a dataset with 3 points in 1-D:

(a) Are the classes {+, −} linearly separable?

yi (wT φ(xi ) + b) ≥ 1, i = 1, 2, 3 (4)

yi (wT φ(xi ) + b) = 1, i = 1, 2, 3 (6)

yi (w̃T φ(xi ) + b̃) ≥ 1, i = 1, 2, 3 (20)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.