0% found this document useful (0 votes)

43 views11 pages

NNLS1 2019 HW4 Solutions

This document contains solutions to homework problems related to neural networks and learning systems. Problem 6.1 discusses finding the optimal separating hyperplane between two classes of linearly separable data. Problem 6.3 formulates the Lagrangian dual of the soft margin SVM optimization problem. Problem 6.11 begins discussing positive semidefinite matrices in the context of joint probability density functions over product spaces.

Uploaded by

ADSTE ETW

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views11 pages

NNLS1 2019 HW4 Solutions

Uploaded by

ADSTE ETW

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Homework #4 solution key

Prayag
Neural networks and learning systems-I
April 23, 2019

Problem 6.1.

Solution. The given data x is linearly separable and the separating hyperplane is given by
wT x + b = 0 where w denotes the weight vector and b denotes the bias. The hyperplane is
said to correspond to a canonical pair (w, b) if for the set of input patters {xi }N
i=1 satisfies

min |wT xi + b| = 1. (1)

i=1,2,...,N

Let wT xi + b = g(xi ) where yi gives the distance of the input data xi from the separating
hyperplane. We know that any point xi can be decomposed into two components as given
below:
w
xi = xp + r
kwk
where xp is the normal projection of the point x on to the hyperplane and r is the distance
of the data point from the hyperplane.

T w
g(xi ) = w xp + r +b
kwk
wT w
= wT xp + r +b
kwk
= wT xp + b + rkwk
= g(xp ) + rkwk
= rkwk Since g(xp ) = 0.
g(xi )
=⇒ r=
kwk
From (1), we know that there exists at least one xi such that wT xi + b = 1 or wT xi + b = −1.
Therefore g(xi ) = ±1. Therefore
(
1
kwk
if Class 1
r= 1
− kwk if Class -1
2
The optimal separation between the two classes is given by 2r = kwk
.

1
Problem 6.3.
Solution. Given problem:
N
1 T X
min w w+C ζi
2 i=1
subject to ζi ≥ 0 ∀i = 1, 2, . . . , N
di (wT xi + b) ≥ 1 − ζi ∀i = 1, 2, . . . , N
Writing this in the standard form to write the Lagrange, we get
N
1 T X
min w w+C ζi
2 i=1
subject to ζi ≥ 0 ∀i = 1, 2, . . . , N
di (wT xi + b) − 1 + ζi ≥ 0 ∀i = 1, 2, . . . , N
The Lagrange can now be written as follows using the Lagrange multipliers λi and αi as
N N N
1 T X X X
αi di (wT xi + b) − 1 + ζi

L= w w+C ζi − λi ζi −
2 i=1 i=1 i=1
N N N N N N
1 T X X X X X X
L= w w+C ζi − λi ζi − αi di wT xi − αi di b + αi − αi ζi
2 i=1 i=1 i=1 i=1 i=1 i=1

Differentiating the Lagrange and equating to zero, we get

N N
∂L X X
= 0 =⇒ w − αi di xi = 0 =⇒ w = αi di xi
∂w i=1 i=1
N
∂L X
= 0 =⇒ αi di = 0
∂b i=1
∂L
= 0 =⇒ C − λi − αi = 0 =⇒ C = λi + αi
∂ζi
Substituting the above in the Lagrange, we get
N N N N N N N N
1 XX X X XX X X
L= αi αj di dj xT x
i j + (λ i + α )ζ
i i − λ ζ
i i − α α d d x T
i j i j i jx − α d
i i b + αi
2 i=1 j=1 i=1 i=1 i=1 j=1 i=1 i=1
N
X
− αi ζi
i=1

:0

0
XN X N
" N N N # N N
1 X XX X X
T
= −1 αi αj di dj xi xj + (λi +α −
i )
λi − αi ζi − b αi di + αi
2 i=1 j=1
i=1 i=1 i=1 i=1 i=1

N X
N N
1 X X
= − αi αj di dj xT
i xj + αi
2 i=1 j=1 i=1

2
From this, the dual can be written as follows
N N N
X 1 XX
max αi − α i α j di dj xT
i xj
i=1
2 i=1 j=1
N

P
αi di = 0 


i=1



subject to C − λi − αi = 0 ∀i = 1, 2, . . . , N

αi ≥ 0 



λi ≥ 0


We observe that the Lagrange multiplier λi appears only in the constraint C − λi − αi =⇒

λi = C − αi . For λi ≥ 0 to be true, C − αi ≥ 0 =⇒ C ≥ αi . Combining this with αi ≥ 0
constraint, we get 0 ≤ αi ≤ C. The dual problem can now be written as
N N N
X 1 XX
max αi − α i α j di dj xT
i xj
i=1
2 i=1 j=1

PN 
αi di = 0
subject to i=1 ∀i = 1, 2, . . . , N

0 ≤ αi ≤ C 

Problem 6.11.

Solution. It is given that a joint probability density function pX1 ,X2 (x1 , x2 ) over an H-by-
H product space is said to be a P-matrix provided it satisfies finitely positive semidefinite
property. The matrix P will be positive semidefinite if for every non-zero column vector z,
the value obtained from zT P z is positive or zero.
Let us consider the simple case of two-element set X = [X1 , X2 ] of random variables.
Case 1: Are all P -kernels joint distributions?
P Pa given P −kernel P (x, y), we can generate an identical kernel the P̂ -kernel if it satisfies
From
P (x, y) = C, where C is some constant such that C < ∞. We can define the P̂ -
x∈X y∈X
kernels as P̂ (x, y) = C1 P (x, y). This definition satisfies the properties of a P-matrix since we
have only scaled the elements. Since P̂ (x, y) is also a joint distribution, we can say that all
P -kernels are joint distributions.
Case 2: Are all joint distributions P -kernels?
Considering the two element case, let us create a joint distribution and verify if it satisfies
the properties of a P -kernel. The joint probability matrix for a two element case would be
given by
p(x1 , x1 ) p(x1 , x2 )
PX,Y =
p(x2 , x1 ) p(x2 , x2 )

3
Considering a particular case where p(x1 , x1 ) = 0, p(x1 , x2 ) = 0.5, p(x2 , x1 ) = 0.5, p(x2 , x2 ) =
0, we get the
0 0.5
PX,Y =
0.5 0
Solving for the eigenvalues, we get λ = ±0.5. From the given definition, the P-matrix must
be positive semidefinite, but an eigenvalue in the above case is negative. Therefore not all
joint distributions are P -kernels.

Problem 6.21.
Solution. Given k(xi , .) and k(xj , .) denote a pain of kernels, where i, j = 1, 2, . . . , N and
the vectors have the same dimensionality. We need to show that
hk(xi , .), k(xj , .)i = k(xi , xj ). (2)
Let f (.) and g(.) be two functions defined over a vector space F such that
N
X
f (.) = ai k(xi , .), (3)
i=1
XN
g(.) = bj k(xj , .) (4)
j=1

where k(x, .) is a mercer kernel. We can write

N
X
f (xj ) = ai k(xi , xj ), (5)
i=1
XN
g(xi ) = bj k(xj , xi ). (6)
j=1

We know that
k(xi , xj ) = φT (xi )φ(xj ) (7)
Using (7) in (5) and (6), we get
N
X
f (xj ) = ai φT (xi )φ(xj ), (8)
i=1
XN
g(xi ) = bj φT (xi )φ(xj ). (9)
j=1

From the above, we obtain the functions below

N
X
f (.) = ai φ(xi ), (10)
i=1
XN
g(.) = bj φ(xj ) (11)
j=1

4
Taking the inner product using (10) and (11), we get

N
!T N
!
X X
hf, gi = ai φ(xi ) bj φ(xj )
i=1 j=1
N X
X N
hf, gi = ai bj φT (xi )φ(xj )
i=1 j=1
N X
X N
hf, gi = ai bj k(xi , xj ) (12)
i=1 j=1

Taking the inner product using (3) and (4), we get

XN N
X
hf, gi =h ai k(xi , .), bj k(xj , .)i
i=1 j=1
N X
X N
hf, gi = ai bj hk(xi , .)k(xj , .)i (13)
i=1 j=1

Comparing (12) and (13), we get hk(xi , .), k(xj , .)i = k(xi , xj ).

Problem 6.25.

Solution. (a) Generation of data set of three concentric circles with the radii as mentioned
in the question. The data generated is as below.

(b) The support machine was trained with C = 500 and the decision boundary obtained
is as given below.

(c) The network was tested and an accuracy of 68% was obtained. We could argue that
the value of C might play a role in the accuracy of the SVM.

5
(d) The network was trained with C = 100 and C = 2500. The decision boundaries
obtained are as given below.
It is observed that for the case of C = 100, the network accuracy was 62% and for
C = 2500, the network accuracy was 64%.

Problem 2.

Solution. Given the kernel K(x, .) = tanh(β0 xT x + β1 ) for x ∈ Rd . Mercer’s theorem is

satisfied only for some choices of β0 and β1 .
Let us first eliminate the conditions where it fails to be a Mercer kernel.

• Consider β0 < 0 and β1 < 0.

Since xT x is an inner product, it is always positive. Therefore, for the above case,
β0 xT x + β1 < 0
K(x, .) = tanh(β0 xT x + β1 ) < 0.
But we know that K(x, .) cannot take negative values. Therefore for the condition
β0 < 0 and β1 < 0, the kernel is not valid.

• Consider β0 < 0 and β1 > 0.

For the kernel to be valid, β0 xT x + β1 > 0. It is given that β0 < 0 =⇒ β0 xT x < 0.

6
Therefore, for the kernel to be valid, the condition to be satisfied is β1 > −β0 xT x.
However, we cannot comment on whether it satisfies Mercer’s theorem or not.

We know that Mercer’s theorem for polynomial type (xT x + 1)p always satisfy Mercer’s
theorem. Let us consider the Maclaurin series for tanh function.
x3 2x5 17x7
tanh(x) = x − + − + ···
3 15 315
Using the above, we get

(β0 xT x + β1 )3 2(β0 xT x + β1 )5
tanh(β0 xT x + β1 ) = (β0 xT x + β1 ) − + − ···
3 15
Assuming β0 xT x + β1 is a small value, we take the 1st order approximation of the function
to get
tanh(β0 xT x + β1 ) ≈ (β0 xT x + β1 )
Comparing this with the polynomial kernel, we see that for values β0 = 1, β1 = 1 and p = 1,
the kernel satisfies the Mercer’s theorem.

tanh(xT x + 1) ≈ (xT x + 1)

Using
√ the above idea, for positive values of β0 , we can define a new variable such that
x̃ = β0 x. Taking the inner product, we see that
p p p p
x̃T x̃ = ( β0 x)T ( β0 x) = β0 β0 xT x = β0 xT x

Therefore, we can approximate any positive β0 using the above method. To see how this
works, let us consider an example of x to be a two element vector.

x
β0 x x = β0 [x1 x2 ] 1 = β0 (x21 + x22 )
T
x2
hp p i √β x
T
x̃ x̃ = β0 x1 β0 x2 √ 0 1 = β0 x21 + β0 x22 = β0 (x21 + x22 )
β 0 x2

Therefore, for any β0 > 0 and β1 = 1, we can approximate

tanh(β0 xT x + 1) ≈ (x̃T x̃ + 1)

The above solution relies on the assumption that β0 xT x + β1 is small. Therefore, for β0 > 0
and β1 < 0, it would be a better approximation of the kernel as compared to the case of
β0 > 0 and β1 > 0. The observations have been summarized in the table below.
These observations are in line with the theoretic proofs obtained in the paper “A study
on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods.”
by Lin, H.T. and Lin, C.J. Their results are as given below.
Therefore, we see that the tanh(.) kernel satisfies the Mercer’s theorem better when
β0 > 0 and β1 < 0.

7
β0 β1 Observations
+ - A good approximation of the Mercer kernel as β0 xT x + β1 is small
+ + Not as good approximation as β0 xT x + β1 is larger than above
- + Valid kernel only when β1 > −β0 xT x, otherwise invalid
- - Not a valid kernel

β0 β1 Results
+ - Kernel is conditionally positive semidefinite for small β1 , and is similar to RBF for small β0
+ + In general not as good as the (+, −) case
- + Objective value of a function becomes −∞ after β1 is large
- - Easily the objective value of the function becomes −∞

Problem 3.

Solution. Given problem:

(
|d − y| − , |d − y| ≥
L (d, y) =
0, else
N −1
1 X
min L (di , yi )
N i=0

kwk2 ≤ c0 

T

di − w φ(xi ) ≤ + ζi 



0
subject to wT φ(xi ) − di ≤ + ζi ∀i = 0, 1, 2, . . . , N − 1.

ζi ≥0 



0

ζi ≥0


Substituting yi = wT φ(xi ) and writing the above in the standard form to write the Lagrange,
we get
N −1
1 X
min L (di , yi )
N i=0

c0 − kwk2 ≥ 0

+ ζi − di + wT φ(xi )

≥ 0



0
subject to + ζi − wT φ(xi ) + di ≥0 ∀i = 0, 1, 2, . . . , N − 1.

ζi ≥ 0



0

ζi ≥0


8
The primal can be set up using the Lagrange as
N −1 −1
1 X 0

2
N
X
βi + ζi − di + wT φ(xi )

L= ζi + ζi − α c0 − kwk −
N i=0 i=0
N −1 N −1 N −1
0 0 0 0
X X X
T
− βi + ζi − w φ(xi ) + di − γi ζi − γi ζi (14)
i=0 i=0 i=0

0 0
where α, βi , βi , γi , and γi are the Lagrangian multipliers. Differentiating the Lagrange and
equating to zero, we get
N −1 N −1 N −1
∂L X X 0 1 X 0

= 0 =⇒ 2αw − βi φ(xi ) + βi φ(xi ) = 0 =⇒ w = βi − βi φ(xi )
∂w i=0 i=0
2α i=0
∂L 1 1
= 0 =⇒ − βi − γi = 0 =⇒ βi + γi =
∂ζi N N
0
∂L 1 0 0 0 0 1
= 0 =⇒ − βi − γi = 0 =⇒ βi + γi =
∂ζi N N

Grouping similar terms in (14), we get

N −1 N −1 N −1
X 1 X 1 0 0 0 2
X 0

L= − βi − γi ζi + − βi − γi ζi − αc0 + αkwk − βi − βi wT φ(xi )
i=0
N i=0
N i=0
N −1 N −1
0 0
X X
− (βi + βi ) + (βi + βi )di (15)
i=0 i=0

9
Substituting the above values, we get
N −1 N −1
0 0 0 0 0
X X
L= (βi + γi − βi − γi ) ζi + βi + γi − βi − γi ζi − αc0
i=0 i=0
N −1 N −1
α X X 0
0

+ βi − βi βj − βj φ(xi )T φ(xj )
4α2 i=0 j=0
N −1 N −1
1 XX 0
0

− βi − βi βj − βj φ(xi )T φ(xj )
2α i=0 j=0
N −1 N −1
0 0
X X
− (βi + βi ) + (βi + βi )di
i=0 i=0
−1 :0 :0
N
! N −1
!
X X 0
0 0 0
0
= βi+γ
i −
β

i − γi ζi + βi+γ
i −
β i − γi ζi − αc0

i=0 i=0
N −1 N −1
1 1 X X 0
0

+ − βi − βi βj − βj φ(xi )T φ(xj )
4α 2α i=0 j=0
N −1 N −1
0 0
X X
− (βi + βi ) + (βi + βi )di
i=0 i=0
N −1 N −1
1 XX 0
0

= − αc0 − βi − βi βj − βj φ(xi )T φ(xj )
4α i=0 j=0
N −1 N −1
0 0
X X
− (βi + βi ) + (βi + βi )di
i=0 i=0

From the above, the dual can be written as

N −1 N −1
1 XX 0
0

max − αc0 − βi − βi βj − βj φ(xi )T φ(xj )
4α i=0 j=0
N −1 N −1
0 0
X X
− (βi + βi ) + (βi + βi )di
i=0 i=0
1

βi + γi = N
0 0 1

βi + γi = N



α≥0




subject to βi ≥ 0 ∀i = 0, 1, 2, . . . , N − 1
0

βi ≥ 0





γi ≥ 0




0

γi ≥ 0


10
0
We observe that the Lagrange multiplier γi and γi appear only in the constraint βi + γi = N1
0 0 0
and βi + γi = N1 respectively. For γi ≥ 0 to be true, N1 − βi ≥ 0 =⇒ N1 ≥ βi and for γi ≥ 0
0 0 0
to be true N1 − βi ≥ 0 =⇒ N1 ≥ βi . Combining this with βi ≥ 0 and βi ≥ 0 constraint, we
0
get 0 ≤ βi ≤ N1 and 0 ≤ βi ≤ N1 respectively. The dual problem can now be written as
N −1 N −1
1 XX 0
0

max − αc0 − βi − βi βj − βj φ(xi )T φ(xj )
4α i=0 j=0
N −1 N −1
0 0
X X
− (βi + βi ) + (βi + βi )di
i=0 i=0

α≥0 

1
subject to 0 ≤ βi ≤ N ∀i = 0, 1, 2, . . . , N − 1.
0 1

0 ≤ βi ≤ N

Linear Equations:Inequalities #2
No ratings yet
Linear Equations:Inequalities #2
26 pages
Mathsqp 24
No ratings yet
Mathsqp 24
68 pages
MF Compre Regular Sol
No ratings yet
MF Compre Regular Sol
10 pages
Week2 - Common Procedure FEM
No ratings yet
Week2 - Common Procedure FEM
32 pages
2023 Hw1sol
No ratings yet
2023 Hw1sol
10 pages
Lec Continuous Least Squares Annotated
No ratings yet
Lec Continuous Least Squares Annotated
31 pages
Wainwrightslides 1
No ratings yet
Wainwrightslides 1
67 pages
MAI391
No ratings yet
MAI391
52 pages
A Journey From Linear Algebra To Machine Learning
No ratings yet
A Journey From Linear Algebra To Machine Learning
50 pages
PhysRevD 35 3621
No ratings yet
PhysRevD 35 3621
11 pages
Index: Chapter 1: Equations & Inequations (1-70)
100% (1)
Index: Chapter 1: Equations & Inequations (1-70)
20 pages
MLSlides4 Selected Shared
No ratings yet
MLSlides4 Selected Shared
35 pages
MLF Combined
No ratings yet
MLF Combined
84 pages
Dommel and Pichler - 2024 - On The Approximation of Kernel Functions
No ratings yet
Dommel and Pichler - 2024 - On The Approximation of Kernel Functions
27 pages
Quantum CAT
36% (14)
Quantum CAT
1 page
Chapter 7 Review and Answers
No ratings yet
Chapter 7 Review and Answers
22 pages
ML Lecture06 2
No ratings yet
ML Lecture06 2
63 pages
Relations and Function - Watermark-1
No ratings yet
Relations and Function - Watermark-1
28 pages
HKIMO 2024 Secondary 3 and 4 (Algebra)
No ratings yet
HKIMO 2024 Secondary 3 and 4 (Algebra)
4 pages
Bessel Functions of The First Kind
No ratings yet
Bessel Functions of The First Kind
5 pages
Lecture Notes On Problem Discretization Using Approximation Theory
No ratings yet
Lecture Notes On Problem Discretization Using Approximation Theory
73 pages
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
No ratings yet
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
20 pages
Trigonometry
100% (1)
Trigonometry
32 pages
Of For: New Proof Giorgi's Theorem Concerning The Problem Elliptic Differential Equations
No ratings yet
Of For: New Proof Giorgi's Theorem Concerning The Problem Elliptic Differential Equations
12 pages
6.4 Normal and Self-Adjoint Operators
0% (1)
6.4 Normal and Self-Adjoint Operators
11 pages
Chapter 14
No ratings yet
Chapter 14
10 pages
578 ch6
No ratings yet
578 ch6
83 pages
Assignment 2 Solutions
100% (1)
Assignment 2 Solutions
10 pages
MA4006: Exercise Sheet 4: Solutions
No ratings yet
MA4006: Exercise Sheet 4: Solutions
12 pages
Advanced Machine Learning
No ratings yet
Advanced Machine Learning
74 pages
2021 UNAS REFER Rafi Yon Saputra 173112706420242 Kernel Primer
No ratings yet
2021 UNAS REFER Rafi Yon Saputra 173112706420242 Kernel Primer
65 pages
HW 3
No ratings yet
HW 3
7 pages
Signal Processing
No ratings yet
Signal Processing
85 pages
M.Sc. AM C & S
No ratings yet
M.Sc. AM C & S
46 pages
cs229 Notes3
No ratings yet
cs229 Notes3
30 pages
Numerical Methods: Dr. Charisma Choudhury
No ratings yet
Numerical Methods: Dr. Charisma Choudhury
14 pages
Higher Integration
No ratings yet
Higher Integration
53 pages
Math Review For ML
No ratings yet
Math Review For ML
41 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
Endsem ML Makeup AK - 1
No ratings yet
Endsem ML Makeup AK - 1
7 pages
ML MS 24-25-II Key
No ratings yet
ML MS 24-25-II Key
4 pages
CHP 3
No ratings yet
CHP 3
6 pages
Low Complexity Adaptive Algorithm For Generalized Eigenvalue Decomposition
No ratings yet
Low Complexity Adaptive Algorithm For Generalized Eigenvalue Decomposition
4 pages
A Frame-Free Formulation of Elasticity
No ratings yet
A Frame-Free Formulation of Elasticity
16 pages
Solutions For Practice Set
No ratings yet
Solutions For Practice Set
7 pages
Midtermsols Sp2010
No ratings yet
Midtermsols Sp2010
6 pages
Support Vector Machine SVM
No ratings yet
Support Vector Machine SVM
58 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
2017-18-I MS Key
No ratings yet
2017-18-I MS Key
6 pages
3 Manifolds As Branched Coverings
No ratings yet
3 Manifolds As Branched Coverings
18 pages
CS209 Practice Problems 1 ML
No ratings yet
CS209 Practice Problems 1 ML
4 pages
Mathematical Reasoning: Solutions
No ratings yet
Mathematical Reasoning: Solutions
16 pages
HW1 Solution
No ratings yet
HW1 Solution
7 pages
28.05.2018 Math152 Final Exam
No ratings yet
28.05.2018 Math152 Final Exam
6 pages
Density Matrix Solutions
100% (1)
Density Matrix Solutions
21 pages
18.657: Mathematics of Machine Learning: N I I H H I 1
No ratings yet
18.657: Mathematics of Machine Learning: N I I H H I 1
6 pages
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
16 pages
Maths Class 10th Cbse MCQ 07
No ratings yet
Maths Class 10th Cbse MCQ 07
2 pages
Fa 2
No ratings yet
Fa 2
7 pages
La Salle University Integrated School Lasallian Learning Module AY 2018-2019
No ratings yet
La Salle University Integrated School Lasallian Learning Module AY 2018-2019
34 pages
HK222 CLC Sample 3 2
No ratings yet
HK222 CLC Sample 3 2
5 pages
Practice 1130
No ratings yet
Practice 1130
20 pages
Amath/Math 516 Second Homework Set Linear Least Squares
No ratings yet
Amath/Math 516 Second Homework Set Linear Least Squares
6 pages
Homework For The Course "Advanced Learninig Models": 1 Neural Networks
No ratings yet
Homework For The Course "Advanced Learninig Models": 1 Neural Networks
10 pages
EE263s Homework 3
No ratings yet
EE263s Homework 3
12 pages
Problem Discretization Approximation Theory Revised
No ratings yet
Problem Discretization Approximation Theory Revised
76 pages
Note 6: EECS 189 Introduction To Machine Learning Fall 2020 1 Multivariate Gaussians
No ratings yet
Note 6: EECS 189 Introduction To Machine Learning Fall 2020 1 Multivariate Gaussians
9 pages
StableGaussianRBFs PDF
No ratings yet
StableGaussianRBFs PDF
22 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
45 pages
Lagrange Multivariate Interpolation
No ratings yet
Lagrange Multivariate Interpolation
9 pages
Xiaobo Chen. Offshore Hydrodynamics and Applications
No ratings yet
Xiaobo Chen. Offshore Hydrodynamics and Applications
19 pages
Partial Differential Equations in Rectangular Coordinates
No ratings yet
Partial Differential Equations in Rectangular Coordinates
5 pages
hw3 Solutions PDF
No ratings yet
hw3 Solutions PDF
11 pages
hw1 Sols
No ratings yet
hw1 Sols
4 pages
Massachusetts Institute of Technology: 6.867 Machine Learning, Fall 2006 Problem Set 4: Solutions
No ratings yet
Massachusetts Institute of Technology: 6.867 Machine Learning, Fall 2006 Problem Set 4: Solutions
8 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
HW 2 Sol
No ratings yet
HW 2 Sol
9 pages
First Periodic Test in Math 7: Pagalanggang National High School
No ratings yet
First Periodic Test in Math 7: Pagalanggang National High School
2 pages
Spring 2005 Solutions
No ratings yet
Spring 2005 Solutions
11 pages
Comparative Analysis of Methods For Computing and Electric Fields
No ratings yet
Comparative Analysis of Methods For Computing and Electric Fields
8 pages
Math 110: Linear Algebra Homework #11: David Zywina
No ratings yet
Math 110: Linear Algebra Homework #11: David Zywina
19 pages
Maple Lab Test
No ratings yet
Maple Lab Test
5 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
Midsem Regular MFDS 22-12-2019 Answer Key PDF
No ratings yet
Midsem Regular MFDS 22-12-2019 Answer Key PDF
5 pages
Worksheet - Logarithm SRM University
No ratings yet
Worksheet - Logarithm SRM University
3 pages
HW 9 Solutions
No ratings yet
HW 9 Solutions
7 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Linear Algebra Cheat Sheet
No ratings yet
Linear Algebra Cheat Sheet
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

NNLS1 2019 HW4 Solutions

Uploaded by

NNLS1 2019 HW4 Solutions

Uploaded by

Homework #4 solution key

min |wT xi + b| = 1. (1)

Differentiating the Lagrange and equating to zero, we get

We observe that the Lagrange multiplier λi appears only in the constraint C − λi − αi =⇒

where k(x, .) is a mercer kernel. We can write

From the above, we obtain the functions below

Taking the inner product using (3) and (4), we get

Solution. Given the kernel K(x, .) = tanh(β0 xT x + β1 ) for x ∈ Rd . Mercer’s theorem is

• Consider β0 < 0 and β1 < 0.

• Consider β0 < 0 and β1 > 0.

Therefore, for any β0 > 0 and β1 = 1, we can approximate

Solution. Given problem:

Grouping similar terms in (14), we get

From the above, the dual can be written as

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.