AIML Lec-14
AIML Lec-14
Reference
1. “Introduction to Statistical Learning” Hastie, Tibshirani
2. “Machine Learning” Pater Flach
3. Course on “Artificial Intelligence” by Patrick Winston, MIT (Lecture 16)
Maximum Margin
• Binary classification problem
• Need to draw decision boundary
• Draw decision boundary such
that
• The separation between ‘+’ and ‘-’
samples is maximum
• Training SVM is again an
optimization problem
SVM for linear decision boundary
• Assume vector 𝑤 which is
perpendicular with decision
boundary (Red dashed line)
• Consider unknown data sample
represented by vector 𝑢
• Project vector 𝑢 on 𝑤 using dot
product (𝑤. 𝑢) 𝑢
𝑦𝑖 = +1 if 𝑢 ∈ ′ + ′ 𝑤
𝑦𝑖 = −1 if 𝑢 ∈′ −′
SVM for linear decision boundary
Combining the following:
𝑤. 𝑥+ + 𝑏 ≥ 1 and 𝑦𝑖 = +1 if 𝑢 ∈ ′ + ′
We get: 𝑦𝑖 (𝑤. 𝑥+ + 𝑏) ≥ 1
Now, the distance between the lines (width of margin) can be calculated as
𝑤 𝑥+ . 𝑤 − 𝑥− . 𝑤 1−𝑏+1+𝑏 2
= 𝑥+ − 𝑥− . = = =
𝑤 𝑤 𝑤 𝑤
SVM for linear decision boundary
• Objective: Maximize the width of the margin between separating
lines of ‘+’ and ‘-’ samples.
2 1 2
• Maximize ≡ minimize 𝑤 ≡ minimize 𝑤
𝑤 2
• Constrain: 𝑦𝑖 𝑤. 𝑥Ԧ + 𝑏 − 1 = 0
• In order to solve constrained optimization problem we use Lagrangian
method. Hence we define objective function as:
1 2
𝐿= 𝑤 − ∑𝛼𝑖 [𝑦𝑖 𝑤. 𝑥𝑖 + 𝑏 − 1]
2
where 𝛼𝑖 are Lagrangian multipliers
SVM for linear decision boundary
1 2
Minimize 𝐿 = 𝑤 − ∑𝛼𝑖 [𝑦𝑖 𝑤. 𝑥𝑖 + 𝑏 − 1]
2
𝜕𝐿
= 𝑤 − ∑𝛼𝑖 𝑦𝑖 𝑥𝑖 = 0 …(2)
𝜕𝑤
𝜕𝐿
= −∑𝛼𝑖 𝑦𝑖 = 0 …(3)
𝜕𝑏
1
𝐿 = ∑𝛼𝑖 𝑦𝑖 𝑥𝑖 . ∑𝛼𝑗 𝑦𝑗 𝑥𝑗 − ∑𝛼𝑖 𝑦𝑖 𝑥𝑖 . ∑𝛼𝑗 𝑦𝑗 𝑥𝑗 − ∑𝛼𝑖 𝑦𝑖 𝑏 + ∑𝛼𝑖
2
1
= ∑𝛼𝑖 − ∑𝑖 ∑𝑗 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑗 𝑥𝑖 . 𝑥𝑗
2
SVM for non-linear decision boundary
• Decision Boundary: ∑𝛼𝑖 𝑦𝑖 𝑥𝑖 . 𝑥Ԧ + 𝑏 ≥ 1
Kernel
• Apply Kernel function (𝜙(𝑥𝑖 , 𝑥))
Ԧ to project data points to higher
dimensions.
1
• Objective function: L = ∑𝛼𝑖 − ∑𝑖 ∑𝑗 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑗 𝜙 𝑥𝑖 , 𝑥𝑗
2
• Some kernel functions:
• Dot product: 𝜙 𝑥𝑖 , 𝑥Ԧ = 𝑥𝑖 . 𝑥Ԧ
2
𝑥𝑖 −𝑥
−
• Radial Basis Function (RBF): 𝜙 𝑥𝑖 , 𝑥Ԧ = 𝑒 𝜎2