Lab 4 - Support Vector Machines: Part B
Lab 4 - Support Vector Machines: Part B
20
Part B
1. Perform the steps listed below, include screen shots.
2. Show outputs of each of the commands
3. Also answer the questions.
margins. For example, a small cost creates a large margin (a soft margin) and
allows more misclassifications.
• Keeping γ =1, Increase the cost to 1000, and then 100000. Show the plot and
report what differences you have observed. What did you observe about the
effect of cost?
Ans.
Cost= 1000
Cost= 100000
When we increase the cost from 1 to 1000 the algorithm is penalised 1000
times more for every misclassified point. The number of misclassified points
Shahima Khan I071 25.07.20
• Also, what difference did you observe in the number of support vectors?
Ans. When cost is 1000, number of Support Vectors is: 15 ( 7 8 )
When the cost is 100000 number of Support Vectors is: 16 ( 7 9 )
• Keeping cost=1, change gamma to 0.5 and then 10. Show the plot and report
what differences you have observed. What did you observe about the effect of
gamma?
Ans. When gamma is a large value like 10 for one misclassified point the
decision boundary will completely change itself and won’t generalise well
to new points. It will remove the effect of large margin classification that
SVM has because of just a few points. When gamma is small a few points won’t make much
difference to the decision boundary.
Gamma=0.5
Gamma=10
• Also, what difference did you observe in the number of support vectors?
Ans. When gamma is 0.5, number of Support Vectors: 30 ( 15 15 )
Shahima Khan I071 25.07.20
• perform cross-validation using tune() to select the best choice of γ and cost for an
SVM with a radial kernel
• Study the tune() function. What is the purpose of the parameters method,
train.x, train.y, ranges
Ans.
tune() function: This generic function tunes hyperparameters of statistical
methods using a grid search over supplied parameter ranges.
Method: Either the function to be tuned, or a character string naming such a
function.
train.x: Either a formula or a matrix of predictors.
train.y: The response variable if train.x is a predictor matrix. Ignored if train.x is
a formula.
Ranges: a named list of parameter vectors spanning the sampling space. The
vectors will usually be created by seq.
• Report the sampling method and the values of the best parameters.
Ans.
Sampling Method: Fold Cross Validation
Best Parameters: Cost=1; Gamma=0.5
STEP 4: Predict
Shahima Khan I071 25.07.20