0% found this document useful (0 votes)
78 views5 pages

Lab 4 - Support Vector Machines: Part B

This document describes steps to perform support vector machine (SVM) classification on sample data using R. It trains an SVM model with a radial kernel on subsets of the data, experimenting with different cost and gamma parameters. Cross-validation is used to select the best parameters, which are reported. The trained model is then used to predict classes for the test subset, and the confusion matrix shows 87% of classes were predicted correctly. Key effects of cost and gamma are noted: higher cost reduces errors but risks overfitting, while higher gamma causes the decision boundary to change drastically for small differences in misclassified points.

Uploaded by

shahima khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views5 pages

Lab 4 - Support Vector Machines: Part B

This document describes steps to perform support vector machine (SVM) classification on sample data using R. It trains an SVM model with a radial kernel on subsets of the data, experimenting with different cost and gamma parameters. Cross-validation is used to select the best parameters, which are reported. The trained model is then used to predict classes for the test subset, and the confusion matrix shows 87% of classes were predicted correctly. Key effects of cost and gamma are noted: higher cost reduces errors but risks overfitting, while higher gamma causes the decision boundary to change drastically for small differences in misclassified points.

Uploaded by

shahima khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Shahima Khan I071 25.07.

20

Lab 4 – Support Vector Machines

Part B
1. Perform the steps listed below, include screen shots.
2. Show outputs of each of the commands
3. Also answer the questions.

STEP1: Library and data


library(e1071)
set.seed(1)
x=matrix(rnorm(200*2) , ncol=2)
x[1:100,]=x[1:100,]+2
x[101:150,]=x[101:150,]-2
y=c(rep(1,150),rep(2,50))
dat=data.frame(x=x,y=as.factor(y))

STEP 2: Train SVM and experiment with parameters


train=sample(200,100)
svmfit=svm(y~.,data=dat[train,],kernel="radial",gamma=1,cost=1)
plot(svmfit,dat[train,])

• Check what is the purpose of cost in the function svm().


Ans. The SVM model has a cost function, which controls training errors and
Shahima Khan I071 25.07.20

margins. For example, a small cost creates a large margin (a soft margin) and
allows more misclassifications.

• Keeping γ =1, Increase the cost to 1000, and then 100000. Show the plot and
report what differences you have observed. What did you observe about the
effect of cost?
Ans.

Cost= 1000

Cost= 100000

When we increase the cost from 1 to 1000 the algorithm is penalised 1000
times more for every misclassified point. The number of misclassified points
Shahima Khan I071 25.07.20

therefore decreases. Similarly, when algorithm has cost = 100000 it


classifies almost all points in the training set correctly but a clear case of
overfitting has happened which is why it may not work well on new data
points.

• Also, what difference did you observe in the number of support vectors?
Ans. When cost is 1000, number of Support Vectors is: 15 ( 7 8 )
When the cost is 100000 number of Support Vectors is: 16 ( 7 9 )

• Keeping cost=1, change gamma to 0.5 and then 10. Show the plot and report
what differences you have observed. What did you observe about the effect of
gamma?
Ans. When gamma is a large value like 10 for one misclassified point the
decision boundary will completely change itself and won’t generalise well
to new points. It will remove the effect of large margin classification that
SVM has because of just a few points. When gamma is small a few points won’t make much
difference to the decision boundary.

Gamma=0.5

Gamma=10

• Also, what difference did you observe in the number of support vectors?
Ans. When gamma is 0.5, number of Support Vectors: 30 ( 15 15 )
Shahima Khan I071 25.07.20

When the gamma is 10, number of Support Vectors: 68 ( 42 26 )

STEP 3: Tune SVM, and experiment with parameters

• perform cross-validation using tune() to select the best choice of γ and cost for an
SVM with a radial kernel

• Study the tune() function. What is the purpose of the parameters method,
train.x, train.y, ranges
Ans.
tune() function: This generic function tunes hyperparameters of statistical
methods using a grid search over supplied parameter ranges.
Method: Either the function to be tuned, or a character string naming such a
function.
train.x: Either a formula or a matrix of predictors.
train.y: The response variable if train.x is a predictor matrix. Ignored if train.x is
a formula.
Ranges: a named list of parameter vectors spanning the sampling space. The
vectors will usually be created by seq.

• Report the sampling method and the values of the best parameters.
Ans.
Sampling Method: Fold Cross Validation
Best Parameters: Cost=1; Gamma=0.5

STEP 4: Predict
Shahima Khan I071 25.07.20

• test set predictions for this model


• subset the dataframe dat using -train as an index set

• By looking at the confusion matrix, report what percentage of classes are


predicted correctly?
Ans.
Precision= TP/(TP+FP)
TP: 67
FP: 10
Therefore, precision = 0.8701
i.e. 87% of classes are predicted correctly.

Conclusion: Understood and successfully implemented SVM commands on R studio


software.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy