CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
Underfitting Overfitting
High bias, low variance Low bias, high variance
Increase # of Get more training data,
features or or reduce # of features
complexity of or complexity of model
model
Evaluation on “LARGE” data
0
3 Training set
2
5
1
Data
Testing set
Model Evaluation Step 2:
Build a model on a training set
THE PAST
Results Known
0
3 Training set
2
5
1
Data
Model Builder
Testing set
Model Evaluation Step 3:
Evaluate on test set
Results Known
0
3 Training set
2
5
1
Data
Model Builder
Evaluate
Predictions
3
Y N
4
1
Testing set 2
A note on parameter tuning
• It is important that the test data is not used in any way to build the
model
• Some learning schemes operate in two stages:
• Stage 1: builds the basic structure
• Stage 2: optimizes parameter settings
• The test data can’t be used for parameter tuning!
• Proper procedure uses three sets: training data, validation data, and
test data
• Validation data is used to optimize parameters
Evaluation on “small” data, 1
• The holdout method reserves a certain amount for testing and uses
the remainder for training
• Usually: one third for testing, the rest for training
• For “unbalanced” datasets, samples might not be representative
• Few or none instances of some classes
• Stratified sample: advanced version of balancing the data
• Make sure that each class is represented with approximately equal
proportions in both subsets
Evaluation on “small” data, 2
— Hold aside one group for testing and use the rest to
build model
Test
— Repeat
15
More on cross-validation
Relevant +
Relevant Retrieved Retrieved
Not Retrieved fn tn
Precision/Recall : Example
Positive Negative
Predicted Positive 1 1
Predicted Negative 8 90
37
Recall-Precision Graph
Multiple precision
for some recalls
38
Interpolation
39
Interpolation
Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Interpolated 1.0
Precision
40
Interpolation
Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Interpolated 1.0
Precision
41
Interpolation
Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
42
Interpolation
Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
43
Interpolation
Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
44
Interpolation
Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
45
Interpolation
Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Interpolated 1.0 1.0 1.0 0.67 0.67 0.5 0.5 0.5 0.5 0.5 0.5
Precision
46
Recap: Confusion matrix
Actual: Actual:
Positive Negative
Predicted: tp fp
Positive
Predicted: fn tn
Negative
Actual
P 150 40 FN P 250 45
N 60 250 N 5 200
FP Predicted Predicted
Cost matrix
P N
Accuracy: 80% Accuracy: 90%
Cost: 150x-1 + 40x100 + 60x1=3910 P -1 100 Cost: 250x-1 + 45x100 +5x1 = 4255
N 1 0
• If we are focusing on accuracy then we will go with the Model 2 (In this
case we need to compromise on cost) , however if we are focusing on
cost then we will go with the Model 1 (In this case we need to
compromise on accuracy).
Significance Testing