Lecture 10
Lecture 10
Lecture 10
Evaluation - II
[Evaluation Metrics]
Arpit Rana
12th August 2024
Experimental Evaluation of Learning Algorithms
Hypothesis Learner
Space 𝓗 (𝚪: S → h)
Final Hypothesis or
Model (h)
Evaluation Metrics
Common Measures
Experimental Evaluation of Learning Algorithms
● Error
● Accuracy
● Precision/ Recall
Measures for Regression Problems
● Non-differentiability
● Robustness (sensitivity to
outliers)
● Unit changes in MSE
Measures for Regression Problems
Where,
Measures for Classification Problems
True Class
Confusion (Actual)
Matrix
Positive Positive Negative Total
Hypothesized Class
True False
Positive Positive P’
(Predicted)
(TP) (FP)
Negative
False True
Negative Negative N’
(FN) (TN)
Total P N P+N
Measures for Classification Problems
True False
Positive Positive P’
(Predicted)
(TP) (FP)
Negative
False True
Negative Negative N’
(FN) (TN)
⍺ ∈ [0, 1] and 𝛽 ∈ [0, ∞]
For ⍺ = ½, 𝛽 = 1, F measure will be balanced
Total P N P+N and is known as F1 measure.
Measures for Classification Problems
What metric would you use to measure the performance of the following classifiers.
● Images are ranked by their classifier (whether the image is 5 or not) score.
● The higher the threshold, the lower the recall, but (in general) the higher the precision.
Precision/Recall Trade-off
● The receiver operating characteristic (ROC) curve is another common tool used with
binary classifiers.
○ the ROC curve plots the true positive rate (TPR, another name for recall or
sensitivity) against the false positive rate (FPR, 1-specificity).
The ROC Curve
● One way to compare classifiers is to measure the area under the curve (AUC).
● A perfect classifier will have a ROC AUC equal to 1, whereas a purely random classifier
will have a ROC AUC equal to 0.5.
Note: As a rule of thumb, you should prefer the PR (precision-recall) curve whenever the
positive class is rare or when you care more about the false positives than the false negatives,
and the ROC curve otherwise.
Next lecture Loss Functions
13th August 2024