Evaluation Metrics:: Confusion Matrix
Evaluation Metrics:: Confusion Matrix
Confusion Matrix:
Accuracy: the proportion of the total number of predictions that were correct
True Positive: Positive class correctly predicted as positive
False Negative : Positive class incorrectly predicted as negative.
False Positive: Negative class incorrectly predicted as positive
True Negative: Negative class correctly predicted as negative
Type II error occurs when the null hypothesis is false, but erroneously fails to be
rejected. [type II error occurs when the null hypothesis is actually false, but was accepted
as true by the testing.]
Scenario/Problem Statement : Medical trials for a drug which is a cure for Cancer
In this case, Type I error is not an issue. It could be corrected later with more trials. Type II
error is more serious as it could be discarded as no cure and a cure can save millions of lives.
Risk of committing a Type I error is represented by your alpha level(P value below which you
reject the null hypothesis)
To control these type of errors, a variable alpha is used. Increasing the sample size can also
reduce the risk and change the amount of these type of errors.
P- Value:
When you perform a hypothesis test in statistics, a p-value can help you
determine the strength of your results. p-value is a number between 0 and 1. Based on the
value it will denote the strength of the results. The claim which is on trial is called Null
Hypothesis.
*Low p-value (≤ 0.05) indicates strength against the null hypothesis which means
we can reject the null Hypothesis.
*High p-value (≥ 0.05) indicates strength for the null hypothesis which means we
can accept the null Hypothesis
F1 Score is the Harmonic Mean between precision and recall. The range for F1
Score is [0, 1]. It tells you how precise your classifier is (how many instances it classifies
correctly), as well as how robust it is (it does not miss a significant number of instances).
High precision but lower recall, gives you an extremely accurate, but it then
misses a large number of instances that are difficult to classify. The greater the F1 Score, the
better is the performance of our model
Classification Accuracy:
It is the ratio of number of correct predictions to the total number of input samples.
It works well only if there are equal number of sample belonging to each class[Balanced
dataset].
Logarithmic Loss:
works well for multi-class classification. When working with Log Loss, the classifier must
assign probability to each class for all the samples. Suppose, there are N samples belonging
where,
Log Loss has no upper bound and it exists on the range [0, ∞). Log Loss nearer to 0 indicates
higher accuracy, whereas if the Log Loss is away from 0 then it indicates lower accuracy.In
general, minimising Log Loss gives greater accuracy for the classifier.
Step 3 : Build deciles with each group having almost 10% of the observations.
Step 4 : Calculate the response rate at each deciles for Good (Responders) ,Bad (Non-
responders) and total.
Gain at a given decile level is the ratio of cumulative number of targets (events) up to that
decile to the total number of targets (events) in the entire data set.
Lift measures how much better one can expect to do with the predictive model comparing
without a model. It is the ratio of gain % to the random expectation % at a given decile level.
The random expectation at the xth decile is x%.
Area Under Curve(AUC) is one of the most widely used metrics for evaluation. It is
used for binary classification problem. AUC of a classifier is equal to the probability that the
classifier will rank a randomly chosen positive example higher than a randomly chosen
negative example.
AUC has a range of [0, 1].
The greater the value, the better is the performance of our model.
The ROC curve is the plot between sensitivity and (1- specificity). (1- specificity) is
also known as false positive rate and sensitivity is also known as True Positive rate. This is a
commonly used graph that summarizes the performance of a classifier over all possible
thresholds.
The biggest advantage of using ROC curve is that it is independent of the change in
proportion of responders.
Mean Absolute Error is the average of the difference between the Original Values and
the Predicted Values. It gives us the measure of how far the predictions were from the actual
output
Mean Squared Error(MSE) is quite similar to Mean Absolute Error, the only
difference being that MSE takes the average of the square of the difference between the
whereas Mean Absolute Error requires complicated linear programming tools to compute
the gradient.
As, we take square of the error, the effect of larger errors become more
pronounced then smaller error, hence the model can now focus more on the larger errors.
K-S Chart:
R-Squared:
R Squared is used to determine the strength of correlation between the
predictors and the target. In simple terms it lets us know how good a regression model is
when compared to the average. R Squared is the ratio between the residual sum of squares
and the total sum of squares.
Where,
SSR (Sum of Squares of Residuals) is the sum of the squares of the difference
between the actual observed value (y) and the predicted value (ycap).
SST (Total Sum of Squares) is the sum of the squares of the difference between the
actual observed value (y) and the average of the observed y value (yavg).
SSR is the best fitting criteria for a regression line. That is the regression algorithm chooses
the best regression line for a given set of observations by drawing random lines and
comparing the SSR of each line. The line with the least value of SSR is the best fitting line.
Adjusted R Squared:
Adjusted R Squared has the capability to decrease with the addition of less
significant variables, thus resulting in a more reliable and accurate evaluation.
Adjusted R Squared,makes use of the degree of freedom to compensate and penalize for
the inclusion of a bad variable.
The value of Adjusted R Squared decreases as k increases also while considering R Squared
acting a penalization factor for a bad variable and rewarding factor for a good or significant
variable. Adjusted R Squared is thus a better model evaluator and can correlate the
variables more efficiently than R Squared.