5 - Model For Predictions - ML
5 - Model For Predictions - ML
Hyper-parameters
•Just like a child learning things for the first time needs her parents guidance to decide
whether she is right or wrong,
•In machine learning someone has to provide some non-learnable parameters, also called
hyper-parameters.
•Without these human inputs, machine learning algorithms cannot be successful.
SELECTING A MODEL
Input variables
predictors, attributes, features,
independent variables, or simply
variables. Input variables can be
denoted by X,
while individual input variables are
represented as X1, X2, X3, …, Xn
Output variables
response or dependent variable output
variable by symbol Y
The relationship between X and Y is
represented in the general form
Y = f (X) + e
where ‘f ’ is the target function and ‘e’
is a random error term.
Modelling and Evaluation
Cost Function
•Error function
•determines how well a machine learning model performs for
a given dataset.
•a measure of how wrong the model is in terms of its ability
to estimate the relationship between X and y.
•Helps to measure the extent to which the model is going
wrong in estimating the relationship between X and Y.
•It calculates the difference between the expected value and
predicted value and represents it as a single real number.
Loss function
It's a method of evaluating how well your algorithm models
your dataset.
Determined as the difference between the actual output and
the predicted output from the model for the single training
example
Example : Loss function
test data
which is by holding back a part of the input data for validating the
trained model
Subset of the input data is used as the test data for evaluating the
performance of a trained model.
In general 70%–80% of the input data (which is obviously labelled) is
of prediction of
Holdout Method
Stratified random sampling, the whole data is broken into several homogenous groups or
strata and a random sample is selected from each such stratum.
This ensures that the generated random partitions have equal proportions of each class
K-fold Cross-validation method
Process of repeated holdout is the basis of k-fold cross-
validation technique.
In k-fold cross-validation, the data set is divided into k-
completely distinct or non-overlapping random partitions
called folds.
A special variant of holdout method, called repeated holdout, is
sometimes employed to ensure the randomness of the
composed data sets.
In repeated holdout, several random holdouts are used to
measure the model performance. In the end, the average of all
performances is taken.
The value of ‘k’ in k-fold cross-validation can be set to any
number.
K-Fold is validation technique in which we split the data into k-
subsets
holdout method is repeated k-times where each of the k
subsets are used as test set other k-1 subsets are used for the
Detailed approach for fold selection
Detailed approach for fold selection
Leave-one-out cross-validation
(LOOCV)
Training Data
New Data
Bias-variance trade-off
In context of the above confusion matrix, total count of TPs= 85, FPs=4
FNs = 2 and count of TNs = 9.
error rate
The percentage of misclassifications is indicated
Kappa
Kappa value can be 1 at the maximum,
which represents perfect agreement between model’s prediction and actual values.
The Kappa statistic (or value) is a metric that compares an Observed
Accuracy with an Expected Accuracy (random chance).
Kappa
In context of the above confusion matrix,
total count of TPs = 85,
count of FPs = 4,
count of FNs = 2 and count of TNs = 9.
Sensitivity
A measure of how well a machine learning model can detect positive instances. It is
also known as the true positive rate (TPR) or Recall.
Used to evaluate model performance because it allows us to see how many positive instances
the model was able to correctly identify.
Recall indicates the proportion of correct prediction of positives to the total number of
positives.
Ex:
Sensitivity or true positive rate is a measure of the proportion of people suffering from the
disease who got predicted correctly as the ones suffering from the disease.
In other words, the person who is unhealthy (positive) actually got predicted as unhealthy.
Specificity
A model measures the proportion of negative examples which have been correctly classified.
It informs us about the proportion of actual negative cases that have gotten predicted as
negative by our model.
It is the ratio of true negatives to all negatives.
Precision
The ratio of correctly classified positive samples (True Positive) to a total number of classified
positive samples (either correctly or incorrectly).
precision helps us to visualize the reliability of the machine learning model in classifying the
model as positive.
F-measure/ F1 score/ F Score
• Measure of model performance
• combines the precision and recall into account using a single score
• It takes the harmonic mean of precision and recall as calculated as
• An evaluation metric for binary classification problems.(comparing the efficiency of two models).
• It shows the efficiency of a model in the detection of true positives while avoiding the occurrence of false
positives.
Supervised learning:
Regression
A regression model which ensures that the difference between predicted and actual values is low can
be considered as a good model
The distance between the actual value and the fitted or predicted value, i.e. ŷ is known as residual.
The regression model can be considered to be fitted well if the difference between actual and
predicted value, i.e.
the residual value is less.
R-squared
• R-squared is a good measure to evaluate the model fitness.
• known as the coefficient of determination, or for multiple regression
• The R-squared value lies between 0 to 1 (0%–100%) with a larger value representing a
better fit
• Sum of Squares Total (SST) = squared differences of each observation from the overall
mean
uses distance (Euclidean or Manhattan distances most commonly used) between data
elements as a similarity measure
The value of silhouette width ranges between –1 and +1, with a high value indicating high
intra-cluster homogeneity and inter-cluster heterogeneity.
For a data set clustered into ‘k’ clusters, silhouette width is calculated as:
a(i) is the average distance between the i th data instance and all other data instances
belonging to the same cluster
b(i) is the lowest average distance between the i-the data instance and data instances of all
other clusters.
Silhouette
coefficient
There are four clusters namely cluster 1, 2, 3, and 4.
ai1,ai2,ai3, …ain
a of the different data elements from the ith data element in cluster 1