0% found this document useful (0 votes)
29 views56 pages

Chapter - 1

The document provides an introduction to machine learning, defining it as the process of programming computers to optimize performance using example data. It outlines various types of machine learning, including supervised, unsupervised, and reinforcement learning, and discusses key concepts such as classification, regression, and clustering. Additionally, it addresses issues like overfitting and underfitting, performance metrics, and the importance of data in building accurate models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views56 pages

Chapter - 1

The document provides an introduction to machine learning, defining it as the process of programming computers to optimize performance using example data. It outlines various types of machine learning, including supervised, unsupervised, and reinforcement learning, and discusses key concepts such as classification, regression, and clustering. Additionally, it addresses issues like overfitting and underfitting, performance metrics, and the importance of data in building accurate models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Module # 1

Introduction to Machine
Learning

Dr. Tatwadarshi P. N.
Introduction
 To solve a problem on a computer, we need an algorithm.
 An algorithm is a sequence of instructions that should be carried out to
transform the input to output.
 For example, one can devise an algorithm for sorting.

 For some tasks, however, we do not have an algorithm—for example,


to tell spam emails from legitimate emails.
 What we lack in knowledge, we make up for in data.
 We can easily compile thousands of example messages some of which
we know to be spam and what we want is to “learn” what constitutes
spam from them.

Dr. Tatwadarshi P. N.
Machine Learning

Dr. Tatwadarshi P. N.
Classical Approach Vs Machine Learning

Dr. Tatwadarshi P. N., AI&DS, Vidyavardhini’s College of Engineering and Technology


Definition
 Machine learning is programming computers to optimize a
performance criterion using example data or past experience.

 We have a model defined up to some parameters, and learning is


the execution of a computer program to optimize the parameters
of the model using the training data or past experience.

 Machine learning is turning data into information.

Dr. Tatwadarshi P. N.
How does Machine Learning work ????
 A Machine Learning system learns from historical data, builds the
prediction models, and whenever it receives new data, predicts
the output for it.
 The accuracy of predicted output depends upon the amount of
data, as the huge amount of data helps to build a better model
which predicts the output more accurately.

Dr. Tatwadarshi P. N.
Why is ML required?
 Rapid increment in the production of data

 Solving complex problems, which are difficult for a human

 Decision making in various sector including finance

 Finding hidden patterns and extracting useful information from


data

Dr. Tatwadarshi P. N.
Key
Terminologies

Dr. Tatwadarshi P. N.
Key Task of Machine Learning
 Classification
 In classification, our job is to predict what class an instance of data
should fall into.

 Regression
 Regression is the prediction of a numeric value.

 Clustering
 In unsupervised learning, there’s no label or target value given for the
data.
 A task where we group similar items together is known as clustering.

Dr. Tatwadarshi P. N.
Types of Machine Learning

Dr. Tatwadarshi P. N.
Machine Learning Classification

Machine Learning

Supervised Learning Unsupervised Learning Reinforcement Learning

Regression Clustering
Linear Semi-Supervised
K-means, PCA Learning
Multivariate
Classification Association

Logistic, Trees, KNN,


Apriori
Naïve Bayes, SVM,
Neural Network
Dr. Tatwadarshi P. N.
Supervised
Learning
 In supervised learning, models
are trained using labelled
dataset, where the model learns
about each type of data.
 Once the training process is
completed, the model is tested
on the basis of test data (a
subset of the training set), and
then it predicts the output.

Dr. Tatwadarshi P. N.
Unsupervised
Learning
 “Unsupervised learning is a type
of machine learning in which
models are trained using
unlabeled dataset and are
allowed to act on that data
without any supervision.”
 Unsupervised learning cannot be
directly applied to a regression
or classification problem because
unlike supervised learning, we
have the input data but no
corresponding output data.
 The goal of unsupervised
learning is to find the underlying
structure of dataset, group that
data according to similarities,
and represent that dataset in a
compressed format. Dr. Tatwadarshi P. N.
Semi Supervised Learning

Dr. Tatwadarshi P. N.
Reinforcement Learning

Dr. Tatwadarshi P. N.
Issues in Machine Learning
Which Algorithm to select?

How much training data is sufficient?

Prior knowledge held by the learner is used at which time and manner to guide the
process of generalization from examples?

What is the best strategy for choosing a useful next training experience, and how
does the choice of this strategy affect the complexity of the learning problem?

To reduce the task of learning approximation of the problems is carriedout, what


will be the best approach?

To improve the knowledge representation and to learn the target function, how can
the learner automatically alter its representation?
Dr. Tatwadarshi P. N.
Applications of Machine Learning
 Automating Employee Access Control
 Protecting Animals
 Predicting Emergency Room Wait Times
 Identifying Heart Failure
 Predicting Strokes and Seizures
 Predicting Hospital Readmissions
 Stop Malware
 Understand Legalese
 Improve Cybersecurity
 Get Ready For Smart Cars
Dr. Tatwadarshi P. N.
Applications of Machine Learning

Dr. Tatwadarshi P. N., AI&DS, Vidyavardhini’s College of Engineering and Technology


How to choose the right Algorithm
 First, you need to consider your goal. What are you trying to get out of
this? (Do you want a probability that it might rain tomorrow, or do you
want to find groups of voters with similar interests?) What data do you
have or can you collect?
 If you’re trying to predict or forecast a target value, then you need to
look into supervised learning. If not, then unsupervised learning is the
place you want to be.
 If you’ve chosen supervised learning, what’s your target value?
 Is it a discrete value like Yes/No, 1/2/3, A/B/C, or Red/Yellow/Black?
If so, then you want to look into classification.
 If the target value can take on a number of values, say any value from
0.00 to 100.00, or -999 to 999, or +_ to -_, then you need to look into
regression
Dr. Tatwadarshi P. N.
Contd…
 If you’re not trying to predict a target value, then you need to look
into unsupervised learning.

 Are you trying to fit your data into some discrete groups? If so and
that’s all you need, you should look into clustering.

 Do you need to have some numerical estimate of how strong the fit
is into each group? If you answer yes, then you probably should look
into a density estimation algorithm.

 You should spend some time getting to know your data, and the
more you know about it, the better you’ll be able to build a
successful application.
Dr. Tatwadarshi P. N.
Steps in developing a machine learning
application

Collect data Prepare the Analyse the Train the


input data input data algorithm

Test the Use it


algorithm

Dr. Tatwadarshi P. N.
Reading a dataset
 Continuous Data  Categorical Data

ID Marital Status Race Sex Income


1 Widowed White Female
<=50K

2 Widowed White Female <=50K


3 Widowed Black Female <=50K
4 Divorced White Female <=50K
5 Separated White Female <=50K
6 Divorced White Female <=50K
7 Separated White Male <=50K
8 Never-married White Female >50K
9 Divorced White Female <=50K
10 Never-married White Male >50K

Dr. Tatwadarshi P. N., • Continuous Data • Features / Attributes


AI&DS, Vidyavardhini’s
College of Engineering
and Technology
• Categorical Data • Classes
• Targets (Categorical / Continuous ) • Training Set / Testing Set
Training, Testing and Validation Dataset

Dr. Tatwadarshi P. N.
Cross Validation
 In machine learning, we couldn’t fit the model on the training data
and can’t say that the model will work accurately for the real data.
For this, we must assure that our model got the correct patterns
from the data, and it is not getting up too much noise. For this
purpose, we use the cross-validation technique.

 Cross validation is a technique used in machine learning to evaluate


the performance of a model on unseen data. It involves dividing the
available data into multiple folds or subsets, using one of these folds
as a validation set, and training the model on the remaining folds.
This process is repeated multiple times, each time using a different
fold as the validation set. Finally, the results from each validation
step are averaged to produce a more robust estimate of the model’s
performance.
Dr. Tatwadarshi P. N.
Cross Validation

Dr. Tatwadarshi P. N.
Overfitting and Underfitting
 When we talk about the Machine Learning model, we actually talk
about how well it performs and its accuracy which is known as
prediction errors.
 Let us consider that we are designing a machine learning model.
 A model is said to be a good machine learning model if it generalizes
any new input data from the problem domain in a proper way.
 This helps us to make predictions about the future data, that the
data model has never seen.
 Now, suppose we want to check how well our machine learning
model learns and generalizes to the new data.
 For that, we have overfitting and underfitting, which are majorly
responsible for the poor performances of the machine learning
algorithms.
Dr. Tatwadarshi P. N.
Overfitting and Underfitting

Dr. Tatwadarshi P. N.
Bias and Variance
 Bias: Assumptions made by a model to make a function easier to
learn. It is actually the error rate of the training data. When the
error rate has a high value, we call it High Bias and when the error
rate has a low value, we call it low Bias.
 Variance: The error rate of the testing data is called variance. When
the error rate has a high value, we call it High variance and when
the error rate has a low value, we call it Low variance.

Dr. Tatwadarshi P. N.
Bias and Variance
 Low Bias: Suggests less assumptions about the form of the target
function.
 High-Bias: Suggests more assumptions about the form of the target
function.

 Low Variance: Suggests small changes to the estimate of the target


function with changes to the training dataset.
 High Variance: Suggests large changes to the estimate of the target
function with changes to the training dataset.

Dr. Tatwadarshi P. N.
Underfitting
 A statistical model or a machine learning algorithm is said to have
underfitting when it cannot capture the underlying trend of the data,
i.e., it only performs well on training data but performs poorly on
testing data.

 Underfitting destroys the accuracy of our machine learning model.


 Its occurrence simply means that our model or the algorithm does
not fit the data well enough.

 An underfitted model has high bias and low variance.

Dr. Tatwadarshi P. N.
Underfitting
 It usually happens when we have fewer data to build an accurate
model and also when we try to build a linear model with fewer non-
linear data.
 In such cases, the rules of the machine learning model are too easy
and flexible to be applied on such minimal data and therefore the
model will probably make a lot of wrong predictions.
 Underfitting can be avoided by using more data and also increasing
the features by feature selection.

Dr. Tatwadarshi P. N.
Underfitting
 Reasons for Underfitting:
 High bias and low variance
 The size of the training dataset used is not enough.
 The model is too simple.
 Training data is not cleaned and also contains noise in it.
 Techniques to reduce underfitting:
 Increase model complexity
 Increase the number of features, performing feature engineering
 Remove noise from the data.
 Increase the number of epochs or increase the duration of training to
get better results.
Dr. Tatwadarshi P. N.
Overfitting
 A statistical model is said to be overfitted when the model does not
make accurate predictions on testing data.
 When a model gets trained with so much data, it starts learning from
the noise and inaccurate data entries in our data set.
 And when testing with test data results in High variance.
 Then the model does not categorize the data correctly, because of
too many details and noise.
 The causes of overfitting are the non-parametric and non-linear
methods because these types of machine learning algorithms have
more freedom in building the model based on the dataset and
therefore they can really build unrealistic models.

Dr. Tatwadarshi P. N.
Overfitting
 A solution to avoid overfitting is using a linear algorithm if we have
linear data or using the parameters like the maximal depth if we are
using decision trees.

 Very Good Validation Accuracy and Very Poor Testing Accuracy.

 The overfitted model has low bias and high variance.

Dr. Tatwadarshi P. N.
Overfitting
 Reasons for Overfitting are as follows:
 High variance and low bias
 The model is too complex
 The size of the training data
 Techniques to reduce overfitting:
 Increase training data.
 Reduce model complexity.
 Early stopping during the training phase (have an eye over the loss over
the training period as soon as loss begins to increase stop training).
 Ridge Regularization and Lasso Regularization
 Use dropout for neural networks to tackle overfitting.
Dr. Tatwadarshi P. N.
Overfitting and Underfitting
 Use these steps to determine if your machine learning model, deep
learning model or neural network is currently underfit or overfit.

 Ensure that you are using validation loss next to training loss in the
training phase.
 When your validation loss is decreasing, the model is still underfit.
 When your validation loss is increasing, the model is overfit.
 When your validation loss is equal, the model is either perfectly fit
or in a local minimum.

Dr. Tatwadarshi P. N.
Performance Metrics

Dr. Tatwadarshi P. N.
Performance Metrics for Regression
 There are three error metrics that are commonly used for evaluating
and reporting the performance of a regression model; they are:

 Mean Squared Error (MSE).


 Root Mean Squared Error (RMSE).
 Mean Absolute Error (MAE)

Dr. Tatwadarshi P. N.
MAE, MAPE, MSE, RMSE and R2

 The Mean absolute error (MAE) represents the average of the


absolute difference between the actual and predicted values in the
dataset. It measures the average of the residuals in the dataset.

Dr. Tatwadarshi P. N.
MAE, MAPE, MSE, RMSE and R2

 The mean absolute percentage error (MAPE), also known as mean


absolute percentage deviation (MAPD), is a measure of prediction
accuracy of a forecasting method in statistics. It usually expresses
the accuracy as a ratio defined by the formula:

Dr. Tatwadarshi P. N.
MAE, MAPE, MSE, RMSE and R2

 Mean Squared Error (MSE) represents the average of the squared


difference between the original and predicted values in the data set.
It measures the variance of the residuals.

 Root Mean Squared Error (RMSE) is the square root of Mean Squared
error. It measures the standard deviation of residuals

Dr. Tatwadarshi P. N.
MAE, MAPE, MSE, RMSE and R2

 The coefficient of determination or R-squared represents the


proportion of the variance in the dependent variable which is
explained by the linear regression model. It is a scale-free score i.e.
irrespective of the values being small or large, the value of R square
will be less than one.

Dr. Tatwadarshi P. N.
MAE, MAPE, MSE, RMSE and R2
 Mean Squared Error(MSE) and Root Mean Square Error penalizes the
large prediction errors vi-a-vis Mean Absolute Error (MAE). However,
RMSE is widely used than MSE to evaluate the performance of the
regression model with other random models as it has the same units
as the dependent variable (Y-axis).
 MSE is a differentiable function that makes it easy to perform
mathematical operations in comparison to a non-differentiable
function like MAE. Therefore, in many models, RMSE is used as a
default metric for calculating Loss Function despite being harder to
interpret than MAE.
 MAE is more robust to data with outliers.
 The lower value of MAE, MSE, and RMSE implies higher accuracy of a
regression model. However, a higher value of R square is considered
desirable.
Dr. Tatwadarshi P. N.
Performance Metrics for Classification
 Classification is a type of supervised machine learning problem where
the goal is to predict, for one or more observations, the category or
class they belong to.
 An important element of any machine learning workflow is the
evaluation of the performance of the model. This is the process
where we use the trained model to make predictions on previously
unseen, labelled data. In the case of classification, we then evaluate
how many of these predictions the model got right.
 In real-world classification problems, it is usually impossible for a
model to be 100% correct. When evaluating a model it is, therefore,
useful to know, not only how wrong the model was, but in which way
the model was wrong.

Dr. Tatwadarshi P. N.
Performance Metrics for Classification
 7 Metrics to Measure Classification Performance

 Accuracy
 Confusion Matrix
 Precision
 Recall
 F1 score
 AUC/ROC
 Kappa

Dr. Tatwadarshi P. N.
Performance Metrics for Classification
 Accuracy
 The overall accuracy of a model is simply the number of correct predictions
divided by the total number of predictions. An accuracy score will give a
value between 0 and 1, a value of 1 would indicate a perfect model.

 This metric should rarely be used in isolation, as on imbalanced data, where


one class is much larger than another, the accuracy can be highly
misleading.
 Imagine we have a dataset where only 1% of the samples are cancerous. A
classifier that simply predicts all outcomes as benign would achieve an
accuracy score of 99%. However, this model would, in fact, be useless and
dangerous as it would never detect a cancerous observation.
Dr. Tatwadarshi P. N.
Performance Metrics for Classification
 Confusion Matrix
 A confusion matrix is an extremely useful tool to observe in which
way the model is wrong (or right!). It is a matrix that compares the
number of predictions for each class that are correct and those that
are incorrect.
 In a confusion matrix, there are 4 numbers to pay attention to.
 True positives: The number of positive observations the model
correctly predicted as positive.
 False-positive: The number of negative observations the model
incorrectly predicted as positive.
 True negative: The number of negative observations the model
correctly predicted as negative.
 False-negative: The number of positive observations the model
incorrectly predicted as negative.
Dr. Tatwadarshi P. N.
Performance Metrics for Classification
 The image below shows a confusion
matrix for a classifier. Using this we can
understand the following:
 The model correctly predicted 3,383
negative samples but incorrectly
predicted 46 as positive.
 The model correctly predicted 962
positive observations but incorrectly
predicted 89 as negative.
 We can see from this confusion matrix
that the data sample is imbalanced, with
the negative class having a higher volume
of observations.
Dr. Tatwadarshi P. N.
Performance Metrics for Classification
 Precision
 Precision measures how good the model is at correctly identifying
the positive class. In other words out of all predictions for the
positive class how many were actually correct? Using alone this
metric for optimising a model we would be minimising the false
positives. This might be desirable for our fraud detection example,
but would be less useful for diagnosing cancer as we would have little
understanding of positive observations that are missed.

Dr. Tatwadarshi P. N.
Performance Metrics for Classification
 Recall
 Recall tell us how good the model is at correctly predicting all the
positive observations in the dataset. However, it does not include
information about the false positives so would be more useful in the
cancer example.

 Usually, precision and recall are observed together by constructing a


precision-recall curve. This can help to visualise the trade-offs
between the two metrics at different thresholds.
Dr. Tatwadarshi P. N.
Performance Metrics for Classification
 The F1 score is the harmonic mean of precision and recall. The F1
score will give a number between 0 and 1. If the F1 score is 1.0 this
indicates perfect precision and recall. If the F1 score is 0 this means
that either the precision or the recall is 0.

Dr. Tatwadarshi P. N.
Performance Metrics for Classification

Dr. Tatwadarshi P. N.
Sensitivity vs Specificity
 Sensitivity and Specificity are similar to precision and recall, which
evaluate the model performance.
 In the data science community, while evaluating the model performance,
precision and recall are mainly used, but in the medical world, Sensitivity
and
 Specificity are used to evaluate the medical test.
 In medical terms, Sensitivity indicates the ability to detect the disease,
while Specificity refers to the percentage of people who don’t actually
have the disease are tested negative.

Dr. Tatwadarshi P. N.
Sensitivity vs Specificity
 Sensitivity measures how well a machine learning model can detect positive
instances.
 In other words, it measures how likely you will get a positive result when
you test for something.

 It is also known as a True Positive Rate or Recall.


 The model with high Sensitivity will have significantly fewer False
Negatives.
 The higher the Sensitivity better the model.

Dr. Tatwadarshi P. N.
Sensitivity vs Specificity
 Specificity measures the proportion of True Negative which are correctly
identified by the model.
 It is also called a True Negative Rate (TNR).
 The Sum of the True Negative Rate and False Negative Rate is 1.
 The higher Specificity of the model indicates that the model correctly
identifies most of the negative results.
 A lower specificity value indicates the model misled the negative results as
positive.
 In Medical terms, Specificity is a measure of the proportion of people not
suffering from the disease who got predicted correctly as those not
suffering from the disease.

Dr. Tatwadarshi P. N.
Sensitivity vs Specificity

Dr. Tatwadarshi P. N.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy