0% found this document useful (0 votes)
10 views72 pages

Unit-5 ML Notes

The document provides a comprehensive overview of machine learning, covering its definition, types, and the transition from human learning to machine learning. It discusses various algorithms, including supervised and unsupervised learning, as well as evaluation techniques and challenges associated with machine learning models. Additionally, it delves into specific methods such as linear regression, logistic regression, and K-nearest neighbors, highlighting their applications, advantages, and disadvantages.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views72 pages

Unit-5 ML Notes

The document provides a comprehensive overview of machine learning, covering its definition, types, and the transition from human learning to machine learning. It discusses various algorithms, including supervised and unsupervised learning, as well as evaluation techniques and challenges associated with machine learning models. Additionally, it delves into specific methods such as linear regression, logistic regression, and K-nearest neighbors, highlighting their applications, advantages, and disadvantages.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Contents

1 Introduction to Machine Learning 1


1.1 What is Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Key Characteristics of Learning . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Types of Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Human Learning to Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Machine Learning (ML) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 How Does Machine Learning Work? . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.2 Real-World Examples of Machine Learning . . . . . . . . . . . . . . . . . . . . 3
1.3.3 Types of Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Evaluating a machine learning model . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.1 The train/test/validation split . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.2 Bias vs Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.2.1 Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.2.2 Ways to reduce high bias in Machine Learning . . . . . . . . . . . . . . 5
1.4.2.3 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.2.4 Ways to Reduce the reduce Variance in Machine Learning . . . . . . . . . 6
1.4.2.5 Different Combinations of Bias-Variance . . . . . . . . . . . . . . . . . 7
1.4.2.6 How to identify High variance or High Bias? . . . . . . . . . . . . . . . 7
1.4.3 Bias-Variance Trade-Off . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.4 Underfitting in Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.4.1 Reasons for Underfitting . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.4.2 Techniques to Reduce Underfitting . . . . . . . . . . . . . . . . . . . 9
1.4.5 Overfitting in Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.5.1 Reasons for Overfitting . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.5.2 Techniques to Reduce Overfitting . . . . . . . . . . . . . . . . . . . . 10
1.5 Supervised Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5.1 How does Supervised Learning work? . . . . . . . . . . . . . . . . . . . . . . 12
1.5.2 Applications of Supervised learning . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.3 Advantages of Supervised learning . . . . . . . . . . . . . . . . . . . . . . . 13
1.5.4 Disadvantages of Supervised learning . . . . . . . . . . . . . . . . . . . . . . 13
1.5.5 Types of Supervised Machine learning Algorithms: . . . . . . . . . . . . . . . . . 14
1.5.6 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.6.1 Types of Regression . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.6.2 Regression Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.7 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5.7.1 Types of Classification . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5.7.2 Classification Algorithms . . . . . . . . . . . . . . . . . . . . . . . 20
1.5.8 Loss Function in Supervised Learning . . . . . . . . . . . . . . . . . . . . . . 22
1.5.9 Evaluating Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5.10 Sensitivity and Specificity . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.5.11 Evaluating Classification Model . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.5.12 Decision Boundary in Classification Algorithms . . . . . . . . . . . . . . . . . . 26
1.5.13 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.5.14 Regression vs Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.6 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.1 Linear Regression Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3
1.6.2 Types of Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.6.3 Applications of Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . 31
1.6.4 Advantages of Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . 31
1.6.5 Disadvantages of Linear Regression . . . . . . . . . . . . . . . . . . . . . . . 31
1.7 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.7.1 How to Calculate m and c values to get the best-fit line? . . . . . . . . . . . . . . . 35
1.7.2 Least Squares Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.7.3 Assumptions of Simple Linear Regression . . . . . . . . . . . . . . . . . . . . 38
1.8 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.8.1 Types of Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.8.2 Assumptions of Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . 40
1.8.3 Linear Regression vs Logistic Regression . . . . . . . . . . . . . . . . . . . . . 41
1.8.4 Logistic Function (Sigmoid Function) . . . . . . . . . . . . . . . . . . . . . . 42
1.8.5 Logistic Regression implementation . . . . . . . . . . . . . . . . . . . . . . . 43
1.8.6 Logistic Regression Resources . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.9 K-Nearest Neighbor(KNN) Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.9.1 KNN Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.9.2 Example of KNN Algorithm Working? . . . . . . . . . . . . . . . . . . . . . . 45
1.9.3 Example of KNN Algorithm Working? . . . . . . . . . . . . . . . . . . . . . . 46
1.9.4 How to choose the value of k for KNN Algorithm? . . . . . . . . . . . . . . . . . 47
1.9.5 Distance Metrics Used in KNN Algorithm . . . . . . . . . . . . . . . . . . . . 48
1.9.6 Advantages of the KNN Algorithm . . . . . . . . . . . . . . . . . . . . . . . 51
1.9.7 Disadvantages of the KNN Algorithm . . . . . . . . . . . . . . . . . . . . . . 51
1.9.8 Implementation of KNN Algorithm for Classification . . . . . . . . . . . . . . . . 52
1.10 Unsupervised Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
1.10.1 Why use Unsupervised Learning? . . . . . . . . . . . . . . . . . . . . . . . . 56
1.10.2 Challenges of Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . 56
1.10.3 Advantages of Unsupervised learning . . . . . . . . . . . . . . . . . . . . . . 57
1.10.4 Disadvantages of Unsupervised learning . . . . . . . . . . . . . . . . . . . . . 57
1.10.5 Applications of Unsupervised learning . . . . . . . . . . . . . . . . . . . . . . 57
1.10.6 Unsupervised Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . 58
1.10.7 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
1.10.7.1 Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 59
1.11 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
1.11.1 When to Use K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . 61
1.11.2 When to Use Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . 61
1.12 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
1.12.1 Comparison of K-Means Clustering and Hierarchical Clustering . . . . . . . . . . . 62
1.12.2 Association Rule Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
1.12.2.1 Association Rule Learning Algorithms . . . . . . . . . . . . . . . . . 63
1.12.3 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
1.12.3.1 Dimensionality Reduction Algorithms . . . . . . . . . . . . . . . . . . 64
1.13 Supervised vs Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 65
1.14 Reinforcement Learning (RL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
1.14.1 Elements of Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . 68
1.14.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
1.14.3 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
1.14.4 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
1.15 Supervised vs Unsupervised vs Reinforcement Learning . . . . . . . . . . . . . . . . . . 70
1.16 Sample Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Chapter 1

Introduction to Machine Learning

1.1 What is Learning?


• Learning can be defined as the process of acquiring knowledge, skills, behaviors, or understanding through
experiences, observations, or instruction.
• It involves recognizing patterns, making decisions, and adapting to new situations based on past experiences.

1.1.1 Key Characteristics of Learning


• Experience-Based: Learning often happens through exposure to examples, events, or data.
• Adaptability: It enables individuals (or systems) to adapt to changes in their environment.
• Improvement Over Time: Through repetition and feedback, learning leads to better performance in tasks.
• Memory: Retaining information from past experiences is critical for future decision-making.

1.1.2 Types of Learning


• Explicit Learning: Learning through direct instruction, such as in classrooms or reading textbooks.
• Implicit Learning: Gaining knowledge subconsciously, like learning to recognize faces without formal
instruction.
• Supervised Learning: Learning under guidance, where feedback is given about right and wrong actions
(e.g., a teacher correcting a student).
• Unsupervised Learning: Self-directed learning through exploration without explicit guidance.
• Reinforcement Learning: Learning through rewards and punishments for actions taken in a particular
situation.

1
1.2 Human Learning to Machine Learning
Just as humans learn from experiences, machines learn from data. While humans rely on senses, memory, and
cognition to process and interpret information, machines use algorithms to analyze data, identify patterns, and
make decisions.

2
1.3 Machine Learning (ML)
• Machine Learning (ML) is a subset of Artificial Intelligence (AI) that provides systems the ability to learn
and improve automatically from experience (data) without being explicitly programmed.
• Instead of hard-coding rules for every scenario, ML models analyze data, learn relationships, and generalize
1
their findings to new, unseen data.

1.3.1 How Does Machine Learning Work?


• Data Collection: Machines require data (numbers, images, text, etc.) as input to learn.
• Feature Extraction: Extract meaningful attributes from the raw data.
• Model Training: Use algorithms to train models on the data.
• Evaluation: Test the model on new data to check accuracy.
• Prediction: Deploy the model to make real-world predictions.

1.3.2 Real-World Examples of Machine Learning


1. Spam Email Detection:
• Emails are classified as "Spam" or "Not Spam" using supervised learning algorithms.
• The model learns from historical data where emails are labeled with their categories.
2. Product Recommendations
• Platforms like Amazon and Netflix use ML to recommend products or movies.
• Algorithms analyze user behavior (past purchases, viewing history) to make suggestions.
3. Autonomous Vehicles:
• Self-driving cars use ML to interpret sensor data, identify road signs, detect obstacles, and make
driving decisions.
• Reinforcement learning helps improve performance by simulating driving scenarios.
4. Healthcare Diagnostics:
• ML models assist doctors in diagnosing diseases from medical images (e.g., identifying tumors in
X-rays).
• Unsupervised learning detects anomalies, while supervised learning uses labeled data.
5. Fraud Detection: Banks use ML to detect fraudulent transactions by analyzing spending patterns and
flagging anomalies.

1.3.3 Types of Machine Learning


• Supervised Learning
• UnSupervised Learning
• Reinforcement Learning
• Semi-Supervised Learning
• Self-Supervised Learning

1 https://www.youtube.com/watch?v=cKxRvEZd3Mw

3
1.4 Evaluating a machine learning model
• So you’ve built a machine learning model and trained it on some data... now what?
• The main goal of each machine learning model is to generalize well. Here generalization defines the ability
of an ML model to provide a suitable output by adapting the given set of unknown input.
• Now, suppose we want to check how well our machine learning model learns and generalizes to the new
data.

1.4.1 The train/test/validation split


• The most important thing you can do to properly evaluate your model is to not train the model on the entire
dataset.
• A typical train/test split would be to use 70% of the data for training and 30% of the data for testing.
• It’s important to use new data when evaluating our model to prevent the likelihood of overfitting to the
training set. However, sometimes it’s useful to evaluate our model as we’re building it to find that best
parameters of a model - but we can’t use the test set for this evaluation or else we’ll end up selecting the
parameters that perform best on the test data but maybe not the parameters that generalize best.
• I’ll also note that it’s very important to shuffle the data before making these splits so that each split has an
accurate representation of the dataset.

1.4.2 Bias vs Variance


• In machine learning, these errors will always be present as there is always a slight difference between the
model predictions and actual predictions.
• If the machine learning model is not accurate, it can make predictions errors, and these prediction errors are
usually known as Bias and Variance.
• Bias and Variance help us in parameter tuning and deciding better-fitted models among several built.
• There are two types of error in machine learning. Reducible error and Irreducible error. Bias and
Variance come under reducible error.

4
1.4.2.1 Bias
• In general, a machine learning model analyses the data, find patterns in it and make predictions. While
training, the model learns these patterns in the dataset and applies them to test data for prediction.
• While making predictions, a difference occurs between prediction values made by the model and actual
values/expected values, and this difference is known as bias errors or errors due to bias.
• Bias refers to the error due to overly simplistic assumptions in the learning algorithm. These assumptions
make the model easier to comprehend and learn but might not capture the underlying complexities of the
data. It is the error due to the model’s inability to represent the true relationship between input and output
accurately.
• A model has either:
– Low Bias: Low bias value means fewer assumptions are taken to build the target function. In this
case, the model will closely match the training dataset.
– High Bias: A model with a high bias makes more assumptions, and the model becomes unable to
capture the important features of our dataset. In this case, the model will not match the training
dataset closely. A high bias model also cannot perform well on new data.
• The high-bias model will not be able to capture the dataset trend. It is considered as the underfitting model
which has a high error rate.
• When a model has poor performance both on the training and testing data means high bias because of the
simple model, indicating underfitting.

1.4.2.2 Ways to reduce high bias in Machine Learning


• Use a more complex model: One of the main reasons for high bias is the very simplified model. It will
not be able to capture the complexity of the data. In such cases, we can make our mode more complex
by increasing the number of hidden layers in the case of a deep neural network. Or we can use a more
complex model like Polynomial regression for non-linear datasets, CNN for image processing, and RNN for
sequence learning.
• Increase the number of features: By adding more features to train the dataset will increase the complexity
of the model. And improve its ability to capture the underlying patterns in the data.
• Reduce Regularization of the model: Regularization techniques such as L1 or L2 regularization can help
to prevent overfitting and improve the generalization ability of the model. if the model has a high bias,
reducing the strength of regularization or removing it altogether can help to improve its performance.
• Increase the size of the training data: Increasing the size of the training data can help to reduce bias by
providing the model with more examples to learn from the dataset.

5
1.4.2.3 Variance
• Variance is the measure of spread in data from its mean position.
• In machine learning variance is the amount by which the performance of a predictive model changes when
it is trained on different subsets of the training data.
• More specifically, variance is the variability of the model that how much it is sensitive to another subset of
the training dataset. i.e. how much it can adjust on the new subset of the training dataset.
• Ideally, a model should not vary too much from one training dataset to another, which means the algorithm
should be good in understanding the hidden mapping between inputs and output variables.
• Variance errors are either low or high-variance errors:
– Low variance: Low variance means that the model is less sensitive to changes in the training data
and can produce consistent estimates of the target function with different subsets of data from the
same distribution.
– Low variance means there is a small variation in the prediction of the target function with changes in
the training data set.
– High variance: High variance means that the model is very sensitive to changes in the training data
and can result in significant changes in the estimate of the target function when trained on different
subsets of data from the same distribution.
– This is the case of overfitting when the model performs well on the training data but poorly on new,
unseen test data. It fits the training data too closely that it fails on the new training dataset.

1.4.2.4 Ways to Reduce the reduce Variance in Machine Learning


• Cross-validation: By splitting the data into training and testing sets multiple times, cross-validation can
help identify if a model is overfitting or underfitting and can be used to tune hyperparameters to reduce
variance.
• Feature selection: By choosing the only relevant feature will decrease the model’s complexity. and it can
reduce the variance error.
• Regularization: We can use L1 or L2 regularization to reduce variance in machine learning models
• Ensemble methods: It will combine multiple models to improve generalization performance. Bagging,
boosting, and stacking are common ensemble methods that can help reduce variance and improve general-
ization performance.
• Simplifying the model: Reducing the complexity of the model, such as decreasing the number of parameters
or layers in a neural network, can also help reduce variance and improve generalization performance.
• Early stopping: Early stopping is a technique used to prevent overfitting by stopping the training of the
deep learning model when the performance on the validation set stops improving.

6
1.4.2.5 Different Combinations of Bias-Variance
There can be four combinations between bias and variance.

1. High Bias, Low Variance: A model with high bias and low variance is said to be underfitting.
2. High Variance, Low Bias: A model with high variance and low bias is said to be overfitting.
3. High-Bias, High-Variance: A model has both high bias and high variance, which means that the model is
not able to capture the underlying patterns in the data (high bias) and is also too sensitive to changes in the
training data (high variance). As a result, the model will produce inconsistent and inaccurate predictions on
average.
4. Low Bias, Low Variance: A model that has low bias and low variance means that the model is able to
capture the underlying patterns in the data (low bias) and is not too sensitive to changes in the training data
(low variance). This is the ideal scenario for a machine learning model, as it is able to generalize well to
new, unseen data and produce consistent and accurate predictions. But in practice, it’s not possible.

1.4.2.6 How to identify High variance or High Bias?

• High variance can be identified if the model has: Low training error and high test error.
• High Bias can be identified if the model has: High training error and the test error is almost similar to
training error.

7
1.4.3 Bias-Variance Trade-Off
• While building the machine learning model, it is really important to take care of bias and variance in order
to avoid overfitting and underfitting in the model.
• If the model is very simple with fewer parameters, it may have low variance and high bias. Whereas, if the
model has a large number of parameters, it will have high variance and low bias.
• So, it is required to make a balance between bias and variance errors, and this balance between the bias
error and variance error is known as the Bias-Variance trade-off.
• For an accurate prediction of the model, algorithms need a low variance and low bias. But this is not
possible because bias and variance are related to each other. So, we need to find a sweet spot between bias
and variance to make an optimal model.
• An algorithm can’t be more complex and less complex at the same time. For the graph, the perfect tradeoff
will be like this.

8
1.4.4 Underfitting in Machine Learning
• A statistical model or a machine learning algorithm is said to have underfitting when a model is too simple
to capture data complexities.
• It represents the inability of the model to learn the training data effectively result in poor performance both
on the training and testing data.
• In simple terms, an underfit model’s are inaccurate, especially when applied to new, unseen examples. It
mainly happens when we uses very simple model with overly simplified assumptions.
• To address underfitting problem of the model, we need to use more complex models, with enhanced feature
representation, and less regularization.
• The underfitting model has High bias and low variance.

1.4.4.1 Reasons for Underfitting


• The model is too simple, So it may be not capable to represent the complexities in the data.
• The input features which is used to train the model is not the adequate representations of underlying factors
influencing the target variable.
• The size of the training dataset used is not enough.
• Excessive regularization are used to prevent the overfitting, which constraint the model to capture the data
well.
• Features are not scaled.

1.4.4.2 Techniques to Reduce Underfitting


• Increase model complexity.
• Increase the number of features, performing feature engineering.
• Remove noise from the data.
• Increase the number of epochs or increase the duration of training to get better results.

9
1.4.5 Overfitting in Machine Learning
• A statistical model is said to be overfitted when the model does not make accurate predictions on testing
data.
• When a model gets trained with so much data, it starts learning from the noise and inaccurate data entries in
our data set. And when testing with test data results in High variance. Then the model does not categorize
the data correctly, because of too many details and noise.
• The causes of overfitting are the non-parametric and non-linear methods because these types of machine
learning algorithms have more freedom in building the model based on the dataset and therefore they can
really build unrealistic models.
• The Overfitting model has High variance and Low bias..

1.4.5.1 Reasons for Overfitting


• The model is too complex.
• The size of the training data.

1.4.5.2 Techniques to Reduce Overfitting


• Improving the quality of training data reduces overfitting by focusing on meaningful patterns, mitigate the
risk of fitting the noise or irrelevant features.
• Increase the training data can improve the model’s ability to generalize to unseen data and reduce the
likelihood of overfitting.
• Reduce model complexity.
• Early stopping during the training phase (have an eye over the loss over the training period as soon as loss
begins to increase stop training).
• Ridge Regularization and Lasso Regularization.
• Use dropout for neural networks to tackle overfitting.

10
1.5 Supervised Machine Learning Algorithms
• In Supervised Learning, the goal is to learn a mapping between input features (independent variables) and
the target output (dependent variable).
• In supervised learning, the model is trained on a labeled dataset, where the input data (features) is paired
with the correct output (labels).
• The goal is to learn a mapping from inputs to outputs and make predictions on new, unseen data.
• Similar to a human learning with a teacher.
• Supervised machine learning involves training a model on labeled data to learn patterns and relationships,
which it then uses to make accurate predictions on new data.

• In the image above,


– Training phase involves feeding the algorithm labeled data, where each data point is paired with
its correct output. The algorithm learns to identify patterns and relationships between the input and
output data.
– Testing phase involves feeding the algorithm new, unseen data and evaluating its ability to predict
the correct output based on the learned patterns.

11
1.5.1 How does Supervised Learning work?
• Data Collection and Preprocessing: Gather a labeled dataset consisting of input features and target output
labels. Clean the data, handle missing values, and scale features as needed to ensure high quality for
supervised learning algorithms.
• Splitting the Data: Divide the data into training set (80%) and the test set (20%).
• Choosing the Model: Select appropriate algorithms based on the problem type. This step is crucial for
effective supervised learning in AI.
• Training the Model: Feed the model input data and output labels, allowing it to learn patterns by adjusting
internal parameters.
• Evaluating the Model: Test the trained model on the unseen test set and assess its performance using
various metrics.
• Hyperparameter Tuning: Adjust settings that control the training process (e.g., learning rate) using
techniques like grid search and cross-validation.
• Final Model Selection and Testing: Retrain the model on the complete dataset using the best hyperparam-
eters testing its performance on the test set to ensure readiness for deployment.
• Model Deployment: Deploy the validated model to make predictions on new, unseen data.

1.5.2 Applications of Supervised learning


• Predicting housing prices based on size, location, and amenities (regression).
• Classifying emails as spam or not spam (classification).
• Fraud Detection in Banking: Utilizes supervised learning algorithms on historical transaction data, training
models with labeled datasets of legitimate and fraudulent transactions to accurately predict fraud patterns.
• Parkinson Disease Prediction: Parkinson’s disease is a progressive disorder that affects the nervous system
and the parts of the body controlled by the nerves.
• Customer Churn Prediction: Uses supervised learning techniques to analyze historical customer data,
identifying features associated with churn rates to predict customer retention effectively.
• Cancer cell classification: Implements supervised learning for cancer cells based on their features, and
identifying them if they are ‘malignant’ or ‘benign.
• Stock Price Prediction: Applies supervised learning to predict a signal that indicates whether buying a
particular stock will be helpful or not.

12
1.5.3 Advantages of Supervised learning
• Accuracy and Predictability: Produces highly accurate models since the algorithm learns from labeled
examples.
• Clear Objective: Training data provides a clear mapping between input and output, making it easier to
evaluate model performance.
• Wide Applicability: Useful for a broad range of tasks, such as regression (predicting prices, temperatures)
and classification (spam detection, image recognition).
• Ease of Evaluation: Performance metrics (e.g., accuracy, precision, recall) can be directly calculated because
true outputs are known.
• Reliable for Well-Defined Problems: Works exceptionally well when sufficient labeled data is available
for tasks like fraud detection or medical diagnosis.
• Supervised learning allows collecting data and produces data output from previous experiences.
• Helps to optimize performance criteria with the help of experience.
• Supervised machine learning helps to solve various types of real-world computation problems.
• It performs classification and regression tasks.
• It allows estimating or mapping the result to a new sample.
• We have complete control over choosing the number of classes we want in the training data.

1.5.4 Disadvantages of Supervised learning


• Overfitting: Models can overfit training data, leading to poor performance on new data due to capturing
noise in supervised machine learning.
• Feature Engineering : Extracting relevant features is crucial but can be time-consuming and requires
domain expertise in supervised learning applications.
• Bias in Models: Bias in the training data may result in unfair predictions in supervised learning algorithms.
• Dependence on Labeled Data: Supervised learning relies heavily on labeled training data, which can be
costly and time-consuming to obtain, posing a challenge for supervised learning techniques.
• Classifying big data can be challenging.
• Training for supervised learning needs a lot of computation time. So, it requires a lot of time.
• Supervised learning cannot handle all complex tasks in Machine Learning.
• Computation time is vast for supervised learning.
• It requires a labelled data set.
• It requires a training process.

13
1.5.5 Types of Supervised Machine learning Algorithms:
Depending on the type of output, supervised learning can be categorized into:

• Regression: Where the output is a continuous variable (e.g., predicting house prices, stock prices).
• Classification: Where the output is a categorical variable (e.g., spam vs. non-spam emails, yes vs. no).

14
• Let’s first understand the classification and regression data through the table below:

• Both the above figures have labelled data set as follows:


– Figure A: It is a dataset of a shopping store that is useful in predicting whether a customer will
purchase a particular product under consideration or not based on his/ her gender, age, and salary.
– Input: Gender, Age, Salary
– Output: Purchased i.e. 0 or 1; 1 means yes the customer will purchase and 0 means that the customer
won’t purchase it.
– Figure B: It is a Meteorological dataset that serves the purpose of predicting wind speed based on
different parameters.
– Input: Dew Point, Temperature, Pressure, Relative Humidity, Wind Direction
– Output: Wind Speed

15
1.5.6 Regression
• Regression is a technique used to predict a continuous output based on input features.
• The output can take any real value within a range.
• How It Works:
– The algorithm establishes a relationship between input variables (independent variables) and the
output (dependent variable).
– The objective is to determine the most suitable function that characterizes the connection between
these variables.
– The relationship is often modeled as a mathematical function, such as a line or a curve.

1.5.6.1 Types of Regression


• Linear Regression
• Non-linear Regression

1.5.6.2 Regression Algorithm


• Linear Regression
• Multiple Linear Regression
• Polynomial Regression
• Stepwise Regression
• Support Vector Regression
• Decision Tree Regression
• Random Forest Regression
• Gradient Boosting Regression
• AdaBoost Regression
• XGBoost Regression
• K-Nearest Neighbors Regression (KNN Regression)
• Ridge Regression
• Lasso Regression
• Elastic Net Regression
• Bayesian Regression
• Quantile Regression
• Principal Components Regression
• Partial Least Squares Regression
• Logistic Regression (for classification but sometimes considered regression)
• Neural Network Regression

16
1.5.7 Classification
• Classification is a technique is a predictive modeling technique that uses classification model (or Classifiers)
to categorize input data and assigned them to predefined classes.
• Classifiers learn class characteristics from input data, then learn to assign possible classes to new unseen
data according to those learned characteristics.
• For example a classification model might be trained on dataset of images labeled as either dogs or cats and
it can be used to predict the class of new and unseen images as dogs or cats based on their features such as
color, texture and shape.
• Classification algorithms can be better understood using the below diagram. In the below diagram, there are
two classes, class A and Class B. These classes have features that are similar to each other and dissimilar to
other classes.

• How It Works:
– The algorithm learns decision boundaries that separate classes in the input feature space.
– New data points are classified based on their proximity to these boundaries.
Data Collection: You start with a dataset where each item is labeled with the correct class (for
example, “cat” or “dog”).
Feature Extraction: The system identifies features (like color, shape, or texture) that help distinguish
one class from another. These features are what the model uses to make predictions.
Model Training: Classification – machine learning algorithm uses the labeled data to learn how to
map the features to the correct class. It looks for patterns and relationships in the data.
Model Evaluation: Once the model is trained, it’s tested on new, unseen data to check how accurately
it can classify the items.
Prediction: After being trained and evaluated, the model can be used to predict the class of new data
based on the features it has learned.

17
1.5.7.1 Types of Classification
There are four main classification tasks in Machine learning:

• Binary classification
– The task is to assign inputs into one of two distinct categories (e.g., Yes/No, True/False).
– Email classification: Spam vs. Not Spam.
– Disease diagnosis: Positive vs. Negative.

• Multi-class classification
– In multi-class classification, the goal is to classify the input into one of several classes or categories.
– Handwritten digit recognition: Digits 0-9.

18
• Multi-label classification
– An instance can belong to multiple classes simultaneously.
– Text categorization: A news article tagged as "Politics" and "Economy."
– Medical diagnosis: A patient diagnosed with multiple diseases.

• Imbalanced classifications
– A scenario where one class is heavily over represented compared to the other(s).
– Fraud detection: Fraudulent transactions are rare compared to legitimate ones.
– Disease detection: Rare diseases vs. healthy samples.

19
1.5.7.2 Classification Algorithms
Classification Algorithms can be further divided into the categories below:

• Linear Classifiers: A linear classifier is a model that makes predictions based on a linear decision boundary
(e.g., a straight line, plane, or hyperplane).
– Logistic Regression
– Linear Discriminant Analysis (LDA)
– Support Vector Machines having kernel = ‘linear’
– Perceptron
– Stochastic Gradient Descent (SGD) Classifier
• Non-linear Classifiers: A non-linear classifier is a model that uses a non-linear decision boundary to
separate classes. It can capture complex relationships and patterns in the data.
– K-Nearest Neighbours (KNN)
– Support Vector Machines (SVM) (with non-linear kernels like RBF or polynomial).
– Decision Tree
– Gradient Boosting Machines (GBM):
* XGBoost
* LightGBM
* CatBoost
• Ensemble Classification Algorithms
– Random Forests
– Bagging Classifier
– Boosting Algorithms:
* AdaBoost
* Gradient Boosting
* Stacking
– Voting Classifier
• Probabilistic Classification Algorithms
– Naïve Bayes Classifier:
* Gaussian Naïve Bayes
* Multinomial Naïve Bayes
* Bernoulli Naïve Bayes
– Bayesian Networks:
• Neural Network-Based Algorithms
– Multilayer Perceptrons (MLPs)
– Convolutional Neural Networks (CNNs)
– Recurrent Neural Networks (RNNs)
– Transformers
• Instance-Based Algorithms
– K-Nearest Neighbors (KNN)
– Locally Weighted Learning (LWL)
– Kernel Density Estimation (KDE)

20
• Rule-Based Classification Algorithms
– Decision Trees
– Rule-Based Classifiers
– Associative Classifiers
• Deep Learning Algorithms
– Deep Neural Networks (DNNs)
– Autoencoders
– Generative Adversarial Networks (GANs)

21
1.5.8 Loss Function in Supervised Learning
• A loss function is a mathematical function that quantifies the difference between the predicted output of a
model and the actual target values.
• It serves as the primary tool to measure how well or poorly a model is performing.
• The goal of supervised learning is to minimize the loss function, thereby improving the accuracy of the
model’s predictions.
• How Loss Functions Work:
– Comparison: The loss function compares the predicted outputs with the actual outputs for each data
point in the training set.
– Quantification: It calculates a numeric value that represents the model’s error. The higher the loss,
the worse the predictions.
– Guidance: The loss function provides feedback to the model during training. Optimization algorithms
(e.g. Gradient Descent) use this feedback to update the model’s parameters and reduce the loss.
– Loss Functions in Supervised Learning For Regression:
* Mean Squared Error (MSE)
* Mean Absolute Error (MAE)
* Huber Loss
– Loss Functions in Supervised Learning For Classification
* Binary Cross Entropy Loss
* Categorical Cross-Entropy Loss
* Hinge Loss

22
1.5.9 Evaluating Regression Models
• Mean Squared Error (MSE): MSE measures the average squared difference between the predicted values
and the actual values. Lower MSE values indicate better model performance.
• Root Mean Squared Error (RMSE): RMSE is the square root of MSE, representing the standard deviation
of the prediction errors. Similar to MSE, lower RMSE values indicate better model performance.
• Mean Absolute Error (MAE): MAE measures the average absolute difference between the predicted
values and the actual values. It is less sensitive to outliers compared to MSE or RMSE.
• R-squared (Coefficient of Determination): R-squared measures the proportion of the variance in the
target variable that is explained by the model. Higher R-squared values indicate better model fit.

23
1.5.10 Sensitivity and Specificity
• True Positive (TP): The model correctly identifies a positive case (e.g., a medical test accurately diagnosing
a disease in a sick patient).
• False Positive (FP): The model incorrectly identifies a negative case as positive (e.g., a security system
alerts about a potential threat when there is none)
• False Negative (FN): The model incorrectly identifies a positive case as negative (e.g., a medical test
wrongly stating a sick patient is healthy)
• True Negative (TN): The model correctly identifies a negative case as negative (e.g., a security system does
not alert when there is no actual threat).

1.5.11 Evaluating Classification Model


• Accuracy:
– Accuracy is a basic metric used to evaluate the performance of a classification model.
– Accuracy is the percentage of predictions that the model makes correctly.
– It is calculated by dividing the number of correct predictions by the total number of predictions.

TP+TN
Accuracy =
T P + T N + FP + FN
• Precision:
– Precision is the percentage of positive predictions that the model makes that are actually correct.
– Precision focuses on the quality of the positive predictions made by the model. Basically, Of all the
instances predicted as positive, how many are actually positive?
– It is calculated by dividing the number of true positives by the total number of positive predictions.

True Positives (T P)
Precision =
True Positives (T P) + False Positives (FP)
– High Precision indicates a low false positive rate. The model is very confident when it predicts a
positive class.
• Precision vs. Recall Trade-off:
– There is often a trade-off between precision and recall. Improving one metric can come at the cost of
the other.
– A model with high precision might miss many positive cases, reducing recall.
– A model with high recall might include many false positives, reducing precision.
• Recall:
– Recall is the percentage of all positive examples that the model correctly identifies.
– Recall (also known as sensitivity or true positive rate) focuses on how well the model captures all the
actual positive instances.
– Of all the actual positive instances, how many did the model correctly identify? It is calculated by
dividing the number of true positives by the total number of positive examples.

True Positives (T P)
Recall =
True Positives (T P) + False Negatives (FN)
– High Recall Indicates a low false negative rate. The model successfully identifies most of the positive
instances.

24
• F1 score:
– To balance precision and recall, the F1-score is used.
– The F1 score is a weighted average of precision and recall.
– It is calculated by taking the harmonic mean of precision and recall.

Precision × Recall
F1 = 2.
Precision ∗ Recall
– High F1-Score indicates a good balance between precision and recall.
• Confusion matrix:
– A confusion matrix is a table that shows the number of predictions for each class, along with the
actual class labels.
– It can be used to visualize the performance of the model and identify areas where the model is
struggling.

25
1.5.12 Decision Boundary in Classification Algorithms
• A decision boundary is a line or surface that separates the feature space into regions, where each region
corresponds to a specific class.
• In a classification algorithm, the decision boundary represents the threshold or rule the model uses to decide
which class a data point belongs to.
• A decision boundary is the demarcation line (for 2D data) or a hyperplane (in higher dimensions) where the
probability or score for belonging to two or more classes is equal.
• It helps the classifier decide which side of the boundary a new data point falls on, thereby assigning it a
class.
• In 2D data, the boundary is a line, In 3D data, the boundary becomes a plane, In higher dimensions, it is a
hyperplane.

1.5.13 Cross-validation
• Cross-validation is a technique used to assess the generalization ability of a machine learning model.
• It evaluates how well a model will perform on unseen data by splitting the dataset into multiple subsets for
training and testing in a systematic way.
• Why is Cross-Validation Important?
– Prevents Overfitting: Ensures the model is not too closely tailored to the training data by validating
it on unseen subsets.
– Reliable Model Evaluation: Provides a more robust estimate of model performance than a single
train-test split. Model Comparison:
– Helps compare the performance of multiple models or configurations to select the best one.
– Utilizes the Entire Dataset: Every data point gets used for both training and testing, which is
especially useful when the dataset is small.

26
1.5.14 Regression vs Classification

27
1.6 Linear Regression
23 45
• Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a
statistical method that is used for predictive analysis.
• Linear regression makes predictions for continuous/real or numeric variables such as sales, salary, age,
product price, etc.
• Linear regression is a statistical method that is used to predict a continuous dependent variable (target
variable) based on one or more independent variables (predictor variables).
• Linear regression assumes a linear relationship between the dependent and independent variables, which
implies that the dependent variable changes proportionally with changes in the independent variables.
• Linear regression is a type of supervised machine learning algorithm that computes the linear relationship
between the dependent and independent feature or variable by fitting a linear equation to observed data.

2 https://www.youtube.com/watch?v=UZPfbG0jNec
3 https://www.youtube.com/watch?v=dXHIDLPKdmA
4 https://www.youtube.com/watch?v=jerPVDaHbEA
5 https://www.youtube.com/watch?v=tFi4Y_y-GNM

28
1.6.1 Linear Regression Line
A linear line showing the relationship between the dependent and independent variables is called a regression line.
A regression line can show two types of relationship:

• Positive Relationship: A positive relationship exists between the independent variables and the dependent
variable when the slope of the regression line is positive. In other words, as the values of the independent
variables on X-axis increase, the value of the dependent variable on the Y-axis also increases. This can be
seen as an upward slope on a scatter plot of the data.

• Negative Relationship: A negative relationship exists between the independent variables and the dependent
variable when the slope of the regression line is negative. In other words, as the values of the independent
variables on X-axis increases, the value of the dependent variable on the Y-axis decreases. This can be seen
as a downward slope on a scatter plot of the data.

29
1.6.2 Types of Linear Regression
• When there is only one independent feature, it is known as Simple Linear Regression, and when there are
more than one independent feature, it is known as Multiple Linear Regression.

• In the case of a simple linear regression, the aim is to examine the influence of an independent variable on
one dependent variable. In case of multiple linear regression, the influence of several independent variables
on one dependent variable is analyzed.
• Example: Simple Linear Regression: Does the height have an influence on the weight of a person?
• Example: Multiple Linear Regression: Do the height and gender have have an influence on the weight
of a person?

30
1.6.3 Applications of Linear Regression
Linear regression is used in many different fields, including finance, economics, and psychology, to understand and
predict the behavior of a particular variable. For example, in finance, linear regression might be used to understand
the relationship between a company’s stock price and its earnings or to predict the future value of a currency based
on its past performance.

1.6.4 Advantages of Linear Regression


• Linear regression is a relatively simple algorithm, making it easy to understand and implement. The
coefficients of the linear regression model can be interpreted as the change in the dependent variable for a
one-unit change in the independent variable, providing insights into the relationships between variables.
• Linear regression is computationally efficient and can handle large datasets effectively. It can be trained
quickly on large datasets, making it suitable for real-time applications.
• Linear regression is relatively robust to outliers compared to other machine learning algorithms. Outliers
may have a smaller impact on the overall model performance.
• Linear regression often serves as a good baseline model for comparison with more complex machine
learning algorithms.
• Linear regression is a well-established algorithm with a rich history and is widely available in various
machine learning libraries and software packages.

1.6.5 Disadvantages of Linear Regression


• Linear regression assumes a linear relationship between the dependent and independent variables. If the
relationship is not linear, the model may not perform well.
• Linear regression is sensitive to multicollinearity, which occurs when there is a high correlation between
independent variables. Multicollinearity can inflate the variance of the coefficients and lead to unstable
model predictions.
• Linear regression assumes that the features are already in a suitable form for the model. Feature engineering
may be required to transform features into a format that can be effectively used by the model.
• Linear regression is susceptible to both overfitting and underfitting. Overfitting occurs when the model
learns the training data too well and fails to generalize to unseen data. Underfitting occurs when the model
is too simple to capture the underlying relationships in the data.
• Linear regression provides limited explanatory power for complex relationships between variables. More
advanced machine learning techniques may be necessary for deeper insights.

31
1.7 Simple Linear Regression
• This is the simplest form of linear regression, and it involves only one independent variable and one
dependent variable.
• This involves predicting a dependent variable based on a single independent variable.
• Let us consider a dataset where we have a value of response y for every feature x:

• A scatter plot of the above dataset looks like this:

• Now, the task is to find a straight line that fits best in the above scatter plot so that we can predict the
response for any new feature values. (i.e a value of x not present in a dataset) This line is called a regression
line.

32
• The equation of the regression line is represented as:

y = m.x + c
where:
– y is the dependent variable
– x is the independent variable
– c is the intercept
– m is the slope or gradient
• Interpretation
– Slope (m): Indicates how much the dependent variable (y) changes for a unit change in the indepen-
dent variable (x).
– Intercept (c): The predicted value of y when x = 0.
• The regression coefficient m can now have different signs, which can be interpreted as follows
– b > 0: there is a positive correlation between x and y (the greater x, the greater y).
– b < 0: there is a negative correlation between x and y (the greater x, the smaller y).
– b = 0: there is no correlation between x and y .

33
• If all points (measured values) were exactly on one straight line, the estimate would be perfect. However,
this is almost never the case and therefore, in most cases a straight line must be found, which is as close as
possible to the individual data points.

• The attempt is thus made to keep the error in the estimation as small as possible so that the distance between
the estimated value and the true value is as small as possible.
• This distance or error is called the "residual", is abbreviated as "e" (error) and can be represented by the
greek letter epsilon (ε ).

y = m.x + c + ε
• Hence, the Best fit line means that the error between predicted values and actual values should be minimized.
In other words, The best-fitting line is the line that has the smallest difference between the predicted values
and the actual values.

• When calculating the regression line, an attempt is made to determine the regression coefficients (m and c)
so that the sum of the squared residuals is minimal.

n n
e = min ∑ di = min ∑ (yi − ŷ)2
2
i=1 i=1

34
1.7.1 How to Calculate m and c values to get the best-fit line?
• Closed-form Solution - Ordinary Least Squares
6
• Open form Solution - Gradient Descent

1.7.2 Least Squares Method


• The Least Squares Method is a standard approach used to find the best-fitting line through a set of points in
linear regression.
• It minimizes the sum of the squares of the vertical differences (residuals) between the observed data points
and the predicted values on the line.
• Why Minimize the Sum of Squares?
– Negative and positive residuals don’t cancel each other out.
– Larger errors are penalized more, emphasizing a better overall fit.
• Steps to Derive m and b

6 https://www.youtube.com/watch?v=VmbA0pi2cRQ

35
7
• Example:

7 https://www.youtube.com/watch?v=P8hT5nDai6A

36
• Code of Simple Linear Regression

37
1.7.3 Assumptions of Simple Linear Regression
Let’s take a look now at the main simple linear regression model assumptions. If these assumptions are violated,
we might want to consider a different approach. The first three, in particular, are strong assumptions and shouldn’t
be ignored.

• Linearity: The independent and dependent variables have a linear relationship with one another. This
implies that changes in the dependent variable follow those in the independent variable(s) in a linear fashion.
This means that there should be a straight line that can be drawn through the data points. If the relationship
is not linear, then linear regression will not be an accurate model.

• Independence: The observations in the dataset are independent of each other. This means that the value
of the dependent variable for one observation does not depend on the value of the dependent variable for
another observation. If the observations are not independent, then linear regression will not be an accurate
model.

38
• Homoscedasticity: Since in practice the regression model never exactly predicts the dependent variable,
there is always an error. This very error must have a constant variance over the predicted range.

To test homoscedasticity, i.e. the constant variance of the residuals, the dependent variable is plotted on
the x-axis and the error on the y-axis. Now the error should scatter evenly over the entire range. If this is
the case, homoscedasticity is present. If this is not the case, heteroskedasticity is present. In the case of
heteroscedasticity, the error has different variances, depending on the value range of the dependent variable.
• Normal distribution of the error: The error epsilon or residuals should be normally distributed. This
means that the residuals should follow a bell-shaped curve. If the residuals are not normally distributed, then
linear regression will not be an accurate model. In the analytical way, you can use either the Kolmogorov-
Smirnov test or the Shapiro-Wilk test. In the graphical variant, either the histogram is looked at or, even
better, the so-called QQ-plot or Quantile-Quantile-plot. The more the data lie on the line, the better the
normal distribution.

39
1.8 Logistic Regression
• Suppose you want to predict whether today is going to be a sunny day or not. There are two possible
outcomes: "sunny" or "not sunny".
• The outcome variable is also known as a "target variable" or a "dependent variable".
• There are many variables that could influence the outcome such as ‘temperature the day before’, ‘air
pressure’ etc. the influencing variables are known as features, independent variables, or predictors —
all these terms mean the same thing.
• Classification techniques are an essential part of machine learning and data mining applications. Approxi-
mately 70% of data science problems are classification problems.
• Logistic regression is a supervised machine learning algorithm used for classification tasks where the goal
is to predict the probability that an instance belongs to a given class or not.

1.8.1 Types of Logistic Regression


• Binary: In Binary Logistic regression, there can be only two possible types of the dependent variables, such
as 0 or 1, Pass or Fail, Spam or Not Spam, Cancer or No Cancer etc.
• Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered types of the
dependent variable, such as “cat”, “dogs”, or “sheep”.
• Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent
variables, such as “low”, “Medium”, or “High”, or restaurant or product rating from 1 to 5.

1.8.2 Assumptions of Logistic Regression


• Logistic Regression model requires the dependent variable to be binary, multinomial or ordinal in nature.
• Independent observations: It requires the observations to be independent of each other, meaning there is
no correlation between any input variables.
• No Multicollinearity: Logistic Regression algorithm requires little or no multicollinearity among the
independent variables. It means that the independent variables should not be too highly correlated with each
other.
• Linearity relationship between independent variables and log odds: The relationship between the
independent variables and the log odds of the dependent variable should be linear.
Large sample size: The success of Logistic Regression model depends on the sample sizes. Typically, it
requires a large sample size to achieve the high accuracy.
• Absence of Outliers: While not a strict requirement, outliers can significantly influence the model. It’s
important to check for and address any outliers that might distort the results.

40
1.8.3 Linear Regression vs Logistic Regression
• Let us consider a problem where we are given a dataset containing Height and Weight for a group of people.
Our task is to predict the Weight for new entries in the Height column.
• So we can figure out that this is a regression problem where we will build a Linear Regression model.
• Now suppose we have an additional field Obesity and we have to classify whether a person is obese or not
depending on their provided height and weight.
• This is clearly a classification problem where we have to segregate the dataset into two classes (Obese and
Not-Obese).
• So, for the new problem, we can again follow the Linear Regression steps and build a regression line.
However, will not do a good job in classifying two classes.

• Now we have a classification problem, and we want to predict the binary output variable Y (2 values: either
1 or 0). For example, the case of flipping a coin (Head/Tail). The response yi is binary: 1 if the coin is Head,
0 if the coin is Tail.
• As can be seen in the graph, however, values between plus and minus infinity can now occur. The goal of
logistic regression, however, is to estimate the probability of occurrence and not the value of the variable
itself. Therefore, the equation must still be transformed.
So. . . how can we predict a classification problem?
• To do this, it is necessary to restrict the value range for the prediction to the range between 0 and 1. As we
are now looking for a model for probabilities, we should ensure the model predicts values on the scale from
0 to 1.
• To ensure that only values between 0 and 1 are possible, the logistic function is used.

41
1.8.4 Logistic Function (Sigmoid Function)
• The sigmoid function is a mathematical function used to map the predicted values to probabilities.
• It maps any real value into another value within a range of 0 and 1.
• The value of the logistic regression must be between 0 and 1, which cannot go beyond this limit, so it forms
a curve like the "S" form. The S-form curve is called the Sigmoid function or the logistic function.
• In logistic regression, we use the concept of the threshold value, which defines the probability of either 0 or
1. Such as values above the threshold value tends to 1, and a value below the threshold values tends to 0.

• Decision boundary: The sigmoid function returns a probability value between 0 and 1. This probability
value is then mapped to a discrete class which is either “Class - 0” or “Class - 1”. In order to map this
probability value to a discrete class (pass/fail, yes/no, true/false), we select a threshold value. This threshold
value is called Decision boundary. Above this threshold value, we will map the probability values into Class
- 1 and below which we will map values into Class - 0.
• Mathematically, it can be expressed as follows:-

p ≥ 0.5 =⇒ class =1
p < 0.5 =⇒ class = 0

• Generally, the decision boundary is set to 0.5. 0.8(> 0.5), we will


So, if the probability value is
map this observation to class 1. Similarly, if the probability value is 0.2(< 0.5), we will map this
observation to class 0. This is represented in the graph below:

42
1.8.5 Logistic Regression implementation
• https://github.com/clareyan/From-Linear-to-Logistic-Regression-Explained-Step-b
y-Step/tree/master

1.8.6 Logistic Regression Resources


• Video Sources:
– https://www.youtube.com/watch?v=XNXzVfItWGY&list=PLKnIA16_Rmvb-ZTsM1QS-tlw
mlkeGSnru
• Websites:
– https://www.geeksforgeeks.org/understanding-logistic-regression/#what-is-l
ogistic-regression
– https://datatab.net/tutorial/logistic-regression
– https://www.kaggle.com/code/prashant111/logistic-regression-classifier-tut
orial
– https://www.analyticsvidhya.com/blog/2021/08/conceptual-understanding-of-l
ogistic-regression-for-data-science-beginners/
– https://www.analyticsvidhya.com/blog/2020/12/beginners-take-how-logistic-r
egression-is-related-to-linear-regression/
– https://www.kdnuggets.com/2020/03/linear-logistic-regression-explained.html
– https://www.geeksforgeeks.org/ml-cost-function-in-logistic-regression/

43
1.9 K-Nearest Neighbor(KNN) Algorithm
89
• K-Nearest Neighbors (KNN) algorithm is a supervised machine learning method used for classification
and regression problems, although it’s more commonly applied in classification.
• KNN is based on the idea that the observations closest to a given data point are the most "similar"
observations in a data set, and we can therefore classify unforeseen points based on the values of the closest
existing points.
• In k-Nearest Neighbours (k-NN) algorithm k is just a number that tells the algorithm how many nearby
points (neighbours) to look at when it makes a decision.
• During the training phase, the KNN algorithm memorizes the entire dataset. When presented with new data,
it categorizes it into a class that closely resembles the characteristics of the new data as shown in Figure
below.

1.9.1 KNN Algorithm


The K-NN working can be explained on the basis of the below algorithm:

1. Select the number K of the neighbors: Start by deciding how many neighbors (data points from your
dataset) you want to consider when making predictions. This is your ’K’ value.
2. Calculate the distance: Find the distance between your new data point and the chosen number of neighbors.
3. Finding Nearest Neighbors:The k data points with the smallest distances to the target point are nearest
neighbors.
4. Count Data Points in Each Category: Among these k-nearest neighbors, count how many belong to each
category. For instance, count how many are in Category A and how many are in Category B.
5. Assign to the Majority Category: Assign the new data points to that category for which the number of the
neighbor is maximum. If most of them are in Category A, your new point goes into Category A.

8 https://www.youtube.com/watch?v=abnL_GUGub4
9 https://www.youtube.com/watch?v=BYaoDZM1IcU&list=PLKnIA16_RmvZiE-lEdN5RDi18-u-T43zd

44
1.9.2 Example of KNN Algorithm Working?
• Let’s take a simple case to understand this algorithm. Following is a spread of red circles (RC) and green
squares (GS):

• You intend to find out the class of the blue star (BS). BS can either be RC or GS and nothing else. The “K”
in KNN algorithm is the nearest neighbor we wish to take the vote from.
• Let’s say K = 3. Hence, we will now make a circle with BS as the center just as big as to enclose only
three data points on the plane.

• The three closest points to BS are all RC. Hence, with a good confidence level, we can say that the BS
should belong to the class RC. Here, the choice became obvious as all three votes from the closest neighbor
went to RC.

45
1.9.3 Example of KNN Algorithm Working?
• Suppose we have a new data point and we need to put it in the required category. Consider the below image:

• Firstly, we will choose the number of neighbors, lets say we will choose k = 5.
• Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is the
distance between two points, which we have already studied in geometry. It can be calculated as:

• By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in category A
and two nearest neighbors in category B.

• As we can see the 3 nearest neighbors are from category A, hence this new data point must belong to
category A.

46
1.9.4 How to choose the value of k for KNN Algorithm?
• The value of k is very crucial in the KNN algorithm to define the number of neighbors in the algorithm.
• As seen in example below, changing ’K’ changes predictions. With K=3, we predict Category B, while with
K=5, we predict Category A.
• So, picking the right ’K’ is a big deal in making KNN work well.

• Low k values make predictions unstable: Take this example: a query point is surrounded by 2 green dots
and one red triangle. If k=1 and it happens that the point closest to the query point is one of the green
dots, the algorithm will incorrectly predict a green dot as the outcome of the query. Low k values are
high variance (the model fits too closely to the training data), high complexity, and low bias (the model is
complex enough to fit the training data well).
• High k values are noisy: A higher k value will increase the accuracy of predictions because there are more
numbers of which to calculate the modes or means. However, if the k value is too high, it will likely result
in low variance, low complexity, and high bias (the model is NOT complex enough to fit the training data
well).
• Ideally, you want to find a k value that is between high variance and high bias. It is also recommended to
choose an odd number for k to avoid ties in classification analysis.
• The right k value is also relative to your data set. To choose that value, you might try to find the square root
of N, where N is the number of data points in the training dataset.
• Cross-validation tactics can also help you choose the k value best suited to your dataset.

47
1.9.5 Distance Metrics Used in KNN Algorithm
KNN algoritmh uses distance metrics to identify which data points are closest to a given query point. Generally,
Euclidean distance is common, but other metrics like Manhattan distance or MinKowski Distance are also used.

1. Euclidean Distance: Euclidean distance is defined as the straight-line distance between two points in a
plane or space. You can think of it like the shortest path you would walk if you were to go directly from one
point to another.

q
d(x, y) = (y2 − y1 )2 + (x2 − x1 )2
s
n
= ∑ (yi − xi)2
1

48
2. Manhattan Distance:
• This is also another popular distance metric, which measures the absolute value between two points.
• This is the total distance you would travel if you could only move along horizontal and vertical lines
(like a grid or city streets).
• It’s also called “taxicab distance” because a taxi can only drive along the grid-like streets of a city.

Manhattan Distance = |x2 − x1 |


n
= ∑ |xi − yi |
1

If the distance from the origin is


1, the Manhattan distance and Euclidean distance are as follows:

49
3. Minkowski Distance:
• Minkowski distance is like a family of distances, which includes both Euclidean and Manhattan
distances as special cases
• This distance measure is the generalized form of Euclidean and Manhattan distance metrics.
• The parameter, p, in the formula below, allows for the creation of other distance metrics.
• Manhattan distance is denoted with p equal to one.
• Euclidean distance is represented by this formula when p is equal to two,

n 1
Minkowski Distance = (∑ |xi − yi |) p
i

50
1.9.6 Advantages of the KNN Algorithm
• Easy to implement: The K-NN algorithm is easy to implement because its complexity is relatively low as
compared to other machine learning algorithms.
• Easily Adaptable- K-NN stores all data in memory, so when new data points are added, it automatically
adjusts and uses the new data for future predictions.
• Few Hyperparameters – The only parameters which are required in the training of a KNN algorithm are
the value of k and the choice of the distance metric, which is low when compared to other machine learning
algorithms.

1.9.7 Disadvantages of the KNN Algorithm


• Difficult to scale: kNN takes up a lot of memory and data storage, it brings up the expenses associated with
storage. This reliance on memory also means that the algorithm is computationally intensive, which is in
turn resource-intensive.
• It is also called a lazy learner algorithm because it does not learn from the training set immediately instead
it stores the dataset and at the time of classification, it performs an action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that
data into a category that is much similar to the new data.
• Curse of Dimensionality: When the number of features increases, K-NN struggles to classify data accu-
rately, a problem known as curse of dimensionality which implies the algorithm faces a hard time classifying
the data points properly when the dimensionality is too high.
• Prone to Overfitting:
– As the algorithm is affected due to the curse of dimensionality, KNN is prone to the problem of
overfitting as well.
– While feature selection and dimensionality reduction techniques are leveraged to prevent this from
occurring, , the value of k can also impact the model’s behavior.
– Lower values of k can overfit the data, whereas higher values of k tend to “smooth out” the prediction
values since it is averaging the values over a greater area, or neighborhood. . However, if the value of
k is too high, then it can underfit the data.

51
1.9.8 Implementation of KNN Algorithm for Classification

# Import necessary libraries


import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

# Step 1: Load the dataset (Iris dataset)


data = load_iris()
X = data.data[:, :2] # Select only the first two features for 2D visualization
y = data.target # Target labels

# Filter the dataset to include only two classes (Setosa and Versicolor)
binary_filter = y < 2 # Keep only class 0 (Setosa) and class 1 (Versicolor)
X = X[binary_filter]
y = y[binary_filter]

# Step 2: Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Create a KNN classifier


k = 3 # Number of neighbors
knn = KNeighborsClassifier(n_neighbors=k)

# Step 4: Train the KNN model


knn.fit(X_train, y_train)

# Step 5: Make predictions


y_pred = knn.predict(X_test)

# Step 6: Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

# Display a detailed classification report


print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=data.target_names[:2]))

# Step 7: Plot classification boundary lines

# Create a mesh grid for plotting


x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
np.arange(y_min, y_max, 0.01))

# Predict for each point in the mesh grid


Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot the classification boundary lines


plt.figure(figsize=(10, 6))
plt.contour(xx, yy, Z, colors='k', linewidths=1) # Draw boundary lines

# Scatter plot of training and testing data


plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, marker='o', label="Train Data",
edgecolor='k')
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, marker='s', label="Test Data", edgecolor='k')

# Label the axes and add legend

52
plt.title("KNN Classification with Boundary Lines (Two Classes)", fontsize=16)
plt.xlabel(data.feature_names[0], fontsize=12)
plt.ylabel(data.feature_names[1], fontsize=12)
plt.legend(fontsize=10)
plt.show()

53
1.10 Unsupervised Machine Learning
• In contrast to supervised learning, Unsupervised machine learning models are given unlabeled data and
allow discover patterns and insights on their own—without explicit direction or instruction.
• Machine learning that takes place in the absence of human supervision is known as unsupervised machine
learning.
• There are no explicit outputs to guide the learning process.
• Resembles self-directed human learning without clear instructions.

54
• Unsupervised learning is a type of machine learning in which models are trained using unlabeled dataset
and are allowed to act on that data without any supervision.
• Unsupervised learning is often used for tasks such as clustering, dimensionality reduction, and anomaly
detection.

55
1.10.1 Why use Unsupervised Learning?
• Unsupervised learning is helpful for finding useful insights from the data.
• Unsupervised learning is much similar as a human learns to think by their own experiences, which makes it
closer to the real AI.
• Unsupervised learning works on unlabeled and uncategorized data which make unsupervised learning more
important.
• In real-world, we do not always have input data with the corresponding output so to solve such cases, we
need unsupervised learning.

1.10.2 Challenges of Unsupervised Learning


• Evaluation: Assessing the performance of unsupervised learning algorithms is difficult without predefined
labels or categories.
• Interpretability: Understanding the decision-making process of unsupervised learning models is often
challenging.
• Overfitting: Unsupervised learning algorithms can overfit to the specific dataset used for training, limiting
their ability to generalize to new data.
• Data quality: Unsupervised learning algorithms are sensitive to the quality of the input data. Noisy or
incomplete data can lead to misleading or inaccurate results.
• Computational complexity: Some unsupervised learning algorithms, particularly those dealing with
high-dimensional data or large datasets, can be computationally expensive.

56
1.10.3 Advantages of Unsupervised learning
• It does not require training data to be labeled.
• Dimensionality reduction can be easily accomplished using unsupervised learning.
• Capable of finding previously unknown patterns in data.
• Unsupervised learning can help you gain insights from unlabeled data that you might not have been able to
get otherwise.
• Unsupervised learning is good at finding patterns and relationships in data without being told what to look
for. This can help you learn new things about your data.
• No Need for Labeled Data: Saves time and effort required for manual labeling, which can be expensive
and time-consuming.
• Discovering Hidden Patterns: Identifies clusters, segments, or anomalies that may not be evident through
manual analysis.
• Ideal for exploring data when little is known about its structure or relationships.
• Applicable in scenarios where labeling is impractical, such as identifying abnormal network traffic in
cybersecurity.
• Handles Complex Data: Can work with high-dimensional data and uncover structures, such as in gene
expression analysis.

1.10.4 Disadvantages of Unsupervised learning


• Difficult to measure accuracy or effectiveness due to lack of predefined answers during training.
• The results often have lesser accuracy.
• The user needs to spend time interpreting and label the classes which follow that classification.
• Unsupervised learning can be sensitive to data quality, including missing values, outliers, and noisy data.
• Without labeled data, it can be difficult to evaluate the performance of unsupervised learning models, making
it challenging to assess their effectiveness.

1.10.5 Applications of Unsupervised learning


• Customer segmentation: Unsupervised learning can be used to segment customers into groups based on
their demographics, behavior, or preferences. This can help businesses to better understand their customers
and target them with more relevant marketing campaigns.
• Fraud detection: Unsupervised learning can be used to detect fraud in financial data by identifying
transactions that deviate from the expected patterns. This can help to prevent fraud by flagging these
transactions for further investigation.
• Recommendation systems: Unsupervised learning can be used to recommend items to users based on their
past behavior or preferences. For example, a recommendation system might use unsupervised learning to
identify users who have similar taste in movies, and then recommend movies that those users have enjoyed.
• Natural language processing (NLP): Unsupervised learning is used in a variety of NLP tasks, including
topic modeling, document clustering, and part-of-speech tagging.
• Image analysis: Unsupervised learning is used in a variety of image analysis tasks, including image
segmentation, object detection, and image pattern recognition.

57
1.10.6 Unsupervised Learning Algorithms
There are mainly 2 types of algorithms which are used for Unsupervised dataset.

• Clustering
• Association Rule Learning
• Dimensionality Reduction

58
1.10.7 Clustering
• Clustering is a method of grouping the objects into clusters such that objects with most similarities remains
into a group and has less or no similarities with the objects of another group.
• In other words, Clustering in unsupervised machine learning is the process of grouping unlabeled data into
clusters based on their similarities.
• The goal of clustering is to identify patterns and relationships in the data without any prior knowledge of
the data’s meaning.
• Clustering can even break it down further into different types of clustering; for example:
– Exlcusive clustering: Data is grouped such that a single data point exclusively belongs to one cluster.
– Overlapping clustering: A soft cluster in which a single data point may belong to multiple clusters
with varying degrees of membership.
– Hierarchical clustering: A type of clustering in which groups are created such that similar instances
are within the same group and different objects are in other groups.
– Probalistic clustering: Clusters are created using probability distribution.

1.10.7.1 Clustering Algorithms


• K-means Clustering
• Hierarchical Clustering
• Density-Based Clustering (DBSCAN)
• Mean-Shift Clustering
• Spectral Clustering
• Singular Value Decomposition
• Principal Component Analysis
• Independent Component Analysis
• Gaussian Mixture Models (GMMs)

59
1.11 K-Means Clustering
https://www.youtube.com/watch?v=5shTLzwAdEc

60
1.12 Hierarchical Clustering
https://www.youtube.com/watch?v=0jPGHniVVNc&pp=ygUXaGllcmFyY2hpY2FsIGNsdXN0ZXJpbmc%3D

1.12.1 Comparison of K-Means Clustering and Hierarchical Clustering

1.12.2 When to Use K-Means Clustering


• Large datasets where computational efficiency is important.
• Clusters are approximately spherical and evenly sized.
• A predefined number of clusters (k) is available or can be estimated.

1.12.3 When to Use Hierarchical Clustering


• Small to medium-sized datasets due to computational limitations.
• Need to understand the relationships or hierarchy between clusters.
• Irregular or non-spherical cluster shapes

61
1.12.4 Association Rule Learning
• An association rule is unsupervised learning is also known as association rule mining, is a common technique
which is used to discover the relationships between variables in the large database.
• It determines the set of items that occurs together in the dataset.
• This technique is basically used for market basket analysis that helps to better understand the relationship
between different products.
• For e.g. shopping stores use algorithms based on this technique to find out the relationship between the sale
of one product w.r.t to another’s sales based on customer behavior.
• Such as people who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item.

1.12.4.1 Association Rule Learning Algorithms


• Apriori Algorithm
• Eclat Algorithm
• FP-Growth Algorithm
• Efficient Tree-based Algorithms

62
1.12.5 Dimensionality Reduction
• Dimensionality reduction is the process of reducing the number of features in a dataset while preserving as
much information as possible.
• These algorithms seek to transform data from high-dimensional spaces to low-dimensional spaces without
compromising meaningful properties in the original data.
• These techniques are typically deployed during exploratory data analysis (EDA) or data processing to
prepare the data for modeling.
• This technique is useful for improving the performance of machine learning algorithms and for data
visualization.
• It’s helpful to reduce the dimensionality of a dataset during EDA to help visualize data: this is because
visualizing data in more than three dimensions is difficult.

1.12.5.1 Dimensionality Reduction Algorithms


• Principal Component Analysis (PCA)
• Linear Discriminant Analysis (LDA)
• Non-negative Matrix Factorization (NMF)
• Locally Linear Embedding (LLE)
• Isomap

63
1.13 Supervised vs Unsupervised Learning

64
1.14 Reinforcement Learning (RL)
• Reinforcement learning involves training a model to make sequences of decisions in an environment to
maximize a cumulative reward.
• The model learns through trial and error, receiving feedback in the form of rewards or penalties.
• Mimics learning from rewards and punishments, like training a pet.
• Reinforcement learning is a type of machine learning method where an intelligent agent (computer program)
interacts with the environment and learns to act within that.
• Key Concepts of Reinforcement Learning:
– Agent: The learner or decision-maker.
– Environment: Everything the agent interacts with.
– State: A specific situation in which the agent finds itself.
– Action: All possible moves the agent can make.
– Reward: Feedback from the environment based on the action taken.
• How It Works: RL operates on the principle of learning optimal behavior through trial and error. The agent
takes actions within the environment, receives rewards or penalties, and adjusts its behavior to maximize the
cumulative reward.
– An agent interacts with an environment.
– The agent takes actions based on its state and receives feedback (reward or penalty).
– Over time, the agent learns the best actions to maximize rewards.

65
• Example: Navigating a Maze

– The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward that is
the diamond and avoid the hurdles that are fired.
– The robot learns by trying all the possible paths and then choosing the path which gives him the
reward with the least hurdles.
– Each right step will give the robot a reward and each wrong step will subtract the reward of the robot.
– The total reward will be calculated when it reaches the final reward that is the diamond.

66
1.14.1 Elements of Reinforcement Learning
• Policy: A strategy used by the agent to determine the next action based on the current state.
• Reward Function: A function that provides a scalar feedback signal based on the state and action.
• Value Function: A function that estimates the expected cumulative reward from a given state.
• Model of the Environment: A representation of the environment that helps in planning by predicting
future states and rewards.

1.14.2 Applications
• Robotics: Automating tasks in structured environments like manufacturing. Teaching robots to walk or
perform tasks.
• Game playing: Developing strategies in complex games like chess. AI systems like AlphaGo and chess
engines.
• Autonomous driving: Learning to navigate roads safely. Self-driving cars learn to navigate by interacting
with simulated environments.
• Industrial Control: Real-time adjustments in operations like refinery controls.
• Personalized Training Systems: Customizing instruction based on individual needs.

67
1.14.3 Advantages
• Learning from Interaction: Models learn optimal behavior through continuous interaction with their
environment.
• No Need for Pre-Labeled Data: Learns directly from the environment without requiring labeled datasets.
• Adaptability: Adapts to changing environments, making it suitable for complex, evolving systems like
robotics or autonomous vehicles.
• Focus on Long-Term Goals: Balances immediate and future rewards, ensuring strategies optimize cumula-
tive rewards over time.
• Problem-Specific Optimization: Excels in solving problems like game playing, resource management, or
scheduling tasks where step-by-step actions impact overall outcomes.
• Reinforcement learning can be used to solve very complex problems that cannot be solved by conventional
techniques.
• The model can correct the errors that occurred during the training process.
• In RL, training data is obtained via the direct interaction of the agent with the environment
• Reinforcement learning can handle environments that are non-deterministic, meaning that the outcomes of
actions are not always predictable. This is useful in real-world applications where the environment may
change over time or is uncertain.
• Reinforcement learning can be used to solve a wide range of problems, including those that involve decision
making, control, and optimization.
• Reinforcement learning is a flexible approach that can be combined with other machine learning techniques,
such as deep learning, to improve performance.

1.14.4 Disadvantages
• Reinforcement learning is not preferable to use for solving simple problems.
• Reinforcement learning needs a lot of data and a lot of computation
• Reinforcement learning is highly dependent on the quality of the reward function. If the reward function is
poorly designed, the agent may not learn the desired behavior.
• Reinforcement learning can be difficult to debug and interpret. It is not always clear why the agent is
behaving in a certain way, which can make it difficult to diagnose and fix problems.

68
1.15 Supervised vs Unsupervised vs Reinforcement Learning

• Supervised Learning is best for tasks where labeled data is available and precise predictions are needed.
• Unsupervised Learning is ideal for exploring data without prior labels and discovering hidden patterns.
• Reinforcement Learning is suited for dynamic systems requiring sequential decision-making and long-term
optimization.

69
1.16 Sample Questions
• What are the three main types of machine learning?
• Describe one real-world application of supervised learning and one of unsupervised learning.
• Explain the concept of overfitting in machine learning and why it is problematic.
• Write the steps involved in implementing a basic regression model in supervised learning.
• How would you implement k-nearest neighbors (KNN) for classification? Outline the steps and provide a
code example.
• What are the key characteristics of supervised learning?
• Describe a scenario where unsupervised learning would be more appropriate than supervised learning.
• Provide an example of a real-world problem that can be solved using reinforcement learning. Describe the
key components involved.
• Discuss the difference between precision and recall in the context of classification. Provide examples of
when each metric is crucial.
• write a Python code to implement k-means clustering and visualize the clusters.
• What is reinforcement learning, and how does it differ from supervised learning?
• Describe how logistic regression can be used for binary classification. Include a real-world example.
• Write a code snippet in python to perform a simple linear regression and plot the regression line on a scatter
plot.
• Explain the primary differences between supervised and unsupervised machine learning.
• What are the advantages of using machine learning techniques in data analysis?
• How does reinforcement learning differ from supervised and unsupervised learning? Provide an example to
illustrate.
• Compare and contrast k-means clustering and hierarchical clustering. What are the advantages and disad-
vantages of each?
• Discuss how overfitting can be detected and mitigated in a machine learning model.
• Explain the concept of cross-validation and its importance in machine learning.
• Write a Python code snippet to split a dataset into training and testing sets and fit a simple linear regression
model.
• Define clustering and explain its importance in unsupervised learning.
• Explain how the concept of a decision boundary is used in classification algorithms.
• What is the role of a loss function in supervised learning? How does it impact the model’s performance?
• Explain the bias-variance tradeoff in machine learning. How does it affect model selection?
• How would you use cross-validation to select the best model from a set of candidate models? Write the
steps and include a code example.
• Define overfitting and underfitting. How do they affect the performance of a machine learning model?
• How does clustering differ from classification in machine learning? Provide examples of each.
• A company wants to predict the salary of an employee based on their years of experience. The dataset
provided includes the following salary values (in $1000) corresponding to 1, 2, 3, 4, 5, and 6 years of
experience: 30, 35, 40, 50, 55, and 60. Using linear regression, find the equation of the line y = mx + b,
where y represents the salary and x represents the years of experience. Predict the salary for an employee
with 7 years of experience.

70

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy