Unit-5 ML Notes
Unit-5 ML Notes
3
1.6.2 Types of Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.6.3 Applications of Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . 31
1.6.4 Advantages of Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . 31
1.6.5 Disadvantages of Linear Regression . . . . . . . . . . . . . . . . . . . . . . . 31
1.7 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.7.1 How to Calculate m and c values to get the best-fit line? . . . . . . . . . . . . . . . 35
1.7.2 Least Squares Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.7.3 Assumptions of Simple Linear Regression . . . . . . . . . . . . . . . . . . . . 38
1.8 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.8.1 Types of Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.8.2 Assumptions of Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . 40
1.8.3 Linear Regression vs Logistic Regression . . . . . . . . . . . . . . . . . . . . . 41
1.8.4 Logistic Function (Sigmoid Function) . . . . . . . . . . . . . . . . . . . . . . 42
1.8.5 Logistic Regression implementation . . . . . . . . . . . . . . . . . . . . . . . 43
1.8.6 Logistic Regression Resources . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.9 K-Nearest Neighbor(KNN) Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.9.1 KNN Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.9.2 Example of KNN Algorithm Working? . . . . . . . . . . . . . . . . . . . . . . 45
1.9.3 Example of KNN Algorithm Working? . . . . . . . . . . . . . . . . . . . . . . 46
1.9.4 How to choose the value of k for KNN Algorithm? . . . . . . . . . . . . . . . . . 47
1.9.5 Distance Metrics Used in KNN Algorithm . . . . . . . . . . . . . . . . . . . . 48
1.9.6 Advantages of the KNN Algorithm . . . . . . . . . . . . . . . . . . . . . . . 51
1.9.7 Disadvantages of the KNN Algorithm . . . . . . . . . . . . . . . . . . . . . . 51
1.9.8 Implementation of KNN Algorithm for Classification . . . . . . . . . . . . . . . . 52
1.10 Unsupervised Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
1.10.1 Why use Unsupervised Learning? . . . . . . . . . . . . . . . . . . . . . . . . 56
1.10.2 Challenges of Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . 56
1.10.3 Advantages of Unsupervised learning . . . . . . . . . . . . . . . . . . . . . . 57
1.10.4 Disadvantages of Unsupervised learning . . . . . . . . . . . . . . . . . . . . . 57
1.10.5 Applications of Unsupervised learning . . . . . . . . . . . . . . . . . . . . . . 57
1.10.6 Unsupervised Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . 58
1.10.7 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
1.10.7.1 Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 59
1.11 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
1.11.1 When to Use K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . 61
1.11.2 When to Use Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . 61
1.12 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
1.12.1 Comparison of K-Means Clustering and Hierarchical Clustering . . . . . . . . . . . 62
1.12.2 Association Rule Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
1.12.2.1 Association Rule Learning Algorithms . . . . . . . . . . . . . . . . . 63
1.12.3 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
1.12.3.1 Dimensionality Reduction Algorithms . . . . . . . . . . . . . . . . . . 64
1.13 Supervised vs Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 65
1.14 Reinforcement Learning (RL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
1.14.1 Elements of Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . 68
1.14.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
1.14.3 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
1.14.4 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
1.15 Supervised vs Unsupervised vs Reinforcement Learning . . . . . . . . . . . . . . . . . . 70
1.16 Sample Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Chapter 1
1
1.2 Human Learning to Machine Learning
Just as humans learn from experiences, machines learn from data. While humans rely on senses, memory, and
cognition to process and interpret information, machines use algorithms to analyze data, identify patterns, and
make decisions.
2
1.3 Machine Learning (ML)
• Machine Learning (ML) is a subset of Artificial Intelligence (AI) that provides systems the ability to learn
and improve automatically from experience (data) without being explicitly programmed.
• Instead of hard-coding rules for every scenario, ML models analyze data, learn relationships, and generalize
1
their findings to new, unseen data.
1 https://www.youtube.com/watch?v=cKxRvEZd3Mw
3
1.4 Evaluating a machine learning model
• So you’ve built a machine learning model and trained it on some data... now what?
• The main goal of each machine learning model is to generalize well. Here generalization defines the ability
of an ML model to provide a suitable output by adapting the given set of unknown input.
• Now, suppose we want to check how well our machine learning model learns and generalizes to the new
data.
4
1.4.2.1 Bias
• In general, a machine learning model analyses the data, find patterns in it and make predictions. While
training, the model learns these patterns in the dataset and applies them to test data for prediction.
• While making predictions, a difference occurs between prediction values made by the model and actual
values/expected values, and this difference is known as bias errors or errors due to bias.
• Bias refers to the error due to overly simplistic assumptions in the learning algorithm. These assumptions
make the model easier to comprehend and learn but might not capture the underlying complexities of the
data. It is the error due to the model’s inability to represent the true relationship between input and output
accurately.
• A model has either:
– Low Bias: Low bias value means fewer assumptions are taken to build the target function. In this
case, the model will closely match the training dataset.
– High Bias: A model with a high bias makes more assumptions, and the model becomes unable to
capture the important features of our dataset. In this case, the model will not match the training
dataset closely. A high bias model also cannot perform well on new data.
• The high-bias model will not be able to capture the dataset trend. It is considered as the underfitting model
which has a high error rate.
• When a model has poor performance both on the training and testing data means high bias because of the
simple model, indicating underfitting.
5
1.4.2.3 Variance
• Variance is the measure of spread in data from its mean position.
• In machine learning variance is the amount by which the performance of a predictive model changes when
it is trained on different subsets of the training data.
• More specifically, variance is the variability of the model that how much it is sensitive to another subset of
the training dataset. i.e. how much it can adjust on the new subset of the training dataset.
• Ideally, a model should not vary too much from one training dataset to another, which means the algorithm
should be good in understanding the hidden mapping between inputs and output variables.
• Variance errors are either low or high-variance errors:
– Low variance: Low variance means that the model is less sensitive to changes in the training data
and can produce consistent estimates of the target function with different subsets of data from the
same distribution.
– Low variance means there is a small variation in the prediction of the target function with changes in
the training data set.
– High variance: High variance means that the model is very sensitive to changes in the training data
and can result in significant changes in the estimate of the target function when trained on different
subsets of data from the same distribution.
– This is the case of overfitting when the model performs well on the training data but poorly on new,
unseen test data. It fits the training data too closely that it fails on the new training dataset.
6
1.4.2.5 Different Combinations of Bias-Variance
There can be four combinations between bias and variance.
1. High Bias, Low Variance: A model with high bias and low variance is said to be underfitting.
2. High Variance, Low Bias: A model with high variance and low bias is said to be overfitting.
3. High-Bias, High-Variance: A model has both high bias and high variance, which means that the model is
not able to capture the underlying patterns in the data (high bias) and is also too sensitive to changes in the
training data (high variance). As a result, the model will produce inconsistent and inaccurate predictions on
average.
4. Low Bias, Low Variance: A model that has low bias and low variance means that the model is able to
capture the underlying patterns in the data (low bias) and is not too sensitive to changes in the training data
(low variance). This is the ideal scenario for a machine learning model, as it is able to generalize well to
new, unseen data and produce consistent and accurate predictions. But in practice, it’s not possible.
• High variance can be identified if the model has: Low training error and high test error.
• High Bias can be identified if the model has: High training error and the test error is almost similar to
training error.
7
1.4.3 Bias-Variance Trade-Off
• While building the machine learning model, it is really important to take care of bias and variance in order
to avoid overfitting and underfitting in the model.
• If the model is very simple with fewer parameters, it may have low variance and high bias. Whereas, if the
model has a large number of parameters, it will have high variance and low bias.
• So, it is required to make a balance between bias and variance errors, and this balance between the bias
error and variance error is known as the Bias-Variance trade-off.
• For an accurate prediction of the model, algorithms need a low variance and low bias. But this is not
possible because bias and variance are related to each other. So, we need to find a sweet spot between bias
and variance to make an optimal model.
• An algorithm can’t be more complex and less complex at the same time. For the graph, the perfect tradeoff
will be like this.
8
1.4.4 Underfitting in Machine Learning
• A statistical model or a machine learning algorithm is said to have underfitting when a model is too simple
to capture data complexities.
• It represents the inability of the model to learn the training data effectively result in poor performance both
on the training and testing data.
• In simple terms, an underfit model’s are inaccurate, especially when applied to new, unseen examples. It
mainly happens when we uses very simple model with overly simplified assumptions.
• To address underfitting problem of the model, we need to use more complex models, with enhanced feature
representation, and less regularization.
• The underfitting model has High bias and low variance.
9
1.4.5 Overfitting in Machine Learning
• A statistical model is said to be overfitted when the model does not make accurate predictions on testing
data.
• When a model gets trained with so much data, it starts learning from the noise and inaccurate data entries in
our data set. And when testing with test data results in High variance. Then the model does not categorize
the data correctly, because of too many details and noise.
• The causes of overfitting are the non-parametric and non-linear methods because these types of machine
learning algorithms have more freedom in building the model based on the dataset and therefore they can
really build unrealistic models.
• The Overfitting model has High variance and Low bias..
10
1.5 Supervised Machine Learning Algorithms
• In Supervised Learning, the goal is to learn a mapping between input features (independent variables) and
the target output (dependent variable).
• In supervised learning, the model is trained on a labeled dataset, where the input data (features) is paired
with the correct output (labels).
• The goal is to learn a mapping from inputs to outputs and make predictions on new, unseen data.
• Similar to a human learning with a teacher.
• Supervised machine learning involves training a model on labeled data to learn patterns and relationships,
which it then uses to make accurate predictions on new data.
11
1.5.1 How does Supervised Learning work?
• Data Collection and Preprocessing: Gather a labeled dataset consisting of input features and target output
labels. Clean the data, handle missing values, and scale features as needed to ensure high quality for
supervised learning algorithms.
• Splitting the Data: Divide the data into training set (80%) and the test set (20%).
• Choosing the Model: Select appropriate algorithms based on the problem type. This step is crucial for
effective supervised learning in AI.
• Training the Model: Feed the model input data and output labels, allowing it to learn patterns by adjusting
internal parameters.
• Evaluating the Model: Test the trained model on the unseen test set and assess its performance using
various metrics.
• Hyperparameter Tuning: Adjust settings that control the training process (e.g., learning rate) using
techniques like grid search and cross-validation.
• Final Model Selection and Testing: Retrain the model on the complete dataset using the best hyperparam-
eters testing its performance on the test set to ensure readiness for deployment.
• Model Deployment: Deploy the validated model to make predictions on new, unseen data.
12
1.5.3 Advantages of Supervised learning
• Accuracy and Predictability: Produces highly accurate models since the algorithm learns from labeled
examples.
• Clear Objective: Training data provides a clear mapping between input and output, making it easier to
evaluate model performance.
• Wide Applicability: Useful for a broad range of tasks, such as regression (predicting prices, temperatures)
and classification (spam detection, image recognition).
• Ease of Evaluation: Performance metrics (e.g., accuracy, precision, recall) can be directly calculated because
true outputs are known.
• Reliable for Well-Defined Problems: Works exceptionally well when sufficient labeled data is available
for tasks like fraud detection or medical diagnosis.
• Supervised learning allows collecting data and produces data output from previous experiences.
• Helps to optimize performance criteria with the help of experience.
• Supervised machine learning helps to solve various types of real-world computation problems.
• It performs classification and regression tasks.
• It allows estimating or mapping the result to a new sample.
• We have complete control over choosing the number of classes we want in the training data.
13
1.5.5 Types of Supervised Machine learning Algorithms:
Depending on the type of output, supervised learning can be categorized into:
• Regression: Where the output is a continuous variable (e.g., predicting house prices, stock prices).
• Classification: Where the output is a categorical variable (e.g., spam vs. non-spam emails, yes vs. no).
14
• Let’s first understand the classification and regression data through the table below:
15
1.5.6 Regression
• Regression is a technique used to predict a continuous output based on input features.
• The output can take any real value within a range.
• How It Works:
– The algorithm establishes a relationship between input variables (independent variables) and the
output (dependent variable).
– The objective is to determine the most suitable function that characterizes the connection between
these variables.
– The relationship is often modeled as a mathematical function, such as a line or a curve.
16
1.5.7 Classification
• Classification is a technique is a predictive modeling technique that uses classification model (or Classifiers)
to categorize input data and assigned them to predefined classes.
• Classifiers learn class characteristics from input data, then learn to assign possible classes to new unseen
data according to those learned characteristics.
• For example a classification model might be trained on dataset of images labeled as either dogs or cats and
it can be used to predict the class of new and unseen images as dogs or cats based on their features such as
color, texture and shape.
• Classification algorithms can be better understood using the below diagram. In the below diagram, there are
two classes, class A and Class B. These classes have features that are similar to each other and dissimilar to
other classes.
• How It Works:
– The algorithm learns decision boundaries that separate classes in the input feature space.
– New data points are classified based on their proximity to these boundaries.
Data Collection: You start with a dataset where each item is labeled with the correct class (for
example, “cat” or “dog”).
Feature Extraction: The system identifies features (like color, shape, or texture) that help distinguish
one class from another. These features are what the model uses to make predictions.
Model Training: Classification – machine learning algorithm uses the labeled data to learn how to
map the features to the correct class. It looks for patterns and relationships in the data.
Model Evaluation: Once the model is trained, it’s tested on new, unseen data to check how accurately
it can classify the items.
Prediction: After being trained and evaluated, the model can be used to predict the class of new data
based on the features it has learned.
17
1.5.7.1 Types of Classification
There are four main classification tasks in Machine learning:
• Binary classification
– The task is to assign inputs into one of two distinct categories (e.g., Yes/No, True/False).
– Email classification: Spam vs. Not Spam.
– Disease diagnosis: Positive vs. Negative.
• Multi-class classification
– In multi-class classification, the goal is to classify the input into one of several classes or categories.
– Handwritten digit recognition: Digits 0-9.
18
• Multi-label classification
– An instance can belong to multiple classes simultaneously.
– Text categorization: A news article tagged as "Politics" and "Economy."
– Medical diagnosis: A patient diagnosed with multiple diseases.
• Imbalanced classifications
– A scenario where one class is heavily over represented compared to the other(s).
– Fraud detection: Fraudulent transactions are rare compared to legitimate ones.
– Disease detection: Rare diseases vs. healthy samples.
19
1.5.7.2 Classification Algorithms
Classification Algorithms can be further divided into the categories below:
• Linear Classifiers: A linear classifier is a model that makes predictions based on a linear decision boundary
(e.g., a straight line, plane, or hyperplane).
– Logistic Regression
– Linear Discriminant Analysis (LDA)
– Support Vector Machines having kernel = ‘linear’
– Perceptron
– Stochastic Gradient Descent (SGD) Classifier
• Non-linear Classifiers: A non-linear classifier is a model that uses a non-linear decision boundary to
separate classes. It can capture complex relationships and patterns in the data.
– K-Nearest Neighbours (KNN)
– Support Vector Machines (SVM) (with non-linear kernels like RBF or polynomial).
– Decision Tree
– Gradient Boosting Machines (GBM):
* XGBoost
* LightGBM
* CatBoost
• Ensemble Classification Algorithms
– Random Forests
– Bagging Classifier
– Boosting Algorithms:
* AdaBoost
* Gradient Boosting
* Stacking
– Voting Classifier
• Probabilistic Classification Algorithms
– Naïve Bayes Classifier:
* Gaussian Naïve Bayes
* Multinomial Naïve Bayes
* Bernoulli Naïve Bayes
– Bayesian Networks:
• Neural Network-Based Algorithms
– Multilayer Perceptrons (MLPs)
– Convolutional Neural Networks (CNNs)
– Recurrent Neural Networks (RNNs)
– Transformers
• Instance-Based Algorithms
– K-Nearest Neighbors (KNN)
– Locally Weighted Learning (LWL)
– Kernel Density Estimation (KDE)
20
• Rule-Based Classification Algorithms
– Decision Trees
– Rule-Based Classifiers
– Associative Classifiers
• Deep Learning Algorithms
– Deep Neural Networks (DNNs)
– Autoencoders
– Generative Adversarial Networks (GANs)
21
1.5.8 Loss Function in Supervised Learning
• A loss function is a mathematical function that quantifies the difference between the predicted output of a
model and the actual target values.
• It serves as the primary tool to measure how well or poorly a model is performing.
• The goal of supervised learning is to minimize the loss function, thereby improving the accuracy of the
model’s predictions.
• How Loss Functions Work:
– Comparison: The loss function compares the predicted outputs with the actual outputs for each data
point in the training set.
– Quantification: It calculates a numeric value that represents the model’s error. The higher the loss,
the worse the predictions.
– Guidance: The loss function provides feedback to the model during training. Optimization algorithms
(e.g. Gradient Descent) use this feedback to update the model’s parameters and reduce the loss.
– Loss Functions in Supervised Learning For Regression:
* Mean Squared Error (MSE)
* Mean Absolute Error (MAE)
* Huber Loss
– Loss Functions in Supervised Learning For Classification
* Binary Cross Entropy Loss
* Categorical Cross-Entropy Loss
* Hinge Loss
22
1.5.9 Evaluating Regression Models
• Mean Squared Error (MSE): MSE measures the average squared difference between the predicted values
and the actual values. Lower MSE values indicate better model performance.
• Root Mean Squared Error (RMSE): RMSE is the square root of MSE, representing the standard deviation
of the prediction errors. Similar to MSE, lower RMSE values indicate better model performance.
• Mean Absolute Error (MAE): MAE measures the average absolute difference between the predicted
values and the actual values. It is less sensitive to outliers compared to MSE or RMSE.
• R-squared (Coefficient of Determination): R-squared measures the proportion of the variance in the
target variable that is explained by the model. Higher R-squared values indicate better model fit.
23
1.5.10 Sensitivity and Specificity
• True Positive (TP): The model correctly identifies a positive case (e.g., a medical test accurately diagnosing
a disease in a sick patient).
• False Positive (FP): The model incorrectly identifies a negative case as positive (e.g., a security system
alerts about a potential threat when there is none)
• False Negative (FN): The model incorrectly identifies a positive case as negative (e.g., a medical test
wrongly stating a sick patient is healthy)
• True Negative (TN): The model correctly identifies a negative case as negative (e.g., a security system does
not alert when there is no actual threat).
TP+TN
Accuracy =
T P + T N + FP + FN
• Precision:
– Precision is the percentage of positive predictions that the model makes that are actually correct.
– Precision focuses on the quality of the positive predictions made by the model. Basically, Of all the
instances predicted as positive, how many are actually positive?
– It is calculated by dividing the number of true positives by the total number of positive predictions.
True Positives (T P)
Precision =
True Positives (T P) + False Positives (FP)
– High Precision indicates a low false positive rate. The model is very confident when it predicts a
positive class.
• Precision vs. Recall Trade-off:
– There is often a trade-off between precision and recall. Improving one metric can come at the cost of
the other.
– A model with high precision might miss many positive cases, reducing recall.
– A model with high recall might include many false positives, reducing precision.
• Recall:
– Recall is the percentage of all positive examples that the model correctly identifies.
– Recall (also known as sensitivity or true positive rate) focuses on how well the model captures all the
actual positive instances.
– Of all the actual positive instances, how many did the model correctly identify? It is calculated by
dividing the number of true positives by the total number of positive examples.
True Positives (T P)
Recall =
True Positives (T P) + False Negatives (FN)
– High Recall Indicates a low false negative rate. The model successfully identifies most of the positive
instances.
24
• F1 score:
– To balance precision and recall, the F1-score is used.
– The F1 score is a weighted average of precision and recall.
– It is calculated by taking the harmonic mean of precision and recall.
Precision × Recall
F1 = 2.
Precision ∗ Recall
– High F1-Score indicates a good balance between precision and recall.
• Confusion matrix:
– A confusion matrix is a table that shows the number of predictions for each class, along with the
actual class labels.
– It can be used to visualize the performance of the model and identify areas where the model is
struggling.
25
1.5.12 Decision Boundary in Classification Algorithms
• A decision boundary is a line or surface that separates the feature space into regions, where each region
corresponds to a specific class.
• In a classification algorithm, the decision boundary represents the threshold or rule the model uses to decide
which class a data point belongs to.
• A decision boundary is the demarcation line (for 2D data) or a hyperplane (in higher dimensions) where the
probability or score for belonging to two or more classes is equal.
• It helps the classifier decide which side of the boundary a new data point falls on, thereby assigning it a
class.
• In 2D data, the boundary is a line, In 3D data, the boundary becomes a plane, In higher dimensions, it is a
hyperplane.
1.5.13 Cross-validation
• Cross-validation is a technique used to assess the generalization ability of a machine learning model.
• It evaluates how well a model will perform on unseen data by splitting the dataset into multiple subsets for
training and testing in a systematic way.
• Why is Cross-Validation Important?
– Prevents Overfitting: Ensures the model is not too closely tailored to the training data by validating
it on unseen subsets.
– Reliable Model Evaluation: Provides a more robust estimate of model performance than a single
train-test split. Model Comparison:
– Helps compare the performance of multiple models or configurations to select the best one.
– Utilizes the Entire Dataset: Every data point gets used for both training and testing, which is
especially useful when the dataset is small.
26
1.5.14 Regression vs Classification
27
1.6 Linear Regression
23 45
• Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a
statistical method that is used for predictive analysis.
• Linear regression makes predictions for continuous/real or numeric variables such as sales, salary, age,
product price, etc.
• Linear regression is a statistical method that is used to predict a continuous dependent variable (target
variable) based on one or more independent variables (predictor variables).
• Linear regression assumes a linear relationship between the dependent and independent variables, which
implies that the dependent variable changes proportionally with changes in the independent variables.
• Linear regression is a type of supervised machine learning algorithm that computes the linear relationship
between the dependent and independent feature or variable by fitting a linear equation to observed data.
2 https://www.youtube.com/watch?v=UZPfbG0jNec
3 https://www.youtube.com/watch?v=dXHIDLPKdmA
4 https://www.youtube.com/watch?v=jerPVDaHbEA
5 https://www.youtube.com/watch?v=tFi4Y_y-GNM
28
1.6.1 Linear Regression Line
A linear line showing the relationship between the dependent and independent variables is called a regression line.
A regression line can show two types of relationship:
• Positive Relationship: A positive relationship exists between the independent variables and the dependent
variable when the slope of the regression line is positive. In other words, as the values of the independent
variables on X-axis increase, the value of the dependent variable on the Y-axis also increases. This can be
seen as an upward slope on a scatter plot of the data.
• Negative Relationship: A negative relationship exists between the independent variables and the dependent
variable when the slope of the regression line is negative. In other words, as the values of the independent
variables on X-axis increases, the value of the dependent variable on the Y-axis decreases. This can be seen
as a downward slope on a scatter plot of the data.
29
1.6.2 Types of Linear Regression
• When there is only one independent feature, it is known as Simple Linear Regression, and when there are
more than one independent feature, it is known as Multiple Linear Regression.
• In the case of a simple linear regression, the aim is to examine the influence of an independent variable on
one dependent variable. In case of multiple linear regression, the influence of several independent variables
on one dependent variable is analyzed.
• Example: Simple Linear Regression: Does the height have an influence on the weight of a person?
• Example: Multiple Linear Regression: Do the height and gender have have an influence on the weight
of a person?
30
1.6.3 Applications of Linear Regression
Linear regression is used in many different fields, including finance, economics, and psychology, to understand and
predict the behavior of a particular variable. For example, in finance, linear regression might be used to understand
the relationship between a company’s stock price and its earnings or to predict the future value of a currency based
on its past performance.
31
1.7 Simple Linear Regression
• This is the simplest form of linear regression, and it involves only one independent variable and one
dependent variable.
• This involves predicting a dependent variable based on a single independent variable.
• Let us consider a dataset where we have a value of response y for every feature x:
• Now, the task is to find a straight line that fits best in the above scatter plot so that we can predict the
response for any new feature values. (i.e a value of x not present in a dataset) This line is called a regression
line.
32
• The equation of the regression line is represented as:
y = m.x + c
where:
– y is the dependent variable
– x is the independent variable
– c is the intercept
– m is the slope or gradient
• Interpretation
– Slope (m): Indicates how much the dependent variable (y) changes for a unit change in the indepen-
dent variable (x).
– Intercept (c): The predicted value of y when x = 0.
• The regression coefficient m can now have different signs, which can be interpreted as follows
– b > 0: there is a positive correlation between x and y (the greater x, the greater y).
– b < 0: there is a negative correlation between x and y (the greater x, the smaller y).
– b = 0: there is no correlation between x and y .
33
• If all points (measured values) were exactly on one straight line, the estimate would be perfect. However,
this is almost never the case and therefore, in most cases a straight line must be found, which is as close as
possible to the individual data points.
• The attempt is thus made to keep the error in the estimation as small as possible so that the distance between
the estimated value and the true value is as small as possible.
• This distance or error is called the "residual", is abbreviated as "e" (error) and can be represented by the
greek letter epsilon (ε ).
y = m.x + c + ε
• Hence, the Best fit line means that the error between predicted values and actual values should be minimized.
In other words, The best-fitting line is the line that has the smallest difference between the predicted values
and the actual values.
• When calculating the regression line, an attempt is made to determine the regression coefficients (m and c)
so that the sum of the squared residuals is minimal.
n n
e = min ∑ di = min ∑ (yi − ŷ)2
2
i=1 i=1
34
1.7.1 How to Calculate m and c values to get the best-fit line?
• Closed-form Solution - Ordinary Least Squares
6
• Open form Solution - Gradient Descent
6 https://www.youtube.com/watch?v=VmbA0pi2cRQ
35
7
• Example:
7 https://www.youtube.com/watch?v=P8hT5nDai6A
36
• Code of Simple Linear Regression
37
1.7.3 Assumptions of Simple Linear Regression
Let’s take a look now at the main simple linear regression model assumptions. If these assumptions are violated,
we might want to consider a different approach. The first three, in particular, are strong assumptions and shouldn’t
be ignored.
• Linearity: The independent and dependent variables have a linear relationship with one another. This
implies that changes in the dependent variable follow those in the independent variable(s) in a linear fashion.
This means that there should be a straight line that can be drawn through the data points. If the relationship
is not linear, then linear regression will not be an accurate model.
• Independence: The observations in the dataset are independent of each other. This means that the value
of the dependent variable for one observation does not depend on the value of the dependent variable for
another observation. If the observations are not independent, then linear regression will not be an accurate
model.
38
• Homoscedasticity: Since in practice the regression model never exactly predicts the dependent variable,
there is always an error. This very error must have a constant variance over the predicted range.
To test homoscedasticity, i.e. the constant variance of the residuals, the dependent variable is plotted on
the x-axis and the error on the y-axis. Now the error should scatter evenly over the entire range. If this is
the case, homoscedasticity is present. If this is not the case, heteroskedasticity is present. In the case of
heteroscedasticity, the error has different variances, depending on the value range of the dependent variable.
• Normal distribution of the error: The error epsilon or residuals should be normally distributed. This
means that the residuals should follow a bell-shaped curve. If the residuals are not normally distributed, then
linear regression will not be an accurate model. In the analytical way, you can use either the Kolmogorov-
Smirnov test or the Shapiro-Wilk test. In the graphical variant, either the histogram is looked at or, even
better, the so-called QQ-plot or Quantile-Quantile-plot. The more the data lie on the line, the better the
normal distribution.
39
1.8 Logistic Regression
• Suppose you want to predict whether today is going to be a sunny day or not. There are two possible
outcomes: "sunny" or "not sunny".
• The outcome variable is also known as a "target variable" or a "dependent variable".
• There are many variables that could influence the outcome such as ‘temperature the day before’, ‘air
pressure’ etc. the influencing variables are known as features, independent variables, or predictors —
all these terms mean the same thing.
• Classification techniques are an essential part of machine learning and data mining applications. Approxi-
mately 70% of data science problems are classification problems.
• Logistic regression is a supervised machine learning algorithm used for classification tasks where the goal
is to predict the probability that an instance belongs to a given class or not.
40
1.8.3 Linear Regression vs Logistic Regression
• Let us consider a problem where we are given a dataset containing Height and Weight for a group of people.
Our task is to predict the Weight for new entries in the Height column.
• So we can figure out that this is a regression problem where we will build a Linear Regression model.
• Now suppose we have an additional field Obesity and we have to classify whether a person is obese or not
depending on their provided height and weight.
• This is clearly a classification problem where we have to segregate the dataset into two classes (Obese and
Not-Obese).
• So, for the new problem, we can again follow the Linear Regression steps and build a regression line.
However, will not do a good job in classifying two classes.
• Now we have a classification problem, and we want to predict the binary output variable Y (2 values: either
1 or 0). For example, the case of flipping a coin (Head/Tail). The response yi is binary: 1 if the coin is Head,
0 if the coin is Tail.
• As can be seen in the graph, however, values between plus and minus infinity can now occur. The goal of
logistic regression, however, is to estimate the probability of occurrence and not the value of the variable
itself. Therefore, the equation must still be transformed.
So. . . how can we predict a classification problem?
• To do this, it is necessary to restrict the value range for the prediction to the range between 0 and 1. As we
are now looking for a model for probabilities, we should ensure the model predicts values on the scale from
0 to 1.
• To ensure that only values between 0 and 1 are possible, the logistic function is used.
41
1.8.4 Logistic Function (Sigmoid Function)
• The sigmoid function is a mathematical function used to map the predicted values to probabilities.
• It maps any real value into another value within a range of 0 and 1.
• The value of the logistic regression must be between 0 and 1, which cannot go beyond this limit, so it forms
a curve like the "S" form. The S-form curve is called the Sigmoid function or the logistic function.
• In logistic regression, we use the concept of the threshold value, which defines the probability of either 0 or
1. Such as values above the threshold value tends to 1, and a value below the threshold values tends to 0.
• Decision boundary: The sigmoid function returns a probability value between 0 and 1. This probability
value is then mapped to a discrete class which is either “Class - 0” or “Class - 1”. In order to map this
probability value to a discrete class (pass/fail, yes/no, true/false), we select a threshold value. This threshold
value is called Decision boundary. Above this threshold value, we will map the probability values into Class
- 1 and below which we will map values into Class - 0.
• Mathematically, it can be expressed as follows:-
p ≥ 0.5 =⇒ class =1
p < 0.5 =⇒ class = 0
42
1.8.5 Logistic Regression implementation
• https://github.com/clareyan/From-Linear-to-Logistic-Regression-Explained-Step-b
y-Step/tree/master
43
1.9 K-Nearest Neighbor(KNN) Algorithm
89
• K-Nearest Neighbors (KNN) algorithm is a supervised machine learning method used for classification
and regression problems, although it’s more commonly applied in classification.
• KNN is based on the idea that the observations closest to a given data point are the most "similar"
observations in a data set, and we can therefore classify unforeseen points based on the values of the closest
existing points.
• In k-Nearest Neighbours (k-NN) algorithm k is just a number that tells the algorithm how many nearby
points (neighbours) to look at when it makes a decision.
• During the training phase, the KNN algorithm memorizes the entire dataset. When presented with new data,
it categorizes it into a class that closely resembles the characteristics of the new data as shown in Figure
below.
1. Select the number K of the neighbors: Start by deciding how many neighbors (data points from your
dataset) you want to consider when making predictions. This is your ’K’ value.
2. Calculate the distance: Find the distance between your new data point and the chosen number of neighbors.
3. Finding Nearest Neighbors:The k data points with the smallest distances to the target point are nearest
neighbors.
4. Count Data Points in Each Category: Among these k-nearest neighbors, count how many belong to each
category. For instance, count how many are in Category A and how many are in Category B.
5. Assign to the Majority Category: Assign the new data points to that category for which the number of the
neighbor is maximum. If most of them are in Category A, your new point goes into Category A.
8 https://www.youtube.com/watch?v=abnL_GUGub4
9 https://www.youtube.com/watch?v=BYaoDZM1IcU&list=PLKnIA16_RmvZiE-lEdN5RDi18-u-T43zd
44
1.9.2 Example of KNN Algorithm Working?
• Let’s take a simple case to understand this algorithm. Following is a spread of red circles (RC) and green
squares (GS):
• You intend to find out the class of the blue star (BS). BS can either be RC or GS and nothing else. The “K”
in KNN algorithm is the nearest neighbor we wish to take the vote from.
• Let’s say K = 3. Hence, we will now make a circle with BS as the center just as big as to enclose only
three data points on the plane.
• The three closest points to BS are all RC. Hence, with a good confidence level, we can say that the BS
should belong to the class RC. Here, the choice became obvious as all three votes from the closest neighbor
went to RC.
45
1.9.3 Example of KNN Algorithm Working?
• Suppose we have a new data point and we need to put it in the required category. Consider the below image:
• Firstly, we will choose the number of neighbors, lets say we will choose k = 5.
• Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is the
distance between two points, which we have already studied in geometry. It can be calculated as:
• By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in category A
and two nearest neighbors in category B.
• As we can see the 3 nearest neighbors are from category A, hence this new data point must belong to
category A.
46
1.9.4 How to choose the value of k for KNN Algorithm?
• The value of k is very crucial in the KNN algorithm to define the number of neighbors in the algorithm.
• As seen in example below, changing ’K’ changes predictions. With K=3, we predict Category B, while with
K=5, we predict Category A.
• So, picking the right ’K’ is a big deal in making KNN work well.
• Low k values make predictions unstable: Take this example: a query point is surrounded by 2 green dots
and one red triangle. If k=1 and it happens that the point closest to the query point is one of the green
dots, the algorithm will incorrectly predict a green dot as the outcome of the query. Low k values are
high variance (the model fits too closely to the training data), high complexity, and low bias (the model is
complex enough to fit the training data well).
• High k values are noisy: A higher k value will increase the accuracy of predictions because there are more
numbers of which to calculate the modes or means. However, if the k value is too high, it will likely result
in low variance, low complexity, and high bias (the model is NOT complex enough to fit the training data
well).
• Ideally, you want to find a k value that is between high variance and high bias. It is also recommended to
choose an odd number for k to avoid ties in classification analysis.
• The right k value is also relative to your data set. To choose that value, you might try to find the square root
of N, where N is the number of data points in the training dataset.
• Cross-validation tactics can also help you choose the k value best suited to your dataset.
47
1.9.5 Distance Metrics Used in KNN Algorithm
KNN algoritmh uses distance metrics to identify which data points are closest to a given query point. Generally,
Euclidean distance is common, but other metrics like Manhattan distance or MinKowski Distance are also used.
1. Euclidean Distance: Euclidean distance is defined as the straight-line distance between two points in a
plane or space. You can think of it like the shortest path you would walk if you were to go directly from one
point to another.
q
d(x, y) = (y2 − y1 )2 + (x2 − x1 )2
s
n
= ∑ (yi − xi)2
1
48
2. Manhattan Distance:
• This is also another popular distance metric, which measures the absolute value between two points.
• This is the total distance you would travel if you could only move along horizontal and vertical lines
(like a grid or city streets).
• It’s also called “taxicab distance” because a taxi can only drive along the grid-like streets of a city.
49
3. Minkowski Distance:
• Minkowski distance is like a family of distances, which includes both Euclidean and Manhattan
distances as special cases
• This distance measure is the generalized form of Euclidean and Manhattan distance metrics.
• The parameter, p, in the formula below, allows for the creation of other distance metrics.
• Manhattan distance is denoted with p equal to one.
• Euclidean distance is represented by this formula when p is equal to two,
n 1
Minkowski Distance = (∑ |xi − yi |) p
i
50
1.9.6 Advantages of the KNN Algorithm
• Easy to implement: The K-NN algorithm is easy to implement because its complexity is relatively low as
compared to other machine learning algorithms.
• Easily Adaptable- K-NN stores all data in memory, so when new data points are added, it automatically
adjusts and uses the new data for future predictions.
• Few Hyperparameters – The only parameters which are required in the training of a KNN algorithm are
the value of k and the choice of the distance metric, which is low when compared to other machine learning
algorithms.
51
1.9.8 Implementation of KNN Algorithm for Classification
# Filter the dataset to include only two classes (Setosa and Versicolor)
binary_filter = y < 2 # Keep only class 0 (Setosa) and class 1 (Versicolor)
X = X[binary_filter]
y = y[binary_filter]
52
plt.title("KNN Classification with Boundary Lines (Two Classes)", fontsize=16)
plt.xlabel(data.feature_names[0], fontsize=12)
plt.ylabel(data.feature_names[1], fontsize=12)
plt.legend(fontsize=10)
plt.show()
53
1.10 Unsupervised Machine Learning
• In contrast to supervised learning, Unsupervised machine learning models are given unlabeled data and
allow discover patterns and insights on their own—without explicit direction or instruction.
• Machine learning that takes place in the absence of human supervision is known as unsupervised machine
learning.
• There are no explicit outputs to guide the learning process.
• Resembles self-directed human learning without clear instructions.
54
• Unsupervised learning is a type of machine learning in which models are trained using unlabeled dataset
and are allowed to act on that data without any supervision.
• Unsupervised learning is often used for tasks such as clustering, dimensionality reduction, and anomaly
detection.
55
1.10.1 Why use Unsupervised Learning?
• Unsupervised learning is helpful for finding useful insights from the data.
• Unsupervised learning is much similar as a human learns to think by their own experiences, which makes it
closer to the real AI.
• Unsupervised learning works on unlabeled and uncategorized data which make unsupervised learning more
important.
• In real-world, we do not always have input data with the corresponding output so to solve such cases, we
need unsupervised learning.
56
1.10.3 Advantages of Unsupervised learning
• It does not require training data to be labeled.
• Dimensionality reduction can be easily accomplished using unsupervised learning.
• Capable of finding previously unknown patterns in data.
• Unsupervised learning can help you gain insights from unlabeled data that you might not have been able to
get otherwise.
• Unsupervised learning is good at finding patterns and relationships in data without being told what to look
for. This can help you learn new things about your data.
• No Need for Labeled Data: Saves time and effort required for manual labeling, which can be expensive
and time-consuming.
• Discovering Hidden Patterns: Identifies clusters, segments, or anomalies that may not be evident through
manual analysis.
• Ideal for exploring data when little is known about its structure or relationships.
• Applicable in scenarios where labeling is impractical, such as identifying abnormal network traffic in
cybersecurity.
• Handles Complex Data: Can work with high-dimensional data and uncover structures, such as in gene
expression analysis.
57
1.10.6 Unsupervised Learning Algorithms
There are mainly 2 types of algorithms which are used for Unsupervised dataset.
• Clustering
• Association Rule Learning
• Dimensionality Reduction
58
1.10.7 Clustering
• Clustering is a method of grouping the objects into clusters such that objects with most similarities remains
into a group and has less or no similarities with the objects of another group.
• In other words, Clustering in unsupervised machine learning is the process of grouping unlabeled data into
clusters based on their similarities.
• The goal of clustering is to identify patterns and relationships in the data without any prior knowledge of
the data’s meaning.
• Clustering can even break it down further into different types of clustering; for example:
– Exlcusive clustering: Data is grouped such that a single data point exclusively belongs to one cluster.
– Overlapping clustering: A soft cluster in which a single data point may belong to multiple clusters
with varying degrees of membership.
– Hierarchical clustering: A type of clustering in which groups are created such that similar instances
are within the same group and different objects are in other groups.
– Probalistic clustering: Clusters are created using probability distribution.
59
1.11 K-Means Clustering
https://www.youtube.com/watch?v=5shTLzwAdEc
60
1.12 Hierarchical Clustering
https://www.youtube.com/watch?v=0jPGHniVVNc&pp=ygUXaGllcmFyY2hpY2FsIGNsdXN0ZXJpbmc%3D
61
1.12.4 Association Rule Learning
• An association rule is unsupervised learning is also known as association rule mining, is a common technique
which is used to discover the relationships between variables in the large database.
• It determines the set of items that occurs together in the dataset.
• This technique is basically used for market basket analysis that helps to better understand the relationship
between different products.
• For e.g. shopping stores use algorithms based on this technique to find out the relationship between the sale
of one product w.r.t to another’s sales based on customer behavior.
• Such as people who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item.
62
1.12.5 Dimensionality Reduction
• Dimensionality reduction is the process of reducing the number of features in a dataset while preserving as
much information as possible.
• These algorithms seek to transform data from high-dimensional spaces to low-dimensional spaces without
compromising meaningful properties in the original data.
• These techniques are typically deployed during exploratory data analysis (EDA) or data processing to
prepare the data for modeling.
• This technique is useful for improving the performance of machine learning algorithms and for data
visualization.
• It’s helpful to reduce the dimensionality of a dataset during EDA to help visualize data: this is because
visualizing data in more than three dimensions is difficult.
63
1.13 Supervised vs Unsupervised Learning
64
1.14 Reinforcement Learning (RL)
• Reinforcement learning involves training a model to make sequences of decisions in an environment to
maximize a cumulative reward.
• The model learns through trial and error, receiving feedback in the form of rewards or penalties.
• Mimics learning from rewards and punishments, like training a pet.
• Reinforcement learning is a type of machine learning method where an intelligent agent (computer program)
interacts with the environment and learns to act within that.
• Key Concepts of Reinforcement Learning:
– Agent: The learner or decision-maker.
– Environment: Everything the agent interacts with.
– State: A specific situation in which the agent finds itself.
– Action: All possible moves the agent can make.
– Reward: Feedback from the environment based on the action taken.
• How It Works: RL operates on the principle of learning optimal behavior through trial and error. The agent
takes actions within the environment, receives rewards or penalties, and adjusts its behavior to maximize the
cumulative reward.
– An agent interacts with an environment.
– The agent takes actions based on its state and receives feedback (reward or penalty).
– Over time, the agent learns the best actions to maximize rewards.
65
• Example: Navigating a Maze
– The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward that is
the diamond and avoid the hurdles that are fired.
– The robot learns by trying all the possible paths and then choosing the path which gives him the
reward with the least hurdles.
– Each right step will give the robot a reward and each wrong step will subtract the reward of the robot.
– The total reward will be calculated when it reaches the final reward that is the diamond.
66
1.14.1 Elements of Reinforcement Learning
• Policy: A strategy used by the agent to determine the next action based on the current state.
• Reward Function: A function that provides a scalar feedback signal based on the state and action.
• Value Function: A function that estimates the expected cumulative reward from a given state.
• Model of the Environment: A representation of the environment that helps in planning by predicting
future states and rewards.
1.14.2 Applications
• Robotics: Automating tasks in structured environments like manufacturing. Teaching robots to walk or
perform tasks.
• Game playing: Developing strategies in complex games like chess. AI systems like AlphaGo and chess
engines.
• Autonomous driving: Learning to navigate roads safely. Self-driving cars learn to navigate by interacting
with simulated environments.
• Industrial Control: Real-time adjustments in operations like refinery controls.
• Personalized Training Systems: Customizing instruction based on individual needs.
67
1.14.3 Advantages
• Learning from Interaction: Models learn optimal behavior through continuous interaction with their
environment.
• No Need for Pre-Labeled Data: Learns directly from the environment without requiring labeled datasets.
• Adaptability: Adapts to changing environments, making it suitable for complex, evolving systems like
robotics or autonomous vehicles.
• Focus on Long-Term Goals: Balances immediate and future rewards, ensuring strategies optimize cumula-
tive rewards over time.
• Problem-Specific Optimization: Excels in solving problems like game playing, resource management, or
scheduling tasks where step-by-step actions impact overall outcomes.
• Reinforcement learning can be used to solve very complex problems that cannot be solved by conventional
techniques.
• The model can correct the errors that occurred during the training process.
• In RL, training data is obtained via the direct interaction of the agent with the environment
• Reinforcement learning can handle environments that are non-deterministic, meaning that the outcomes of
actions are not always predictable. This is useful in real-world applications where the environment may
change over time or is uncertain.
• Reinforcement learning can be used to solve a wide range of problems, including those that involve decision
making, control, and optimization.
• Reinforcement learning is a flexible approach that can be combined with other machine learning techniques,
such as deep learning, to improve performance.
1.14.4 Disadvantages
• Reinforcement learning is not preferable to use for solving simple problems.
• Reinforcement learning needs a lot of data and a lot of computation
• Reinforcement learning is highly dependent on the quality of the reward function. If the reward function is
poorly designed, the agent may not learn the desired behavior.
• Reinforcement learning can be difficult to debug and interpret. It is not always clear why the agent is
behaving in a certain way, which can make it difficult to diagnose and fix problems.
68
1.15 Supervised vs Unsupervised vs Reinforcement Learning
• Supervised Learning is best for tasks where labeled data is available and precise predictions are needed.
• Unsupervised Learning is ideal for exploring data without prior labels and discovering hidden patterns.
• Reinforcement Learning is suited for dynamic systems requiring sequential decision-making and long-term
optimization.
69
1.16 Sample Questions
• What are the three main types of machine learning?
• Describe one real-world application of supervised learning and one of unsupervised learning.
• Explain the concept of overfitting in machine learning and why it is problematic.
• Write the steps involved in implementing a basic regression model in supervised learning.
• How would you implement k-nearest neighbors (KNN) for classification? Outline the steps and provide a
code example.
• What are the key characteristics of supervised learning?
• Describe a scenario where unsupervised learning would be more appropriate than supervised learning.
• Provide an example of a real-world problem that can be solved using reinforcement learning. Describe the
key components involved.
• Discuss the difference between precision and recall in the context of classification. Provide examples of
when each metric is crucial.
• write a Python code to implement k-means clustering and visualize the clusters.
• What is reinforcement learning, and how does it differ from supervised learning?
• Describe how logistic regression can be used for binary classification. Include a real-world example.
• Write a code snippet in python to perform a simple linear regression and plot the regression line on a scatter
plot.
• Explain the primary differences between supervised and unsupervised machine learning.
• What are the advantages of using machine learning techniques in data analysis?
• How does reinforcement learning differ from supervised and unsupervised learning? Provide an example to
illustrate.
• Compare and contrast k-means clustering and hierarchical clustering. What are the advantages and disad-
vantages of each?
• Discuss how overfitting can be detected and mitigated in a machine learning model.
• Explain the concept of cross-validation and its importance in machine learning.
• Write a Python code snippet to split a dataset into training and testing sets and fit a simple linear regression
model.
• Define clustering and explain its importance in unsupervised learning.
• Explain how the concept of a decision boundary is used in classification algorithms.
• What is the role of a loss function in supervised learning? How does it impact the model’s performance?
• Explain the bias-variance tradeoff in machine learning. How does it affect model selection?
• How would you use cross-validation to select the best model from a set of candidate models? Write the
steps and include a code example.
• Define overfitting and underfitting. How do they affect the performance of a machine learning model?
• How does clustering differ from classification in machine learning? Provide examples of each.
• A company wants to predict the salary of an employee based on their years of experience. The dataset
provided includes the following salary values (in $1000) corresponding to 1, 2, 3, 4, 5, and 6 years of
experience: 30, 35, 40, 50, 55, and 60. Using linear regression, find the equation of the line y = mx + b,
where y represents the salary and x represents the years of experience. Predict the salary for an employee
with 7 years of experience.
70