ML UNIT 2 Sir
ML UNIT 2 Sir
Supervised Learning(Regression/Classification)
BTech III Year – II Semester
Computer Science & Engineering
UNIT-II
1
SYLLABUS
MACHINE LEARNING
Unit II: Supervised Learning(Regression/Classification):
Basic Methods:
▪ Distance based Methods,
▪ Nearest Neighbours,
▪ Decision Trees,
▪ Naive Bayes,
Linear Models:
▪ Linear Regression,
▪ Logistic Regression,
▪ Generalized Linear Models,
▪ Support Vector Machines,
Binary Classification:
▪ Multiclass/Structured outputs,
▪ MNIST, Ranking.
2
Types of Learning
The learning methods in ML can be broadly classified into three basic types: Supervised, unsupervised
and reinforced learning.
3
Supervised and Unsupervised Learning.
4
Supervised Learning
1. As its name suggests, Supervised machine learning is based on supervision.
2. It means in the supervised learning technique, we train the machines using the "labelled"
dataset, and based on the training, the machine predicts the output.
3. Here, the labelled data specifies that some of the inputs are already mapped to the output.
More preciously, we can say; first, we train the machine with the input and corresponding
output, and then we ask the machine to predict the output using the test dataset. Supervised
machine learning can be classified into two types of problems, which are Regression and
Classification.:
5
Machine Learning Basic Methods: Distance Based Methods
6
Machine Learning Basic Methods: Distance Based Methods
1. Euclidean distance
2. Manhattan distance
3. Minkowski distance
4. Hamming distance
5. Cosine similarity
7
Basic Methods: K-Nearest Neighbor(KNN)
1. K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised
Learning technique.
2. K-NN algorithm assumes the similarity between the new case/data and available cases and put the
new case into the category that is most similar to the available categories.
3. K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well suite
category by using K- NN algorithm.
4. K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for
the Classification problems.
5. K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
6. It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an action on
the dataset.
8
Basic Methods: K-Nearest Neighbor(KNN)
9
Basic Methods: K-Nearest Neighbor(KNN)(Example)
10
Basic Methods: K-Nearest Neighbor(KNN)
11
Machine Learning Basic Methods: Decision Trees
1. Decision tree is one of the predictive modeling approaches used n statistics, data
mining and machine learning
2. Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems
3. it is commonly used for solving Classification problems.
4. It is a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
5. In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision
nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further branches.
6. The decisions or the test are performed on the basis of features of the given dataset.
12
Machine Learning Basic Methods: Decision Trees
It is a graphical representation for getting all the possible solutions to a problem/decision based on
given conditions.
13
Machine Learning Basic Methods: Decision Trees Algorithm
A fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if
these features depend on each other or upon the existence of the other features, all of these properties
independently contribute to the probability that this fruit is an apple and that is why it is known as
‘Naive’.
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described as:
1. Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent
of the occurrence of other features. Such as if the fruit is identified on the bases of color, shape, and
taste, then red, spherical, and sweet fruit is recognized as an apple. Hence each feature individually
contributes to identify that it is an apple without depending on each other.
2. Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
Machine Learning Basic Methods: Naive Bayes Methods
1. Naive Bayes model is easy to build and particularly useful for very large data sets.
2. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification
methods.
3. Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c).
Look at the equation below:
Where
• P(c|x) is the posterior probability of
class (c, target) given predictor (x,
attributes).
• P(c) is the prior probability of class.
• P(x|c) is the likelihood which is the
probability of predictor given class.
• P(x) is the prior probability of
predictor.
Machine Learning Basic Methods: Naive Bayes algorithm
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
Example:
Consider raining data set of weather and corresponding target variable ‘Play’
(suggesting possibilities of playing). Now, we need to classify whether players
will play or not based on weather condition.
Machine Learning Basic Methods: Naive Bayes Example
Here we have
P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64
Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.
Machine Learning Basic Methods: Naive Bayes Method
Advantages
1. It is easy and fast to predict class of test data set. It also perform well in multi class prediction
2. When assumption of independence holds, a Naive Bayes classifier performs better compare to
other models like logistic regression and you need less training data.
3. It perform well in case of categorical input variables compared to numerical variable(s). For
numerical variable, normal distribution is assumed (bell curve, which is a strong assumption).
Dis-Advantages
1. If categorical variable has a category (in test data set), which was not observed in training data
set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This
is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One of
the simplest smoothing techniques is called Laplace estimation.
2. On the other side naive Bayes is also known as a bad estimator, so the probability outputs from
predict_proba are not to be taken too seriously.
3. Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it is
almost impossible that we get a set of predictors which are completely independent.
Machine Learning Basic Methods: Linear Models:
1. Linear Models:
2. Linear Regression,
3. Logistic Regression,
Machine Learning Basic Methods: Regression
1. Linear models describe a continuous response variable as a function of one or more predictor
variables.
2. A regression model provides a function that describes the relationship between one or more
independent variables and a response, dependent, or target variable.
3. Regression is a method to determine the statistical relationship between a dependent variable and
one or more independent variables.
4. Regression analysis is a predictive modeling technique that analyzes the relation between the target
or dependent variable and independent variable in a dataset.
Types of regression techniques:
• Linear Regression
• Logistic Regression
• Ridge Regression
• Lasso Regression
• Polynomial Regression
• Bayesian Linear Regression
Machine Learning Basic Methods: Linear Regression
1. Linear regression analysis is used to predict the value of a variable based on the value of another
variable.
2. The variable you want to predict is called the dependent variable. The variable you are using
to predict the other variable's value is called the independent variable.
3. It is a supervised learning and statistical method that is used for predictive analysis.
4. Linear regression performs the task to predict a dependent variable value (y) based on a given
independent variable (x).
5. General function for Linear Regression :
1. Logistic regression is a statistical model that uses Logistic function to model the conditional
probability.
2. It is an example of supervised learning. It is used to calculate or predict the probability of a
binary (yes/no) event occurring.
3. outcome is a probability, the dependent variable is bounded between 0 and 1.
4. Logistic regression is used for solving the classification problems.
5. A logistic regression model predicts a dependent data variable by analyzing the relationship
between one or more existing independent variables.
Example, a logistic regression could be used to predict whether a
• Political candidate will win or lose an election.
• whether a high school student will be admitted or not to a particular college
• Whether an employee can buy a car or not based on salary.
Machine Learning Basic Methods: Logistic Regression
1. Support Vector Machine(SVM) is a supervised machine learning algorithm used for both
classification and regression.
2. The objective of the support vector machine algorithm is to find a hyperplane in an N-
dimensional space(N the number of features) that distinctly classifies the data points.
3. The dimension of the hyperplane depends upon the number of features.
4. SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.
5. If the number of input features is two, then the hyperplane is just a line. If the number of input
features is three, then the hyperplane becomes a 2-D plane. It becomes difficult to imagine when
the number of features exceeds three.
Support Vector Machines
f(x) = sign(wTx + b)
wTx + b = 0
wTx + b > 0
wTx + b < 0
Support Vector Machines Linear Separators
Linear Separators
Support Vector Machines Classification Margin
r
Support Vector Machines Maximum Margin Classification
• Let training set {(xi, yi)}i=1..n, xiRd, yi {-1, 1} be separated by a hyperplane with
margin ρ. Then for each training example (xi, yi):
• For every support vector xs the above inequality is an equality. After rescaling w and b by ρ/2 in
the equality, we obtain that distance between each xs and the hyperplane is
y (wT x + b) 1
r= s s
=
• Then the margin can be expressed through (rescaled) w and b as: w w
2
= 2r =
w
Support Vector Machines Advantages
Advantages of SVM:
1. Classification means categorizing data and forming groups based on the similarities.
2. In a dataset, the independent variables or features play a vital role in classifying data.
3. In multiclass classification, we have more than two classes in our dependent or target variable
4. algorithms such as Naïve Bayes, Decision trees, SVM, Random forest classifier, KNN,
It is a classification of two groups, i.e. There can be any number of classes in it, i.e.,
No. of classes
classifies objects in at most two classes. classifies the object into more than two classes.
The most popular algorithms used by the Popular algorithms that can be used for multi-
binary classification are- class classification include:
• Logistic Regression •k-Nearest Neighbors
Algorithms used •k-Nearest Neighbors •Decision Trees
•Decision Trees •Naive Bayes
•Support Vector Machine •Random Forest.
•Naive Bayes •Gradient Boosting