0% found this document useful (0 votes)
7 views26 pages

Dsbdunitiii T1729232981820-1

Uploaded by

apdeshmukh371122
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views26 pages

Dsbdunitiii T1729232981820-1

Uploaded by

apdeshmukh371122
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Unit III Classification

Supervised learning, as the name indicates, has the presence of a supervisor as


a teacher. Supervised learning is when we teach or train the machine using
data that is well-labelled. Which means some data is already tagged with the
correct answer. After that, the machine is provided with a new set of
examples(data) so that the supervised learning algorithm analyses the training
data(set of training examples) and produces a correct outcome from labeled
data.

For example, a labelled dataset of images of Elephant, Camel and Cow would
have each image tagged with either “Elephant”, “Camel “or “Cow.”

The k-nearest neighbors (k-NN) algorithm is a simple yet powerful supervised


learning method used for both classification and regression tasks. Here’s a
brief overview:

How k-NN Works

Training Phase: The algorithm stores all the training data points and their
corresponding labels.

Classification Phase:
For a given test data point, the algorithm calculates the distance between this
point and all the training data points.

It then selects the (k) nearest neighbors (where (k) is a user-defined constant).

The test data point is assigned the class that is most common among its (k)
nearest neighbors.

Key Features

Non-parametric: k-NN does not make any assumptions about the underlying
data distribution.

Distance Metrics: Commonly used metrics include Euclidean, Manhattan, and


Minkowski distances.

Versatility: It can handle both numerical and categorical data.

Example 1
Example 2

Imagine we have a dataset with two features: Brightness and Saturation, and two
classes: Red and Blue. Here’s a simplified version of the dataset:

Table

Brightness Saturation Class

40 20 Red

50 50 Blue

60 90 Blue

10 25 Red

70 70 Blue

60 10 Red

25 80 Blue

Now, we want to classify a new data point with Brightness = 55 and


Saturation = 45. We’ll use the KNN algorithm with ( k = 3 ).

Steps:

Calculate the Distance: Compute the distance between the new data point
and all other points in the dataset. We’ll use the Euclidean distance formula:

Find the Nearest Neighbors: Identify the 3 nearest neighbors to the new data
point based on the calculated distances.
Majority Voting: Assign the class of the new data point based on the majority
class of its 3 nearest neighbors.

Calculation:

Let’s calculate the distances:

Distance to (40, 20):

≈29.15

Distance to (50, 50):

≈7.07

Distance to (60, 90):

≈45.28

Distance to (10, 25):

≈49.25
Distance to (70, 70):

≈29.15

Distance to (60, 10):

≈35.36

Distance to (25, 80):

≈46.10

Nearest Neighbors:

The 3 nearest neighbors are:

(50, 50) - Blue

(40, 20) - Red

(70, 70) - Blue

Majority Voting:

Out of the 3 nearest neighbors, 2 are Blue and 1 is Red. Therefore, the new
data point will be classified as Blue.

Applications

Pattern Recognition

Data Mining

Intrusion Detection

Decision tree
A decision tree classifier is a type of supervised learning algorithm used for classification
tasks. It works by splitting the data into subsets based on the values of the input features,
creating a tree-like model of decisions.
Structure of a Decision Tree

 Root Node: Represents the entire dataset and the initial decision to be made.
 Internal Nodes: Represent decisions or tests on attributes.
 Branches: Represent the outcome of a decision or test, leading to another node.
 Leaf Nodes: Represent the final decision or prediction.
How It Works

 Selecting the Best Attribute: Using metrics like Gini impurity, entropy, or
information gain, the best attribute to split the data is selected.

Gini impurity and entropy are both metrics used to measure the quality of a split in
decision trees, which are popular models in machine learning for classification and
regression tasks.

Gini impurity and entropy are both metrics used in decision trees to measure the
impurity or disorder of a dataset, helping to determine the best splits at each node

Formula: The Gini impurity for a node ( t ) is calculated as:

where ( p_i ) is the probability of an element being classified as class ( i ) and ( C ) is


the total number of classes.

Entropy

Definition: Entropy measures the amount of uncertainty or disorder in the


dataset. It quantifies the impurity in a more information-theoretic sense.

Formula: The entropy for a node ( t ) is calculated as

where ( p_i ) is the probability of an element being classified as class ( i ) and


( C ) is the total number of classes.

 Splitting the Dataset: The dataset is split into subsets based on the selected
attribute.
 Repeating the Process: This process is repeated recursively for each subset,
creating new internal nodes or leaf nodes until a stopping criterion is met
(e.g., all instances in a node belong to the same class or a predefined depth is
reached).

Pruning

To overcome overfitting, pruning techniques are used. Pruning reduces


the size of the tree by removing nodes that provide little power in
classifying instances. There are two main types of pruning:
Pre-pruning (Early Stopping): Stops the tree from growing once it meets
certain criteria (e.g., maximum depth, minimum number of samples per
leaf).

Post-pruning: Removes branches from a fully grown tree that do not


provide significant power.

Support vector machine


A Support Vector Machine (SVM) is a supervised machine learning algorithm
used for classification and regression tasks. Here’s a brief overview:

The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future. This best decision
boundary is called a hyperplane.
Since SVM is a supervised algorithm, it uses a labeled dataset to
train itself. A labeled dataset is one in which the output data is
already present for the input data.

And when the new data comes in, the algorithm makes the
prediction based on the learnings from the labeled dataset. This can
be seen in the above diagram.

Let’s understand the basic terminologies of SVM with the help of a


classification problem.

Suppose we have a dataset in which the data points belong to two


classes i.e. circle class and square class.
Using SVM, our goal is to identify whether the new data
point which comes in belongs to the circle class or the
square class.

Hyperplane (Decision Boundary)

As the first step in SVM, we want to separate the 2 classes (circles


and squares). We don’t want the data points of the 2 classes to mix
with each other.

So how do we separate the two classes?


We draw a separating line between the two classes such that the data
points belonging to the circle class are on one side of the line and the
data points belonging to the square class are on the other side of the
line.

This line that separates the two classes is known as the “Decision
Boundary”. And in generalized form, it is known as the
“Hyperplane”.

Why is it called a decision boundary?

It is called a decision boundary because it acts as a boundary


between the two classes and it decides if the newly arrived data
points belong to the circle class or the square class.

In the next step, we have to identify the data points in


each class that are closest to the hyperplane.
After identifying the data points that are closest to the hyperplane,
we will draw a line in such a way that this line just touches the
closest data point and is parallel to the hyperplane. This
step is done on both sides of the hyperplane i.e. for both classes.

Let the distance between the hyperplane and the parallel line of class
A (circle class) be D1. Similarly, let the distance between the
hyperplane and the parallel line of class B (square class) be D2.

When we sum up these two distances, we get the new distance


known as the “Margin” or the “Marginal Distance”.
Hence, the Margin is nothing but:

This margin has a huge significance when deciding which is the best
hyperplane that should be used while making predictions.

Support Vectors
The data points touching the marginal lines are known as
the Support Vectors.
These are the data points closest to the hyperplane that was
considered while drawing the marginal lines parallel to the
hyperplane.
Please note that there can be multiple data points
touching the marginal line. And hence there can be
multiple support vectors present at the same time.
Non-Linear SVM:If data is linearly arranged, then we can separate it by using a straight
line, but for non-linear data, we cannot draw a single straight line. Consider the below

image:
So to separate these data points, we need to add one more dimension. For linear data,
we have used two dimensions x and y, so for non-linear data, we will add a third
dimension z. It can be calculated as:

z=x2 +y2
By adding the third dimension, the sample space will become as below image:

So now, SVM will divide the datasets into classes in the following way. Consider the
below image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we
convert it in 2d space with z=1, then it will become as:

Key Concepts:

Classification and Regression: SVMs can be used to classify data into different
categories or predict continuous values.

Hyperplane: The main goal of an SVM is to find the optimal hyperplane that
best separates the data into different classes. In a 2D space, this hyperplane is
a line, while in higher dimensions, it becomes a plane or a hyperplane.

Margin: SVM aims to maximize the margin, which is the distance between the
hyperplane and the nearest data points from each class. This helps in achieving
better generalization on unseen data.

Support Vectors: These are the data points that are closest to the hyperplane
and influence its position and orientation. They are critical in defining the
optimal hyperplane.

Applications:
Text Classification: Categorizing emails as spam or not spam.

Image Classification: Identifying objects in images.

Handwriting Recognition: Recognizing handwritten characters.

Bioinformatics: Classifying genes or proteins.

A Support Vector Machine (SVM) is a supervised machine learning algorithm


used for classification and regression tasks. Here’s a brief overview:

Bagging
Bagging (Bootstrap Aggregating) is an ensemble learning technique designed to
improve the stability and accuracy of machine learning models. Here’s a
breakdown of how it works and its benefits:

How Bagging Works

M1, M2 and M3 are weak learners.

Bootstrap Sampling: Multiple subsets of the training data are created by


randomly sampling with replacement. This means some data points may
appear multiple times in a subset, while others may not appear at all.
Training Multiple Models: Each subset is used to train a separate model
independently and in parallel.

Aggregating Predictions: The predictions from all the models are combined. For
classification tasks, this is typically done by majority voting, and for regression
tasks, by averaging the predictions.

Benefits of Bagging

Reduces Variance: By training multiple models on different subsets of the data,


bagging reduces the variance of the overall model, making it less sensitive to
fluctuations in the training data.

Prevents Overfitting: Since each model is trained on a different subset of the


data, the ensemble model is less likely to overfit compared to a single model
trained on the entire dataset.

Improves Accuracy: The combined predictions of multiple models often result


in better performance than any single model.

Boosting

Boosting is a powerful ensemble technique in machine learning designed to


improve the accuracy of predictive models by combining multiple weak
learners into a single strong learner. Here’s a breakdown of how it works and
some common algorithms:

How Boosting Works

Initial Weights: Assign initial weights to all data points.

Sequential Training: Train the first weak learner on the data. Evaluate its
performance and increase the weights of misclassified instances.

Iterative Process: Repeat the process of adjusting weights and training


subsequent learners. Each new model focuses on the weaknesses of the
ensemble so far.

Combining Results: Aggregate the predictions of all weak learners to form the
final output, typically using weighted voting.
Types of Boosting Algorithms

AdaBoost (Adaptive Boosting):

One of the first boosting algorithms.

Focuses on reweighting the training examples each time a learner is added,


putting more emphasis on incorrectly classified instances.

Particularly effective for binary classification problems.

Gradient Boosting:

Builds models sequentially and corrects errors along the way.

Each new model is trained to correct the errors of the previous models.

Variants include Gradient Boosting Machines (GBM) and XGBoost, which are
known for their high performance and efficiency.

XGBoost (Extreme Gradient Boosting):

An optimized version of gradient boosting.

Known for its speed and performance.

Incorporates regularization to prevent overfitting.

Applications

Classification: Boosting is widely used in classification tasks, such as spam


detection, sentiment analysis, and image recognition.

Regression: It is also used in regression tasks to predict continuous outcomes,


like house prices or stock prices.

Boosting has significantly impacted machine learning by enhancing model


accuracy and robustness

Bagging and boosting are both ensemble techniques in machine learning that
aim to improve the performance of models by combining multiple weak
learners. However, they do so in different ways

Bagging:Reduce variance and prevent overfitting.


Boosting:Reduce both bias and variance by focusing on difficult-to-predict
instances

Definition: Involves classifying data into one of two possible classes

In a binary classification task, the goal is to classify the input data into two
mutually exclusive categories. The training data in such a situation is labeled
in a binary format: true and false; positive and negative; O and 1; spam and
not spam, etc. depending on the problem being tackled. For instance, we
might want to detect whether a given image is a truck or a boat.
Examples:

Spam detection (spam vs. not spam)

Disease diagnosis (disease vs. no disease)

Algorithms: Logistic Regression, Support Vector Machines (SVM), Decision


Trees, etc.
Multiclass Classification

Definition: Involves classifying data into one of three or more possible classes.

One-versus-one: this strategy trains as many classifiers as there are pairs of


labels. If we have a 3-class classification, we will have three pairs of labels, thus
three classifiers, as shown below.
In general, for N labels, we will have Nx(N-1)/2 classifiers. Each classifier is
trained on a single binary dataset, and the final class is predicted by a majority
vote between all the classifiers. One-vs-one approach works best for SVM and
other kernel-based algorithms.

One-versus-rest: at this stage, we start by considering each label as an


independent label and consider the rest combined as only one label. With 3-
classes, we will have three classifiers.
In general, for N labels, we will have N binary classifiers.

Multi-Label Classification

In multi-label classification tasks, we try to predict 0 or more classes for each


input example. In this case, there is no mutual exclusion because the input
example can have more than one label.

Such a scenario can be observed in different domains, such as auto-tagging in


Natural Language Processing, where a given text can contain multiple topics.
Similarly to computer vision, an image can contain multiple objects, as
illustrated below: the model predicted that the image contains: a plane, a
boat, a truck, and a dog.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy