Dsbdunitiii T1729232981820-1
Dsbdunitiii T1729232981820-1
For example, a labelled dataset of images of Elephant, Camel and Cow would
have each image tagged with either “Elephant”, “Camel “or “Cow.”
Training Phase: The algorithm stores all the training data points and their
corresponding labels.
Classification Phase:
For a given test data point, the algorithm calculates the distance between this
point and all the training data points.
It then selects the (k) nearest neighbors (where (k) is a user-defined constant).
The test data point is assigned the class that is most common among its (k)
nearest neighbors.
Key Features
Non-parametric: k-NN does not make any assumptions about the underlying
data distribution.
Example 1
Example 2
Imagine we have a dataset with two features: Brightness and Saturation, and two
classes: Red and Blue. Here’s a simplified version of the dataset:
Table
40 20 Red
50 50 Blue
60 90 Blue
10 25 Red
70 70 Blue
60 10 Red
25 80 Blue
Steps:
Calculate the Distance: Compute the distance between the new data point
and all other points in the dataset. We’ll use the Euclidean distance formula:
Find the Nearest Neighbors: Identify the 3 nearest neighbors to the new data
point based on the calculated distances.
Majority Voting: Assign the class of the new data point based on the majority
class of its 3 nearest neighbors.
Calculation:
≈29.15
≈7.07
≈45.28
≈49.25
Distance to (70, 70):
≈29.15
≈35.36
≈46.10
Nearest Neighbors:
Majority Voting:
Out of the 3 nearest neighbors, 2 are Blue and 1 is Red. Therefore, the new
data point will be classified as Blue.
Applications
Pattern Recognition
Data Mining
Intrusion Detection
Decision tree
A decision tree classifier is a type of supervised learning algorithm used for classification
tasks. It works by splitting the data into subsets based on the values of the input features,
creating a tree-like model of decisions.
Structure of a Decision Tree
Root Node: Represents the entire dataset and the initial decision to be made.
Internal Nodes: Represent decisions or tests on attributes.
Branches: Represent the outcome of a decision or test, leading to another node.
Leaf Nodes: Represent the final decision or prediction.
How It Works
Selecting the Best Attribute: Using metrics like Gini impurity, entropy, or
information gain, the best attribute to split the data is selected.
Gini impurity and entropy are both metrics used to measure the quality of a split in
decision trees, which are popular models in machine learning for classification and
regression tasks.
Gini impurity and entropy are both metrics used in decision trees to measure the
impurity or disorder of a dataset, helping to determine the best splits at each node
Entropy
Splitting the Dataset: The dataset is split into subsets based on the selected
attribute.
Repeating the Process: This process is repeated recursively for each subset,
creating new internal nodes or leaf nodes until a stopping criterion is met
(e.g., all instances in a node belong to the same class or a predefined depth is
reached).
Pruning
The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future. This best decision
boundary is called a hyperplane.
Since SVM is a supervised algorithm, it uses a labeled dataset to
train itself. A labeled dataset is one in which the output data is
already present for the input data.
And when the new data comes in, the algorithm makes the
prediction based on the learnings from the labeled dataset. This can
be seen in the above diagram.
This line that separates the two classes is known as the “Decision
Boundary”. And in generalized form, it is known as the
“Hyperplane”.
Let the distance between the hyperplane and the parallel line of class
A (circle class) be D1. Similarly, let the distance between the
hyperplane and the parallel line of class B (square class) be D2.
This margin has a huge significance when deciding which is the best
hyperplane that should be used while making predictions.
Support Vectors
The data points touching the marginal lines are known as
the Support Vectors.
These are the data points closest to the hyperplane that was
considered while drawing the marginal lines parallel to the
hyperplane.
Please note that there can be multiple data points
touching the marginal line. And hence there can be
multiple support vectors present at the same time.
Non-Linear SVM:If data is linearly arranged, then we can separate it by using a straight
line, but for non-linear data, we cannot draw a single straight line. Consider the below
image:
So to separate these data points, we need to add one more dimension. For linear data,
we have used two dimensions x and y, so for non-linear data, we will add a third
dimension z. It can be calculated as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image:
So now, SVM will divide the datasets into classes in the following way. Consider the
below image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we
convert it in 2d space with z=1, then it will become as:
Key Concepts:
Classification and Regression: SVMs can be used to classify data into different
categories or predict continuous values.
Hyperplane: The main goal of an SVM is to find the optimal hyperplane that
best separates the data into different classes. In a 2D space, this hyperplane is
a line, while in higher dimensions, it becomes a plane or a hyperplane.
Margin: SVM aims to maximize the margin, which is the distance between the
hyperplane and the nearest data points from each class. This helps in achieving
better generalization on unseen data.
Support Vectors: These are the data points that are closest to the hyperplane
and influence its position and orientation. They are critical in defining the
optimal hyperplane.
Applications:
Text Classification: Categorizing emails as spam or not spam.
Bagging
Bagging (Bootstrap Aggregating) is an ensemble learning technique designed to
improve the stability and accuracy of machine learning models. Here’s a
breakdown of how it works and its benefits:
Aggregating Predictions: The predictions from all the models are combined. For
classification tasks, this is typically done by majority voting, and for regression
tasks, by averaging the predictions.
Benefits of Bagging
Boosting
Sequential Training: Train the first weak learner on the data. Evaluate its
performance and increase the weights of misclassified instances.
Combining Results: Aggregate the predictions of all weak learners to form the
final output, typically using weighted voting.
Types of Boosting Algorithms
Gradient Boosting:
Each new model is trained to correct the errors of the previous models.
Variants include Gradient Boosting Machines (GBM) and XGBoost, which are
known for their high performance and efficiency.
Applications
Bagging and boosting are both ensemble techniques in machine learning that
aim to improve the performance of models by combining multiple weak
learners. However, they do so in different ways
In a binary classification task, the goal is to classify the input data into two
mutually exclusive categories. The training data in such a situation is labeled
in a binary format: true and false; positive and negative; O and 1; spam and
not spam, etc. depending on the problem being tackled. For instance, we
might want to detect whether a given image is a truck or a boat.
Examples:
Definition: Involves classifying data into one of three or more possible classes.
Multi-Label Classification