0% found this document useful (0 votes)

25 views21 pages

ML Unit-5

This document discusses unsupervised learning in machine learning, focusing on clustering and association rules. It covers key concepts such as distance measures (Euclidean, Manhattan, and Cosine), types of clustering algorithms (hierarchical and partitioning), and the K-means clustering method, including the elbow method for determining optimal clusters. Additionally, it highlights the advantages and disadvantages of unsupervised learning, compares it with supervised learning, and explains association rules and their components.

Uploaded by

Nimu Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views21 pages

ML Unit-5

Uploaded by

Nimu Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

UNIT-5

Clustering and Association Rules

5.1 Unsupervised learning
• Unsupervised learning is a type of machine learning where the
algorithm learns from unlabelled data without any predefined
outputs or target variables.
• Unsupervised learning finds patterns, similarities, or groupings
within the data to get insights and make data-driven decisions.
• It is particularly useful when dealing with large datasets.

5.2 Working of Unsupervised Learning

5.3 Distance Measure
• Distance measure determines the similarity between two elements and it
influences the shape of the clusters.
• Some of the ways we can calculate distance measures include:
o Euclidean distance measure
o Manhattan distance measure
o Cosine distance measure

5.3.1 Euclidean distance measure

• The Euclidean distance formula helps to find the distance of a line segment.
Let us assume two points, such as (x1, y1) and (x2, y2) in the two-
dimensional coordinate plane.
• Thus, the Euclidean distance formula is given by:

• Euclidean Distance Examples

Find the distance between two points P(0, 4) and Q(6, 2).
Solution:
Given:
P(0, 4) = (x1, y1) , Q(6, 2) = (x2, y2)
The distance between the point PQ is
PQ = √[(x2 – x1)2 + (y2 – y1)2]
PQ = √[(6 – 0)2 + (2 – 4)2]
PQ = √[(6)2 + (-2)2]
PQ = √(36+4)
PQ = √40 and PQ = 2√10.
Therefore, the distance between two points P(0,4) and Q(6, 2) is 2√10.

5.3.2 Manhattan distance

• Manhattan Distance: The distance measured by calculating the sum of
absolute differences of two points or vectors.
• Suppose we have to find the Manhattan distance between points A (x₁, y₁)
and
• B (x₂, y₂). Then, it is given by Distance, d = |x₁ - x₂| + |y₁ - y₂|
• The Manhattan distance in a 2-dimensional space is given as:

• The generalized formula for an n-dimensional space is given as:

• Example: Calculate the Manhattan distance between Point P1(4,4) and

P2(9,9).

• Solution: The Manhattan distance between P₁ and P₂ is given by,

Given: First point, P₁ = (4, 4)

Second point, P₂ = (9, 9)

P₁P₂ = |x₁ - x₂| + |y₁ - y₂|
P₁P₂ = |4 - 9| + |4 - 9|
= |-5| + |-5|
=5+5
= 10 units
Hence, the Manhattan distance between point P₁ (4,4) and P₂ (9,9) is
10 units.
5.3.3 Cosine distance measure
• Cosine similarity is a metric that measures the cosine of the angle between
two vectors projected in a multi-dimensional space.
• The cosine similarity is described mathematically as the division between
the dot product of vectors and the product of the euclidean norms or
magnitude of each vector.
• Cosine similarity measures how similar two vectors are, and Cosine
distance measures how different they are. In real applications, it depends
on the task and what function to choose. You can use cosine similarity as a
loss function or as a measure for clustering

Where, A and B are vectors in a multidimensional space.

Since the 𝑐𝑜𝑠(𝜃) value is in the range [−1,1] :

• −1 value will indicate strongly opposite vectors i.e. no similarity

• 0 indicates independent (or orthogonal) vectors
• 1 indicates a high similarity between the vectors
• Cosine Distance: Usually, people use the cosine similarity as a
similarity metric between vectors. Now, the cosine distance can be
defined as follows:
Cosine Distance = 1 — Cosine Similarity

5.4. Types of Unsupervised Learning Algorithm

5.4.1 Clustering Algorithms

• Clustering is a type of unsupervised learning wherein data points are

grouped into different sets based on their degree of similarity.
• The various types of clustering are:
o Hierarchical clustering
o Partitioning clustering
• Hierarchical clustering is further subdivided into:
o Agglomerative clustering (bottom-up approach)
o Divisive clustering (top-down approach)
• Partitioning clustering is further subdivided into:
o K-Means clustering
o Fuzzy C-Means clustering

5.4.2 Hierarchical clustering

• Hierarchical clustering is a method of grouping similar objects into clusters

in a tree-like structure.
• Agglomerative clustering: is a bottom-up approach. We begin with each
element as a separate cluster and merge them into successively more
massive clusters.
• Divisive clustering: is a top-down approach. We begin with the whole set
and proceed to divide it into successively smaller clusters.
5.5 Unsupervised Learning Algorithms:

5.5.1 K-means clustering

• K-Means Clustering is an Unsupervised Learning algorithm, which groups

the unlabelled dataset into different clusters. Here K defines the number of
pre-defined clusters that need to be created in the process, if K=2, there
will be two clusters, and for K=3, there will be three clusters, and so on.
• It is a centroid-based algorithm, where each cluster is associated with a
centroid. The main aim of this algorithm is to minimize the sum of
distances between the data point and their corresponding clusters.

The algorithm below shows the simple algorithm of K-means

Step 1: Select K points in the data space and mark them as initial centroids
loop
Step 2: Assign each point in the data space to the nearest centroid to form K
clusters
Step 3: Measure the distance of each point in the cluster from the centroid
Step 4: Calculate the Sum of Squared Error (SSE) to measure the quality
of the clusters.
Step 5: Identify the new centroid of each cluster based on the distance between
points
Step 6: Repeat Steps 2 to 5 to refine until the centroids do not change the end
loop.
Choosing the Optimal Number of Clusters
The number of clusters that we choose for the algorithm shouldn’t be random.
Each and every cluster is formed by calculating and comparing the mean
distances of each data point within a cluster from its centroid.

We can choose the right number of clusters with the help of the Within-Cluster-
Sum-of-Squares (WCSS) method. WCSS stands for the sum of the squares of
distances of the data points in each and every cluster from its centroid.

The main idea is to minimize the distance (e.g., euclidean distance) between
the data points and the centroid of the clusters. The process is iterated until we
reach a minimum value for the sum of distances.

Elbow Method
Here are the steps to follow in order to find the optimal number of clusters
using the elbow method:
Step 1: Execute the K-means clustering on a given dataset for different K
values (ranging from 1-10).
Step 2: For each value of K, calculate the WCSS value.
Step 3: Plot a graph/curve between WCSS values and the respective number
of clusters K.
Step 4: The sharp point of bend or a point (looking like an elbow joint) of the
plot, like an arm, will be considered as the best/optimal value of K.

Let's understand the above steps by considering the visual plots:

Suppose we have two variables M1 and M2. The x-y axis scatter plot of these
two variables is given below:
o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put
them into different clusters. It means here we will try to group these
datasets into two different clusters.
o We need to choose some random k points or centroid to form the cluster.
These points can be either the points from the dataset or any other point.
So, here we are selecting the below two points as k points, which are not
the part of our dataset. Consider the below image:
o Now we will assign each data point of the scatter plot to its closest K-point
or centroid. We will compute it by applying some mathematics that we
have studied to calculate the distance between two points. So, we will draw
a median between both the centroids. Consider the below image:

From the above image, it is clear that points left side of the line is near to the
K1 or blue centroid, and points to the right of the line are close to the yellow
centroid. Let's colour them as blue and yellow for clear visualization.
o As we need to find the closest cluster, so we will repeat the process by
choosing a new centroid. To choose the new centroids, we will compute
the center of gravity of these centroids, and will find new centroids:

o Next, we will reassign each datapoint to the new centroid. For this, we will
repeat the same process of finding a median line. The median will be like
From the above image, we can see, one yellow point is on the left side of
the line, and two blue points are right to the line. So, these three points will
be assigned to new centroids.

reassignment has taken place, so we will again go to the step-4, which is

finding new centroids or K-points.
o We will repeat the process by finding the center of gravity of centroids, so
the new centroids will be as shown in the below image:
o As we got the new centroids so again will draw the median line and
reassign the data points. So, the image will be:

o We can see in the above image; there are no dissimilar data points on either
side of the line, which means our model is formed. Consider the below
image:
As our model is ready, so we can now remove the assumed centroids, and the two
final clusters will be as shown in the below image:

• Challenges of Unsupervised Learning

While unsupervised learning has many benefits, some challenges can occur
when it allows machine learning models to execute without any human
intervention. Some of these challenges can include:
o Computational complexity due to a high volume of training data
o Longer training times
o Higher risk of inaccurate results
o Human intervention to validate output variables
o Lack of transparency into the basis on which data was clustered

5.6. Advantages and Disadvantages Unsupervised Learning

• Advantages Unsupervised Learning
o Use of Unlabelled Data
Unsupervised learning helps us to find hidden patterns or structures
in data that don’t have any labels. It gives us valuable insights and
knowledge by uncovering meaningful connections and information
that we may not have noticed before.

o Scalability
Unsupervised learning algorithms handle large-scale datasets without
manual labelling and make them more scalable than supervised
learning in certain scenarios. Unsupervised learning algorithms
handle large-scale datasets without manual labelling and make them
more scalable than supervised learning in certain scenarios.
Unsupervised learning algorithms handle large-scale datasets without
manual labeling and make them more scalable than supervised
learning in certain scenarios.
o Anomaly Detection
Unsupervised learning can effectively detect anomalies or outliers in
data, which is particularly useful for fraud detection, network security,
or identifying rare events.
o Data Preprocessing
Unsupervised learning techniques like dimensionality reduction can
help preprocess data by reducing noise, removing irrelevant features,
and improving efficiency in subsequent supervised learning tasks.

• Disadvantages of Unsupervised Learning

Unsupervised learning has some limitations and challenges:
o Lack of Ground Truth
Since unsupervised learning deals with unlabelled data, there is no
definitive measure of correctness or accuracy. Evaluation and
interpretation of results become subjective and rely heavily on domain
expertise.

o Interpretability
Unsupervised learning algorithms often provide clusters or patterns
without explicit labels or explanations. Interpreting and understanding
the meaning of these clusters can be challenging and subjective.
o Overfitting and Model Selection

Unsupervised learning models are susceptible to overfitting and

choosing the optimal model or parameters can be challenging due to
the absence of a labeled validation set.

o Limited Guidance
Unlike supervised learning, where the algorithm learns from explicit
feedback, unsupervised learning lacks explicit guidance, which can
result in the algorithm discovering irrelevant or noisy patterns.
5.7. Difference between Supervised and Unsupervised learning

Supervised Learning Unsupervised Learning

Supervised learning algorithms are Unsupervised learning algorithms

trained using labelled data. are trained using unlabelled data.

The supervised learning model takes The unsupervised learning model

direct feedback to check if it is does not take any feedback.
predicting the correct output or not.
The supervised learning model The unsupervised learning model
predicts the output. finds the hidden patterns in data.

In supervised learning, input data is In unsupervised learning, only

provided to the model along with the input data is provided to the
output. model.

The goal of supervised learning is to The goal of unsupervised learning

train the model so that it can predict is to find hidden patterns and
the output when it is given new data. useful insights from the unknown
dataset.

Supervised learning needs Unsupervised learning does not

supervision to train the model. need any supervision to train the
model.

Supervised learning can be Unsupervised Learning can be

categorized classified
into Classification and Regression p in Clustering and Association pr
roblems. oblems.

Supervised learning can be used for Unsupervised learning can be

those cases where we know the input used for those cases where we
as well as corresponding outputs. have only input data and no
corresponding output data.

A supervised learning model Unsupervised learning models

produces an accurate result. may give less accurate results as
compared to supervised learning.
It includes various algorithms such as It includes various algorithms
Linear Regression, Logistic such as Hierarchical Clustering,
Regression, Support Vector Machine, KNN, and Apriori algorithm.
Multi-class Classification, Decision
tree, Bayesian Logic, etc.

5.8 Association rule

• The Association rule is a learning technique that helps identify the

dependencies between two data items. Based on the dependency, it then
maps accordingly so that it can be more profitable. The association rule
furthermore looks for interesting associations among the variables of the
dataset. It is undoubtedly one of the most important concepts of Machine
Learning and has been used in different cases such as association in data
mining and continuous production, among others. However, like all other
techniques, association in data mining, too, has its own set of
disadvantages.
• An Association rule has 2 parts:
o an antecedent (if) and
o a consequent (then)
• An antecedent is something that’s found in data, and a consequent is an
item that is found in combination with the antecedent. Have a look at this
rule for instance:
“If a customer buys bread, he’s 70% likely to buy milk.”
In the above association rule, bread is the antecedent and milk is the
consequent. Simply put, it can be understood as a retail store’s association
rule to target their customers better. If the above rule is a result of a
thorough analysis of some data sets, it can be used to not only improve
customer service but also improve the company’s revenue.
• Association rules are created by thoroughly analyzing data and looking for
frequent if/then patterns. Then, depending on the following two
parameters, the important relationships are observed:
o Support: Support indicates how frequently the if/then relationship
appears in the database.
o Confidence: Confidence tells about the number of times these
relationships are true.

5.8.1 Types of Association Rules

There are typically four different types of association rules in data
mining. They are
o Multi-relational association rules
o Generalized Association rule
o Quantitative Association Rules

• Multi-Relational Association Rule

Also known as MRAR, the multi-relational association rule is defined as
a new class of association rules that are usually derived from different or
multi-relational databases. Each rule under this class has one entity with
different relationships that represent the indirect relationships between
entities.
• Generalized Association Rule
o Moving on to the next type of association rule, the generalized
association rule is largely used for getting a rough idea about the
interesting patterns that often tend to stay hidden in data.
• Quantitative Association Rules
o This particular type is one of the most unique kinds of all the four
association rules available. What sets it apart from the others is
the presence of numeric attributes in at least one attribute of
quantitative association rules. This is in contrast to the
generalized association rule, where the left and right sides consist
of categorical attributes.

• Algorithms Of Associate Rule

Apriori Algorithm
o The name of the algorithm is based on the fact that the algorithm
uses prior knowledge of frequent itemset properties, as we shall
see later.
o Support
The rule A→B holds in the transaction set D with supports, where
s is the percentage of transactions in D that contain A U B (i.e. the
union of sets A and B say, or, both A and B).
Support (A=>B) = P(A U B).
o Confidence
The rule A→B has confidence c in the transaction set D, where c is
the percentage of transactions in D containing A that also contains
B.

Example Let’s look at a concrete example, based on the All Electronics

transaction database, D, of the Table below.
Consider the following dataset and we will find frequent itemsets and
generate association rules for them.
minimum support count is 2
minimum confidence is 60%

Create a table containing support count of each item present in dataset –

Called C1(candidate set)

Step-2: K=2
Generate candidate set C2 using L1 (this is called join step). Condition of
joining Lk-1 and Lk-1 is that it should have (K-2) elements in common.
Check all subsets of an itemset are frequent or not and if not frequent
remove that itemset.(Example subset of{I1, I2} are {I1}, {I2} they are
frequent.Check for each itemset)
Now find support count of these itemsets by searching in dataset.
Step-3:
Step 2: Generating Association Rules

5 - CH 5-K-Means Clustering
No ratings yet
5 - CH 5-K-Means Clustering
54 pages
Unit 3 Unsupervised Learning & Neural Network
No ratings yet
Unit 3 Unsupervised Learning & Neural Network
21 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
78 pages
Unit IV
No ratings yet
Unit IV
51 pages
Unit V
No ratings yet
Unit V
165 pages
K Means
No ratings yet
K Means
26 pages
02 - KNN & Regression
No ratings yet
02 - KNN & Regression
40 pages
Unit 2
No ratings yet
Unit 2
89 pages
3.k-Metoids and Hierarchical Updated
No ratings yet
3.k-Metoids and Hierarchical Updated
50 pages
ML Ch-5 Clustering, Dimensionality Reduction and Recommender System
No ratings yet
ML Ch-5 Clustering, Dimensionality Reduction and Recommender System
13 pages
Unit Ii
No ratings yet
Unit Ii
26 pages
Building The Future Telco: Simplify - Automate - Innovate
100% (2)
Building The Future Telco: Simplify - Automate - Innovate
69 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
Clustering
No ratings yet
Clustering
23 pages
Machine Learning Unsupervised
No ratings yet
Machine Learning Unsupervised
20 pages
Clustering
No ratings yet
Clustering
24 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
Unit IV
No ratings yet
Unit IV
96 pages
Clustering Part1
No ratings yet
Clustering Part1
19 pages
ML - Unit - 2
No ratings yet
ML - Unit - 2
13 pages
Kmea
No ratings yet
Kmea
53 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
20 pages
K - Means Clustering
No ratings yet
K - Means Clustering
13 pages
PART2
No ratings yet
PART2
61 pages
7 Cluster Analysis
No ratings yet
7 Cluster Analysis
62 pages
Algo
No ratings yet
Algo
59 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
Clustering - K-Means: Prerequisite
No ratings yet
Clustering - K-Means: Prerequisite
8 pages
SIP Report Format - BBA-bcom 2022 Batch
No ratings yet
SIP Report Format - BBA-bcom 2022 Batch
13 pages
DM&BAFall2204 2
No ratings yet
DM&BAFall2204 2
61 pages
Unsupervised Learning Final
No ratings yet
Unsupervised Learning Final
17 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
Week 9
No ratings yet
Week 9
66 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
Week 9 - Clustering
No ratings yet
Week 9 - Clustering
63 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
MSC Management Dissertation Examples
100% (2)
MSC Management Dissertation Examples
8 pages
AL3451 Assignment Question1
No ratings yet
AL3451 Assignment Question1
3 pages
Business Research Chapter 2
100% (1)
Business Research Chapter 2
41 pages
ML Unit5 Notes
No ratings yet
ML Unit5 Notes
18 pages
Unit 4
No ratings yet
Unit 4
22 pages
K Means
No ratings yet
K Means
25 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
K-Mean Clustering
No ratings yet
K-Mean Clustering
8 pages
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
Kmean
No ratings yet
Kmean
24 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
Biostatistics Course Work
67% (3)
Biostatistics Course Work
4 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
Unit 4 Machine Learning
No ratings yet
Unit 4 Machine Learning
12 pages
K Means Clustering
No ratings yet
K Means Clustering
17 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
BA4101 STATISTICS FOR MANAGEMENT Reg 21 Question Bank
No ratings yet
BA4101 STATISTICS FOR MANAGEMENT Reg 21 Question Bank
11 pages
Foundit Profile - Rohan Singh
No ratings yet
Foundit Profile - Rohan Singh
1 page
Research Methodology Multiple Choice Questions
89% (247)
Research Methodology Multiple Choice Questions
313 pages
PRD Swiggy AI Based Recommendations 1727247837
No ratings yet
PRD Swiggy AI Based Recommendations 1727247837
9 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Statistics Review
0% (1)
Statistics Review
5 pages
Lecture 11 K Means Clustering
No ratings yet
Lecture 11 K Means Clustering
8 pages
Additional Textual Learning Material - B5
No ratings yet
Additional Textual Learning Material - B5
114 pages
CHP 10 Research Methods Part III
No ratings yet
CHP 10 Research Methods Part III
43 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
Time Series Analysis
No ratings yet
Time Series Analysis
9 pages
Simple K Means
No ratings yet
Simple K Means
3 pages
DWM Project
No ratings yet
DWM Project
7 pages
Business and Data Analytics
No ratings yet
Business and Data Analytics
4 pages
Training Needs Assessment Methodology
No ratings yet
Training Needs Assessment Methodology
170 pages
English Exam Literature Daf 202308
No ratings yet
English Exam Literature Daf 202308
20 pages
Sta5176 SAS Exam1 Fall 2024 Report Nafees
No ratings yet
Sta5176 SAS Exam1 Fall 2024 Report Nafees
8 pages
Examine
No ratings yet
Examine
19 pages
Group3 - Pilgrim Bank (A) Customer Profitability
No ratings yet
Group3 - Pilgrim Bank (A) Customer Profitability
13 pages
C1000-154 STU C1000154v2STUSGC1000154
No ratings yet
C1000-154 STU C1000154v2STUSGC1000154
10 pages
.Sem I - Statistical Foundations With Excel - 2024-25
No ratings yet
.Sem I - Statistical Foundations With Excel - 2024-25
4 pages
Chi-Square Test of Independence
No ratings yet
Chi-Square Test of Independence
46 pages
Evaluating Fracture-Fluid Flowback in Marcellus Using Data-Mining Technologies
No ratings yet
Evaluating Fracture-Fluid Flowback in Marcellus Using Data-Mining Technologies
14 pages
Handling Non-Normal Data in Structural Equation Modeling
No ratings yet
Handling Non-Normal Data in Structural Equation Modeling
5 pages
Predictive Accuracy: A Misleading Performance Measure For Highly Imbalanced Data
No ratings yet
Predictive Accuracy: A Misleading Performance Measure For Highly Imbalanced Data
12 pages
MTH 2500 hwk4
No ratings yet
MTH 2500 hwk4
6 pages
Villanueva BSA22 LaboratoryExercise3
No ratings yet
Villanueva BSA22 LaboratoryExercise3
2 pages
Student Solutions Manual for Mathematics for Economics, fourth edition
From Everand
Student Solutions Manual for Mathematics for Economics, fourth edition
Michael Hoy
No ratings yet
Physics Experiment: Експериментальні роботи, #2
From Everand
Physics Experiment: Експериментальні роботи, #2
Yuliia Derid
No ratings yet
Two Dimensional Geometric Model: Understanding and Applications in Computer Vision
From Everand
Two Dimensional Geometric Model: Understanding and Applications in Computer Vision
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ML Unit-5

Uploaded by

ML Unit-5

Uploaded by

UNIT-5

Clustering and Association Rules

5.2 Working of Unsupervised Learning

5.3.1 Euclidean distance measure

• Euclidean Distance Examples

5.3.2 Manhattan distance

• The generalized formula for an n-dimensional space is given as:

• Example: Calculate the Manhattan distance between Point P1(4,4) and

• Solution: The Manhattan distance between P₁ and P₂ is given by,

Second point, P₂ = (9, 9)

Where, A and B are vectors in a multidimensional space.

• −1 value will indicate strongly opposite vectors i.e. no similarity

5.4. Types of Unsupervised Learning Algorithm

5.4.1 Clustering Algorithms

• Clustering is a type of unsupervised learning wherein data points are

5.4.2 Hierarchical clustering

• Hierarchical clustering is a method of grouping similar objects into clusters

5.5.1 K-means clustering

• K-Means Clustering is an Unsupervised Learning algorithm, which groups

The algorithm below shows the simple algorithm of K-means

Let's understand the above steps by considering the visual plots:

reassignment has taken place, so we will again go to the step-4, which is

• Challenges of Unsupervised Learning

5.6. Advantages and Disadvantages Unsupervised Learning

• Disadvantages of Unsupervised Learning

Unsupervised learning models are susceptible to overfitting and

Supervised Learning Unsupervised Learning

Supervised learning algorithms are Unsupervised learning algorithms

The supervised learning model takes The unsupervised learning model

In supervised learning, input data is In unsupervised learning, only

The goal of supervised learning is to The goal of unsupervised learning

Supervised learning needs Unsupervised learning does not

Supervised learning can be Unsupervised Learning can be

Supervised learning can be used for Unsupervised learning can be

A supervised learning model Unsupervised learning models

5.8 Association rule

• The Association rule is a learning technique that helps identify the

5.8.1 Types of Association Rules

• Multi-Relational Association Rule

• Algorithms Of Associate Rule

Example Let’s look at a concrete example, based on the All Electronics

Create a table containing support count of each item present in dataset –

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.