Clustering Part-A
Clustering Part-A
1
Someone Messed Up the Library!
French
German
Spanish
Someone Messed Up the Library!
French
German
Spanish
Supervised Learning
Feature 2
Feature 1
Unsupervised Learning
Feature 2
Feature 1
Unsupervised Learning
Supervised vs Unsupervised Learning
Properties Unsupervised Learning Supervised Learning
Definition Type of machine learning Type of machine learning that
that happens without happens under human supervision,
human supervision and meaning people label input data
machine tries to find out with answer keys, that guided/
the patterns in data itself supervises the machine to learn the
desired outputs
Input data Unlabeled Labeled
Use of data Model is given only the Model is given input variables (X),
input variables (X) and no output variables (Y) and an
corresponding output data algorithm to learn the function
from input to output
7
Supervised vs Unsupervised
Learning
Properties Unsupervised Learning Supervised Learning
When to use You don’t know what you You know what you are looking for
are looking for in data in data
9
When Unsupervised Learning?
• If you need to identify patterns and relationships in data
10
Applications of Unsupervised Learning
• Medical diagnosis
• Customer segmentation
• Recommendation systems
• Anomaly Detection
• Cyber security
11
Applications of Unsupervised Learning
• Medical diagnosis
12
Applications of Unsupervised Learning
• Customer segmentation
13
Applications of Unsupervised Learning
• Recommendation systems
14
Applications of Unsupervised Learning
• Anomaly Detection
15
Applications of Unsupervised Learning
• Cyber security (data preparation for unknown threats)
16
Applications of Unsupervised Learning
• Preparing data for supervised learning (Image segmentation)
17
Unsupervised ML Approaches
• Clustering: identifies similarities and differences between
unlabelled data entries and groups them based on their
properties.
18
Unsupervised ML Approaches
• Clustering: identifies similarities and differences between
unlabelled data entries and groups them based on their
properties.
19
Unsupervised ML Approaches
• Dimensionality reduction: reduces some data while
maintaining the integrity of a data, when there's so much data
to analyse which may reduce the algorithms' performance.
20
Unsupervised ML Approaches
• Association: can find relationships between variables, i.e.
identifies sets of items which often occur together in a dataset
21
Clustering
K-means Algorithm
22
K-means Clustering
Motivation:
• Pick K random
points as cluster
centeroids.
Here,
K=2
K-means Clustering
Iterative Step 1
• Assign data
points to closest
cluster centroid
K-means Clustering
Iterative Step 2
• Compute the
average position
of all data points
assigned to a
centroid
K-means Clustering
Iterative Step 3
Repeat:
• Calculate average
of data points
• Move centroid to
the new average
position
K-means Clustering
Repeat:
• Until
Convergence
• i.e. Reassignment
of data points
occurs
K-means Clustering
Repeat:
• Until
Convergence
• i.e. Reassignment
of data points
occurs
K-means Clustering
Repeat:
• Until
Convergence
• i.e. Reassignment
of data points
occurs
K-means Clustering
Repeat:
• Until
Convergence
Converged
or
Not Converged?
K-means Clustering
When K-means Algorithm ends?