Techniques of Cluster Analysis: A Seminar On
Techniques of Cluster Analysis: A Seminar On
Techniques of Cluster
Analysis
Group members
Munner mohammad 47
Vaibhav Nanaware 52
Nishant Nirmal 55
Tejas pawar 60
Bhagwat Shinde 72
Talal Saeed ..
5. Working of Clustering?
7. Clustering Algorithms
8. Conclusion
9. References.
2
What is Clustering?
Cluster Variate
- represents a mathematical representation of the
selected set of variables which compares the
object’s similarities.
3
Cluster Analysis in Marketing Research
Marketing Segmentation 4
Use of cluster analysis in marketing
Data Reduction
Hypothesis generation
5
Source: use of clustering
How does a cluster analysis work?
The primary objective of cluster analysis is to
define the structure of the data by placing the
most similar observations into groups.
6
Deriving Clustring
Source:clustring
7
Hierarchical Clustering Analysis
8
Hierarchical Clustering Analysis -continued
2. Divisive Clustering:
• Also known as top-down approach.
• algorithm also does not require to prespecify the number of
clusters.
• Top-down clustering requires a method for splitting a cluster that
contains the whole data and proceeds by splitting clusters
recursively until individual data have been splitted into singleton
cluster. 9
Agglomerative Algorithm
10
Source: Agglomerative image
Divisive Algorithm
We can say that the Divisive Hierarchical clustering is precisely
the opposite of the Agglomerative Hierarchical clustering.
In Divisive Hierarchical clustering, we take into account all of the
data points as a single cluster and in every iteration, we separate
the data points from the clusters which aren’t comparable.
In the end, we are left with N clusters.
11
Source:divisive image
non hierarchical clustering
Non-hierarchical clustering methods are
divided in four subclasses:
a) K-means
b) density-based
Refer.1]
12
K-means
K-Means algorithm consists of four basic steps: -
1) Determination of centers.
2) Assigning points to clusters which are outside of
the centers according to distance between centers
and points.
3) Calculation of new centers.
4) Repeating these steps until obtaining decided
clusters.
13
K-Means -conti’d
Input:
Output:
K-Means Algorithms
14
K-Medoids Clustering
15
K-Medoids Clustering - conti’d
Algorithm
1. Initialize: select k random points out of the n data
points as the medoids.
Source :
graphical-representation
17
Continued
Step 1:
Let the randomly selected 2 medoids, so select k = 2 and let
C1 -(4, 5) and C2 -(8, 5) are the two medoids.
Step 2: Calculating cost.
The points 1, 2, 5 go to cluster C1 and 0, 3, 6, 7, 8 go to
cluster C2.
The Cost = (3 + 4 + 4) + (3 + 1 + 1 + 2 + 2) = 20
18
Source: Dissimilarity
Continued
Step 3:
Each point is assigned to that cluster whose dissimilarity is
less. So, the points 1, 2, 5 go to cluster C1 and 0, 3, 6, 7, 8 go
to cluster C2.
The New cost = (3 + 4 + 4) + (2 + 2 + 1 + 3 + 3) = 22
Swap Cost = New Cost – Previous Cost = 22 – 20 and 2 >0
Source: Dissimilarity 19
As the swap cost is not less than zero, we undo the swap.
Hence (3, 4) and (7, 4) are the final medoids.
The clustering would be in the following way
20
Density Based Clustering
Algorithmic steps for DBSCAN clustering
Let X = {x1, x2, x3, ..., xn} be the set of data points. DBSCAN requires
two parameters: ε (eps) and the minimum number of points required to
form a cluster (minPts).
Step 1.
Start with an arbitrary starting point that has
not been visited.
Step 2.
Extract the neighborhood of this point using ε
Step 3.
If there are sufficient neighborhood around this point
then clustering process starts and point is marked as
visited else this point is labeled as noise.
21
Continued
Step 4.
Step 5.
Step 6.
22
Conclusion
Clustering is one of the important methods for knowledge
discovery and data mining applications
23
References
[1] (Gulagiz F.K and Sahin S. (2017) Comparison of Hierarchical and Non Hierarchical Clustering
Algorithms, International Journal of Computer Engineering and Information Technology
January 2017, 6-14 (available online))
[2] Alpaydın, E., Zeki Veri Madenciliği: Ham Veriden Altın Bilgiye Ulaşma Yöntemler, Bilişim 2000,
Veri madenciliği Eğitim Semineri, 2000.
[3] Likas, A., Vlassisb, N., Verbeekb, J. J., The Global K-Means Clustering Algorithm, Pattern
Recognition, 2003, 36(2), pp 451-461.
[5] Density-based clustering algorithms – DBSCAN and SNN by Adriano Moreira, Maribel Y. Santos
and Sofia Carneiro.
[6] Kaufman, L., Rousseeuw, P. J., Clustering by Means of Medoids, Statistical Data Analysis
Based on The L1– Norm and Related Methods, Springer, 1987.
24
THANK YOU
25