0% found this document useful (0 votes)

85 views25 pages

Techniques of Cluster Analysis: A Seminar On

The document summarizes techniques of cluster analysis. It discusses what cluster analysis is, how it is used in marketing research to group similar customers and products for market segmentation. It then covers how to measure distance between clusters, different cluster analysis techniques like hierarchical and non-hierarchical clustering, specific algorithms like K-means and K-medoids clustering, and concludes with references.

Uploaded by

VAIBHAV NANAWARE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views25 pages

Techniques of Cluster Analysis: A Seminar On

Uploaded by

VAIBHAV NANAWARE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

A seminar on

Techniques of Cluster
Analysis
Group members

Munner mohammad 47
Vaibhav Nanaware 52
Nishant Nirmal 55
Tejas pawar 60
Bhagwat Shinde 72
Talal Saeed ..

Under the Guidance of

Dr.PRIYADARSHAN DHABE
DEPTARTMENT. OF IT & MCA,VIT PUNE
Content
1. What is Clustering?

2. Cluster Analysis in Marketing Research.

3. Use of Cluster Analysis In Marketing.

4. How to measure the distance?

5. Working of Clustering?

6. Different types of Cluster Analysis Technique.

7. Clustering Algorithms

8. Conclusion

9. References.

2
What is Clustering?

 Clustering analysis is a group of multivariate techniques

whose primary purpose is to group objects

 Cluster Variate
- represents a mathematical representation of the
selected set of variables which compares the
object’s similarities.

SOURCE :- Cluster Analysis

3
Cluster Analysis in Marketing Research

 Grouping similar customers and products is a fundamental

marketing concept. It is used ,for example, in market
segmentation.

 As companies connect with all there customers, they have

to divide the market into groups of consumers, customers or
clients with similar needs and wants.

Marketing Segmentation 4
Use of cluster analysis in marketing
 Data Reduction

 Potential opportunities for products

 Understanding of consumer behavior in market

 Hypothesis generation

5
Source: use of clustering
How does a cluster analysis work?
 The primary objective of cluster analysis is to
define the structure of the data by placing the
most similar observations into groups.

 To accomplish this task we must address three

basic questions:

 How do we measure similarity?

 How do we form clusters?

 How many clusters do we form?

6
Deriving Clustring

 There are number of methods available to carry

out clustering.
 They can classified as below

• Hierarchical Clustering Analysis

• Non-Hierarchical Clustering Analysis

Source:clustring
7
Hierarchical Clustering Analysis

 A Hierarchical clustering method works via grouping

data into a tree of clusters. Hierarchical clustering
begins by treating every data points as a separate
cluster.
 Then, it repeatedly executes the subsequent steps:
.

1) Identify the 2 clusters which can be closest

together
2) Merge the 2 maximum comparable clusters.

 We need to continue these steps until all the

clusters are merged together.

8
Hierarchical Clustering Analysis -continued

 In Hierarchical Clustering, the aim is to produce a hierarchical

series of nested clusters.
 The basic method to generate hierarchical clustering are
1. Agglomerative Clustering:
• Also known as bottom-up approach or hierarchical agglomerative
clustering (HAC).
• A structure that is more informative than the unstructured set of
clusters returned by flat clustering.
• This clustering algorithm does not require us to prespecify the
number of clusters.

2. Divisive Clustering:
• Also known as top-down approach.
• algorithm also does not require to prespecify the number of
clusters.
• Top-down clustering requires a method for splitting a cluster that
contains the whole data and proceeds by splitting clusters
recursively until individual data have been splitted into singleton
cluster. 9
Agglomerative Algorithm

 Algorithm for Agglomerative Hierarchical Clustering is:

• Calculate the similarity of one cluster with all the other clusters
(calculate proximity matrix)
• Consider every data point as a individual cluster
• Merge the clusters which are highly similar or close to each other.
• Recalculate the proximity matrix for each cluster
• Repeat Step 3 and 4 until only a single cluster remains.

10
Source: Agglomerative image
Divisive Algorithm
 We can say that the Divisive Hierarchical clustering is precisely
the opposite of the Agglomerative Hierarchical clustering.
 In Divisive Hierarchical clustering, we take into account all of the
data points as a single cluster and in every iteration, we separate
the data points from the clusters which aren’t comparable.
 In the end, we are left with N clusters.

11
Source:divisive image
non hierarchical clustering
 Non-hierarchical clustering methods are
divided in four subclasses:

a) K-means
b) density-based

Refer.1]

12
K-means
 K-Means algorithm consists of four basic steps: -
1) Determination of centers.
2) Assigning points to clusters which are outside of
the centers according to distance between centers
and points.
3) Calculation of new centers.
4) Repeating these steps until obtaining decided
clusters.

13
K-Means -conti’d
 Input:

1) k // Desired number of clusters

2) D = {x1 , x2 , x3 , x4 . …………. ,xN} // Set of elements

 Output:

K = {C1 , C2 ,……Ck } //Set of k clusters which minimizes the squared –

error function

 K-Means Algorithms

 Assign initial values for means point {u1 , u2 , u3 ,., uk}

 Repeat:
• Assign each item xi to the clusters which has closest mean
• Calculate new mean for each cluster:

14
K-Medoids Clustering

 A medoid can be defined as the point in the cluster, whose

dissimilarities with all the other points in the cluster is
minimum.
 The dissimilarity of the medoid(Ci) and object(Pi) is
calculated by using E = |Pi - Ci|

 The cost in K-Medoids algorithm is given as

15
K-Medoids Clustering - conti’d

Algorithm
1. Initialize: select k random points out of the n data
points as the medoids.

2. Associate each data point to the closest medoid by

using any common distance metric methods.

3. While the cost decreases:

For each medoid m, for each data o point
which is not a medoid:
1. Swap m and o, associate each data
point to the closest medoid, recompute the cost.
2. If the total cost is more than that in the
previous step, undo the swap.
16
Example for K-medoids Clustering

 Lets Consider below data set for understanding how

K-medoids clustering works

Source :
graphical-representation

Source: data-set table

17
Continued
 Step 1:
Let the randomly selected 2 medoids, so select k = 2 and let
C1 -(4, 5) and C2 -(8, 5) are the two medoids.
 Step 2: Calculating cost.
 The points 1, 2, 5 go to cluster C1 and 0, 3, 6, 7, 8 go to
cluster C2.
 The Cost = (3 + 4 + 4) + (3 + 1 + 1 + 2 + 2) = 20

18
Source: Dissimilarity
Continued
 Step 3:
 Each point is assigned to that cluster whose dissimilarity is
less. So, the points 1, 2, 5 go to cluster C1 and 0, 3, 6, 7, 8 go
to cluster C2.
 The New cost = (3 + 4 + 4) + (2 + 2 + 1 + 3 + 3) = 22
Swap Cost = New Cost – Previous Cost = 22 – 20 and 2 >0

Source: Dissimilarity 19
 As the swap cost is not less than zero, we undo the swap.
Hence (3, 4) and (7, 4) are the final medoids.
 The clustering would be in the following way

20
Density Based Clustering
Algorithmic steps for DBSCAN clustering
Let X = {x1, x2, x3, ..., xn} be the set of data points. DBSCAN requires
two parameters: ε (eps) and the minimum number of points required to
form a cluster (minPts).

 Step 1.
Start with an arbitrary starting point that has
not been visited.

 Step 2.
Extract the neighborhood of this point using ε

 Step 3.
If there are sufficient neighborhood around this point
then clustering process starts and point is marked as
visited else this point is labeled as noise.
21
Continued
 Step 4.

If a point is found to be a part of the cluster then its ε

neighborhood is also the part of the cluster and the above
procedure from step 2 is repeated for all ε neighborhood
points. This is repeated until all points in the cluster is
determined.

 Step 5.

A new unvisited point is retrieved and processed, leading to

the discovery of a further cluster or noise.

 Step 6.

This process continues until all points are marked as visited.

22
Conclusion
 Clustering is one of the important methods for knowledge
discovery and data mining applications

 Cluster analysis aims at the detection of natural

partitioning of objects.

 Clustering helps in understanding the natural

grouping in a dataset. Their purpose is to make sense to
partition the data into some group of logical groupings.
Clustering quality depends on the methods and to the
identification of hidden patterns.

23
References
[1] (Gulagiz F.K and Sahin S. (2017) Comparison of Hierarchical and Non Hierarchical Clustering
Algorithms, International Journal of Computer Engineering and Information Technology
January 2017, 6-14 (available online))

[2] Alpaydın, E., Zeki Veri Madenciliği: Ham Veriden Altın Bilgiye Ulaşma Yöntemler, Bilişim 2000,
Veri madenciliği Eğitim Semineri, 2000.

[3] Likas, A., Vlassisb, N., Verbeekb, J. J., The Global K-Means Clustering Algorithm, Pattern
Recognition, 2003, 36(2), pp 451-461.

[4] R. Capaldo and F. Collova, Clustering: A survey, http://www.slideshare.net/rcapaldo/cluster-

analysis-presentation, (2008).

[5] Density-based clustering algorithms – DBSCAN and SNN by Adriano Moreira, Maribel Y. Santos
and Sofia Carneiro.

[6] Kaufman, L., Rousseeuw, P. J., Clustering by Means of Medoids, Statistical Data Analysis
Based on The L1– Norm and Related Methods, Springer, 1987.

24
THANK YOU

Equations and Patterns
No ratings yet
Equations and Patterns
230 pages
Premium Tatkal
No ratings yet
Premium Tatkal
2 pages
J&T Express Leveraging Information Systems For Competitive Advantage
No ratings yet
J&T Express Leveraging Information Systems For Competitive Advantage
14 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Concepts and Techniques: - Chapter 11
No ratings yet
Concepts and Techniques: - Chapter 11
103 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Grouping
No ratings yet
Grouping
98 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Techniques of Cluster Analysis: A Seminar On
No ratings yet
Techniques of Cluster Analysis: A Seminar On
25 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
Data Mining: I Gede Mahendra Darmawiguna
No ratings yet
Data Mining: I Gede Mahendra Darmawiguna
25 pages
Clustering
No ratings yet
Clustering
80 pages
Clustering
No ratings yet
Clustering
104 pages
A Parallel Study On Clustering Algorithms in Data Mining
No ratings yet
A Parallel Study On Clustering Algorithms in Data Mining
7 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Grade 11 ICT - Learning Actvity 001
No ratings yet
Grade 11 ICT - Learning Actvity 001
7 pages
Week 9 Part 1 Clustering
No ratings yet
Week 9 Part 1 Clustering
44 pages
Clustering
No ratings yet
Clustering
32 pages
BIS 541 Ch04 20-21 S
No ratings yet
BIS 541 Ch04 20-21 S
82 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
35C+ & 45C+ Gas Fryer Parts Manual: Pitco Frialator Inc
No ratings yet
35C+ & 45C+ Gas Fryer Parts Manual: Pitco Frialator Inc
16 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
Unit V - Clustering
No ratings yet
Unit V - Clustering
19 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Lect 10 DM
No ratings yet
Lect 10 DM
36 pages
Clustering Agglo Devisive DBSCAN
No ratings yet
Clustering Agglo Devisive DBSCAN
78 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
ACW Writing Roundabout Ebook
No ratings yet
ACW Writing Roundabout Ebook
23 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
2022 JamesCook Katalog EN Homepage
No ratings yet
2022 JamesCook Katalog EN Homepage
36 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Clustering
No ratings yet
Clustering
25 pages
Module 4 It New Era Prelim
No ratings yet
Module 4 It New Era Prelim
7 pages
English 5 Blended Words
No ratings yet
English 5 Blended Words
26 pages
Unit 4
No ratings yet
Unit 4
29 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
1st MIL
No ratings yet
1st MIL
4 pages
Belden Copper Catalog 12.13
No ratings yet
Belden Copper Catalog 12.13
84 pages
Unit5 Clustering
No ratings yet
Unit5 Clustering
74 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
M5
No ratings yet
M5
40 pages
Cluster Analysis
No ratings yet
Cluster Analysis
76 pages
M5
No ratings yet
M5
40 pages
07 Clustering
No ratings yet
07 Clustering
34 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
Amazon Products Data Entry Task Clarification - 17 Jan 2022
No ratings yet
Amazon Products Data Entry Task Clarification - 17 Jan 2022
3 pages
Frame Scaffolding Catalog
No ratings yet
Frame Scaffolding Catalog
38 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
Workshop Equipment List (Status)
100% (2)
Workshop Equipment List (Status)
4 pages
Sherwin's Resume and Application Letter
No ratings yet
Sherwin's Resume and Application Letter
8 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Unit 5
No ratings yet
Unit 5
85 pages
Survey On Multilevel Security Using Honeypot
No ratings yet
Survey On Multilevel Security Using Honeypot
5 pages
ML Unsupervised
No ratings yet
ML Unsupervised
35 pages
IT 106 - Intro To Data Sciences
No ratings yet
IT 106 - Intro To Data Sciences
32 pages
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
No ratings yet
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
38 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Course File OS Session 2022-23
No ratings yet
Course File OS Session 2022-23
34 pages
1a Slide Introduction
No ratings yet
1a Slide Introduction
8 pages
BeneFusion VP1 Vet Operators Manual - ENG - V9.0
No ratings yet
BeneFusion VP1 Vet Operators Manual - ENG - V9.0
90 pages
Lec. 15-Final. ClusAdvanced
No ratings yet
Lec. 15-Final. ClusAdvanced
103 pages
Cold Storage Design Thesis
100% (2)
Cold Storage Design Thesis
6 pages
Iphone Thesis Statement
100% (2)
Iphone Thesis Statement
6 pages
Matlab Odd Workbook - 2022-2023
No ratings yet
Matlab Odd Workbook - 2022-2023
60 pages
GeM Bidding 5879144
No ratings yet
GeM Bidding 5879144
5 pages
10 Clus Basic
No ratings yet
10 Clus Basic
95 pages
10ClusBasic Editted v1
No ratings yet
10ClusBasic Editted v1
41 pages
Complete Clustering
No ratings yet
Complete Clustering
80 pages
Session 7 Clustering
No ratings yet
Session 7 Clustering
93 pages
بنك الاسئله لنظم التشغيل
No ratings yet
بنك الاسئله لنظم التشغيل
46 pages
Session11-Parts 21-22
No ratings yet
Session11-Parts 21-22
171 pages
Unit IV
No ratings yet
Unit IV
96 pages
Stacks
No ratings yet
Stacks
29 pages
Puyat Na Kami - Ni Jong Final
No ratings yet
Puyat Na Kami - Ni Jong Final
57 pages
Clustering Notes
No ratings yet
Clustering Notes
17 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
DWDM Unit V Note
No ratings yet
DWDM Unit V Note
19 pages
Clustering Methods
No ratings yet
Clustering Methods
14 pages
Case Study - How Aggressively Should A Bank Pursue - 240820 - 080128
No ratings yet
Case Study - How Aggressively Should A Bank Pursue - 240820 - 080128
11 pages
Unit VII
No ratings yet
Unit VII
30 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Unit 4
No ratings yet
Unit 4
16 pages
Logg 20250509
No ratings yet
Logg 20250509
21 pages
Employees' State Insurance Corporation E-Pehchan Card: Personal Details
No ratings yet
Employees' State Insurance Corporation E-Pehchan Card: Personal Details
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Techniques of Cluster Analysis: A Seminar On

Uploaded by

Techniques of Cluster Analysis: A Seminar On

Uploaded by

A seminar on

Under the Guidance of

2. Cluster Analysis in Marketing Research.

3. Use of Cluster Analysis In Marketing.

4. How to measure the distance?

6. Different types of Cluster Analysis Technique.

 Clustering analysis is a group of multivariate techniques

SOURCE :- Cluster Analysis

 Grouping similar customers and products is a fundamental

 As companies connect with all there customers, they have

 Potential opportunities for products

 Understanding of consumer behavior in market

 To accomplish this task we must address three

 How do we measure similarity?

 How do we form clusters?

 How many clusters do we form?

 There are number of methods available to carry

• Hierarchical Clustering Analysis

• Non-Hierarchical Clustering Analysis

 A Hierarchical clustering method works via grouping

1) Identify the 2 clusters which can be closest

 We need to continue these steps until all the

 In Hierarchical Clustering, the aim is to produce a hierarchical

 Algorithm for Agglomerative Hierarchical Clustering is:

1) k // Desired number of clusters

K = {C1 , C2 ,……Ck } //Set of k clusters which minimizes the squared –

 Assign initial values for means point {u1 , u2 , u3 ,., uk}

 A medoid can be defined as the point in the cluster, whose

 The cost in K-Medoids algorithm is given as

2. Associate each data point to the closest medoid by

3. While the cost decreases:

 Lets Consider below data set for understanding how

Source: data-set table

If a point is found to be a part of the cluster then its ε

A new unvisited point is retrieved and processed, leading to

This process continues until all points are marked as visited.

 Cluster analysis aims at the detection of natural

 Clustering helps in understanding the natural

[4] R. Capaldo and F. Collova, Clustering: A survey, http://www.slideshare.net/rcapaldo/cluster-

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.