0% found this document useful (0 votes)

23 views21 pages

CT075!3!2 DTM Topic 10 Cluster Analysis

This document discusses cluster analysis and the k-means clustering algorithm. It defines cluster analysis as grouping a set of data objects into clusters based on similarity. The k-means algorithm partitions observations into k clusters by minimizing distances between observations and assigned cluster centers, iteratively updating cluster centers until convergence. An example applies k-means to movie rating data to generate two clusters.

Uploaded by

kishanselvarajah80

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views21 pages

CT075!3!2 DTM Topic 10 Cluster Analysis

Uploaded by

kishanselvarajah80

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 21

Data Management

CT075-3-2

Cluster Analysis
Learning Outcomes

By the end of this lecture, YOU should be able

to:
•Understand the clustering concept

•Apply some algothims used for clustering

•Explain the partitioning algorithm K-means

Key Terms you must be able to use

• If you have mastered this topic, you should be

able to use the following terms correctly in your
assignments and exams:
– Clustering
– Cluster analysis
– K-means

Slide 4 (of 25)

What is Cluster Analysis?
• Cluster: a collection of data objects
– Similar to one another within the same cluster
– Dissimilar to the objects in other clusters
• Cluster analysis
– Grouping a set of data objects into clusters
• Clustering is unsupervised classification: no
predefined classes
• Typical applications
– As a stand-alone tool to get insight into data
distribution
– As a preprocessing step for other algorithms
General Applications of Clustering

• Pattern Recognition
• Spatial Data Analysis
– create thematic maps in GIS by clustering feature
spaces
– detect spatial clusters and explain them in spatial data
mining
• Image Processing
• Economic Science (especially market research)
• WWW
– Document classification
– Cluster Weblog data to discover groups of similar
access patterns
Examples of Clustering Applications
• Marketing: Help marketers discover distinct groups in their
customer bases, and then use this knowledge to develop
targeted marketing programs
• Land use: Identification of areas of similar land use in an
earth observation database
• Insurance: Identifying groups of motor insurance policy
holders with a high average claim cost
• City-planning: Identifying groups of houses according to
their house type, value, and geographical location
What Is Good Clustering?

• A good clustering method will produce high quality

clusters with
– low intra-class similarity (between 2 classes)
– high inter-class similarity (within a class)
• The quality of a clustering result depends on both the
similarity measure used by the method and its
implementation.
Typical Requirements of Clustering in
Data Mining
• Scalability : work good on small sets only
• Ability to deal with different types of attributes
• Minimal requirements for domain knowledge to
determine input parameters
• Able to deal with noise and outliers
• High dimensionality
• Interpretability and usability
Partitioning Algorithms: Basic Concept

• Partitioning method: Construct a partition of a database D

of n objects into a set of k clusters
• Given a k, find a partition of k clusters that optimizes the
chosen partitioning condition.

– k-means : Each cluster is represented by the center of

the cluster.
The K-Means Clustering Method
k-means algorithm is implemented in 5 steps:
• Step 1: Ask the user how many clusters k the data set should be
partitioned into.
• Step 2: Randomly assign k records to be the initial cluster center
locations.
• Step 3: For each record, find the nearest cluster center. Thus, in a
sense, each cluster center “owns” a subset of the records, thereby
representing a partition of the data set. We therefore have k clusters,
C1,C2, . . . ,Ck .
• Step 4: For each of the k clusters, find the cluster centroid, and
update the location of each cluster center to the new value of the
centroid.
• Step 5: Repeat steps 3 to 5 until convergence or termination.
The K-Means Clustering Method
• Example
10 10

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

10 10

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Equations required

Euclidean : to calculate the nearest value to the center of

cluster.

Data Mining: Concepts and Techniques

Example
k-means algorithm: consider the following data set consisting of the
ratings of two variables on each of seven movies.

Movie A B
M1 1.0 1.0
M2 1.5 2.0
M3 3.0 4.0
M4 5.0 7.0
M5 3.5 5.0
M6 4.5 5.0
M7 3.5 4.5
Example

Steps 1 and 2: Lets choose two seeds in

random
Movie A B

M1 1.0 1.0

M4 5.0 7.0
Example

Steps 3 & 4: Compute the distances using

the two attributes and using the sum of
absolute difference for simplicity (K-means
method)
Example

DISTANCE FROM CLUSTERS

C1 1 1
ALLOCATION TO
C2 5 7 C1 C2 NEAREST CLUSTER

M1 1 1 0 10 C1

M2 1.5 2 1.5 8.5 C1

M3 3 4 5 5 C1, C2

M4 5 7 10 0 C2

M5 3.5 5 6.5 3.5 C2

M6 4.5 5 7.5 2.5 C2

M7 3.5 4.5 6 4 C2
Example

STEP 5

A B

C1 1.83 2.33

C2 3.9 5.1

SEED1 1 1

SEED2 5 7
Example
DISTANCE FROM
CLUSTERS

C1 1.83 2.33 FROM

ALLOCATION
TO
THE NEAREST
C2 3.9 5.1 C1 C2 CLUSTER

M1 1 1 2.16 7 C1
M2 1.5 2 0.66 5.5 C1
M3 3 4 2.84 2 C1
M4 5 7 7.84 3 C2
M5 3.5 5 4.34 0.5 C2
M6 4.5 5 5.34 0.5 C2
M7 3.5 4.5 3.84 1 C2

Cluster 1 -> M1, M2, M3

Cluster 2 -> M4, M5, M6, M7
Summary

• Clustering algorithm and its applications

• The k-means algorithm.
References

• Larose T. (2005), Discovering Knowledge in Data, Wiley.

07 Clustering
No ratings yet
07 Clustering
34 pages
Clustering Analysis
No ratings yet
Clustering Analysis
17 pages
K Means Clustering
No ratings yet
K Means Clustering
29 pages
Clustering
No ratings yet
Clustering
125 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Lecture 6
No ratings yet
Lecture 6
14 pages
Module 4-1
No ratings yet
Module 4-1
153 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
Unit 4
No ratings yet
Unit 4
40 pages
Clustering
No ratings yet
Clustering
84 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
Prasanna Hebbar @govt First Grade College Honnavar
No ratings yet
Prasanna Hebbar @govt First Grade College Honnavar
11 pages
Week 10 Lecture - Introduction To Clustering
No ratings yet
Week 10 Lecture - Introduction To Clustering
35 pages
DSV - Unit 3 - Data Analysis in Depth
No ratings yet
DSV - Unit 3 - Data Analysis in Depth
53 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Clustering
No ratings yet
Clustering
104 pages
K-Means ML
No ratings yet
K-Means ML
23 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
L7 Clustering
No ratings yet
L7 Clustering
58 pages
Fds Unit03
No ratings yet
Fds Unit03
11 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
Unit 4
No ratings yet
Unit 4
125 pages
CS8091 - Big Data Analytics - Unit 2
No ratings yet
CS8091 - Big Data Analytics - Unit 2
44 pages
Untitled Document
No ratings yet
Untitled Document
32 pages
Unit 4
No ratings yet
Unit 4
74 pages
Clustering and K-Means Algorithm
No ratings yet
Clustering and K-Means Algorithm
81 pages
DM Unit Iv
No ratings yet
DM Unit Iv
45 pages
A Dynamic K-Means Clustering For Data Mining
No ratings yet
A Dynamic K-Means Clustering For Data Mining
6 pages
A Dynamic K-Means Clustering For Data Mining-Dikonversi
No ratings yet
A Dynamic K-Means Clustering For Data Mining-Dikonversi
6 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
K Mean Clustering
No ratings yet
K Mean Clustering
59 pages
Unit 4
No ratings yet
Unit 4
4 pages
Cluster Analysis: G Sreenivas
No ratings yet
Cluster Analysis: G Sreenivas
29 pages
Data Mining: I Gede Mahendra Darmawiguna
No ratings yet
Data Mining: I Gede Mahendra Darmawiguna
25 pages
Clustering
No ratings yet
Clustering
9 pages
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
No ratings yet
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
43 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
9 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
K - Means Clustering Algorithm Applications in Data Mining and Pattern Recognition
No ratings yet
K - Means Clustering Algorithm Applications in Data Mining and Pattern Recognition
8 pages
Unit 5
No ratings yet
Unit 5
85 pages
ML Module 4 Unsupervised Learning - Updated
No ratings yet
ML Module 4 Unsupervised Learning - Updated
55 pages
Clustering
No ratings yet
Clustering
29 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
K Mean
No ratings yet
K Mean
7 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
Clustering in Data Mining
No ratings yet
Clustering in Data Mining
14 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
20 pages
Module 5
No ratings yet
Module 5
98 pages
Unit 4
No ratings yet
Unit 4
29 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
A Parallel Study On Clustering Algorithms in Data Mining
No ratings yet
A Parallel Study On Clustering Algorithms in Data Mining
7 pages
Sudoku New: Workouts to sharpen your mind
From Everand
Sudoku New: Workouts to sharpen your mind
Sahil Gupta
No ratings yet
Vibhor Kumar CV
No ratings yet
Vibhor Kumar CV
2 pages
1976 - Barlow Optimal Stress Locations in Finite Element Models
No ratings yet
1976 - Barlow Optimal Stress Locations in Finite Element Models
9 pages
En 10210pdf
No ratings yet
En 10210pdf
34 pages
Adv Math 02
No ratings yet
Adv Math 02
4 pages
Eigenstructure Assignment For Design of Multimode Flight Control Systems
No ratings yet
Eigenstructure Assignment For Design of Multimode Flight Control Systems
7 pages
Fiziceski Meqale
No ratings yet
Fiziceski Meqale
6 pages
2.2 Freefall
No ratings yet
2.2 Freefall
2 pages
Linear Algebra Slide Beammer 2022 Oct 16
No ratings yet
Linear Algebra Slide Beammer 2022 Oct 16
178 pages
Vision Based Systems For UAV Applications: Aleksander Nawrat Zygmunt Kus
100% (1)
Vision Based Systems For UAV Applications: Aleksander Nawrat Zygmunt Kus
348 pages
KND 100M
No ratings yet
KND 100M
297 pages
Mathematics For Economics: Euncheol Shin
No ratings yet
Mathematics For Economics: Euncheol Shin
14 pages
Digital Communications: Fundamentals and Applications: by Bernard Sklar
No ratings yet
Digital Communications: Fundamentals and Applications: by Bernard Sklar
310 pages
Managerial Decision Making
100% (3)
Managerial Decision Making
27 pages
M2W1 Waystage - Past Simple
No ratings yet
M2W1 Waystage - Past Simple
3 pages
V53PR0906133 Exp
No ratings yet
V53PR0906133 Exp
155 pages
Summation
No ratings yet
Summation
9 pages
Statistical Dispersion
No ratings yet
Statistical Dispersion
2 pages
Design of Sliding Mode Control For BUCK Converter
No ratings yet
Design of Sliding Mode Control For BUCK Converter
8 pages
Permutations and Combination
No ratings yet
Permutations and Combination
26 pages
Cambridge IGCSE ™: Physics 0625/53 October/November 2022
No ratings yet
Cambridge IGCSE ™: Physics 0625/53 October/November 2022
8 pages
PG 589
No ratings yet
PG 589
1 page
Assignment-1 QT
No ratings yet
Assignment-1 QT
3 pages
Mirza - Ali Resume - I
No ratings yet
Mirza - Ali Resume - I
1 page
Fractions, Decimals, and Percentage in Real-Life (With Answer Key)
No ratings yet
Fractions, Decimals, and Percentage in Real-Life (With Answer Key)
38 pages
Andaman and Nicobar Islands
No ratings yet
Andaman and Nicobar Islands
7 pages
Chapter II Risk Management
No ratings yet
Chapter II Risk Management
36 pages
Dual Space
No ratings yet
Dual Space
17 pages
Functions in Python MCQ
No ratings yet
Functions in Python MCQ
4 pages
Digital Communications Over Fading Channels M.K. Simon and M.S. Alouini 2005 Book Review
No ratings yet
Digital Communications Over Fading Channels M.K. Simon and M.S. Alouini 2005 Book Review
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

CT075!3!2 DTM Topic 10 Cluster Analysis

Uploaded by

CT075!3!2 DTM Topic 10 Cluster Analysis

Uploaded by

Data Management

By the end of this lecture, YOU should be able

•Apply some algothims used for clustering

•Explain the partitioning algorithm K-means

• If you have mastered this topic, you should be

Slide 4 (of 25)

• A good clustering method will produce high quality

• Partitioning method: Construct a partition of a database D

– k-means : Each cluster is represented by the center of

Euclidean : to calculate the nearest value to the center of

Data Mining: Concepts and Techniques

Steps 1 and 2: Lets choose two seeds in

Steps 3 & 4: Compute the distances using

DISTANCE FROM CLUSTERS

M2 1.5 2 1.5 8.5 C1

M5 3.5 5 6.5 3.5 C2

M6 4.5 5 7.5 2.5 C2

C1 1.83 2.33 FROM

Cluster 1 -> M1, M2, M3

• Clustering algorithm and its applications

• Larose T. (2005), Discovering Knowledge in Data, Wiley.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.