0% found this document useful (0 votes)

7 views85 pages

UNIT-5 PPT

The document provides an overview of cluster analysis, including its basic concepts, algorithms, and applications across various fields such as marketing, land use, and city planning. It discusses different clustering methods like K-means, agglomerative hierarchical clustering, and DBSCAN, along with their strengths and weaknesses. Additionally, it outlines the requirements for effective clustering in data mining and presents various approaches and techniques for calculating distances between clusters.

Uploaded by

divyareddyguduru456

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views85 pages

UNIT-5 PPT

Uploaded by

divyareddyguduru456

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 85

UNIT-V

Cluster
Analysis
19/02/25
Cluster Analysis:
Basic Concepts and Algorithms
 Overview, What Is Cluster Analysis? Different
Types of Clustering, Different Types of
Clusters;
 K-means: The Basic K-means Algorithm, K-
means Additional Issues, Bisecting K-means,
Strengths and Weaknesses;
 Agglomerative Hierarchical Clustering: Basic
Agglomerative Hierarchical Clustering
Algorithm
 DBSCAN: Traditional Density Center-Based
Approach, DBSCAN Algorithm, Strengths and
Weaknesses.
19/02/25
Cluster Analysis
 Cluster: a collection of data objects
 Similar to one another within the same cluster
 Dissimilar to the objects in other clusters
 Cluster analysis
 Finding similarities between data according to the
characteristics found in the data and grouping
similar data objects into clusters
 Unsupervised learning: no predefined classes
 Typical applications
 As a stand-alone tool to get insight into data
distribution
 As a preprocessing step for other algorithms
19/02/25
Clustering Applications
 Marketing: Help marketers discover distinct groups in their
customer bases, and then use this knowledge to develop
targeted marketing programs
 Land use: Identification of areas of similar land use in an
earth observation database
 Insurance: Identifying groups of motor insurance policy
holders with a high average claim cost
 City-planning: Identifying groups of houses according to their
house type, value, and geographical location
 Earth-quake studies: Observed earth quake epicenters
should be clustered along continent faults
19/02/25
Clustering Applications and
Multidisciplinary Efforts
 Pattern Recognition
 Spatial Data Analysis

Create thematic maps in GIS by clustering feature
spaces

Detect spatial clusters or for other spatial mining
tasks
 Image Processing
 Economic Science (especially market research)
 WWW

Document classification

Cluster Weblog data to discover groups of similar
access patterns
19/02/25
Quality - Good Clustering

 A good clustering method will produce high quality

clusters with

high intra-class similarity

low inter-class similarity
 The quality of a clustering result depends on both
the similarity measure used by the method and its
implementation
 The quality of a clustering method is also measured
by its ability to discover some or all of the hidden
patterns
19/02/25
Different Types of Clustering

 Hierarchical versus Partitional

 Exclusive versus Overlapping versus Fuzzy

 Complete versus Partial

19/02/25
Different Types of Clusters

 Well-separated Cluster

 Prototype-Based cluster

 Graph-Based Cluster

 Density-Based Cluster

 Shared- property or Conceptual Clusters

19/02/25
Different Types of Clusters
 Well-separated clusters : Each point is
closer to all of the points in its cluster than
to any point in another cluster.

Prototype-based clusters : Each point is closer

to the center of its cluster than to the center of
any other cluster.

19/02/25
 Graph-based clusters: Each point is closer
to at least one point in its cluster than to
any point in another cluster.

 Density-based clusters: Clusters are

regions of high density separated by
regions of low density.

19/02/25
 Shared clusters: Points in a cluster share
some general property that derives from
the entire set of points.

19/02/25
Requirements of Clustering in Data
Mining
 Scalability
 Ability to deal with different types of attributes
 Ability to handle dynamic data
 Discovery of clusters with arbitrary shape
 Minimal requirements for domain knowledge to
determine input parameters
 Able to deal with noise and outliers
 Insensitive to order of input records
 High dimensionality
 Incorporation of user-specified constraints
 Interpretability and usability
19/02/25
Major Clustering Approaches

 Partitioning approach:
 Construct various partitions and then evaluate them by some
criterion, e.g., minimizing the sum of square errors
 Typical methods: k-means, k-medoids, CLARANS
 Hierarchical approach:
 Create a hierarchical decomposition of the set of data (or objects)
using some criterion
 Typical methods: Diana, Agnes, BIRCH, ROCK, CAMELEON
 Density-based approach:
 Based on connectivity and density functions
 Typical methods: DBSACN, OPTICS, DenClue

19/02/25
Major Clustering Approaches
 Grid-based approach:

based on a multiple-level granularity structure

Typical methods: STING, WaveCluster, CLIQUE
 Model-based:

A model is hypothesized for each of the clusters and tries to find the
best fit of that model to each other

Typical methods: EM, SOM, COBWEB
 Frequent pattern-based:

Based on the analysis of frequent patterns

Typical methods: pCluster
 User-guided or constraint-based:

Clustering by considering user-specified or application-specific
constraints

Typical methods: COD (obstacles), constrained clustering
19/02/25
Typical Alternatives to Calculate the
Distance between Clusters
 Single link: smallest distance between an element in one
cluster and an element in the other, i.e., dis(K i, Kj) = min(tip, tjq)
 Complete link: largest distance between an element in one
cluster and an element in the other, i.e., dis(K i, Kj) = max(tip, tjq)
 Average: avg distance between an element in one cluster and
an element in the other, i.e., dis(K i, Kj) = avg(tip, tjq)
 Centroid: distance between the centroids of two clusters, i.e.,
dis(Ki, Kj) = dis(Ci, Cj)
 Medoid: distance between the medoids of two clusters, i.e.,
dis(Ki, Kj) = dis(Mi, Mj)
 Medoid: one chosen, centrally located object in the cluster
19/02/25
Centroid, Radius and Diameter of a
Cluster (for numerical data sets)
 Centroid: the “middle” of a cluster iN1(t )
Cm  N ip

 Radius: square root of average distance from any point of

the cluster to its centroid
 N (t  cm ) 2
Rm  i 1 ip
N
 Diameter: square root of average mean squared distance
between all pairs of points in the cluster

 N  N (t  t ) 2
Dm  i 1 i 1 ip iq
N ( N  1)

19/02/25
K-means
Partitioning Algorithms: Basic
Concept
 Partitioning method: Construct a partition of a database D of n
objects into a set of k clusters, s.t., min sum of squared distance

 Given a k, find a partition of k clusters that optimizes the chosen

partitioning criterion

Global optimal: exhaustively enumerate all partitions

Heuristic methods: k-means and k-medoids algorithms

k-means (MacQueen’67): Each cluster is represented by the
center of the cluster

k-medoids or PAM (Partition around medoids) (Kaufman &
Rousseeuw’87): Each cluster is represented by one of the
objects in the cluster
19/02/25
The K-Means Clustering Method
 Given k, the k-means algorithm is implemented in
four steps:
 Partition objects into k nonempty subsets
 Compute seed points as the centroids of the
clusters of the current partition (the centroid is
the center, i.e., mean point, of the cluster)

 Assign each object to the cluster with the

nearest seed point ,Go back to Step 2, stop
when no more new assignment
19/02/25
Basic K-means algorithm

1: Select K points as initial centroids.

2: repeat
3: Form K clusters by assigning each point to
its closest centroid.

4: Recompute the centroid of each cluster.

5: until Centroids do not change

19/02/25
The K-Means Clustering
Method
 Given k, the k-means algorithm is implemented in four
steps:

Cluster Analysis
The K-Means Clustering Method

 Example
10 10
10
9 9
9
8 8
8
7 7
7
6 6
6
5 5
5
4 4
4
Assign 3 Update 3
3

2 each
2 the 2

1
objects
1

0
cluster 1

0
0
0 1 2 3 4 5 6 7 8 9 10 to
0 1 2 3 4 5 6 7 8 9 10 means 0 1 2 3 4 5 6 7 8 9 10

most
similar reassign reassign
center 10 10

K=2 9 9

8 8

Arbitrarily choose 7 7

6 6
K object as initial 5 5

cluster center 4 Update 4

2
the 3

1 cluster 1

0
0 1 2 3 4 5 6 7 8 9 10
means 0
0 1 2 3 4 5 6 7 8 9 10

19/02/25
Table of notation

19/02/25
K-Means Method

 Strength: Relatively efficient: O(tkn), where n is # objects, k is

# clusters, and t is # iterations. Normally, k, t << n.
 Weakness
 Applicable only when mean is defined, then what about
categorical data
 Need to specify k, the number of clusters, in advance
 Unable to handle noisy data and outliers
 Not suitable to discover clusters with non-convex shapes

19/02/25
Example
 Cluster the following eight points (with (x,
y) representing locations) into three
clusters:
 A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7,
5), A6(6, 4), A7(1, 2), A8(4, 9)
 Initial cluster centers are: A1(2, 10), A4(5,
8) and A7(1, 2).

19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
K Means
 Advantages
1. Easy to implement.
2. With a large number of variables, K-Means
may be computationally faster than
hierarchical clustering (if K is small)
 Disadvantages

1. Difficult to predict the number of clusters

(K-Value).
2. Can converge on local minima
3. Sensitive to outliers
19/02/25
K-means: Additional Issues

Handling Empty Clusters:

 One of the problems with the basic K-

means algorithm - empty clusters can be

obtained if no points are allocated to a
cluster during the assignment step
 One approach is to choose the point that is

farthest away from any current centroid.

 Another approach is to choose the
replacement centroid from the cluster that
has the highest SSE

19/02/25
Reducing the SSE with
Postprocessing
 Two strategies that decrease the total SSE
by increasing the number of clusters are the
following:
 Split a cluster: The cluster with the largest

SSE is chose and split

 Introduce a new cluster centroid: the

point that is farthest from any cluster center

is chosen then determine this if we keep
track of the SSE contributed by each point.
Or
 choose randomly from all points or from the

points with the highest SSE.

19/02/25
 Two strategies that decrease the number of clusters,
while trying to minimize the increase in total SSE,
are the following:
 Disperse a cluster: removing the centroid that
corresponds to the cluster and reassigning the
points to other clusters
 Merge two clusters: The clusters with the closest
centroids are chosen, and merge the two clusters
that result in the smallest increase in total SSE
 Updating Centroids Incrementally: the centroids
can be updated incrementally, after each
assignment of a point to a cluster

19/02/25
Bisecting K-means
 The bisecting K-means algorithm is a
straightforward extension of the basic K-
means algorithm .
 To obtain K clusters, split the set of all
points into two clusters, select one of these
clusters to split, and so on.
 until K clusters have been produced.

19/02/25
Bisecting K-means

19/02/25
EXAMPLE

19/02/25
Hierarchical Clustering

 Use distance matrix as clustering criteria. This

method does not require the number of clusters k
as an input, but needs a termination condition
Step 0 Step 1 Step 2 Step 3 Step 4
agglomerative
(AGNES)
a ab
b abcde
c
cde
d
de
e
divisive
Step 4 Step 3 Step 2 Step 1 Step 0 (DIANA)
19/02/25
AGNES (Agglomerative Nesting)
 Introduced in Kaufmann and Rousseeuw (1990)
 Implemented in statistical analysis packages, e.g.,
Splus
 Use the Single-Link method and the dissimilarity
matrix.
 Merge nodes that have the least dissimilarity
 Go on in a non-descending fashion
10 10 10

 9

8
Eventually all nodes belong to the same cluster 9

8
9

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

19/02/25
Dendrogram: Shows How the Clusters are Merged

Decompose data objects into a several levels of nested

partitioning (tree of clusters), called a dendrogram.

A clustering of the data objects is obtained by cutting the

dendrogram at the desired level, then each connected
component forms a cluster.

19/02/25
19/02/25
Defining Proximity between Clusters

cluster proximity that differentiates the various

agglomerative hierarchical techniques
Agglomerative hierarchical clustering techniques,
such as MIN, MAX, and Group Average, come from a
graph-based view of clusters.
MIN defines cluster proximity as the proximity
between the closest two points that are in different
clusters
MAX takes the proximity between the farthest two
points in different clusters to be the cluster proximity
Group average technique, defines cluster proximity
to be the average pairwise proximities of all pairs of
points from different clusters

19/02/25
19/02/25
Agglomerative

19/02/25
Agglomerative

19/02/25
DENDROGRAM

19/02/25
EXample

19/02/25
Plot

19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
19/02/25
Complete Link

19/02/25
19/02/25
19/02/25
DIANA (Divisive Analysis)

 Introduced in Kaufmann and Rousseeuw (1990)

 Implemented in statistical analysis packages, e.g.,
Splus
 Inverse order of AGNES
 Eventually each node forms a cluster on its own
10 10
10

9 9
9
8 8
8

7 7
7
6 6
6

5 5
5
4 4
4

3 3
3
2 2
2

1 1
1
0 0
0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10

19/02/25
Wards Method and Centroid Methods
 Ward's method, the proximity between two
clusters is defined as the increase in the squared
error that results when two clusters are merged
 Ward's method somewhat distinct from other
hierarchical techniques, it can be shown
mathematically that Ward's method is very similar
to the group average method when the proximity
between two points is taken to be the square of
the distance between them
 Centroid methods calculate the proximity between
two clusters by calculating the distance between
the centroids of clusters, similar to K-means.

19/02/25
Strengths and Weaknesses
 Both algorithms can produce better-quality
clusters
 Agglomerative hierarchical clustering algorithms
are expensive in terms of their computational and
storage requirements.
 all merges are final can also cause trouble for
noisy, high-dimensional data, such as document
data.
 these two problems can be addressed to some
degree by first partially clustering the data using
another technique, such as K-means.

19/02/25
Density-Based Clustering Methods
 Clustering based on density (local cluster criterion),
such as density-connected points

Major features:

Discover clusters of arbitrary shape

Handle noise

One scan

Need density parameters as termination condition
 Several interesting studies:
 DBSCAN: Ester, et al. (KDD’96)

 OPTICS: Ankerst, et al (SIGMOD’99).

 DENCLUE: Hinneburg & D. Keim (KDD’98)

 CLIQUE: Agrawal, et al. (SIGMOD’98) (more grid-

based)
19/02/25
Density-Based Clustering: Basic
Concepts
 Two parameters:
 Eps: Maximum radius of the neighbourhood
 MinPts: Minimum number of points in an Eps-
neighbourhood of that point
 NEps(p): {q belongs to D | dist(p,q) <= Eps}
 Directly density-reachable: A point p is directly
density-reachable from a point q w.r.t. Eps, MinPts
if

p belongs to NEps(q) p MinPts = 5
q
 core point condition: Eps = 1 cm

|NEps (q)| >= MinPts

19/02/25
Classification of Points According to
Center-Based Density

 Core points: These points are in the interior of a

density-based cluster. A point is a core point if the
number of points within a given neighbor hood
around the point as determined by the distance
function and a user specified distance parameter -
Eps, exceeds a certain threshold, MinPts, which is
also a user-specified parameter.
 Border points: A border point is not a core point,
but falls within the neighborhood of a core point. A
border point can fall within the neighborhoods of
several core points.
 Noise points: A noise point is any point that is
neither a core point nor a border point.
19/02/25
Density-Reachable and Density-Connected
 Density-reachable:
 A point p is density-reachable p
from a point q w.r.t. Eps, MinPts if
there is a chain of points p1, …, p1
q
pn, p1 = q, pn = p such that pi+1 is
directly density-reachable from pi

 Density-connected
p q
 A point p is density-connected to
a point q w.r.t. Eps, MinPts if o
there is a point o such that both,
p and q are density-reachable
from o w.r.t. Eps and MinPts
19/02/25
DBSCAN: Density Based Spatial
Clustering of Applications with Noise
 Relies on a density-based notion of cluster: A
cluster is defined as a maximal set of density-
connected points
 Discovers clusters of arbitrary shape in spatial
databases with noise
Outlier

Border
Eps = 1cm
Core MinPts = 5

19/02/25
Example

19/02/25
DBSCAN: The Algorithm

 Arbitrary select a point p

 Retrieve all points density-reachable from p w.r.t.
Eps and MinPts.
 If p is a core point, a cluster is formed.
 If p is a border point, no points are density-
reachable from p and DBSCAN visits the next
point of the database.
 Continue the process until all of the points have
been processed.
19/02/25
19/02/25
Strengths and Weaknesses
 DBSCAN uses a density-based definition of a
cluster, it is relatively resistant to noise and can
handle clusters of arbitrary shapes and sizes
 DBSCAN can find many clusters that could not be
found using K-means
 DBSCAN has trouble when the clusters have widely
varying densities. It also has trouble with high-
dimensional data because density is more difficult
to define for such data.
 DBSCAN can be expensive when the computation
of nearest neighbors requires computing all
pairwise proximities, as is usually the case for
high-dimensional data.
19/02/25
Outlier
 What are outliers?
 The set of objects are considerably dissimilar

from the remainder of the data

 Example: Sports: Michael Jordon, Wayne

Gretzky, ...
 Problem: Define and find outliers in large data sets
 Applications:
 Credit card fraud detection

 Telecom fraud detection

 Customer segmentation

 Medical analysis

19/02/25

CIECO PPC1000R Installation Manual V3.3
No ratings yet
CIECO PPC1000R Installation Manual V3.3
61 pages
Exercices - Basic Access Control On Linux
No ratings yet
Exercices - Basic Access Control On Linux
8 pages
B-63993EN - 03 - 3xiAB - Profibus - Connection
100% (2)
B-63993EN - 03 - 3xiAB - Profibus - Connection
82 pages
CIS 112 Computer Technology: - Lectuerer: Karabo Macheng - Email
No ratings yet
CIS 112 Computer Technology: - Lectuerer: Karabo Macheng - Email
74 pages
ESE - Past Year Papers
No ratings yet
ESE - Past Year Papers
13 pages
Questions and Answers On Basic Structures of Computers
57% (7)
Questions and Answers On Basic Structures of Computers
23 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Clustering
No ratings yet
Clustering
25 pages
Lecture 6
No ratings yet
Lecture 6
14 pages
10ClusBasic
No ratings yet
10ClusBasic
95 pages
Unit - 5 Cluster Analysis
No ratings yet
Unit - 5 Cluster Analysis
83 pages
Clustering
No ratings yet
Clustering
104 pages
unit iv[1]
No ratings yet
unit iv[1]
96 pages
BIS 541 Ch04 20-21 S
No ratings yet
BIS 541 Ch04 20-21 S
82 pages
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
No ratings yet
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
43 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Lect 10 DM
No ratings yet
Lect 10 DM
36 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
Clustering
No ratings yet
Clustering
29 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Clustering
No ratings yet
Clustering
32 pages
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
No ratings yet
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
38 pages
Cluster Analysis: G Sreenivas
No ratings yet
Cluster Analysis: G Sreenivas
29 pages
Unit 5
No ratings yet
Unit 5
63 pages
Clustering
No ratings yet
Clustering
24 pages
Clustering
No ratings yet
Clustering
125 pages
M5
No ratings yet
M5
40 pages
Cluster Analysis
No ratings yet
Cluster Analysis
76 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
Dwdm Unit v Note
No ratings yet
Dwdm Unit v Note
19 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
DM_C6
No ratings yet
DM_C6
37 pages
Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
Unit V - Clustering
No ratings yet
Unit V - Clustering
19 pages
Data Mining Unit 3 Cluster Analysis: Types of Clusters
No ratings yet
Data Mining Unit 3 Cluster Analysis: Types of Clusters
11 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
50 pages
10ClusBasic Editted v1
No ratings yet
10ClusBasic Editted v1
41 pages
Unit 5 DM
No ratings yet
Unit 5 DM
47 pages
M5
No ratings yet
M5
40 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
93 pages
8 - Clustering
No ratings yet
8 - Clustering
85 pages
Slide-08-Chapter10-Cluster Analysis Basic Concept I
No ratings yet
Slide-08-Chapter10-Cluster Analysis Basic Concept I
40 pages
10ClusBasic
No ratings yet
10ClusBasic
66 pages
Cluster
No ratings yet
Cluster
20 pages
Ml Module5 Clustering
No ratings yet
Ml Module5 Clustering
71 pages
10ClusBasic (1)
No ratings yet
10ClusBasic (1)
31 pages
Clustering Methods
No ratings yet
Clustering Methods
14 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
Session 7 Clustering
No ratings yet
Session 7 Clustering
93 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
datamining-lect8
No ratings yet
datamining-lect8
79 pages
Lecture 3.2.3 3.2.4
No ratings yet
Lecture 3.2.3 3.2.4
28 pages
IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Chapter 1 An Introduction To Information Systems in Organizations
No ratings yet
Chapter 1 An Introduction To Information Systems in Organizations
99 pages
Formatting
No ratings yet
Formatting
1 page
Wireless Sample Bill Guide
No ratings yet
Wireless Sample Bill Guide
2 pages
05-6632A01 RevM MCR Tech Manual
No ratings yet
05-6632A01 RevM MCR Tech Manual
581 pages
Iptv-Unicast and Multicast
0% (1)
Iptv-Unicast and Multicast
21 pages
Lesson 9 PHP - Introduction
No ratings yet
Lesson 9 PHP - Introduction
7 pages
BAIT2023 Introduction To Internet Security: Tutorial 4
No ratings yet
BAIT2023 Introduction To Internet Security: Tutorial 4
10 pages
# Consensus and Agreement Algorithms: Distributed Computing
No ratings yet
# Consensus and Agreement Algorithms: Distributed Computing
9 pages
Latitude-E4200 Setup Guide2 En-Us
No ratings yet
Latitude-E4200 Setup Guide2 En-Us
8 pages
Basic Training SIEMENS
No ratings yet
Basic Training SIEMENS
322 pages
Assignment
No ratings yet
Assignment
8 pages
1 2 Introduction
No ratings yet
1 2 Introduction
45 pages
Lab-6-Implement-Network-based-IDS-Functionality-using-Suricata-IDS
No ratings yet
Lab-6-Implement-Network-based-IDS-Functionality-using-Suricata-IDS
85 pages
Quick Start Guide Tally Prime
No ratings yet
Quick Start Guide Tally Prime
21 pages
Instruction Manual 16GB Water Resistant Watch SB-WR072
No ratings yet
Instruction Manual 16GB Water Resistant Watch SB-WR072
6 pages
Microprocessor and Peripherals Lab Assignment: 1. Binary Search
No ratings yet
Microprocessor and Peripherals Lab Assignment: 1. Binary Search
42 pages
Taks 1 - Git Project
No ratings yet
Taks 1 - Git Project
7 pages
Canonical Decision Problems
No ratings yet
Canonical Decision Problems
65 pages
Chapter 16 (Computer)
No ratings yet
Chapter 16 (Computer)
3 pages
2 Memory and bus architecture
No ratings yet
2 Memory and bus architecture
14 pages
Clearance Sale 3-Aug 2021
No ratings yet
Clearance Sale 3-Aug 2021
13 pages
SQL Views1
No ratings yet
SQL Views1
5 pages
Code Pal Result
No ratings yet
Code Pal Result
2 pages
Sir C R Reddy College of Engineering Department of Computer Science and Engineering
No ratings yet
Sir C R Reddy College of Engineering Department of Computer Science and Engineering
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

UNIT-5 PPT

Uploaded by

UNIT-5 PPT

Uploaded by

UNIT-V

 A good clustering method will produce high quality

 Hierarchical versus Partitional

 Exclusive versus Overlapping versus Fuzzy

 Complete versus Partial

 Shared- property or Conceptual Clusters

Prototype-based clusters : Each point is closer

 Density-based clusters: Clusters are

 Radius: square root of average distance from any point of

 Given a k, find a partition of k clusters that optimizes the chosen

 Assign each object to the cluster with the

1: Select K points as initial centroids.

4: Recompute the centroid of each cluster.

5: until Centroids do not change

cluster center 4 Update 4

 Strength: Relatively efficient: O(tkn), where n is # objects, k is

1. Difficult to predict the number of clusters

Handling Empty Clusters:

means algorithm - empty clusters can be

farthest away from any current centroid.

SSE is chose and split

point that is farthest from any cluster center

points with the highest SSE.

 Use distance matrix as clustering criteria. This

Decompose data objects into a several levels of nested

A clustering of the data objects is obtained by cutting the

cluster proximity that differentiates the various

 Introduced in Kaufmann and Rousseeuw (1990)

 OPTICS: Ankerst, et al (SIGMOD’99).

 DENCLUE: Hinneburg & D. Keim (KDD’98)

 CLIQUE: Agrawal, et al. (SIGMOD’98) (more grid-

|NEps (q)| >= MinPts

 Core points: These points are in the interior of a

 Arbitrary select a point p

from the remainder of the data

 Telecom fraud detection

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.