DS143 Group 13 Presentation-1
DS143 Group 13 Presentation-1
Methods
Group 13 Members :
Density-based clustering
algorithms:
DBSCAN- grows clusters according to a
density-based connectivity analysis.
-stands for Density-Based Spatial Clustering of Applications with Noise – It is a density-based clustering algorithm. – The algorithm grows regions with sufficiently high density into clusters and discovers clusters of arbitrary shape in Density-Based Methods
spatial databases with noise. – It defines a cluster as a maximal set of density-connected points.
Definitions:
ε-Neighborhood of an object – The neighborhood
within a radius e of a given object is called the ε-neighborhood
of the object.
A density-based cluster –A
density-based cluster is a set of density-connected
objects that is maximal with respect to density-
reachability. – Every object not contained in any cluster
is considered to be noise.
DBSCAN
DBSCAN searches for clusters by checking the ε-
neighborhood of each point in the database.
- If the ε-neighborhood of a point p contains at least
MinPts, a new cluster with p as a core object is
created.
- DBSCAN then iteratively collects directly
densityreachable objects from these core objects,
which may involve the merge of a few density-
reachable clusters.
- The process terminates when no new point can be
added to any cluster.
•The grid-based clustering methods use a multi-resolution grid data structure. It quantizes the object
areas into a finite number of cells that form a grid structure on which all of the operations for
clustering are implemented. The benefit of the method is its quick processing time, which is generally
independent of the number of data objects, still dependent on only the multiple cells in each
dimension in the quantized space.
•An instance of the grid-based approach involves STING, which explores statistical data stored in the
grid cells, WaveCluster, which clusters objects using a wavelet transform approach, and CLIQUE,
which defines a grid-and density-based approach for clustering in high-dimensional data space.
Grid-Based Clustering
Grid-Based Clustering method uses a multi-resolution grid data
structure.
(Partitional Clustering Methods
(Hierarchical Clustering Methods
(Density-Based Clustering Methods
a.) STING - A Statistical Information Grid Approach
The statistical info of each cell is calculated and stored beforehand and is
used to answer queries.
For each cell in the current level compute the confidence interval.
When finishing examining the current layer, proceed to the next lower
level.
b.) WaveCluster
It was proposed by Sheikholeslami, Chatterjee, and Zhang
(VLDB’98).
Input parameters:
No of grid cells for each dimension
The wavelet, and the no of applications of wavelet transform.
How to apply the wavelet transform to find
clusters
It summaries the data by imposing a
multidimensional grid structure onto data
space.
These multidimensional spatial data objects are
represented in an n-dimensional feature space.
Now apply wavelet transform on feature space
to find the dense regions in the feature space.
Then apply wavelet transform multiple times
which results in clusters at different scales from
fine to coarse.
Why is wavelet transformation useful for clustering
It uses hat-shape filters to emphasize region where points cluster, but
simultaneously to suppress weaker information in their boundary.
It is an effective removal method for outliers.
It is of Multi-resolution method.
It is cost-efficiency.
Major features:
The time complexity of this method is O(N).
It detects arbitrary shaped clusters at different scales.
It is not sensitive to noise, not sensitive to input order.
It only applicable to low dimensional data.
c.) CLIQUE - Clustering In QUEst
It is insensitive to the order of records in input and does not presume some canonical
data distribution.
It scales linearly with the size of input and has good scalability as the number of
dimensions in the data increases.
Disadvantages
The accuracy of the clustering result may be degraded at the expense of the simplicity
of the method.
Summary
Grid-Based Clustering -> It is one of the methods of cluster analysis which uses a
multi-resolution grid data structure.
“
”
The End !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!