0% found this document useful (0 votes)

35 views5 pages

Research on k Mean Algorithm

Uploaded by

Sreenath Radhakrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views5 pages

Research on k Mean Algorithm

Uploaded by

Sreenath Radhakrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Research on k-means Clustering Algorithm: An Improved k-means

Clustering Algorithm

Abstract:
Clustering analysis method is one of the main analytical methods in data mining,
the method of clustering algorithm will influence the clustering results directly. This
paper discusses the standard k-means clustering algorithm and analyzes the
shortcomings of standard k-means algorithm, such as the k-means clustering
algorithm has to calculate the distance between each data object and all cluster
centers in each iteration, which makes the efficiency of clustering is not high. This
paper proposes an improved k-means algorithm in order to solve this question,
requiring a simple data structure to store some information in every iteration, which
is to be used in the next interation. The improved method avoids computing the
distance of each data object to the cluster centers repeatly, saving the running
time. Experimental results show that the improved method can effectively improve
the speed of clustering and accuracy, reducing the computational complexity of the
k-means.
Introduction
Extracting meaningful and tangible information from collected data is the primary
goal of data mining [4]. However, most data are collected in arbitrary forms and
categories, making such data difficult to analyse, especially when the data objects’
features are unknown. Appropriate organization of unlabeled data is an aspect of
data mining handled by cluster analysis. The meaningful grouping of such unlabeled
data is regarded as data clustering. The goal is to group unlabeled data so that the
data objects whose characteristics and attributes are similar are together in a
cluster such that the similarities of data objects within the same clusters are higher
when compared with other clusters’ data objects. In other words, data clustering
analysis classifies unlabeled data to ensure higher intra-cluster similarity and lower
inter-cluster similarity [59]. The process of clustering analysis can be likened to the
learning process, which involves specific predictive behavior associated with
unsupervised learning when handling unlabeled datasets [55]. Fig. 1 clearly
illustrates this spectrum of different categories of learning problems of interest in
pattern recognition and machine learning, as discussed in Jain [95].
Cluster analysis has been successfully applied to address data clustering problems
in different domains such as medical science, manufacturing, robotics, the financial
sector, privacy protection, artificial intelligence, urban development, aviation,
industries, sales, and marketing [61], [7], [180], [59], [20], [111], [49]. Extracting
useful information from data in these domains is essential for providing better
services and generating more profits [181], [148], [172]. Real-world data generated
are mostly voluminous, unlabeled, and of different dimensions. This makes data
clustering difficult. Pre-identifying the number of clusters in a real-world dataset
cannot be quickly done. Therefore, determining the optimal number of clusters in a
real-world dataset characterized by high density and dimensionality is quite tricky
for standard clustering algorithms. This poses a significant challenge to
conventional clustering algorithms in which the number of clusters must be
specified as input to the algorithm.
Algorithms for data clustering are grouped into two major categories [97], [224],
[68], [60], namely, hierarchical clustering algorithms and partitional clustering
algorithms. Hierarchical clustering algorithms partition data objects into clusters in
a hierarchical form either in a bottom-up approach (agglomerative method) or a
top-down approach (divisive method). In the agglomerative method, individual data
objects are merged iteratively based on their similarity. In the divisive method, the
initial dataset is taken as a single cluster and broken down iteratively using data
object similarity until each data object forms a single cluster or a set criterion is
met. The hierarchical clustering algorithm produces a dendrogram of merged
(agglomerative) or split (divisive) data objects depicting the corresponding cluster
hierarchy generated as output for the cluster analysis [60]. The dendrogram is a
pictorial representation of the data objects’ nested grouping showing the similarity
level at which each grouping changes [97].
In the partitional clustering approach, a single partition of the initial dataset is
produced instead of a clustering structure of a dendrogram. Clusters are produced
in a heuristic approach while optimizing a criterion function defined globally on all
the data objects in the set or locally on the subset of the data objects [246], [9],
[189]. Optimizing a criterion function on a set of the data objects using a
combinatorial search of all possible values to get the optimum value is
computationally prohibitive. Therefore, partitional clustering algorithms require the
specification of different k values supplied at different runs to obtain the best
configuration to produce the optimum clusters.
K-means clustering algorithm was proposed independently by different researchers,
including Steinhaus [203], Lloyd [132], MacQueen [135], and Jancey [98] from
different disciplines during the 1950s and 1960s [171]. These researchers’ various
versions of the algorithms show four common processing steps with differences in
each step [171]. The K-means clustering algorithm generates clusters using the
cluster’s object mean value [197], [34]. In the standard K-means algorithm, the
cluster number is required as a user parameter and is used in the arbitrary cluster
center selection from the dataset. However, the K-means algorithm may converge
to a local minimum because of its greedy nature [95]. Therefore, it requires several
runs for a given k value with different initial cluster center selections to obtain the
optimal cluster result [243], [59], [19]. In addition, the standard algorithm detects
ball-shaped or spherical clusters only because of the use of the Euclidean metric as
its distance measure [95]. A typical K-means clustering process is illustrated in Fig.
2. With a set of input data supplied to the K-means clustering algorithm, the
centroid vector C={c1,c2,...,ck} can easily be identified with K being the number of
centroids defined by the user. Fig. 2a illustrates a data set in 2D space distributed
randomly with -100≤xi,yi≤100, and Fig. 2b presents the K-means clustering result
with the number of centroids set to K=3.
Despite these limitations, the K-means clustering algorithm is credited with
flexibility, efficiency, and ease of implementation. It is also among the top ten
clustering algorithms in data mining [59], [217], [105], [94]. The simplicity and low
computational complexity have given the K-means clustering algorithm a wide
acceptance in many domains for solving clustering problems. Several K-means
clustering algorithm variants have been developed to enhance its performance. This
work presents an overview of the K-means clustering algorithm and its variants with
a proposed taxonomy for the variants. The algorithm’s research progression from its
inception, the current trends, open issues, and challenges with recommended future
research perspectives are also discussed in detail.
In this paper, the following focal research question was proposed to reflect the
purpose of this comprehensive review work:
“What are the existing variants of K-means algorithms for solving clustering
problems since its inception to date.”
In providing answers to the main research question, the following sub-research
questions were considered:
 a)
Identify research that has been conducted to improve on the standard K-means
clustering algorithm
 b)
What methods have been adopted in the various research found in (a) for improving
the performance of the K-means clustering algorithm?
 c)
What are the performances of the reported K-means clustering algorithm variants?
 d)
What are the current research progressions involving the K-means clustering
algorithm?
This review work will be presented from four perspectives: first, a systematic review
of the K-mean clustering algorithm and its variants. Second, a presentation of a
proposed novel taxonomy of K-mean clustering methods in the literature. Third,
verifications of the findings on all aspects of K-means clustering methods through
an in-depth analysis. Fourth, an outline of open issues and challenges and
recommended future trends. The main idea is to present a comprehensive
systematic review that will provide current researchers and practitioners with a
pathway for future novel research involving the K-means clustering algorithm. The
main contributions of this research work are summarized below:
 •
A comprehensive review of the K-means algorithm is presented, including a
proposed taxonomy of recent variants and trending application areas of the K-
means clustering algorithm.
 •
Open research issues relating to adopting metaheuristic algorithms as automatic
cluster number generators to improve the K-means algorithm's performance quality
are identified and discussed.
 •
Finally, research gaps and the future scope of the K-means algorithm in general,
particularly in outlining a new perspective for solving the challenges of the K-means
clustering algorithm and its variants, are identified.
The rest of the paper is organized as follows: Section 1 introduces the background
work on the proposed review study; Section 2 outlines the methodology approach;
Section 3 presents a proposed taxonomy of k-mean clustering methods found in the
literature, followed by a detailed discussion of the review of the K-means algorithm
variants; Section 4 discusses the review findings; Section 5 reports the current
trending areas of application of the K-means algorithm; Section 6 outlines the open
issues and challenges of K-means clustering methods with recommended future
trends; and Section 7 concludes the review.
Section snippets
Research methodology
This study aims at conducting a review of the K-means clustering algorithm
variants. The research methodology adopted for the study is presented in this
section. Kitchenham et al.’s [113] guidelines for a systematic literature review of
computer technology were adopted for the study. Four phases are involved in the
review: planning, study search and selection, data acquisition, and data analysis.
The planning section reported in section 1 includes establishing the problem
statement, the study
Standard K-means clustering algorithm
The K-means clustering algorithm is categorized as a partitional clustering
algorithm. Partitioning given datasets into clusters involves finding the minimum
squared error between the various data points in the data set and the mean of a
cluster, then assigning each data point to the cluster centre nearest to it.
Mathematically, given a dataset X=xiwherei=1,2,...n of d-dimension data points of
size n,
X Is partitioned into ‘k’ clusters C=cjwherej=1,2...k

such that∊Jck=∑xi∊ckxi-μk2
The K-means
Discussion
This study presents an extensive literature review on the various improvements of
the K-means algorithm, mainly from 2010 to date. The review has surveyed the
variety of modifications available of the standard K-means algorithm design and
implementation which are intended to enhance its clustering performance and
speed. The current study found that the improvements span all the major aspects of
algorithm design, including the algorithm input, processes, output, and concept
modification. In the
Trending application areas of the K-Means algorithm
The K-means clustering algorithm and variants have been applied widely in many
research areas, including: image recognition [12], image processing [151], market
analysis [70], data processing [152], medical image segmentation [153], [151], risk
evaluation [248], medical diagnosis [200], [245], medical services [215], [100], etc.
In the health sector, some parts of the human body have been tested and examined
for tumor detection (tumor in the brain, cancer), and cluster analysis in medical
Open issues and challenges
The main aim of the K-means algorithm and its variants is to group any given
dataset into k clusters such that the data objects within clusters are similar but
different from the ones in other clusters. Open issues and challenges in the K-means
algorithm and its variants include the challenges common to the generality of
clustering techniques as well as those peculiar to it.
Initialization Problem: - The initialization problem in K-means is twofold: defining the
accurate cluster numbers to be
Conclusion
The K-means clustering algorithm is known for its simplicity and is applied in
clustering datasets from different domains. Despite this advantage, its performance
is greatly hampered due to some of the problems inherent in its implementation. As
a result, much research has been conducted to improve the algorithm’s general
performance. This review work has been able to identify the various limitations of
the standard algorithm and the numerous variants developed to solve the identified
problems

sakthi PCP
No ratings yet
sakthi PCP
22 pages
AI Chapter 3
No ratings yet
AI Chapter 3
98 pages
Chapter Three Searching and Sorting Algorithm
100% (1)
Chapter Three Searching and Sorting Algorithm
47 pages
Comprehensive Review of K Means Clustering Algorithms1
No ratings yet
Comprehensive Review of K Means Clustering Algorithms1
6 pages
genedata doc
No ratings yet
genedata doc
67 pages
Number System Refresher
No ratings yet
Number System Refresher
2 pages
Electronics 09 01295 v2
No ratings yet
Electronics 09 01295 v2
12 pages
3. Entropy
No ratings yet
3. Entropy
29 pages
Untitled document
No ratings yet
Untitled document
32 pages
DM Sumer 2023 Question
No ratings yet
DM Sumer 2023 Question
3 pages
DWMModule 4 (1) (1) (1)
No ratings yet
DWMModule 4 (1) (1) (1)
31 pages
1 s2.0 S0020025522014633 Main
No ratings yet
1 s2.0 S0020025522014633 Main
33 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
Unit 2 - Introduction to Cluster Analysis
No ratings yet
Unit 2 - Introduction to Cluster Analysis
53 pages
Normalization Based K Means Clustering Algorithm
No ratings yet
Normalization Based K Means Clustering Algorithm
5 pages
Option X Operation Research UoS
No ratings yet
Option X Operation Research UoS
3 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
14 pages
DAA_unit_4_Backtracking
No ratings yet
DAA_unit_4_Backtracking
30 pages
GrayCodesPCInControl PDF
No ratings yet
GrayCodesPCInControl PDF
2 pages
An Improved K-Means Algorithm Based On Mapreduce and Grid: Li Ma, Lei Gu, Bo Li, Yue Ma and Jin Wang
No ratings yet
An Improved K-Means Algorithm Based On Mapreduce and Grid: Li Ma, Lei Gu, Bo Li, Yue Ma and Jin Wang
12 pages
Fast_and_Robust_General_Purpose_Clustering_Algorit
No ratings yet
Fast_and_Robust_General_Purpose_Clustering_Algorit
29 pages
anupama luthra_2011
No ratings yet
anupama luthra_2011
21 pages
Lecture Notes 3-8 To 18-8 Predicate Logic
No ratings yet
Lecture Notes 3-8 To 18-8 Predicate Logic
11 pages
Azimi 2017
No ratings yet
Azimi 2017
26 pages
??_♂️DAA solutions
No ratings yet
??_♂️DAA solutions
43 pages
Unit - 4 DM
No ratings yet
Unit - 4 DM
24 pages
Unsupervised-Learning-Part-1 (1)
No ratings yet
Unsupervised-Learning-Part-1 (1)
9 pages
Neuro-Fuzzy Systems: A Survey: WSEAS Transactions On Systems January 2004
No ratings yet
Neuro-Fuzzy Systems: A Survey: WSEAS Transactions On Systems January 2004
7 pages
An Improvement in K Means Clustering Algorithm IJERTV2IS1385
No ratings yet
An Improvement in K Means Clustering Algorithm IJERTV2IS1385
6 pages
ENHANCING CLUSTERING PERFORMANCE: A HYBRID GENERALIZED K-MEANS APPROACH
No ratings yet
ENHANCING CLUSTERING PERFORMANCE: A HYBRID GENERALIZED K-MEANS APPROACH
9 pages
Clustering
No ratings yet
Clustering
29 pages
MATH6100 Calculus 1 Finals Q2 2nd Attemp 100 100
No ratings yet
MATH6100 Calculus 1 Finals Q2 2nd Attemp 100 100
6 pages
Comparative Analysis of K-Means and Fuzzy C-Means Algorithms
No ratings yet
Comparative Analysis of K-Means and Fuzzy C-Means Algorithms
5 pages
A Comparative Study of K-Means, K-Medoid and Enhanced K-Medoid Algorithms
No ratings yet
A Comparative Study of K-Means, K-Medoid and Enhanced K-Medoid Algorithms
4 pages
cs188 sp23 Note07
No ratings yet
cs188 sp23 Note07
7 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
23 pages
Unit 4
No ratings yet
Unit 4
74 pages
A Wavelet-Based Anytime Algorithm For K-Means Clustering of Time Series
No ratings yet
A Wavelet-Based Anytime Algorithm For K-Means Clustering of Time Series
12 pages
Assignment For Bmn-301
No ratings yet
Assignment For Bmn-301
7 pages
Research On K-Means Clustering Algorithm An Improved K-Means Clustering Algorithm
No ratings yet
Research On K-Means Clustering Algorithm An Improved K-Means Clustering Algorithm
5 pages
An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics
No ratings yet
An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics
12 pages
1 A Modified Version
No ratings yet
1 A Modified Version
7 pages
A Discretization Method For Industrial Data Based On Big Data Technology
No ratings yet
A Discretization Method For Industrial Data Based On Big Data Technology
3 pages
Fds Unit03
No ratings yet
Fds Unit03
11 pages
A 2D Path-Planning Performance Comparison of RRT and RRT For Unmanned Ground Vehicle
No ratings yet
A 2D Path-Planning Performance Comparison of RRT and RRT For Unmanned Ground Vehicle
8 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Sorting Visualizer Exploring The Beauty of Sorting Algorithms
No ratings yet
Sorting Visualizer Exploring The Beauty of Sorting Algorithms
11 pages
Analysis and Study of K Means Clustering Algorithm IJERTV2IS70648
No ratings yet
Analysis and Study of K Means Clustering Algorithm IJERTV2IS70648
6 pages
2 3 Trees
No ratings yet
2 3 Trees
10 pages
MMZ XRF O0 Ra Pre 0 ZB XGXW W1 Er 02 OAYQum QDD78 HQP
No ratings yet
MMZ XRF O0 Ra Pre 0 ZB XGXW W1 Er 02 OAYQum QDD78 HQP
4 pages
A Review On K Means Clustering
No ratings yet
A Review On K Means Clustering
7 pages
Linprog 1 Sol
No ratings yet
Linprog 1 Sol
7 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
A Dynamic K-Means Clustering For Data Mining-Dikonversi
No ratings yet
A Dynamic K-Means Clustering For Data Mining-Dikonversi
6 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
4C7 - Digital Communications - Tutorial 1 Solutions
No ratings yet
4C7 - Digital Communications - Tutorial 1 Solutions
4 pages
Analysis&Comparisonof Efficient Techniquesof
No ratings yet
Analysis&Comparisonof Efficient Techniquesof
5 pages
na2010
No ratings yet
na2010
5 pages
P vs. NP - An Introduction
No ratings yet
P vs. NP - An Introduction
2 pages
K-Means Clustering
No ratings yet
K-Means Clustering
8 pages
Ijert Ijert: Enhanced Clustering Algorithm For Classification of Datasets
No ratings yet
Ijert Ijert: Enhanced Clustering Algorithm For Classification of Datasets
8 pages
K - Means Clustering Algorithm Applications in Data Mining and Pattern Recognition
No ratings yet
K - Means Clustering Algorithm Applications in Data Mining and Pattern Recognition
8 pages
Ijcset 2016060701
No ratings yet
Ijcset 2016060701
3 pages
V5I5201647
No ratings yet
V5I5201647
13 pages
Enhancing The Exactness of K-Means Clustering Algorithm by Centroids
No ratings yet
Enhancing The Exactness of K-Means Clustering Algorithm by Centroids
7 pages
Gautam A. Kudale
No ratings yet
Gautam A. Kudale
6 pages
VL7101 VLSI Signal Processing Lesson Plan
No ratings yet
VL7101 VLSI Signal Processing Lesson Plan
3 pages
K-Means Data Clustering Approach: Jaipur National University
No ratings yet
K-Means Data Clustering Approach: Jaipur National University
43 pages
Comprehensive Review of K-Means Clustering Algorithms
No ratings yet
Comprehensive Review of K-Means Clustering Algorithms
5 pages
MCQDiscrete IIISem
No ratings yet
MCQDiscrete IIISem
17 pages
Data Mining Unit 3 Cluster Analysis: Types of Clusters
No ratings yet
Data Mining Unit 3 Cluster Analysis: Types of Clusters
11 pages
The International Journal of Engineering and Science (The IJES)
No ratings yet
The International Journal of Engineering and Science (The IJES)
4 pages
Coursework AI Method COB107
No ratings yet
Coursework AI Method COB107
6 pages
DWDM Unit-5
No ratings yet
DWDM Unit-5
52 pages
Statistical Considerations On The K - Means Algorithm
No ratings yet
Statistical Considerations On The K - Means Algorithm
9 pages
An Efficient Enhanced K-Means Clustering Algorithm
No ratings yet
An Efficient Enhanced K-Means Clustering Algorithm
8 pages
An Efficient Incremental Clustering Algorithm
No ratings yet
An Efficient Incremental Clustering Algorithm
3 pages
Cluster Evaluation Techniques: Atds Assignment
No ratings yet
Cluster Evaluation Techniques: Atds Assignment
4 pages
Linear Programming Project
No ratings yet
Linear Programming Project
5 pages
Unit 4
No ratings yet
Unit 4
4 pages
An Introduction To Graph Theory in Complex Systems Studies: Why Use A Graph-Theoretic Representation?
No ratings yet
An Introduction To Graph Theory in Complex Systems Studies: Why Use A Graph-Theoretic Representation?
10 pages
Introduction To Operations Research: Ninth Edition
No ratings yet
Introduction To Operations Research: Ninth Edition
8 pages
A Dynamic K-Means Clustering For Data Mining
No ratings yet
A Dynamic K-Means Clustering For Data Mining
6 pages
Practice Set: Operation Research (BMA342)
No ratings yet
Practice Set: Operation Research (BMA342)
10 pages
Y (S) U (S) K K Τ S Where Ε Is Damping Coefficient M S Thusy (S) = Km S Τ S For Critically Damped, Ε=1, Thus Y (T) =Km T Τ E
No ratings yet
Y (S) U (S) K K Τ S Where Ε Is Damping Coefficient M S Thusy (S) = Km S Τ S For Critically Damped, Ε=1, Thus Y (T) =Km T Τ E
4 pages
Design and Analysis of Algorithm Question Bank
No ratings yet
Design and Analysis of Algorithm Question Bank
15 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Research on k Mean Algorithm

Uploaded by

Research on k Mean Algorithm

Uploaded by

Research on k-means Clustering Algorithm: An Improved k-means

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.