Abstract
To solve the problem of DPC (Clustering by fast search and find of Density Peaks) that it cannot find the cluster centers coming from sparse clusters, a new clustering algorithms is proposed in this paper. The proposed clustering algorithm uses the local standard deviation of point i to define its local density \(\rho _i\), such that all the cluster centers no matter whether they come from dense clusters or sparse clusters will be found as the density peaks. We named the new clustering algorithm as SD_DPC. The power of SD_DPC was tested on several synthetic data sets. Three data sets comprise both dense and sparse clusters with various number of points. The other data set is a typical synthetic one which is often used to test the performance of a clustering algorithm. The performance of SD_DPC is compared with that of DPC, and that of our previous work KNN-DPC (K-nearest neighbors DPC) and FKNN-DPC (Fuzzy weighted K-nearest neighbors DPC). The experimental results demonstrate that the proposed SD_DPC is superior to DPC, KNN-DPC and FKNN-DPC in finding cluster centers and the clustering of a data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alex, R., Alessandro, L.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Dan, F., Melanie, S., Christian, S.: Turning big data into tiny data: constant-size coresets for k-means, PCA and projective clustering. In: Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, pp. 1434–1453. SIAM (2013), http://dl.acm.org/citation.cfm?id=2627817.2627920
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. The Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann (2011)
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Karkkainen, I., Franti, P.: Dynamic local search for clustering with unknown number of clusters. In: Proceedings of the 16th International Conference on Pattern Recognition, vol. 2, pp. 240–243. IEEE (2002)
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, Statistics, Oakland, CA, USA, pp. 281–297 (1967)
Mehmood, R., EI-AShram, S., Bie, R., Dawood, H., Kos, A.: Clustering by fast search and merge local density peaks for gene expression microarray data. Sci. Rep. 7, 45602 (2017)
Tong, H., Kang, U.: Big data clustering. In: Aggarwal, C.C., Reddy, C.K. (eds.) Data Clustering: Algorithms and Applications, chap. 11, pp. 259–276. CRC Press (2013)
Von Luxburg, U., Williamson, R.C., Guyon, I.: Clustering: science or art? J. Mach. Learn. Res. Proc. Track 27, 65–80 (2012)
Xie, J., Gao, H.: Statistical correlation and k-means based distinguishable gene subset selection algorithms. J. Softw. 25(9), 2050–2075 (2014)
Xie, J., Gao, H., Xie, W.: K-nearest neighbors optimized clustering algorithm by fast search and nding the density peaks of a dataset. SCIENTIA SINICA Informationis 46(2), 258–280 (2016)
Xie, J., Gao, H., Xie, W., Liu, X., Grant, P.W.: Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf. Sci. 354, 19–40 (2016)
Xie, J., Jiang, S., Xie, W., Gao, X.: An efficient global K-means clustering algorithm. J. Comput. 6(2), 271–279 (2011)
Xie, J., Li, Y., Zhou, Y., Wang, M.: Differential feature recognition of breast cancer patients based on minimum spanning tree clustering and F-statistics. In: Yin, X., Geller, J., Li, Y., Zhou, R., Wang, H., Zhang, Y. (eds.) HIS 2016. LNCS, vol. 10038, pp. 194–204. Springer, Cham (2016). doi:10.1007/978-3-319-48335-1_21
Xu, R., Wunsch, D.I.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Acknowledgments
We are much obliged to those who provide the public data sets for us to use. This work is supported in part by the National Natural Science Foundation of China under Grant No. 61673251, is also supported by the Key Science and Technology Program of Shaanxi Province of China under Grant No. 2013K12-03-24, and is at the same time supported by the Fundamental Research Funds for the Central Universities under Grant No. GK201701006, and by the Innovation Funds of Graduate Programs at Shaanxi Normal University under Grant No. 2015CXS028 and 2016CSY009.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Xie, J., Jiang, W., Ding, L. (2017). Clustering by Searching Density Peaks via Local Standard Deviation. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2017. IDEAL 2017. Lecture Notes in Computer Science(), vol 10585. Springer, Cham. https://doi.org/10.1007/978-3-319-68935-7_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-68935-7_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68934-0
Online ISBN: 978-3-319-68935-7
eBook Packages: Computer ScienceComputer Science (R0)