Abstract
Clustering algorithms for multidimensional numerical data must overcome special difficulties due to the irregularities of data distribution. We present a clustering algorithm for numerical data that combines ideas from random projection techniques and density-based clustering. The algorithm consists of two phases: the first phase that entails the use of random projections to detect clusters, and the second phase that consists of certain post-processing techniques of clusters obtained by several random projections. Experiments were performed on synthetic data consisting of randomly-generated points in ℝn, synthetic images containing colored regions randomly distributed, and, finally, real images. Our results suggest the potential of our algorithm for image segmentation.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM-SIGMOD Int. Conf. Management of Data, pp. 94–105. ACM Press, New York (1998)
Agarwal, P., Mustafa, N.H.: k-means projective clustering. In: Proceedings of PODS, pp. 155–165 (2004)
Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.S.: Fast algorithms for projected clustering. In: Proceedings of ACM-SIGMOD Conference on Management of Data, pp. 61–72. ACM Press, New York (1999)
Barthélemy, J.P., Leclerc, B.: The median procedure for partitions. In: Partitioning Data Sets. American Mathematical Society, pp. 3–14. Providence, RI (1995)
Chaudhri, A.B., Unland, R., Djeraba, C., Lindner, W. (eds.): EDBT 2002. LNCS, vol. 2490. Springer, Heidelberg (2002)
Dasgupta, S., Gupta, A.: An elementary proof of the johnson-lindenstrauss lemma. Technical Report TR-99-006, International Computer Science Institute (1999)
Djeraba, C. (ed.): Multimedia Mining - A Highway to Intelligent Multimedia Documents. Kluwer, Dordrecht (2003)
Frankl, P., Maehara, H.: The johnson-lindenstrauss lemma and the sphericity of some graphs. J. Comb. Theory B 44, 355–362 (1988)
Jain, A.K., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
Jain, A.K., Flynn, P.J.: Image segmentation using clustering. In: Advances in Image Understanding: A Festschrift for Azriel Rosenfeld, Piscataway, NJ, pp. 65–83. IEEE Press, Los Alamitos (1996)
Johnson, W.B., Lindenstrauss, J.: Extensions of lipshitz mappings into hilbert spaces. Contemporary Mathematics 26, 189–206 (1984)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31, 264–323 (1999)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, pp. 281–297. University of California Press, California (1967)
Mondrian, P.: http://artchive.com/artchive/M/mondrian.html
Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Mining and Knowledge Discovery, an International Journal 2, 169–194 (1998)
Tan, P.N, Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson/Addison-Wesley, Boston (2006)
Vempala, S.S.: The Random Projection Method. American Mathematical Society. Providence, Rhode Island (2004)
Witten, I.H., Frank, E.: Data Mining - Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Zaïane, O.R., Simoff, S.J., Djeraba, C. (eds.): MDM/KDD 2002 and KDMCD 2002. LNCS (LNAI), vol. 2797. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Urruty, T., Djeraba, C., Simovici, D.A. (2007). Clustering by Random Projections. In: Perner, P. (eds) Advances in Data Mining. Theoretical Aspects and Applications. ICDM 2007. Lecture Notes in Computer Science(), vol 4597. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73435-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-73435-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73434-5
Online ISBN: 978-3-540-73435-2
eBook Packages: Computer ScienceComputer Science (R0)