Abstract
In this paper, a speed-up version of the Dynamic Hierarchical Compact (DHC) algorithm is presented. Our approach profits from the cluster hierarchy already built to reduce the number of calculated similarities. The experimental results on several benchmark text collections show that the proposed method is significantly faster than DHC while achieving approximately the same clustering quality.
Chapter PDF
Similar content being viewed by others
References
Gil-GarcĂa, R.J., BadĂa-Contelles, J.M., Pons-Porrata, A.: Dynamic Hierarchical Compact Clustering Algorithm. In: Sanfeliu, A., CortĂ©s, M.L. (eds.) CIARP 2005. LNCS, vol. 3773, pp. 302–310. Springer, Heidelberg (2005)
Ciaccia, P., Patella, P., Zezula, P.: M-Tree: An efficient access method for similarity search in metric spaces. In: VLDB 1997, pp. 426–435 (1997)
Berchtold, S., Bohm, C., Jagadish, H.V., Kriegel, H.P., Sander, J.: Independent quantization: An index compression technique for high dimensional data space. In: 16th International Conference on Data Engineering, pp. 577–588 (2000)
Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: International Conference on Information and Knowledge Management, pp. 515–524 (2002)
Larsen, B., Aone, C.: Fast and Effective Text Mining Using Linear-time Document Clustering. In: KDD 1999, pp. 16–22. ACM Press, New York (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gil-GarcĂa, R., Pons-Porrata, A. (2009). A Speed-Up Hierarchical Compact Clustering Algorithm for Dynamic Document Collections. In: Bayro-Corrochano, E., Eklundh, JO. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2009. Lecture Notes in Computer Science, vol 5856. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10268-4_45
Download citation
DOI: https://doi.org/10.1007/978-3-642-10268-4_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10267-7
Online ISBN: 978-3-642-10268-4
eBook Packages: Computer ScienceComputer Science (R0)