Abstract
Hierarchical clustering methods are important in many data mining and pattern recognition tasks. In this paper we present an efficient coarse grained parallel algorithm for Single Link Clustering; a standard inter-cluster linkage metric. Our approach is to first describe algorithms for the Prefix Larger Integer Set and the Closest Larger Ancestor problems and then to show how these can be applied to solve the Single Link Clustering problem. In an extensive performance analysis an implementation of these algorithms on a Linux-based cluster has shown to scale well, exhibiting near linear relative speedup.
Research partially supported by the Natural Sciences and Engineering Research Council of Canada
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arumugavelu, S., Ranganathan, N.: SIMD Algorithms for Single Link and Complete Link pattern clustering. In: Proc. of Intl. Conf. on Pattern Recognition (1996)
Chan, A., Dehne, F.: A coarse grained parallel algorithm for maximum weight matching in trees. In: Proceedings of 12th IASTED International Conference Parallel and Distributed Computing and Systems (PCDS 2000), pp. 134–138 (2000)
Chan, A., Dehne, F.: CGMlib/CGMgraph: Implementing and testing CGM graph algorithms on PC clusters. In: Proceedings of 10th European PVM/MPI User’s Group Meeting (Euro PVM/MPI 2003), pp. 117–125 (2003)
Chan, A., Dehne, F., Taylor, R.: Cgmgraph/cgmlib: Implementing and testing cgm graph algorithms on pc clusters and shared memory machines. The international Journal of High Performance Computing Applications 19(1), 81–97 (2005)
Dahlhaus, E.: Fast parallel algorithm for the single link heuristics of hierarchical clustering. In: Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing, pp. 184–187 (1992)
Dehne, F., Fabri, A., Rau-Chaplin, A.: Scalable parallel geometric algorithms for coarse grained multicomputers. In: Proc. ACM Symposium on Computational Geometry, pp. 298–307 (1993)
DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Communication of the ACM 35(6), 85–98 (1992)
Ferreira, A., Flocchini, P., Rieping, I., Roncato, A., Santoro, N., Cáceres, E., Dehne, F., Song, S.W.: Efficient parallel graph algorithms for coarse grained multicomputers and bsp
Gao, C.: Parallel single link clustering on coarse-grained multicomputers. Master’s thesis, Faculty of Computer Sceince, Dalhousie University (April 2004)
Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. In: ACM SIGMOD International Conference on Management of Data, pp. 73–84 (1998)
Guha, S., Rastogi, R., Shim, K.: ROCK: A robust clustering algorithm for categorical attributes. In: International Conference on Data Engineering, vol. 25, pp. 345–366 (1999)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)
Li, X.: Parallel algorithms for hierarchical clustering and cluster validity. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(11), 1088–1092 (1990)
Li, X., Fang, Z.: Parallel algorithms for clustering on Hypercube SIMD computers. In: Proceedings of 1986 Conference on Computer Vission and Pattern Recognition, pp. 130–133 (1986)
Li, X., Fang, Z.: Parallel clustering algorithms. Parallel Computing 11(3), 275–290 (1989)
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: Ordering points to identify the clustering structure. In: ACMSIGMOD Int. Conf. on Management of Data (1999)
Murtagh, F.: Multidimensional clustering algorithms. Physica-Verlag, Vienna (1985)
Olson, C.: Parallel algorithms for hierarchical clustering. Parallel Computing 21, 1313–1325 (1995)
Sibson, R.: Slink: an optimally efficient algorithm for the single-link cluster method. The Computer Journal 16(1), 30–34 (1973)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: ACM SIGMOD International Conference on Management of Data, pp. 103–114 (1996)
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chan, A., Gao, C., Rau-Chaplin, A. (2005). A Coarse Grained Parallel Algorithm for Closest Larger Ancestors in Trees with Applications to Single Link Clustering. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J. (eds) High Performance Computing and Communications. HPCC 2005. Lecture Notes in Computer Science, vol 3726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557654_96
Download citation
DOI: https://doi.org/10.1007/11557654_96
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29031-5
Online ISBN: 978-3-540-32079-1
eBook Packages: Computer ScienceComputer Science (R0)