Abstract
Parallel and Distributed Knowledge Discovery (PDKD) is emerging as a possible killer application for clusters and grids of computers. The need to process large volumes of data and the availability of parallel data mining algorithms, makes it possible to exploit the increasing computational power of clusters at low costs. On the other side, grid computing is an emerging “standard” to develop and deploy distributed, high performance applications over geographic networks, in different domains, and in particular for data intensive applications. This paper proposes an approach to integrate cluster of computers within a grid infrastructure to use them, enriched by specific data mining services, as the deployment platform for high performance distributed data mining and knowledge discovery.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
G. Piatesky-Shapiro, The data mining Industry coming of age, IEEE Intelligent Systems, pp. 32–34, november/december 1999
A. Freitas, S. Levington, Mining Very Large Databases with Parallel Processing, Kluwer, 1998.
M.J.A. Michael, J.A. Berry, Data Mining Techniques, John Wiley & Sons, 1997.
D. Abramson, From PC Clusters to a Global Computational Grid, 1st IEEE Workshop on Cluster Computing (IWCC99), Melbourne, 1999.
R. Moore, Collection-Based Data Management, Workshop on Large-Scale Parallel, KDD Systems (KDD99), San Diego, CA, 1999.
S. Bailey, E. Creel, R. Grossman, S. Gutti, H. Sivakumar, A high performance implementation of the data space transfer protocol (DSTP), Workshop on Large-Scale Parallel, KDD Systems (KDD99), San Diego, CA, 1999.
U. Dayal, Large-Scale Data Mining Applications: Requirements and Architectures, Workshop on Large-Scale Parallel KDD Systems (KDD99), San Diego, CA, 1999.
G. Williams, Integrated Delivery of Large-Scale Data Mining Systems, Workshop on Large-Scale Parallel KDD Systems (KDD99), San Diego, CA, 1999.
R. Grossman, S. Kasif, R. Moore, D. Rocke, J. Ullman, Data Mining Research: Opportunities and Challenges, A report on three NFS Workshops on Mining Large, Massive and Distributed Data, available at http://www.ncdm.uic.edu/m3d-finalreport.htm
B. Grossman and Yike Guo, Communicating Data Mining: Issues and Challenges in Wide Area Distributed Data Mining, Workshop on Large-Scale Parallel KDD Systems (KDD99), San Diego, CA, 1999.
V. Kumar, Large-Scale Data Mining: Where is it Headed?, Workshop on Large-Scale Parallel KDD Systems (KDD99), San Diego, CA, 1999.
Building the Grid: An Integrated Services and Toolkit Architecture for Next-Generation Networked Applications, Working Draft, http://www.gridforum.org/building_the_grid.htm.
Foster and C. Kesselman, editors, The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann Publishers, 1999.
Foster, G. H. Thiruvathukal, S. Tuecke, Technologies for Ubiquitous Supercomputing: A Java Interface to the Nexus Communication System, Concurrency: Practice and Experience, special issue edited by G. C. Fox, June 1997.
The Globus project, available at http://www.globus.org.
The Nimrod project, available at http://www.dgs.monah.edu/~davida/nimrod.html.
Rajkumar Buyya (editor), High Performance Cluster Computing: Architectures and Systems, Prentice Hall PTR, NJ, USA, 1999.
M. Baker, editor, Cluster Computing White Paper, http://www.dcs.port.ac.uk/~mab/tfcc/WhitePaper/
R. L. Grossman, S. Kasif, D. Mon, A. Ramu and B. Malhi, The Preliminary Design of Papyrus: A System for High Performance, Distributed Data Mining over Clusters, Meta-Clusters and Super-Clusters, Proceedings of the KDD-98 Workshop on Distributed Data Mining, AAAI, 1999.
S. Stolfo, A. L. Prodromis, P.K. Chan, JAM: Java Agents for Meta-Learning over Distributed Databases, Proc. of the 3rd Int. Conf. On Knowledge Discovery and data Miing, AAAI Press, CA, 1997.
Y. Guo et al., Meta Learning for parallel Data Mining, in Proc. o the 7th Parallel Computing Workshop, 1997.
Albanese, M. Cannataro, P. Rullo, D. Saccà, Transmitting Datacubes over Congested Networks, Proc. of the IEEE International Conference on Coding and Transmission (ITCC2000), Las Vegas, 2000 (to appear).
Foster, I., A Grid-Enabled MPI: Message Passing in Heterogeneous Distributed Computing Systems, Proc. of the SC98 Conference, Orlando, USA, Nov. 7–13, 1998.
DiNucci, D. “The Role and Requirements of a Grid Programming Model”, available at http://www.elepar.com/GPMWG/gpm.1.ps
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cannataro, M. (2000). Clusters and Grids for Distributed and Parallel Knowledge Discovery. In: Bubak, M., Afsarmanesh, H., Hertzberger, B., Williams, R. (eds) High Performance Computing and Networking. HPCN-Europe 2000. Lecture Notes in Computer Science, vol 1823. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45492-6_86
Download citation
DOI: https://doi.org/10.1007/3-540-45492-6_86
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67553-2
Online ISBN: 978-3-540-45492-2
eBook Packages: Springer Book Archive