Abstract
This paper proposes fault tolerance service to satisfy QoS requirement in grid computing. The probability of failure in the grid computing is higher than in a tradition parallel computing. Since the failure of resources affects job execution fatally, fault tolerance service is essential in grid computing. And grid services are often expected to meet some minimum levels of quality of service (QoS) for desirable operation. However Globus toolkit does not provide fault tolerance service that supports fault detection service and management service and satisfies QoS requirement. In order to provide fault tolerance service and satisfy QoS requirements, we expand the definition of failure, such as process failure, processor failure, and network failure. And we propose fault detection service and fault management service and show simulation results.
This work was granted by University Research Program supported by Ministry of Information & Communication in republic of Korea.
Chapter PDF
Similar content being viewed by others
References
I. Foster, C. Kesselman, S. Tuecke, The Anatomy of the Grid: Enabling Scalable Virtual Organizations, International J. Supercomputer Applications, 15(3), 2001.
Ian Foster, Carl Kesselman, The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann Publishers, 1998.
I. Foster, A. Roy, V. Sander, A Quality of Service Architecture that Combines Resource Reservation and Application Adaptation, 8th International Workshop on Quality of Service, 2000.
P. Stelling, I. Foster, C. Kesselman, C. Lee, G. von Laszewski, A Fault Detection Service for Wide Area Distributed Computations, Proc. 7th IEEE Symp. on High Performance Distributed Computing, pp. 268–278, 1998.
A. Waheed, W. Smith, J. George, J. Yan, An Infrastructure for Monitoring and Management in Computational Grids, In Proceedings of the 5th Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers, March, 2000.
Anh Nguyen-Tuong, Integrating Fault-Tolerance Techniques in Grid Applications, Ph.D. Dissertation, August, 2000.
R.J. Allan, D.R.S. Boyd, T. Folkes, C. Greenough, D. Hanlon, R.P. Middleton, R.A. Sansum, Globus and Associated Grid Middleware, Consolidated Evaluation Report from UKHEC Sites, 2001.
K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith, S. Tuecke, A Resource Management Architecture for Metacomputing Systems, Proc. IPPS/SPDP’ 98 Workshop on Job Scheduling Strategies for Parallel Processing, pg. 62–82, 1998.
M. Swany and R. Wolski, Representing Dynamic Performance Information in Grid Environments with the Network Weather Service, 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid2002), Berlin, Germany, May 2002.
I. Foster, C. Kesselman, Globus: A Metacomputing Infrastructure Toolkit, Intl J. Supercomputer Applications, 11(2):115–128, 1997.
I. Foster, C. Kesselman, C. Lee, R. Lindell, K. Nahrstedt, A. Roy, A Distributed Resource Management Architecture that Supports Advance Reservations and Co-Allocation, International Workshop on Quality of Service, 1999.
Big Brother System and Network Monitor, available from http://maclawran.ca/bb-dnld/bb-dnld.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, H.M. et al. (2003). A Fault Tolerance Service for QoS in Grid Computing. In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J.J., Zomaya, A.Y. (eds) Computational Science — ICCS 2003. ICCS 2003. Lecture Notes in Computer Science, vol 2659. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44863-2_29
Download citation
DOI: https://doi.org/10.1007/3-540-44863-2_29
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40196-4
Online ISBN: 978-3-540-44863-1
eBook Packages: Springer Book Archive