Abstract
Assigning additional processors to a parallel application may slow it down or lead to poor computer utilization. This paper demonstrates that it is possible for an application to automatically choose its own, optimal degree of parallelism. The technique is based on a simple binary search procedure for finding the optimal number of processors, subject to one of the following criteria:
-
maximum speed,
-
maximum benefit-cost ratio, or
-
maintaining an efficiency threshold
The technique has been implemented and evaluated on a Cray T3E with 512 processors using both kernels and real applications from Mathematics, Electrical Engineering, and Geophysics. In all tests, the optimal parallelism is found quickly. The technique can be used to determine the optimal degree of parallelism without manual timing runs. It thus can help shorten application runtime, reduce costs, and lead to better overall utilization of parallel computers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
D. Bailey, J. Barton, T. Lasinski, and H. Simon. The NAS parallel benchmarks. Technical Report 103863, NASA, July 1993.
David E. Culler, Andrea Dusseau, Seth C. Goldstein, Arvind Krishnamurthy, Steven Lumetta, Thorsten von Eicken, and Katherine Yelick. Parallel programming in Split-C. In Proceedings of Supercomputing’ 93, pages 262–273, Los Alamitos, CA, November 1993. IEEE Computer Society Press.
Allen B. Downey. Using Queue Time Predictions for Processor Allocation. In Proceedings of the International Parallel Processing Symposium, pages 35–57, Berlin, April 1997. Springer-Verlag.
Guy Edjlali, Gagan Agrawal, Alan Sussman, Jim Humphries, and Joel Saltz. Compiler and Runtime Support for Programming in Adaptive Environments. Technical Report UMIACS-TR-95-83 and CS-TR-3510, UMIACS and Department of Computer Science, University of Maryland, 1997.
M.S. Squillante et al. An analysis of gang scheduling for multiprogrammed parallel computing environments. In Proceedings of the 8th ACM Symposium on Parallel Algorithms and Architectures, pages 89–98, New York, NY, June 1996. ACM.
John T. Feo. An Analysis of the Computational and Parallel Complexity of the Livermore Loops. Parallel Computing, 7:163–185, February 1988.
Martin Gebhardt. Parallelisierung des 3D-TLM-Algorithmus mittels MPI. Master’s thesis, Department of Electrical Engineering, Karlsruhe University, 1998.
Mary Hall and Margaret Martonosi. Adaptive parallelism in compiler-parallelized code. In Proceedings of the 2nd SUIF Compiler Workshop, Stanford University, August 1997. USC Information Sciences Institute.
Stefan U. Hänssgen. Effiziente parallele AusfÜhrung irregulärer rekursiver Programme. PhD thesis, Department of Informatics, Karlsruhe University, 1998.
Hans-Ulrich Heiss. Prozessorzuteilung in Parallelrechnern. BI-Wissenschaftsverlag, 1994.
Matthias Jacob. Implementing large-scale parallel geophysical algorithms using the Java programming language-a feasibility study. Master’s thesis, Department of Informatics, Karlsruhe University, 1998.
Honghui Lu, Sandhya Dwarkadas, Alan L. Cox, and Willy Zwaenepoel. Message passing versus distributed shared memory on network workstations. In Proceedings of the Supercomputing’95, pages 64–65, New York, NY, December 1995. ACM.
Cathy McCann, Raj Vaswani, and John Zahorjan. A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors. ACM Transactions on Computer Systems, 11(2):146–178, May 1993.
Matthias M. Müller, Thomas M. Warschko, and Walter F. Tichy. Prefetching on the Cray-T3E: A Model and its Evaluation. Technical Report 26/97, Department of Informatics, Karlsruhe University, 1997.
Niels Reimer, Stefan U. Hänssgen, and Walter F. Tichy. Dynamically adapting the degree of parallelism with reflexive programs. In Proceedings of the Third International Workshop on Parallel Algorithms for Irregularly Structured Problems (IRREGULAR), pages 313–318, Santa Barbara, CA, USA, August 1996. Springer LNCS 1117.
Mark S. Squillante. On the Benefits and Limitations of Dynamic Partitioning in Parallel Computer Systems. In Proceedings of the 8th International Parallel Processing Symposium, pages 219–238, Berlin, April 1995. Springer-Verlag.
Andrew Tucker and Anoop Gupta. Process Control and Scheduling Issues for Multiprogrammed Shared-memory Multiprocessors. In Proceedings of the 12th ACM Symposium on Operating Systems Principles, pages 159–166, New York, NY, December 1989. ACM Press.
Fang Wang, Marios Papaefthymiou, and Mark S. Squillante. Performance Evaluation of Gang Scheduling for Parallel and Distributed Multiprogramming. In Proceedings of the 8th International Parallel Processing Symposium, pages 277–298, Berlin, April 1997. Springer-Verlag.
Thomas M. Warschko, Joachim M. Blum, and Walter F. Tichy. Design and evaluation of ParaStation 2. In Proceedings of the International Workshop on Distributed High Performance Computing and Gigabit Wide Area Networks, pages 283–296. Springer LNCS, September 1999.
Otilia Werner-Kytölä. Automatische Einstellung des Parallelitätsgrades von Programmen. PhD thesis, Department of Informatics, Karlsruhe University, 1999.
Steven C. Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder P. Singh, and Anoop Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, New York, NY, June 1995. IEEE Computer Society Press.
Kelvin K. Yue and David J. Lilja. Efficient Execution of Parallel Applications in Multiprogrammed Multiprocessor Systems. Technical Report HPPC-95-05, High-Performance Parallel Computing Research Group, Department of Electrical Engineering, Department of Computer Science, Minneapolis, Minnesota, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Werner-Kytölä, O., Tichy, W.F. (2000). Self-Tuning Parallelism. In: Bubak, M., Afsarmanesh, H., Hertzberger, B., Williams, R. (eds) High Performance Computing and Networking. HPCN-Europe 2000. Lecture Notes in Computer Science, vol 1823. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45492-6_30
Download citation
DOI: https://doi.org/10.1007/3-540-45492-6_30
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67553-2
Online ISBN: 978-3-540-45492-2
eBook Packages: Springer Book Archive