Self-Tuning Parallelism

Werner-Kytölä, Otilia; Tichy, Walter F.

doi:10.1007/3-540-45492-6_30

Otilia Werner-Kytölä⁷ &
Walter F. Tichy⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1823))

Included in the following conference series:

International Conference on High-Performance Computing and Networking

399 Accesses
5 Citations

Abstract

Assigning additional processors to a parallel application may slow it down or lead to poor computer utilization. This paper demonstrates that it is possible for an application to automatically choose its own, optimal degree of parallelism. The technique is based on a simple binary search procedure for finding the optimal number of processors, subject to one of the following criteria:

maximum speed,
maximum benefit-cost ratio, or
maintaining an efficiency threshold

The technique has been implemented and evaluated on a Cray T3E with 512 processors using both kernels and real applications from Mathematics, Electrical Engineering, and Geophysics. In all tests, the optimal parallelism is found quickly. The technique can be used to determine the optimal degree of parallelism without manual timing runs. It thus can help shorten application runtime, reduce costs, and lead to better overall utilization of parallel computers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

High-Performance Computing Basics

Parallelisation to Several Tens-of-Thousands of Cores

Speeding Up Processing

References

D. Bailey, J. Barton, T. Lasinski, and H. Simon. The NAS parallel benchmarks. Technical Report 103863, NASA, July 1993.
Google Scholar
David E. Culler, Andrea Dusseau, Seth C. Goldstein, Arvind Krishnamurthy, Steven Lumetta, Thorsten von Eicken, and Katherine Yelick. Parallel programming in Split-C. In Proceedings of Supercomputing’ 93, pages 262–273, Los Alamitos, CA, November 1993. IEEE Computer Society Press.
Google Scholar
Allen B. Downey. Using Queue Time Predictions for Processor Allocation. In Proceedings of the International Parallel Processing Symposium, pages 35–57, Berlin, April 1997. Springer-Verlag.
Google Scholar
Guy Edjlali, Gagan Agrawal, Alan Sussman, Jim Humphries, and Joel Saltz. Compiler and Runtime Support for Programming in Adaptive Environments. Technical Report UMIACS-TR-95-83 and CS-TR-3510, UMIACS and Department of Computer Science, University of Maryland, 1997.
Google Scholar
M.S. Squillante et al. An analysis of gang scheduling for multiprogrammed parallel computing environments. In Proceedings of the 8th ACM Symposium on Parallel Algorithms and Architectures, pages 89–98, New York, NY, June 1996. ACM.
Google Scholar
John T. Feo. An Analysis of the Computational and Parallel Complexity of the Livermore Loops. Parallel Computing, 7:163–185, February 1988.
Article MATH Google Scholar
Martin Gebhardt. Parallelisierung des 3D-TLM-Algorithmus mittels MPI. Master’s thesis, Department of Electrical Engineering, Karlsruhe University, 1998.
Google Scholar
Mary Hall and Margaret Martonosi. Adaptive parallelism in compiler-parallelized code. In Proceedings of the 2nd SUIF Compiler Workshop, Stanford University, August 1997. USC Information Sciences Institute.
Google Scholar
Stefan U. Hänssgen. Effiziente parallele AusfÜhrung irregulärer rekursiver Programme. PhD thesis, Department of Informatics, Karlsruhe University, 1998.
Google Scholar
Hans-Ulrich Heiss. Prozessorzuteilung in Parallelrechnern. BI-Wissenschaftsverlag, 1994.
Google Scholar
Matthias Jacob. Implementing large-scale parallel geophysical algorithms using the Java programming language-a feasibility study. Master’s thesis, Department of Informatics, Karlsruhe University, 1998.
Google Scholar
Honghui Lu, Sandhya Dwarkadas, Alan L. Cox, and Willy Zwaenepoel. Message passing versus distributed shared memory on network workstations. In Proceedings of the Supercomputing’95, pages 64–65, New York, NY, December 1995. ACM.
Google Scholar
Cathy McCann, Raj Vaswani, and John Zahorjan. A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors. ACM Transactions on Computer Systems, 11(2):146–178, May 1993.
Article Google Scholar
Matthias M. Müller, Thomas M. Warschko, and Walter F. Tichy. Prefetching on the Cray-T3E: A Model and its Evaluation. Technical Report 26/97, Department of Informatics, Karlsruhe University, 1997.
Google Scholar
Niels Reimer, Stefan U. Hänssgen, and Walter F. Tichy. Dynamically adapting the degree of parallelism with reflexive programs. In Proceedings of the Third International Workshop on Parallel Algorithms for Irregularly Structured Problems (IRREGULAR), pages 313–318, Santa Barbara, CA, USA, August 1996. Springer LNCS 1117.
Chapter Google Scholar
Mark S. Squillante. On the Benefits and Limitations of Dynamic Partitioning in Parallel Computer Systems. In Proceedings of the 8th International Parallel Processing Symposium, pages 219–238, Berlin, April 1995. Springer-Verlag.
Google Scholar
Andrew Tucker and Anoop Gupta. Process Control and Scheduling Issues for Multiprogrammed Shared-memory Multiprocessors. In Proceedings of the 12th ACM Symposium on Operating Systems Principles, pages 159–166, New York, NY, December 1989. ACM Press.
Google Scholar
Fang Wang, Marios Papaefthymiou, and Mark S. Squillante. Performance Evaluation of Gang Scheduling for Parallel and Distributed Multiprogramming. In Proceedings of the 8th International Parallel Processing Symposium, pages 277–298, Berlin, April 1997. Springer-Verlag.
Google Scholar
Thomas M. Warschko, Joachim M. Blum, and Walter F. Tichy. Design and evaluation of ParaStation 2. In Proceedings of the International Workshop on Distributed High Performance Computing and Gigabit Wide Area Networks, pages 283–296. Springer LNCS, September 1999.
Chapter Google Scholar
Otilia Werner-Kytölä. Automatische Einstellung des Parallelitätsgrades von Programmen. PhD thesis, Department of Informatics, Karlsruhe University, 1999.
Google Scholar
Steven C. Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder P. Singh, and Anoop Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, New York, NY, June 1995. IEEE Computer Society Press.
Google Scholar
Kelvin K. Yue and David J. Lilja. Efficient Execution of Parallel Applications in Multiprogrammed Multiprocessor Systems. Technical Report HPPC-95-05, High-Performance Parallel Computing Research Group, Department of Electrical Engineering, Department of Computer Science, Minneapolis, Minnesota, 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Karlsruhe University, Am Fasanengarten 5, 76128, Karlsruhe, Germany
Otilia Werner-Kytölä & Walter F. Tichy

Authors

Otilia Werner-Kytölä
View author publications
You can also search for this author in PubMed Google Scholar
Walter F. Tichy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science and Academic Computer Center CYFRONET, University of Mining and Metallurgy (AGH), al. Mickiewicza 30, 30-059, Cracow, Poland
Marian Bubak
Faculteit der Natuurwetenschappen, Wiskunde en Informatica, Universiteit van Amsterdam, 1098 SJ, Amsterdam, The Netherlands
Hamideh Afsarmanesh & Bob Hertzberger &
California Institute of Technology, Caltech 158-79, Pasadena, CA, 91125, USA
Roy Williams

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Werner-Kytölä, O., Tichy, W.F. (2000). Self-Tuning Parallelism. In: Bubak, M., Afsarmanesh, H., Hertzberger, B., Williams, R. (eds) High Performance Computing and Networking. HPCN-Europe 2000. Lecture Notes in Computer Science, vol 1823. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45492-6_30

Download citation

DOI: https://doi.org/10.1007/3-540-45492-6_30
Published: 12 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67553-2
Online ISBN: 978-3-540-45492-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Self-Tuning Parallelism

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

High-Performance Computing Basics

Parallelisation to Several Tens-of-Thousands of Cores

Speeding Up Processing

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Self-Tuning Parallelism

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

High-Performance Computing Basics

Parallelisation to Several Tens-of-Thousands of Cores

Speeding Up Processing

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.