Abstract
In this paper, we introduce a novel approach to support concurrent offloading for OpenMP tasks based on hidden helper threads. We contrast our design to alternative implementations and explain why the approach we have chosen provides the most consistent performance across a wide range of use cases. In addition to a theoretical discussion of the trade-offs, we detail our implementation in the LLVM compiler infrastructure. Finally, we provide evaluation results of four extreme offloading situations on the Summit supercomputer, showing that we achieve speedup of up to \(6.7\times \) over synchronous offloading, and provide comparable speedup to the commercial IBM XL C/C++ compiler.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The fallback case, execution on the issuing device, is sufficiently similar.
- 2.
This is CUDA terminology, but almost all heterogeneous programming models have a similar concept, such as the command queue in OpenCL.
References
Antao, S.F., et al.: Offloading support for OpenMP in clang and LLVM. In: The Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC), Salt Lake City, UT, USA, pp. 1–11 (2016)
Oak Ridge Leadership Computing Facility: Summit - oak ridge leadership computing facility. https://www.olcf.ornl.gov/summit/
Group, L.D.: OpenMP support – clang 11 documentation - LLVM. https://clang.llvm.org/docs/OpenMPSupport.html
IBM: OpenMP support in XL C/C++. https://www.ibm.com/support/knowledgecenter/SSXVZZ_16.1.1/com.ibm.xlcpp1611.lelinux.doc/getstart/omp_v1611.html
Jiao, Q., Lu, M., Huynh, H.P., Mitra, T.: Improving GPGPU energy-efficiency through concurrent kernel execution and DVFS. In: IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 1–11. IEEE, San Francisco (2015)
NVIDIA: CUDA C best practices guide. https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html
NVIDIA: Nvidia PTX optimizing assembler. https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html
NVIDIA: Nvidia visual profiler. https://developer.nvidia.com/nvidia-visual-profiler
Project, G.: Offloading support in GCC. https://gcc.gnu.org/wiki/Offloading
Wang, L., Huang, M., El-Ghazawi, T.: Exploiting concurrent kernel execution on graphic processing units. In: International Conference on High Performance Computing & Simulation, pp. 24–32. IEEE, Istanbul, July 2011
Wen, Y., O’Boyle, M.F., Fensch, C.: MaxPair: enhance OpenCL concurrent kernel execution by weighted maximum matching. In: Workshop on General Purpose GPUs, pp. 40–49. ACM, Vienna (2018)
Wende, F., Cordes, F., Steinke, T.: On improving the performance of multi-threaded CUDA applications with concurrent kernel execution by kernel reordering. In: Symposium on Application Accelerators in High Performance Computing, pp. 74–83. IEEE, Chicago (2012)
Acknowledgments
This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering, and early testbed platforms, in support of the nation’s exascale computing imperative.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Tian, S., Doerfert, J., Chapman, B. (2022). Concurrent Execution of Deferred OpenMP Target Tasks with Hidden Helper Threads. In: Chapman, B., Moreira, J. (eds) Languages and Compilers for Parallel Computing. LCPC 2020. Lecture Notes in Computer Science(), vol 13149. Springer, Cham. https://doi.org/10.1007/978-3-030-95953-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-95953-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95952-4
Online ISBN: 978-3-030-95953-1
eBook Packages: Computer ScienceComputer Science (R0)