Abstract
GPUs have achieved widespread adoption for High-Performance Computing and Cloud applications. However, the closed-source nature of CUDA has hindered the development of otherwise commonly used virtualization techniques. In this paper, we evaluate the feasibility of building a GPU virtualization layer that isolates the GPU and CPU parts of CUDA applications to achieve better control of the interactions between applications and the CUDA libraries. We present our open-source tool that transparently intercepts CUDA library calls and executes them in a separate process using remote procedure calls. This allows the execution of CUDA applications on machines without a GPU and provides a basis for the development of tools that require fine-grained control of the GPU resources, such as checkpoint/restore and job schedulers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The Top500 list (https://www.top500.org/) from November 2019 that ranks the fastest HPC clusters contains no cluster that uses GPUs from different vendors.
- 2.
The code is available at https://github.com/RWTH-ACS/cricket.
- 3.
As of writing the latest GPU generation and CUDA version are Turing and CUDA 10.2.
References
Baker, Z.K., Gokhale, M.B., Tripp, J.L.: Matched filter computation on FPGA, cell and GPU. In: 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007), pp. 207–218, April 2007. https://doi.org/10.1109/FCCM.2007.52
Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: 2009 IEEE International Symposium on Workload Characterization (IISWC), pp. 44–54. IEEE (2009)
Duato, J., Peña, A.J., Silla, F., Mayo, R., Quintana-Ortí, E.S.: rCUDA: reducing the number of GPU-based accelerators in high performance clusters. In: 2010 International Conference on High Performance Computing Simulation, pp. 224–231, June 2010. https://doi.org/10.1109/HPCS.2010.5547126
Esmaeilzadeh, H., Blem, E., Amant, R.S., Sankaralingam, K., Burger, D.: Dark silicon and the end of multicore scaling. IEEE Micro 32(3), 122–134 (2012). https://doi.org/10.1109/MM.2012.17
Gavrilovska, A., et al.: High-performance hypervisor architectures: virtualization in HPC systems. In: Workshop on System-Level Virtualization for HPC (HPCVirt). Citeseer (2007)
Kutzner, C., Páll, S., Fechner, M., Esztermann, A., de Groot, B.L., Grubmüller, H.: More bang for your buck: improved use of GPU nodes for GROMACS 2018. J. Comput. Chem. 40(27), 2418–2431 (2019). https://doi.org/10.1002/jcc.26011
Laurenzano, M.A., Tikir, M.M., Carrington, L., Snavely, A.: PEBIL: efficient static binary instrumentation for Linux. In: 2010 IEEE International Symposium on Performance Analysis of Systems Software (ISPASS), pp. 175–183 (2010)
Milojičić, D.S., Douglis, F., Paindaveine, Y., Wheeler, R., Zhou, S.: Process migration. ACM Comput. Surv. 32(3), 241–299 (2000). https://doi.org/10.1145/367701.367728
Mirz, M., Vogel, S., Reinke, G., Monti, A.: DPsim–a dynamic phasor real-time simulator for power systems. SoftwareX 10, 100253 (2019). https://doi.org/10.1016/j.softx.2019.100253
Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2007, pp. 89–100. Association for Computing Machinery, New York (2007). https://doi.org/10.1145/1250734.1250746
NVIDIA Corporation: Multi-process service. Technical report. https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf. Accessed 04 May 2020
NVIDIA Corporation: NVIDIA(R) CUDA(TM) architecture. Technical report. http://developer.download.nvidia.com/compute/cuda/docs/CUDA_Architecture_Overview.pdf. Accessed 10 May 2020
Oikawa, M., Kawai, A., Nomura, K., Yasuoka, K., Yoshikawa, K., Narumi, T.: DS-CUDA: a middleware to use many GPUs in the cloud environment. In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 1207–1214 (2012)
Reaño, C., Silla, F.: A performance comparison of CUDA remote GPU virtualization frameworks. In: Proceedings of the 2015 IEEE International Conference on Cluster Computing, CLUSTER 2015, pp. 488–489. IEEE Computer Society (2015). https://doi.org/10.1109/CLUSTER.2015.76
Shi, L., Chen, H., Sun, J., Li, K.: vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Trans. Comput. 61(6), 804–816 (2012)
Silla, F., Prades, J., Iserte, S., Reaño, C.: Remote GPU virtualization: is it useful? In: 2016 2nd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB), pp. 41–48 (2016)
Srinivasan, R.: RPC: remote procedure call protocol specification version 2 (1995)
Villa, O., et al.: Scaling the power wall: a path to exascale. In: SC 2014: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 830–841, November 2014. https://doi.org/10.1109/SC.2014.73
Acknowledgment
This research and development was supported by the German Federal Ministry of Education and Research under Grant 01IH16010C (Project ENVELOPE).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Eiling, N., Lankes, S., Monti, A. (2021). An Open-Source Virtualization Layer for CUDA Applications. In: Balis, B., et al. Euro-Par 2020: Parallel Processing Workshops. Euro-Par 2020. Lecture Notes in Computer Science(), vol 12480. Springer, Cham. https://doi.org/10.1007/978-3-030-71593-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-71593-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71592-2
Online ISBN: 978-3-030-71593-9
eBook Packages: Computer ScienceComputer Science (R0)