Abstract
In systems with multiple memories, software may need to explicitly copy data from one memory location to another. This copying is required to enable access or to unlock performance, and it is especially important in heterogeneous systems. When the data includes pointers to other data, the copying process has to recursively follow the pointers to perform a deep copy of the entire data structure. It is tedious and error-prone to require users to manually program the deep copy code for each pointer-based data structure used. Instead, a compiler and runtime system can automatically handle deep copies if it can identify pointers in the data, and can determine the size and type of data pointed to by each pointer. This is possible if the language provides reflection capabilities, or uses smart pointers that encapsulate this information, e.g. Fortran pointers that intrinsically include dope vectors to describe the data pointed to. In this paper, we describe our implementation of automatic deep copy in a Fortran compiler targeting a heterogeneous system with GPUs. We measure the runtime overheads of the deep copies, propose techniques to reduce this overhead, and evaluate the efficacy of these techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Intel OpenMP Runtime Library: https://www.openmprtl.org.
- 2.
This excludes the use of OpenMP directives such as target enter data and target exit data.
References
MPI: A Message-Passing Interface Standard. Technical report, Knoxville, TN, USA (1994)
CORAL Benchmark Codes: Single Node UMT Microkernel (2014). https://asc.llnl.gov/CORAL-benchmarks/#umtmk
OpenMP Application Programming Interface, v4.5 (2015). http://openmp.org/wp/openmp-specifications
Bershad, B., Zekauskas, M., Sawdon, W.: The midway distributed shared memory system. In: Compcon Digest of Papers (1993)
Beyer, J., Oehmke, D., Sandoval, J.: Transferring user-defined types in OpenACC. In: Proceedings of Cray User Group (2014)
Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014)
Foley, D.: NVLink, Pascal and Stacked Memory: Feeding the Appetite for Big Data. https://devblogs.nvidia.com/parallelforall/nvlink-pascal-stacked-memory-ifeedng-appetite-big-data
Gregory, K., Miller, A.: C++ AMP: Accelerated Massive Parallelism with Microsoft\(\textregistered \) Visual C++\(\textregistered \). Microsoft Press, Redmond (2012)
Harris, M.: Unified Memory in CUDA 6. https://devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6
HSA Foundation: HSA Runtime Programmer’s Reference Manual, version 1.1 (2016)
Iftode, L., Singh, J.P., Li, K.: Scope consistency: a bridge between release consistency and entry consistency. Theory Comput. Syst. 31(4), 451–473 (1998)
Jablin, T., Jablin, J., Prabhu, P., Liu, F., August, D.: Dynamically managed data for CPU-GPU architectures. In: International Symposium on Code Generation and Optimization (2012)
Jones, R., Hosking, A., Moss, E.: The Garbage Collection Handbook: The Art of Automatic Memory Management. Chapman & Hall/CRC, Boca Raton (2011)
Keleher, P., Cox, A.L., Zwaenepoel, W.: Lazy release consistency for software distributed shared memory. In: International Symposium on Computer Architecture (ISCA) (1992)
NVIDIA Corporation: NVIDIA CUDA C Programming Guide (2010)
NVIDIA Corporation: PGI Accelerator Compilers OpenACC Getting Started Guide (2016)
OpenACC-Standard.org. The OpenACC application programming interface, v 2.5 (2015)
Tian, C., Feng, M., Gupta, R.: Supporting speculative parallelization in the presence of dynamic data structures. In: Programming Language Design and Implementation (2010)
Acknowledgement
This work was supported in part by the United States Department of Energy CORAL program (contract B604142).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Chen, T., Sura, Z., Sung, H. (2017). Automatic Copying of Pointer-Based Data Structures. In: Ding, C., Criswell, J., Wu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2016. Lecture Notes in Computer Science(), vol 10136. Springer, Cham. https://doi.org/10.1007/978-3-319-52709-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-52709-3_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52708-6
Online ISBN: 978-3-319-52709-3
eBook Packages: Computer ScienceComputer Science (R0)