A high-performance dynamic scheduling for sparse matrix-based applications on heterogeneous CPU–GPU environment

Shokrani Baigi, Ahmad; Savadi, Abdorreza; Naghibzadeh, Mahmoud

doi:10.1007/s11227-024-06394-1

A high-performance dynamic scheduling for sparse matrix-based applications on heterogeneous CPU–GPU environment

Published: 07 August 2024

Volume 80, pages 25071–25098, (2024)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

246 Accesses
Explore all metrics

Abstract

Efficient utilization of processors in heterogeneous CPU–GPU systems is crucial for improving overall application performance by reducing workload completion time. This article introduces a framework designed to achieve maximum performance in scheduling the processing of sparse matrix-based applications within a heterogeneous CPU–GPU system. The framework suggests splitting the matrix into chunks, employing machine learning to find the optimal chunk size for scheduling efficiency, with the number of GPU streams regarded as a critical factor. The scheduling algorithm introduced is inspired by the concept of quartiles in statistics and is designed to operate in real-time, thereby striving to impose minimal overhead on the system. The evaluation of the proposed framework focused on the SpMV (Sparse Matrix–Vector Multiplication) kernel, essential for various applications such as matrix-based graph processing. This evaluation was conducted using a system equipped with an NVIDIA GTX 1070 GPU. Testing on real-world sparse matrices showed that the proposed scheduling algorithm significantly outperforms scenarios with no offloading, full offloading, and the Alternate Assignment method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUs

Article Open access 11 March 2024

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Performance evaluation and analysis of sparse matrix and graph kernels on heterogeneous processors

Article 12 June 2019

References

Hu L, Che X, Zheng S-Q (2016) A closer look at GPGPU. ACM Comput Surv 48(4):1–20
Article Google Scholar
Kato S, Lakshmanan K, Kumar A, Kelkar M, Ishikawa Y, Rajkumar R (2011) RGEM: a responsive GPGPU execution model for runtime engines. In: IEEE 32nd real-time systems symposium
Guzmán MAD, Nozal R, Tejero RG, Villarroya-Gaudó M, Gracia DS, Bosque JL (2019) Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL. J Supercomput 75:1732–1746
Article Google Scholar
Fang J, Huang C, Tang T, Wang Z (2020) Parallel programming models for heterogeneous many-cores: a comprehensive survey. CCF Trans High Perform Comput 4:382–400
Article Google Scholar
Liu X, Zhong Z, Xu K (2015) A hybrid solution method for CFD applications on GPU-accelerated hybrid HPC platforms. Future Gener Comput Syst 56:759–765
Article Google Scholar
Nurvitadhi E, Mishra A, Marr D (2015) A sparse matrix vector multiply accelerator for support vector machine. In: International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), Amsterdam
Roui MB, Shekofteh SK, Noori H, Harati A (2020) Efficient scheduling of streams on GPGPUs. J Supercomput 76:9270–9302
Article Google Scholar
The open standard for parallel programming of heterogeneous. Khronos Group (2009). https://www.khronos.org/opencl/
Busato F, Green O, Bombieri N, Bader DA (2018) Hornet: an efficient data structure for dynamic sparse graphs and matrices on GPUs. In: IEEE High Performance Extreme Computing Conference (HPEC)
Zardoshti P, Khunjush F, Sarbazi-Azad H (2015) Adaptive sparse matrix representation for efficient matrix–vector multiplication. J Supercomput 72:3366–3386
Article Google Scholar
Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, New York
Sedaghati N, Mu T, Pouchet L-N, Parthasarathy S, Sadayappan P (2015) Automatic selection of sparse matrix representation on GPUs. In: Proceedings of the 29th ACM on International Conference on Supercomputing
Langr D, Tvrdík P (2016) Evaluation criteria for sparse matrix storage formats. IEEE Trans Parallel Distribut Syst 27(2):428–440
Article Google Scholar
Filippone S, Cardellini V, Barbieri D, Fanfarillo A (2017) Sparse matrix-vector multiplication on GPGPUs. ACM Trans Math Softw (TOMS) 43(4):1–49
Article MathSciNet Google Scholar
Joseph MD, Greathouse L (2014) Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans
Khorasani F, Vora K, Gupta R, Bhuyan LN (2014) CuSha:vertexcentric graph processing on GPUs. In: Proceedings of the 23rd international symposium on highperformance parallel and distributed computing
Belviranli ME, Bhuyan LN, Gupta R (2013) A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures. ACM Trans Archit Code Optim 9(4):1–20
Article Google Scholar
Geng T, Amaris M, Zuckerman S, Goldman A, Gao GR, Gaudiot J-L (2022) A profile-based AI-assisted dynamic scheduling approach for heterogeneous architectures. Int J Parallel Program 50(4):115–151
Article Google Scholar
Tse AHT, Thomas DB, Tsoi KH, Luk W (2010) Dynamic scheduling Monte-Carlo framework for multi-accelerator heterogeneous clusters. In: International Conference on Field-Programmable Technology
Busato F, Bombieri N (2017) A dynamic approach for workload partitioning on GPU architectures. IEEE Trans Parallel Distribut Syst 28:1535–1549
Article Google Scholar
Wan L, Zheng W, Yuan X (2021) Efficient inter-device task scheduling schemes for multi-device co-processing of data-parallel kernels on heterogeneous systems. IEEE Access 9:59968–59978
Article Google Scholar
Wang Z, Zheng L, Chen Q, Guo M (2014) CPU + GPU scheduling with asymptotic profiling. Parallel Comput 40(2):107–115
Article Google Scholar
Grewe D, O’Boyle MFP (2011) A static task partitioning approach for heterogeneous systems using OpenCL. In: Compiler Construction: 20th International Conference, CC 2011, Held as Part of the Joint European Conferences on Theory and Practice of Software, Saarbrücken, Germany
Yasir Noman Khalid MA, Prodan R, Iqbal MA, Islam MA (2018) E-OSched: a load balancing scheduler for heterogeneous multicores. J Supercomput 74:5399–5431
Article Google Scholar
Wrede F, Ernsting S (2018) Simultaneous CPU–GPU execution of data parallel algorithmic skeletons. Int J Parallel Program 40(1):42–61
Article Google Scholar
Aba MA, Zaourar L, Munier A (2018) Approximation algorithm for scheduling applications on hybrid multi-core machines with communications delays. In: IEEE international parallel and distributed processing symposium workshops (IPDPSW)
Tang X, Fu Z (2020) CPU–GPU utilization aware energy-efficient scheduling algorithm on heterogeneous computing systems. IEEE Access 8:58948–58958
Article Google Scholar
Zhang P, Fang J, Tang T (2018) Auto-tuning streamed applications on intel Xeon Phi. In: IEEE international parallel and distributed processing symposium (IPDPS), Vancouver
Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to Algorithms, 2nd edn. The MIT Press, New York
Google Scholar
Bisseling RH, Knigge TE (2020) An improved exact algorithm and an NP-completeness proof for sparse matrix bipartitioning. Parallel Comput 96:102640
Article MathSciNet Google Scholar
Davis TA, Hu Y (2011) The university of Florida sparse matrix collection. ACM Trans Math Softw 38:1–25
MathSciNet Google Scholar
Yang W, Li K, Li K (2017) A hybrid computing method of SpMV on CPU–GPU heterogeneous computing systems. J Parallel Distrib Comput 104:49–60
Article Google Scholar
Zhang F, Wu B, Zhai J, He B, Chen W (2017) FinePar: irregularity-aware fine-grained workload partitioning on integrated architectures. In: IEEE/ACM international symposium on code generation and optimization (CGO), Austin
SuiteSparse Matrix Collection. http://sparse.tamu.edu/. Accessed 8 3, 2024
Zardoshti P, Khunjush F, Sarbazi-Azad H (2016) Adaptive sparse matrix representation for efficient matrix–vector multiplication. J Supercomput 72:3366–3386
Article Google Scholar
Chen Y, Li K, Yang W, Xiao G, Xie X, Li T (2019) Performance-aware model for sparse matrix-matrix multiplication on the sunway TaihuLight supercomputer. IEEE Trans Parallel Distrib Syst 99:1–1
Google Scholar
Bian H, Huang J, Liu L, Huang D, Wang X (2021) ALBUS: a method for efficiently processing SpMV using SIMD and Load. Future Gener Comput Syst 116:371–392
Article Google Scholar
Bell N, Garland M (2008) Efficient sparse matrix-vector multiplication on CUDA. Nvidia Corporation, New York
Google Scholar
Choi HJ, Son DO, Kang SG, Kim JM, Lee H-H, Kim CH (2013) An efficient scheduling scheme using estimated execution time for heterogeneous computing systems. J Supercomput 65:886–902
Article Google Scholar
https://github.com/computablee/heterogeneous-spmv/tree/main
Zhang F, Liu W, Feng N, Zhai J, Du X (2019) Performance evaluation and analysis of sparse matrix and graph kernels on heterogeneous processors. Trans High Perform Comput 1:131–143
Article Google Scholar

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
Ahmad Shokrani Baigi, Abdorreza Savadi & Mahmoud Naghibzadeh

Authors

Ahmad Shokrani Baigi
View author publications
You can also search for this author inPubMed Google Scholar
Abdorreza Savadi
View author publications
You can also search for this author inPubMed Google Scholar
Mahmoud Naghibzadeh
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

ASB contributed to methodology, algorithms, software, validation, writing—original draft, and editing. AS contributed to methodology, review and editing, and supervision. MN contributed to review and editing

Corresponding author

Correspondence to Abdorreza Savadi.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Availability of data and materials

No datasets were generated or analyzed during the current study.

Code availability

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shokrani Baigi, A., Savadi, A. & Naghibzadeh, M. A high-performance dynamic scheduling for sparse matrix-based applications on heterogeneous CPU–GPU environment. J Supercomput 80, 25071–25098 (2024). https://doi.org/10.1007/s11227-024-06394-1

Download citation

Accepted: 29 July 2024
Published: 07 August 2024
Issue Date: November 2024
DOI: https://doi.org/10.1007/s11227-024-06394-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A high-performance dynamic scheduling for sparse matrix-based applications on heterogeneous CPU–GPU environment

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUs

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Performance evaluation and analysis of sparse matrix and graph kernels on heterogeneous processors

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Availability of data and materials

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

A high-performance dynamic scheduling for sparse matrix-based applications on heterogeneous CPU–GPU environment

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUs

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Performance evaluation and analysis of sparse matrix and graph kernels on heterogeneous processors

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Availability of data and materials

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.