Abstract
Application with a set of dependent distributed tasks is generally regarded as a direct acyclic graph or an out-tree. Tree-shaped task graphs are widely applied in a variety of computational domains, including electronic structure computations and sparse matrix factorization. Efficient algorithms for tree-shaped task partition and allocation can dominate the performance of heterogeneous computing systems, as most relevant publications have pointed out. This paper presents efficient algorithms for partitioning and allocating tree-shaped tasks on heterogeneous multiprocessor systems with limited memory to improve task-parallel computing. The proposed main algorithm consists of two stages: partition and allocation. During partition, an algorithm is provided for partitioning a task tree into multiple subtrees. It iteratively partitions the subtrees on the critical path of the quotient tree. During allocation, two algorithms are proposed for task allocation to minimize the task tree’s execution time. One is to preferentially allocate the largest subtree of the whole tree, and the other is to preferentially allocate the subtree located on the quotient tree’s critical path. Experimental results show that the proposed algorithms significantly improve the latest works in terms of average makespan, both on randomly generated trees and on a real-world dataset. On a real-world dataset, the average makespan of existing work is approximately \(6.28\times 10^8\). However, it is approximately \(2.13\times 10^8\) for our proposed algorithm. This results in a reduction of 64.33%. On randomly generated trees, the average makespan of existing work is approximately 8973. However, it is approximately 4040 for our proposed algorithm. This results in a reduction of 54.96%.


















Similar content being viewed by others
Data availability
The assembly trees dataset are generated by a set of sparse matrices, which can be obtained from the University of Florida Sparse Matrix Collection http://www.cise.ufl.edu/research/sparse/matrices/.
References
Hussain H, Malik SUR, Hameed A, Khan SU, Bickler G, Min-Allah N, Qureshi MB, Zhang L, Yongji W, Ghani N et al (2013) A survey on resource allocation in high performance distributed computing systems. Parall Comput 39(11):709–736
Kelefouras V, Djemame K (2022) Workflow simulation and multi-threading aware task scheduling for heterogeneous computing. J Parall Distrib Comput 168:17–32
Davis TA (2006) Direct methods for sparse linear systems. Society for Industrial and Applied Mathematics, Texas
Kim K, Eijkhout V (2014) A parallel sparse direct solver via hierarchical DAG scheduling. ACM Trans Math Softw 41(1):1–27
Sao P, Li XS, Vuduc R (2018) A communication-avoiding 3D LU factorization algorithm for sparse matrices. In: IEEE International parallel and distributed processing symposium, pp. 908–919
Gou C, Benoit A, Marchal L (2020) Partitioning tree-shaped task graphs for distributed platforms with limited memory. IEEE Trans Parall Distrib Syst 31(7):1533–1544
Ozkaya MY, Benoit A, Ucar B, Herrmann J, Catalyurek UV (2019) A scalable clustering-based task scheduler for homogeneous processors using DAG partitioning. In: IEEE International Parallel and Distributed Processing Symposium, pp. 155–165
Meyerhenke H, Sanders P, Schulz C (2017) Parallel graph partitioning for complex networks. IEEE Trans Parall Distrib Syst 28(9):2625–2638
Zhou AC, Shen B, Xiao Y, Ibrahim S, He B (2019) Cost-aware partitioning for efficient large graph processing in geo-distributed datacenters. IEEE Trans Parall Distrib Syst 31(7):1707–1723
Jacquelin M, Marchal L, Robert Y, Ucar B (2011) On optimal tree traversals for sparse matrix factorization. In: IEEE international parallel & distributed processing symposium, pp. 556–567
Djigal H, Feng J, Lu J, Ge J (2021) IPPTS: an efficient algorithm for scientific workflow scheduling in heterogeneous computing systems. IEEE Trans Parall Distrib Syst 32(5):1057–1071
Zhou N, Qi D, Wang X, Zheng Z, Lin W (2017) A list scheduling algorithm for heterogeneous systems based on a critical node cost table and pessimistic cost table. Concurr Comput Pract Exp 29(5):e3944
Wu C-G, Wang L, Wang J-J (2021) A path relinking enhanced estimation of distribution algorithm for direct acyclic graph task scheduling problem. Knowl Syst 228:1–15
Wang H, Sinnen O (2018) List-scheduling versus cluster-scheduling. IEEE Trans Parall Distrib Syst 29(8):1736–1749
Yoosefi A, Naji HR (2017) A clustering algorithm for communicationaware scheduling of task graphs on multi-core reconfigurable systems. IEEE Trans Parall Distrib Syst 28(10):2718–2732
Sinnen O, To A, Kaur M (2011) Contention-aware scheduling with task duplication. J Parall Distrib Comput 71(1):77–86
He K, Meng X, Pan Z, Yuan L, Zhou P (2018) A novel task-duplication based clustering algorithm for heterogeneous computing environments. IEEE Trans Parall Distrib Syst 30(1):2–14
Ramezani R (2021) Dynamic scheduling of task graphs in multi-fpga systems using critical path. J Supercomput 77(1):597–618
Marchal L, Nagy H, Simon B, Vivien F (2018) Parallel scheduling of DAGs under memory constraints. In: IEEE international parallel and distributed processing symposium, pp. 204–213
Kitagawa Y, Ishigooka T, Azumi T (2018) Dag scheduling algorithm for a cluster-based many-core architecture. In: IEEE International Conference on Embedded And Ubiquitous Computing, pp. 150–157
Geng X, Mao Y, Xiong M, Liu Y (2019) An improved task scheduling algorithm for scientific workflow in cloud computing environment. Cluster Comput 22(3):7539–7548
Tang X, Shi W, Wu F (2019) Interconnection network energy-aware workflow scheduling algorithm on heterogeneous systems. IEEE Trans Indust Inform 16:7637–7645
Guermouche A, Marchal L, Simon B, Vivien F (2015) Scheduling trees of malleable tasks for sparse linear algebra. In: European Conference on Parallel Processing, pp. 479–490
Eyraud-Dubois L, Marchal L, Sinnen O, Vivien F (2015) Parallel scheduling of task trees with limited memory. ACM Trans Parall Comput 2(2):1–37
Rennich SC, Stosic D, Davis TA (2016) Accelerating sparse cholesky factorization on GPUs. Parall Comput 59:140–150
Kayaaslan E, Lambert T, Marchal L, Ucar B (2018) Scheduling series-parallel task graphs to minimize peak memory. Theor Comput Sci 707:1–23
Gou C, Benoit A, Marchal L (2018) Memory-aware tree partitioning on homogeneous platforms. In: Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp. 321–324
Aupy G, Brasseur C, Marchal L (2017) Dynamic memory-aware task-tree scheduling. In: IEEE international parallel and distributed processing symposium, pp. 758–767
Guinand F, Moukrim A, Sanlaville E (2004) Sensitivity analysis of tree scheduling on two machines with communication delays. Parall Comput 30(1):103–120
Bai H, Zhang X, Liu Y, Xie Y (2021) Resource scheduling based on routing tree and detection matrix for internet of things. Int J Distrib Sens Netw 17:1–13
Herrmann J, Marchal L, Robert Y (2014) Memory-aware list scheduling for hybrid platforms. In: IEEE international parallel & distributed processing symposium workshops, pp. 689–698
Bak S, Hernandez O, Gates M, Luszczek P, Sarkar V (2021) Task-graph scheduling extensions for efficient synchronization and communication. In: Proceedings of the Acm International Conference on Supercomputing, pp. 88–101
Herrmann J, Marchal L, Robert Y (2013) Model and complexity results for tree traversals on hybrid platforms. In: EUROPEAN CONFERENCE ON PARALLEL PROCESSING, pp. 647–658
Arabnejad H, Barbosa JG (2014) List scheduling algorithm for heterogeneous systems by an optimistic cost table. IEEE Trans Parall Distrib Syst 25(3):682–694
Taheri G, Khonsari A, Entezari-Maleki R, Sousa L (2020) A hybrid algorithm for task scheduling on heterogeneous multiprocessor embedded systems. Appl Soft Comput 91:1–14
Jeong D, Kim J, Oldja M-L, Ha S (2021) Parallel scheduling of multiple sdf graphs onto heterogeneous processors. IEEE Access 9:20493–20507
Li J, Zheng G, Zhang H, Shi G (2019) Task scheduling algorithm for heterogeneous real-time systems based on deadline constraints. In: IEEE International Conference on Electronics Information And Emergency Communication, pp. 113–116
He S, Wu J, Wei B, Wu J (2021) Task tree partition and subtree allocation for heterogeneous multiprocessors. In: IEEE International Conference on Parallel Distributed Processing With Applications, Big Data Cloud Computing, Sustainable Computing Communications, Social Computing Networking, pp. 571–577
Acknowledgements
Part of the work has been presented in 2021 IEEE International Conference on Parallel & Distributed Processing with Applications, Sept. 30 – Oct. 3, 2021, New York, USA.
Funding
This work was supported in part by the National Natural Science Foundation of China under Grant 62072118 and 62202108. It was also supported in part by Huangpu International Sci & Tech Cooperation Foundation of Guangzhou, China under Grant No. 2021GH12, Guangdong Natural Science Foundation under Grant Nos. 2023A1515011230, 2023A1515030183 and 2021B1515120010.
Author information
Authors and Affiliations
Contributions
S.H., J.W. and B.W. conceived of the presented idea. J.W. encouraged S.H. to investigate the critical path and supervised the findings of this work. S.H. carried out the experiment and wrote the main manuscript text. All authors discussed the results and revised the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
He, S., Wu, J., Wei, B. et al. Algorithms for tree-shaped task partition and allocation on heterogeneous multiprocessors. J Supercomput 79, 13210–13240 (2023). https://doi.org/10.1007/s11227-023-05186-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05186-3