Abstract
We address in this paper the parallelization of a recursive algorithm for large scale triangular matrix inversion based on the ‘Divide and Conquer’ (D&C) paradigm. A set of different versions of an original sequential algorithm are first presented. A theoretical performance study permits to establish an accurate comparison between the designed algorithms. Afterwards, we develop in the second part of the paper, an optimal parallel avoiding-communication algorithm for a given number of available homogeneous and heterogeneous processors. To reach this target, we use a so called ‘non equitable and incomplete’ version of the D&C paradigm consisting in recursively decomposing the original problem into two sub-problems of non equal sizes, then decomposing only one sub-problem in the same previous manner. The theoretical study is validated by a series of experiments achieved on three target platforms, namely an 8-core shared memory machine, a distributed memory cluster and a heterogeneous CPU-GPU cluster. The obtained results permit to illustrate the interest of the contribution.
























Similar content being viewed by others
References
Quarteroni, A., Sacco, R., Saleri, F.: Méthodes Numériques. Algorithmes, Analyse et Applications. Springer, Milano (2007)
Heller, D.: A survey of parallel algorithms in numerical linear algebra. SIAM Rev. 20, 740–777 (1978)
Modi, J.J.: Parallel Algorithms and Matrix Computation. Oxford University Press, Oxford (1988)
JáJá, J.: An Introduction to Parallel Algorithms. Addison-Wesley, Reading (1992)
Schikarski, A., Wagner, D.: Efficient parallel matrix inversion on interconnection networks. J. Parallel Distrib. Comput. 34, 196–201 (1996)
Nasri, W.: Optimal parallelization of a recursive algorithm for triangular matrix inversion on MIMD computers. Doctoral thesis, Faculty of Sciences of Tunis, Tunis (2002)
Nasri, W., Mahjoub, Z.: Design and implementation of a general parallel divide and Conquer algorithm for triangular matrix inversion. Int. J. Parallel Distrib. Syst. Netw. 5(1), 35–42 (2002)
Karlsson, L.: Computing explicit matrix inverses by recursion. MS thesis, Umea University, Department of Computing Science, Sweden (2006)
Li, K.: Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systems. J. Supercomput. http://www.springerlink.com/content/x03424q12666w3t4/fulltext.pdf (2009)
Gengler, M., Ubéda S., Desprez, F.: Initiation au parallélisme: concepts, architectures et algorithmes. Masson, Paris (1996)
Choi, J., Dongarra, J., Ostrouchov, S., Petitet, A., Walker, D., Whaley, R. C.: A proposal for a set of parallel basic linear algebra subprograms. TR CS- pp. 95–292, Computer Science Department, University of Tennesse, Knoxville, TN (1995)
Marrakchi, M.: Conception et analyse d’ordonnancements efficaces pour algorithmes parallèles d’algèbre linéaire. Doctoral thesis, Faculty of Sciences of Tunis (2001)
Ries, F., De Marco, T., Guerrieri, R.: Triangular matrix inversion on heterogeneous multicore systems. IEEE Trans. Parallel Distrib. Syst. 23, 177–184 (2012)
Georganas, E., González-Domínguezy, J., Solomonik, E., Zhengz, Y., Touriñoy, J., Yelick, K.: Communication avoiding and overlapping for numerical linear algebra. SC ’12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (2012)
Donfack, S., Grigori, L., Khabou, A.: Avoiding communication through a Multilevel LU Factorization, Euro-Par 2012 Parallel Processing, pp. 551–562 (2012)
ChronoMath http://serge.mehl.free.fr/anx/equ_deg3.html
Nasri, W., Mahjoub, Z., Trystram, D.: Computing the inverse of a triangular matrix on heterogeneous clusters. In: Algorithms and Tools for Parallel Computing on Heterogeneous Clusters, pp. 67–78 (2007)
Karmarkar, N., Karp, R.M., Luekerand, G.S., Odlyzko, A.M.: Probabilistic analysis of optimum partitioning. J. Appl. Prob. 23, 626–645 (1986)
Khabou, A.: Dense Matrix Computations: Communication Cost and Numerical Stability. Thesis, University Paris-Sud (2013)
Chergui, J.: OpenMP: Parallélisation multitâches pour machines à mémoire partagée. Course, Institut du développement et des ressources en informatique scientifique, France (2006)
OpenMP. http://www.openmp.org
Creel, M., Goffe, W.L.: Multi-Core CPUs, Clusters, and Grid Computing. Kluwer, Dordrecht (2007)
Message Passing Interface Forum. http://www.mpi-forum.org
Plaza, A., Valencia, D., Plaza, J.: An experimental comparison of parallel algorithms for hyperspectral analysis using heterogeneous and homogeneous networks of workstations. Parallel Comput. 34, 92–114 (2008)
Kumar, A., Grama, A., Gupta, A., Karypis, G.: Introduction to Parallel Computing: Design and Analysis of Algorithms. Addison-Wesley, Reading (1994)
Tomov, S., Nath, R., Dongarra, R.: Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing. Parallel Comput. J. 36, 645–654 (2010)
Mahfoudhi, R., Mahjoub, Z., Nasri, W.: Une nouvelle méthode de parallélisation optimale pour l’inversion de matrice triangulaire, RenPar’20 / SympA’14 / CFSE 8. Saint-Malo, France (2011)
Mahfoudhi, R., Mahjoub, Z., Nasri, W.: Parallel communication-free algorithm for triangular matrix inversion on heterogenoues platform. In: Proceedings of the Federated Conference on Computer Science and Information Systems, pp. 553–560, Wroklaw, Poland (2012)
Acknowledgments
We address our deep thanks to Dr. N. Jaïdane for his invaluable help and an anonymous referee for his judicious comments and suggestions
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mahfoudhi, R., Mahjoub, Z. & Nasri, W. Parallel Communication-Avoiding Algorithm for Triangular Matrix Inversion on Homogeneous and Heterogeneous Platforms. Int J Parallel Prog 43, 631–655 (2015). https://doi.org/10.1007/s10766-014-0310-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-014-0310-0