0% found this document useful (0 votes)

62 views126 pages

Userguide 5.6.0

This document provides a 3 sentence summary of the MUMPS Users' Guide: The MUMPS Users' Guide describes how to use the MUMPS solver package to solve sparse systems of linear equations, detailing the data structures, parameters, calling sequences and error diagnostics. It explains the main functionalities of MUMPS including preprocessing, factorization, solving, and post-processing features. The guide also provides information on the application programming interface and user interface for MUMPS.

Uploaded by

Mark W

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views126 pages

Userguide 5.6.0

Uploaded by

Mark W

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 126

MUltifrontal Massively Parallel Solver

(MUMPS 5.6.0)
Users’ guide *

April 19, 2023

Abstract
This document describes the Fortran 95 and C user interfaces to MUMPS5.6.0, a software package
to solve sparse systems of linear equations, with many features. For some classes of problems, the
complexity of the factorization and the memory footprint of MUMPS can be reduced thanks to the Block
Low-Rank (BLR) feature.
We describe in detail the data structures, parameters, calling sequences, and error diagnostics. Basic
example programs using MUMPS are also provided. Users are allowed to save to disk MUMPS internal
data before or after any of the main steps of MUMPS (analysis, factorization, solve) and then restart.

* Information on how to obtain updated copies of MUMPS can be obtained from the Web page http://mumps-solver.org/

1
Contents
1 Introduction 5

2 Notes for users of other versions of MUMPS 6

2.1 ChangeLog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Binary compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Upgrading between minor releases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Upgrading from MUMPS 5.5.1 to MUMPS 5.6.0 . . . . . . . . . . . . . . . . . . . 8
2.5 Upgrading from MUMPS 5.4.1 to MUMPS 5.5.0 . . . . . . . . . . . . . . . . . . . 8
2.6 Upgrading from MUMPS 5.3.5 to MUMPS 5.4.0 . . . . . . . . . . . . . . . . . . . 8
2.7 Upgrading from MUMPS 5.2.1 to MUMPS 5.3.5 . . . . . . . . . . . . . . . . . . . 8
2.8 Upgrading from MUMPS 5.1.2 to MUMPS 5.2.1 . . . . . . . . . . . . . . . . . . . 9
2.9 Upgrading from MUMPS 5.0.2 to MUMPS 5.1.2 . . . . . . . . . . . . . . . . . . . 9
2.9.1 Changes on installation issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.9.2 Selective 64-bit integer feature . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.10 Upgrading from MUMPS 4.10.0 to MUMPS 5.0.2 . . . . . . . . . . . . . . . . . . 10
2.10.1 Interface with the Metis and ParMetis orderings . . . . . . . . . . . . . . . . . 10
2.10.2 Interface with the SCOTCH and PT-SCOTCH orderings . . . . . . . . . . . . . 10
2.10.3 ICNTL(10): iterative refinement . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.10.4 ICNTL(11): error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.10.5 ICNTL(20): sparse right-hand sides . . . . . . . . . . . . . . . . . . . . . . . . 11
2.10.6 ICNTL(4): Control of the level of printing . . . . . . . . . . . . . . . . . . . . 12

3 Main functionalities of MUMPS 5.6.0 12

3.1 Input matrix format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Post-processing facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3.1 Iterative refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3.2 Error analysis and statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4 Null pivot row detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5 Computation of a solution of a deficient matrix and of a null space basis . . . . . . . . . 15
3.6 Solving the transposed system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.7 Forward elimination during factorization . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.8 Arithmetic versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.9 Numerical pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.10 The working host processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.11 MPI-free version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.12 Multithreaded parallelism (possibly combined with MPI parallelism) . . . . . . . . . . 17
3.13 Using the BLAS extension GEMMT for symmetric matrix-matrix products . . . . . . . 17
3.14 Out-of-core facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.15 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.16 Computing selected entries of A−1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.17 Reduce/condense a problem on an interface (Schur complement and reduced/condensed
right-hand side) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.18 Block Low-Rank (BLR) multifrontal factorization . . . . . . . . . . . . . . . . . . . . 19
3.18.1 BLR factorization variants and their complexity . . . . . . . . . . . . . . . . . 20
3.18.2 Reducing the memory consumption using BLR . . . . . . . . . . . . . . . . . . 21
3.19 Save / Restore feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 User interface and available routines 21

2
5 Application Program Interface 24
5.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.1.1 Initialization, Analysis, Factorization, Solve, Save, Restore, Termination (JOB) . 24
5.1.2 Special use of JOB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.1.2.1 JOB=9 before solution with distributed right-hand sides: . . . . . . . 26
5.1.2.2 JOB=-4 after factorization or solve phases: . . . . . . . . . . . . . . 26
5.1.3 Version number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.1.4 Control of parallelism (COMM, PAR) . . . . . . . . . . . . . . . . . . . . . . . 26
5.2 Input Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2.1 Matrix type (SYM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2.2 Matrix format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2.2.1 Centralized assembled matrix (ICNTL(5)=0 and ICNTL(18)=0). . 28
5.2.2.2 Distributed assembled matrix (ICNTL(5)=0 and ICNTL(18)=1,2,3). 29
5.2.2.3 Elemental matrix (ICNTL(5)=1 and ICNTL(18)=0). . . . . . . . . 30
5.2.3 Writing the input matrix to a file . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3 Preprocessing: permutation to zero-free diagonal and scaling . . . . . . . . . . . . . . . 32
5.3.1 Permutation to a zero-free diagonal (ICNTL(6)) . . . . . . . . . . . . . . . . 34
5.3.2 Scaling (ICNTL(6) or ICNTL(8)) . . . . . . . . . . . . . . . . . . . . . . . 34
5.4 Preprocessing: symmetric permutations . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.4.1 Symmetric permutation vector (ICNTL(7) and ICNTL(29)) . . . . . . . . . 37
5.4.2 Given ordering (ICNTL(7)=1 and ICNTL(28)=1) . . . . . . . . . . . . . . . 37
5.5 Preprocessing: exploit compression of the input matrix resulting from a block format
(ICNTL(15)̸= 0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.6 Post-processing: iterative refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.7 Post-processing: error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.8 Out-of-core (ICNTL(22)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.9 Workspace parameters (ICNTL(14) and ICNTL(23)) and user workspace . . . . . . 42
5.10 Null pivot row detection (ICNTL(24)) . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.11 Discard matrix factors (ICNTL(31)) . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.12 Computation of the determinant (ICNTL(33)) . . . . . . . . . . . . . . . . . . . . . 47
5.13 Forward elimination during factorization (ICNTL(32)) . . . . . . . . . . . . . . . . . 47
5.14 Right-hand side and solution vectors/matrices . . . . . . . . . . . . . . . . . . . . . . . 48
5.14.1 Dense right-hand side (ICNTL(20)=0) . . . . . . . . . . . . . . . . . . . . . 50
5.14.2 Sparse right-hand side (ICNTL(20)=1,2,3) . . . . . . . . . . . . . . . . . . . 50
5.14.3 Distributed right-hand side (ICNTL(20)=10,11) . . . . . . . . . . . . . . . . 51
5.14.4 A particular case of sparse right-hand side: computing entries of A−1
(ICNTL(30)=1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.14.5 Centralized solution (ICNTL(21)=0) . . . . . . . . . . . . . . . . . . . . . . 53
5.14.6 Distributed solution (ICNTL(21)=1) . . . . . . . . . . . . . . . . . . . . . . 54
5.15 Schur complement with reduced or condensed right-hand side (ICNTL(19) and
ICNTL(26)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.15.1 Centralized Schur complement stored by rows (ICNTL(19)=1) . . . . . . . . 55
5.15.2 Distributed Schur complement (ICNTL(19)=2 or 3) . . . . . . . . . . . . . . 55
5.15.3 Centralized Schur complement stored by columns (ICNTL(19)=2 or 3) . . . . 57
5.15.4 Using partial factorization during solution phase (ICNTL(26)= 0, 1 or 2) . . . 58
5.16 Block Low-Rank (BLR) feature (ICNTL(35) and CNTL(7)) . . . . . . . . . . . . 59
5.16.1 Enabling the BLR functionality at installation . . . . . . . . . . . . . . . . . . 59
5.16.2 BLR API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.16.3 BLR output: statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.17 Save (JOB=7) / Restore (JOB=8) feature . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.17.1 Location and names of the save files . . . . . . . . . . . . . . . . . . . . . . . 64
5.17.2 Deletion of the save files (JOB=-3) . . . . . . . . . . . . . . . . . . . . . . . . 64
5.17.3 Important remarks for the restore feature (JOB=8) . . . . . . . . . . . . . . . . 64
5.17.4 Combining the save/restore feature with out-of-core . . . . . . . . . . . . . . . 65
5.17.5 Combining the save/restore feature with WK USER . . . . . . . . . . . . . . . 66

3
5.17.6 Error management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.18 Setting the number of OpenMP threads (ICNTL(16)) . . . . . . . . . . . . . . . . . . 66
5.19 Compact workarray id%S at the end of factorization phase . . . . . . . . . . . . . . . . 67

6 Control parameters 67
6.1 Integer control parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2 Real/complex control parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.3 Compatibility between options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

7 Information parameters 92
7.1 Information local to each processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.2 Information available on all processors . . . . . . . . . . . . . . . . . . . . . . . . . . 96

8 Error and warning diagnostics 100

9 Calling MUMPS from C 105

9.1 Array indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9.2 Issues related to the C and Fortran communicators . . . . . . . . . . . . . . . . . . . . 107
9.3 Fortran I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.4 Runtime libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.5 Integer, real and complex datatypes in C and Fortran . . . . . . . . . . . . . . . . . . . 108
9.6 Sequential version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

10 Scilab and MATLAB/Octave interfaces 108

11 Examples of use of MUMPS 110

11.1 An assembled problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
11.2 An elemental problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
11.3 An example of calling MUMPS from C . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
11.4 An example of calling MUMPS from fortran using the Save/Restore feature and Out Of Core115
11.5 An example of calling MUMPS from C using the Save/Restore feature . . . . . . . . . . 118

12 License 122

13 Credits 122

4
1 Introduction
MUMPS (“MUltifrontal Massively Parallel Solver”) is a package for solving systems of linear equations of
the form Ax = b, where A is a square sparse matrix that can be either unsymmetric, symmetric positive
definite, or general symmetric, on distributed memory computers. MUMPS implements a direct method
based on a multifrontal approach which performs a Gaussian factorization

A = LU (1)

where L is a lower triangular matrix and U an upper triangular matrix. If the matrix is symmetric then
the factorization
A = LDLT (2)
where D is block diagonal matrix with blocks of order 1 or 2 on the diagonal is performed. We refer the
reader to the papers [8, 9, 12, 27, 28, 33, 43, 31, 32, 17, 1, 48, 14, 44, 4, 37, 50, 18, 38, 47] for full details
of the techniques used, algorithms and related research.
The system Ax = b is solved in three main steps:
1. Analysis.
During analysis, preprocessing (see Subsection 3.2), including an ordering based on the
symmetrized pattern A + AT , and a symbolic factorization are performed. During the symbolic
factorization, a mapping of the multifrontal computational graph, the so called elimination tree
[40], is computed and used to estimate the number of operations and memory necessary for
factorization and solution. Both parallel and sequential implementations of the analysis phase are
available. Let Apre denote the preprocessed matrix (further defined in Subsection 3.2).
2. Factorization.
During factorization Apre = LU or Apre = LDLT , depending on the symmetry of the
preprocessed matrix, is computed. The original matrix is first distributed (or redistributed)
onto the processors depending on the mapping computed during the analysis. The numerical
factorization is then a sequence of dense factorization on so called frontal matrices. In addition to
standard threshold pivoting and two-by-two pivoting (not so standard in distributed memory codes)
there is an option to perform static pivoting. The elimination tree also expresses independency
between tasks and enables multiple fronts to be processed simultaneously. This approach is
called multifrontal approach. After factorization, the factor matrices are kept distributed (in-core
memory or on disk); they will be used at the solution phase.
3. Solution.
The solution xpre of LUxpre = bpre or LDLT xpre = bpre where xpre and bpre are
respectively the transformed solution x and right-hand side b associated to the preprocessed matrix
Apre , is obtained through a forward elimination step

Ly = bpre or LDy = bpre , (3)

followed by a backward elimination step

Uxpre = y or LT xpre = y . (4)

The solution xpre is finally postprocessed to obtain the solution x of the original system Ax = b,
where x is either assembled on an identified processor (the host) or kept distributed on the working
processors. Iterative refinement and backward error analysis are also postprocessing options of the
solution phase.
Each of these 3 phases can be called separately (see Subsection 5.1.1). A special case is the one
where the forward elimination step is performed during factorization (see Subsection 3.7), instead of
during the solve phase. This allows accessing the L factors right after they have been computed, with a
better locality, and can avoid writing the L factors to disk in an out-of-core context. In this case (forward
elimination during factorization), only the backward elimination is performed during the solution phase .
The software is mainly written in Fortran 95 although a C interface is available (see Section 9). Scilab
and MATLAB/Octave interfaces are also available in the case of sequential executions. The parallel

5
version of MUMPS requires MPI [49] for message passing and makes use of the BLAS [22, 23], LAPACK,
BLACS, and ScaLAPACK [21] libraries. The sequential version only relies on BLAS and LAPACK.
MUMPS exploits both parallelism arising from sparsity in the matrix A and from dense factorization
kernels. It distributes the work tasks among the processors, but an identified processor (the host) is
required to perform most of the analysis phase, to distribute the incoming matrix to the other processors
(slaves) in the case where the matrix is centralized, and to collect the solution if it is not kept distributed.
Several instances of MUMPS can be handled simultaneously. MUMPS allows the host processor to
participate to the factorization and solve phases, just like any other processor (see Subsection 3.10).
For both the symmetric and the unsymmetric algorithms used in the code, we have chosen a
fully asynchronous approach with dynamic scheduling of the computational tasks. Asynchronous
communication is used to enable overlapping between communication and computation. Dynamic
scheduling is used to accommodate numerical pivoting in the factorization and to remap work and data
to appropriate processors at execution time. In fact, we combine the main features of static and dynamic
approaches; we use the estimation obtained during the analysis to map some of the main computational
tasks; the other tasks are dynamically scheduled at execution time. The main data structures (the original
matrix and the factors) are similarly partially mapped during the analysis phase.
The main features of the MUMPS package include:
• various arithmetics (real or complex, single or double precision)
• input of the matrix in assembled format (distributed or centralized) or elemental format
• sequential or parallel analysis phase
• use of several built-in ordering algorithms, a tight interface to some external ordering packages
such as PORD [46], SCOTCH [42] or Metis [34] (strongly recommended), and the possibility for
the user to input a given ordering.
• scaling of the original matrix
• out-of-core capability
• save and restore feature
• detection of null pivot rows, basic estimate of rank deficiency and computation of a null space basis
• computation of a Schur complement matrix
• computation of the determinant
• computation of selected entries of the inverse of A
• exploiting sparsity of the right-hand sides
• forward elimination during factorization
• solution of the transposed system
• error analysis
• iterative refinement
• selective 64-bit integer
• exploitation of a Block Low-Rank (BLR) format for factorization and solution
MUMPS is downloaded from the web site many times per day and has been run on very many machines,
compilers and operating systems, although our experience is really only with UNIX-based systems. We
have tested it extensively on many parallel computers. Please visit our webstite for recommendations,
from our users, on how to use the solver on Windows platforms.

2 Notes for users of other versions of MUMPS

Each MUMPS 5.6.0 release incorporates new features and bug fixes with respect to the previous major
release (see a list of changes in Subsection 2.1). We strongly advise to use the latest version.
In this section, we also describe backward compatibility issues and what should be done if you are
using a version of MUMPS anterior to MUMPS 5.6.0 and if you would like to upgrade your application
to use MUMPS 5.6.0.

6
We discuss binary compatibility in Subsection 2.2 and how to upgrade between minor versions in
Subsection 2.3. We then discuss in Subsection 2.9 and in Subsection 2.10 some minor modifications
to control parameters or to interfaces with ordering packages. Such control parameters are normally
backward compatible, but it may happen that their range of possible values or their meaning has been
slightly modified or extended. Please read this section if you are using an anterior version of MUMPS and
want to use MUMPS 5.6.0.
The interface is backward compatible so that a code working with the previous version should also
work with the new one (all codes including MUMPS headers should be recompiled). Still, the following
points can be noted.

2.1 ChangeLog
Changes from 5.5.1 to 5.6.0
* Analysis by blocks and out-of-range entries compatible with parallel analysis
* Compact workarray S before solution phase (ICNTL(49)=1,2)
* JOB=-4 frees data from factorization and keeps results from analysis
* NEC vector engine version: tuned block low rank and GEMMT usage
* Reduced memory for symbolic datastructures (order-N arrays, arrowheads)
* Use new symbolic factorization (column counts) by default (+allow forest in case of Schur
* Improved amalgamation algorithm to reduce the number tiny nodes
* Discard factors option is now compatible with BLR
* Discard L factors option in case of LU factorization available for in-core
* Count null negative pivots (INFOG(50))
* Compilation with -DBLR_MT is no longer needed (see also -DBLR_NOOPENMP)
* Improve performance of A-1 entries computation when #MPI > 1
* Fixed risk of hanging in case of OOC error (e.g. disk full)
* Free internal data earlier (e.g., factors in case of failed factorization, etc.)
* Avoid setenv/unsetenv on MinGW environments (Scotch versions >= 7)
* Fix error during ordering when processing matrices of order 1
* Out-Of-Core, values of ICNTL(22) different from 1 are treated as zero (in-core)
* BLKPTR ignored when ICNTL(15) < 0 (INFO(1:2)=-57 4 no longer occurs)
* BLR OpenMP fac_lr.F fix for possible non-increasing scheduling of loop indices
* Fix: id%LRGROUPS was not nullified in CMUMPS_FREE_ONENTRY_ANA_DRIVER
* Fixed parameter error 1202 in PDGETRS after restoring a factorization
* Workaround MPI_SSEND intel MPI 2021.6 issue
* NEC: Fix MUMPS_WRAP_GINP94 offloading to host
* Shared libraries: minor update in Makefiles
* Fixed missing printings of preprocessing constants in mumps_print_defined.F

2.2 Binary compatibility

In general, successive versions of the MUMPS package do not ensure binary compatibility. This means
that you should recompile the source files of your application that include MUMPS include files when
you switch to a new version of the MUMPS package. However, we normally ensure binary compatibility
between minor releases (e.g., MUMPS x.y.z to MUMPS x.y.z’).

2.3 Upgrading between minor releases

Between minor releases (e.g., MUMPS x.y.z to MUMPS x.y.z’), the number of source files (and
general Makefile) may have changed. However, the interface is fully backward-compatible. Building and
linking the new version of MUMPS with your application should thus be enough to use the latest version.
Furthermore, if you use dynamic MUMPS libraries, you will not need to recompile your code.

7
2.4 Upgrading from MUMPS 5.5.1 to MUMPS 5.6.0
• The -DBLR MT, indicating that the BLAS is compatible with OpenMP is deprecated. MUMPS
now assumes by default that the BLAS library is compatible with OpenMP and that the number
of threads in the BLAS adapts to the number of OpenMP threads available in the current
OpenMP region. If this is not the case, then MUMPS should be compiled with OPTF containing
-DBLR NOOPENMP, indicating that the BLAS should not be called inside OpenMP parallel
regions.
• Values of ICNTL(22) different from 1 (Out-Of-Core activated during factorization) are now
treated as 0 (In-Core factorization), in previous versions they were treated as 1 (Out-Of-Core
activated).
• The feature to discard factors (ICNTL(31)=1) now also discards the low-rank factors in case
the block low-rank feature is activated (ICNTL(35)), leading to a smaller memory usage when
ICNTL(31)=1 and ICNTL(35)=1 or 2. The feature to discard only the L factor (unsymmetric
matrices, ICNTL(31)=2 or ICNTL(32)=1) now also discards the in-core L factor, whereas it
was only effective for out-of-core factorizations in previous versions.
• Concerning the analysis by blocks, ICNTL(15) < 0 is now authorized even if BLKPTR (or
BLKVAR) is allocated (BLKPTR is then not accessed). As a consequence, error -57 with
INFO(2)=4 no longer occurs.

2.5 Upgrading from MUMPS 5.4.1 to MUMPS 5.5.0

• Since the availability of the analysis by blocks feature, the symmetry ratio is no longer computed
in case the analysis by blocks (ICNTL(15)) is activated. See the description of INFOG(8).
• The default value of ICNTL(38) has changed from 333 to 600. When not set by the user, this
can thus make the memory estimates for the block low-rank factorization slighly more pessimistic.
• We advise using SCOTCH version 7.0.1 or later to benefit from a faster analysis phase
(multithreading during SCOTCH).
• During the installation of MUMPS, the construction of the file mumps int def.h no longer needs
a C compiler (this was required by some users for cross-compilation)

2.6 Upgrading from MUMPS 5.3.5 to MUMPS 5.4.0

• In case of null pivot detection (ICNTL(24)̸= 0), CNTL(3) default value √in previous releases
(ϵ × 10−5 × ∥Apre ∥) was too small and has been updated to ϵ × ∥Apre ∥ × Nh

2.7 Upgrading from MUMPS 5.2.1 to MUMPS 5.3.5

• RINFO(5), RINFO(6), RINFOG(15), RINFOG(16) were incorrect in the previous version
and their values have been fixed.
• INFO(30) and INFO(31) on processes with MPI rank > 0 are fixed.
• In case the block low-rank feature (ICNTL(35)>0), when numerical pivoting can be restricted
(CNTL(1) = 0.0), then the use of ICNTL(36)=1 has improved, leading to more compression and
better performance during factorization.
• Concerning the right-hand side format, it can now be provided distributed over the MPI processes,
with either an arbitrary distribution or a distribution suggested by the package. This has been done
thanks to an extension of the ICNTL(20) control parameter. We refer the reader to the description
of ICNTL(20), and to Subsection 5.14.3 for more details on this feature.
• When using the C interface and 64-bit default integers, it is still necessary to compile MUMPS C
files with -DINTSIZE64, but no longer necessary to use this flag to compile your application
including MUMPS headers. The size of MUMPS INT is defined in the MUMPS headers and it is now
possible to test from an application how it was set during MUMPS installation (see the “INSTALL”
file in the package and Subsection 9.5).

8
2.8 Upgrading from MUMPS 5.1.2 to MUMPS 5.2.1
Although MUMPS 5.2.1 contains new features compared to MUMPS 5.1.2 (one of them being the
possibility to store factors in low-rank format, leading to significant memory gains on some classes of
matrices), the installation procedure has not changed and the interface of existing features is almost fully
backward compatible, subject to the remarks below. Still, your code should be recompiled.
A minor modification concerns the combination of the BLR feature (ICNTL(35)) with elemental
input (ICNTL(5)) and with the forward elimination during factorization (ICNTL(32)). In 5.1.x
versions of MUMPS, BLR was automatically switched off in case of elemental input or forward elimination
during factorization where as an error is now raised in such cases (see error codes -43 and -800).
The scope of ICNTL(35) has been extended: ICNTL(35)=2 exploits low-rank factors during
solve, ICNTL(35)=3 is a backward-compatibility option to only exploit low-rank factors during
factorization but perform a full-rank solve, as was doing ICNTL(35)=1 in MUMPS 5.1.2.
ICNTL(35)=1 now does an automatic choice of the “best” BLR option (currently BLR factorization
and BLR solve).
Thanks to a more flexible and improved memory management, the ICNTL(23) parameter to limit
the amount of memory allocation by MUMPS to a given amount is now compatible with the block low-
rank feature (ICNTL(35)). Therefore, the error characterized by INFO(1)=-800 and INFO(2)=23
from previous versions has disappeared. Furthermore, when ICNTL(23) is provided, MUMPS will no
longer try to allocate all the memory authorized in virtual memmory. This is due to the new capacity of
dynamically allocating some working memory when the static workspace is not large enough.
In case a maximum amount of allowed memory ICNTL(23) is provided and MUMPS stops with an
error because ICNTL(23) cannot be respected, a new specific error code -19 is now raised, whereas
previously only the error code -9 was raised (in full-rank). This allows to distinguish between the
two types of errors. The error code -9 may still occur from time to time to indicate that the main
internal workarray allocated at the beginning of the factorization is too small (and as before, the parameter
ICNTL(14) defining the relaxation of the main internal working space should be relaxed). However, this
error should now occur much less often than before, e.g. in case of extreme numerical pivoting difficulties.
This is because this version includes a first step towards a more flexible memory management that allows
part of the working memory to be stored in dynamically allocated blocks when needed.
Because of dynamic allocations, remark that the parameter ICNTL(14) used to relax the main
internal working space does not lead to a bound on the total memory allocated: dynamic data may in
some cases be allocated when this allows to avoid -9 errors and low-rank matrices will also be allocated
dynamically and possibly kept during the solve phase. If the user would like to provide a strict bound on
the total allocated memory, we recommend the use of ICNTL(23).
Related to the low-rank feature, the new control parameter ICNTL(38) was introduced for the user
to provide an estimate compression rate for the factors. It is used by the analysis phase to provide memory
estimates for the block low-rank factorization.
Out-of-range values of ICNTL(12) are now treated as 1 (usual ordering) instead of 0 (automatic
choice).
The range of statistics provided to the users was extended and the lengths of the arrays INFO and
INFOG is now 80 instead of 40.

2.9 Upgrading from MUMPS 5.0.2 to MUMPS 5.1.2

2.9.1 Changes on installation issues
Installation issues are described in the file “INSTALL” and in Makefile.inc examples from the “Make.inc/”
directory.
1. Since MUMPS 5.1.0, the variables IMETIS and ISCOTCH should always be defined in your
Makefile.inc in case (par)Metis or (pt-)SCOTCH are installed (recommended). This is because
some functions relying on Metis and/or SCOTCH now require header files from those ordering
packages whereas this was not the case in MUMPS 5.0.x versions.
2. Since MUMPS 5.1.0, LAPACK is required at installation time, even in the case of the MPI-
free version. This is because the BLR feature uses some LAPACK routines. In MUMPS 5.0.x

9
versions, LAPACK was only necessary as a dependency on ScaLAPACK, in case the ScaLAPACK
library used did not come with all its LAPACK dependencies.

2.9.2 Selective 64-bit integer feature

Since MUMPS 5.1.0, in order to process matrices with a large number of entries and/or with a large
number of right-hand sides, 64-bit integers have been introduced where needed (and only where needed).
With large number we mean a number greater than 231 − 1 (32-bit integer limitation). This has the
following implications.
• Large number of matrix entries (Subsection 5.2.2). There are both 32 and 64 bit integers in the
MUMPS user interface for the number of entries in the matrix:
– centralized/distributed input matrix format: the variables NZ/NZ loc are still 32-bit for
backward compatibility and will become obsolete in some future release;
– centralized/distributed input matrix format: two new variables have been introduced,
NNZ/NNZ loc, which are 64-bit integers (recommended choice).
• External orderings. Either the 32 or 64 bit integer versions of external orderings (Metis/ParMetis,
SCOTCH/pt-SCOTCH, PORD) can be used (see “INSTALL”).
• External libraries. 32-bit integers are still used for BLAS, LAPACK, MPI, ScaLAPACK (MPI
message doesn’t have a count larger than 231 − 1).
• -51 error message. Error -51 which was previously raised in case of integer overflow during analysis
is now only raised when a 32-bit external ordering is invoked on a graph with more than 231 − 1
edges.
• Large number of right-hand sides. Large numbers of right-hand sides (NRHS) can be processed in
a single block (i.e. ICNTL(27)×N is now allowed to be larger than 231 − 1). However, in order to
avoid integer overflows, ICNTL(27) should still be small enough for ICNTL(27)×INFOG(11)
to be smaller than 231 , where INFOG(11) ≤ N is the maximum frontal size.
With the selective 64-bit integer feature, most MUMPS integer arrays will remain 32-bit integers,
since MUMPS data (e.g. matrix input IRN/JCN, IRN loc/JCN loc, internal graph and frontal matrices)
mainly rely on matrix indices. All MPI communication will also rely on 32-bit integers. This can lead
to significant memory and performance gains with respect to a full 64-bit integer MUMPS version. A
full 64-bit integer version can be obtained compiling MUMPS with C preprocessing flag -DINTSIZE64
and Fortran compiler option -i8, -fdefault-integer-8 or something equivalent depending on your compiler,
and compiling all libraries including MPI, BLACS, ScaLAPACK, LAPACK and BLAS also with 64-bit
integers. We refer the reader to the “INSTALL” file provided with the package for details and explanations
of the compilation flags controlling integer sizes.

2.10 Upgrading from MUMPS 4.10.0 to MUMPS 5.0.2

2.10.1 Interface with the Metis and ParMetis orderings
Since the release of MUMPS 4.10.0, the Metis API has changed. MUMPS 5.6.0 now assumes that
Metis ≥ 5.1.0 or ParMetis ≥ 4.0.3 are installed, and that the newer versions of Metis/ParMetis are
backward compatible with Metis 5.1.0/ParMetis 4.0.3.
It is however possible to continue using Metis versions ≤ 4.0.3 by forcing the compilation
flag -Dmetis4, and to continue using ParMetis versions ≤ 3.2.0 by forcing the compilation flag
-Dparmetis3.
Note that Metis 5.0.1/5.0.2/5.0.3 and ParMetis 4.0.1/4.0.2 have never been supported in MUMPS.

2.10.2 Interface with the SCOTCH and PT-SCOTCH orderings

MUMPS 4.10.0 was not compatible with SCOTCH 6.x. MUMPS 5.6.0 is compatible with both
SCOTCH 5.1.x and SCOTCH 6.0.x and we recommend using the latest version of SCOTCH1 .
1 See http://gforge.inria.fr/projects/scotch/

10
Since SCOTCH version 6.0.0, the PT-SCOTCH library does not include the SCOTCH library. So
during the link phase, the SCOTCH library must be provided to MUMPS. It can be easily done by adding
”-lscotch” to the LSCOTCH variable in your Makefile.inc file.
Unfortunately, there is a problem in the SCOTCH 6.0.0 package which is making it unusable with
MUMPS. You should update your version of SCOTCH to 6.0.1 or later.

2.10.3 ICNTL(10): iterative refinement

In both MUMPS 4.10.0 and MUMPS 5.6.0, ICNTL(10) indicates the maximum number of iterative
refinement steps during the solve phase, with a default value of 0 meaning no iterative refinement.
In MUMPS 4.10.0, if the user sets ICNTL(10) to a negative value, then the value was treated as
0, no iterative refinement. This is not the case with MUMPS 5.6.0, where negative values are interpreted
differently (fixed number of iterative refinement steps).
This modification should not affect normal users, since it was not natural to reset ICNTL(10) to
a negative value (instead of the default value of 0) in order to forbid iterative refinement in MUMPS
4.10.0.

2.10.4 ICNTL(11): error analysis

Whereas all positive values of ICNTL(11) were producing the same statistics in MUMPS 4.10.0,
ICNTL(11)=1, ICNTL(11)=2 and ICNTL(11)> 2 now have a different behaviour in MUMPS
5.6.0, as shown in Table 1.

ICNTL(11) MUMPS 4.10.0 MUMPS 5.6.0

value meaning meaning
<0 No error analysis No error analysis
0 No error analysis (default) No error analysis(default)
1 Full statistics Full statistics
2 Full statistics Main statistics (recommended)
>2 Full statistics Defaults to 0, no error analysis

Table 1: Backward compatibility issues between MUMPS 4.10.0 and MUMPS 5.6.0 for ICNTL(11). Full statistics include
condition numbers estimates and forward error estimate, which are very expensive to compute.

The main reason for this change is that, because backward error analysis already provides a good
indication of the quality of the computed solution, most users might not want to compute forward error
analysis and condition numbers estimates in all cases.

2.10.5 ICNTL(20): sparse right-hand sides

The range of values for ICNTL(20) has been extended to better control an internal feature that exploits
sparsity of the right-hand sides during the solution phase. This extension should be transparent in practice,
since the authorized values for ICNTL(20) were 0 (dense right-hand side expected) and 1 (sparse right-
hand sides expected) in MUMPS 4.10.0, which will be correctly interpreted in MUMPS 5.6.0.
One minor difference is in the interpretation of ICNTL(20)=2 or 3 in MUMPS 4.10.0. Whereas
both values (2 and 3) are out-of-range values in MUMPS 4.10.0 and treated as 0 (dense right-hand side
expected), they now also mean in MUMPS 5.6.0 that the right-hand sides should be provided in sparse
form.
Another difference is in the treatment of duplicate entries in the sparse right-hand sides. In MUMPS
4.10.0 the last entry is used whereas in MUMPS 5.6.0 duplicate entries are summed.
We refer the reader to the description of ICNTL(20) for a precise description of sparse right-hand
sides and an explanation of the differences between values 1, 2, and 3.

11
2.10.6 ICNTL(4): Control of the level of printing
By default, some printings that were appearing in MUMPS 4.10.0 no longer appear in MUMPS 5.6.0.
This is because, when ICNTL(4)< 2, some diagnostic messages were printed in MUMPS 4.10.0
whereas they should not have been printed. This change of behaviour should thus be considered as a bug
correction between MUMPS 4.10.0 and the new version, rather than a backward compatibility issue.
In order to have such messages printed as in MUMPS 4.10.0 with the latest version, please set
the value of ICNTL(4) to 2. Please also refer to the detailed descriptions of ICNTL(4), ICNTL(1),
ICNTL(2), and ICNTL(3).

3 Main functionalities of MUMPS 5.6.0

We describe here the main functionalities of the MUMPS solver. These functionalities are controlled by
the components of the arrays mumps par%ICNTL and mumps par%CNTL. The user should refer to
Section 5 and Section 6 for a complete description of the parameters that must be set or that are referred
to in this section. The variables mentioned in this section are components of a structure mumps par of
type MUMPS STRUC (see Section 4). For the sake of clarity and when no confusion is possible, we refer
to them only by their component name. For example, we use ICNTL to refer to mumps par%ICNTL.

3.1 Input matrix format

MUMPS provides several possibilities to input the matrix. The selection is controlled by the parameters
ICNTL(5) and ICNTL(18).
The input matrix can be supplied as assembled in coordinate format either on a single processor or
distributed over the processors. Otherwise, it can be supplied in elemental format, but in this case it
can be input only centrally on the host.
For full details on how these formats are handled by MUMPS, see Subsection 5.2.2.1 and
Subsection 5.2.2.2, respectively for the assembled centralized and assembled distributed formats, and
see Subsection 5.2.2.3 for the elemental format .
By default, the input matrix is assumed to be provided in assembled format and centralised on the
host processor.

3.2 Preprocessing
During the analysis phase, it is possible to preprocess the matrix to make easier/cheaper the numerical
factorization. The package offers a range of symmetric orderings to preserve sparsity, but also other
preprocessing facilities: permuting to zero-free diagonal and prescaling. When all preprocessing options
are activated, the preprocessed matrix Apre that will be effectively factored is :

Apre = P Dr A Qc Dc PT , (5)
where P is a permutation matrix applied symmetrically, Qc is a (column) permutation and Dr and Dc
are diagonal matrices for (respectively row and column) scaling. Note that when the matrix is symmetric,
preprocessing is designed to preserve symmetry.
Preprocessing highly influences the performance (memory and time) of the factorization and solution
steps. The default values correspond to an automatic setting performed by the package which depends on
the ordering packages installed, the type of the matrix (symmetric or unsymmetric), the size of the matrix
and the number of processors available. We thus strongly recommend the user to install all ordering
packages to offer maximum choice to the automatic decision process.
• Symmetric permutation : P
The symmetric permutation can be computed either sequentially, or in parallel. This option is
controlled by ICNTL(28). The sequential computation is controlled by ICNTL(7) whereas the
parallel computation is controlled by ICNTL(29).
− In the case where the symmetric permutation is computed sequentially, an important range of
ordering options is offered including the approximate minimum degree ordering (AMD, [7]), an

12
approximate minimum degree ordering with automatic quasi-dense row detection (QAMD, [3]),
an approximate minimum fill-in ordering (AMF), an ordering where bottom-up strategies are used
to build separators by Jürgen Schulze from University of Paderborn (PORD, [46]), the SCOTCH
package [42] from the University of Bordeaux 1, and the Metis package from Univ. of Minnesota
[34]. A user-supplied permutation can also be provided and the pivot order must be set by the user
on the host in the array PERM IN (see Subsection 5.4.2).
− When the symmetric permutation is computed in parallel, possible orderings are computed by
PT-SCOTCH or ParMetis, which must have been installed by the user.
• Permutations to a zero-free diagonal : Qc
Controlled by ICNTL(6), this permutation is recommended for very unsymmetric matrices to
reduce fill-in and arithmetic cost, see [24, 25]. For symmetric matrices this permutation can
also be used to constrain the symmetric permutation P (see ICNTL(12)). Furthermore, when
numerical values are provided on entry to the analysis phase, ICNTL(6) may also build scaling
vectors during the analysis, that will be either used or discarded depending on the scaling option
ICNTL(8) (see next paragraph).
• Row and column scalings : Dr and Dc
Controlled by ICNTL(8), this preprocessing improves the numerical accuracy and makes all
estimations performed during analysis more reliable. A range of classical scalings are provided
and can be automatically performed within the package either during the analysis phase or at the
beginning of the factorization phase.
Furthermore, preprocessing strategies for symmetric indefinite matrices, as described in [26], can
be applied and also lead to scaling arrays; they are controlled by ICNTL(12).

13
3.3 Post-processing facilities
3.3.1 Iterative refinement
A well known and simple technique to improve the accuracy of the solution of linear systems is the use
of iterative refinement. It consists in refining an initial solution obtained after solution phase as described
in Algorithm 1.

repeat
Solve A∆x = r using the computed factorization
x = x + ∆x
r = b − Ax
The computed backward error ω (see Subsection 3.3.2)
until ω < α

Algorithm 1: Iterative refinement. At each step, backward errors are computed and compared to α, the
stopping criterion.

It has been shown [20] that with only two to three steps of iterative refinement the solution can
often be significantly improved. For this reason, alternatively to Algorithm 1, a simple variant of
iterative refinement can be used with a fixed number of steps and thus without convergence test (see
Subsection 5.6).
The use of iterative refinement can be particularly useful if static pivoting has been used during
factorization (see Subsection 3.9).
In MUMPS, iterative refinement can be optionally performed after the solution step using the parameter
ICNTL(10).

3.3.2 Error analysis and statistics

MUMPS enables the user to perform classical error analysis based on the residuals. We calculate an
estimate of the sparse backward error using the theory and metrics developed in [20]. We use the notation
x̄ for the computed solution and a modulus sign on a vector or a matrix to indicate the vector or matrix
obtained by replacing all entries by their moduli. The scaled residual

|b − Ax̄|i
(6)
(|b| + |A| |x̄|)i

is computed for all equations except those for which the numerator is nonzero and the denominator is
small. For all these exceptional equations,

|b − Ax̄|i
(7)
(|A| |x̄|)i + ∥Ai ∥∞ ∥x̄∥∞

is computed instead, where Ai is row i of A. In [20], the largest scaled residual in Equation (6), is defined
as ω1 and the largest scaled residual in Equation (7) as ω2 . If all equations are in category (1), ω2 is zero.
ω1 and ω2 are the two backward errors.
Then, the computed solution x̄ is the exact solution of the equation
(A + δA)x = (b + δb),
where
|δAij | ≤ max(ω1 , ω2 )|A|ij ,
and |δbi | ≤ max(ω1 |b|i , ω2 ∥Ai ∥∞ ∥x̄∥∞ ). Note that δA respects the sparsity of A in the sense that
δAij is zero for structural zeros in A, i.e., when Aij =0.
Finally, if the user can afford a significantly more costly error analysis, condition numbers cond1 and
cond2 for the linear system (not just the matrix) can also be returned together with an upper bound of the
forward error of the computed solution :

14
∥δx∥∞
≤ ω1 cond1 + ω2 cond2
∥x∥∞
This option is controlled by ICNTL(11).

3.4 Null pivot row detection

MUMPS gives the possibility to detect null pivot rows of a matrix during factorization. This option is
controlled by ICNTL(24) (see Section Subsection 5.10). The number of null pivot rows provides an
estimate of the rank deficiency. The computation of a solution of a deficient matrix and of a null space
basis is then possible using the control parameter ICNTL(25) (see Subsection 3.5).

3.5 Computation of a solution of a deficient matrix and of a null space

basis
Using the ICNTL(25) parameter, MUMPS can compute one of the possible solutions of AX = B if
during factorization the matrix A has been found to be deficient if a zero-pivot detection option was
requested (Subsection 3.4).
The control parameter (ICNTL(25)) can be used also to compute a part or the complete null space
basis (that is AX = 0) if the matrix A has been found deficient during factorization.

3.6 Solving the transposed system

Given a sparse matrix A, the system AX = B or AT X = B can be solved during the solve phase,
where A is a square matrix of order n and X and B are matrices of order n by nrhs. This is controlled
by ICNTL(9). Note that in the case where the forward elimination (see Subsection 3.7) is performed
during the factorization, solving the transposed system is not allowed.

3.7 Forward elimination during factorization

It is possible to perform the forward elimination (Subsection 5.13) of the right-hand sides (Equation (3))
during the factorization.
If the forward elimination is performed during the factorization, some factors may be discarded
(Subsection 5.11) because in this case only the backward substitution (Equation (4)) needs to be
performed during the solution phase. This option makes much sense in an out-of-core context
(Subsection 3.14), where the solution phase involves loading factors from disk during both the forward
(Equation (3)) and backward (Equation (4)) substitutions and can be particularly costly, or when the
factors have to be used only once. Note that in the case where the forward elimination is performed
during the factorization, solving the transposed system is not allowed.

3.8 Arithmetic versions

Several versions of the package MUMPS are available: REAL, DOUBLE PRECISION, COMPLEX, and
DOUBLE COMPLEX.
To compile all or any particular version, please refer to the file named “INSTALL” at the root of the
MUMPS distribution.
This document applies to all four arithmetics. In the following we use the conventions below:
1. the term real is used for REAL or DOUBLE PRECISION,
2. the term complex is used for COMPLEX or DOUBLE COMPLEX,

15
3.9 Numerical pivoting
The goal of pivoting is to ensure a good numerical accuracy during Gaussian elimination. A widely used
technique is known as partial pivoting. Considering an unsymmetric matrix, at step i of the factorization
we first determine k such that |ak,i | = maxl=i:n |al,i |. Rows i and k are swapped in A (and the
permutation information is stored in order to apply it to the right-hand side b) before dividing the column
by the pivot and performing the rank-one update. The advantage of this approach is that it bounds the
growth factor and improves the numerical stability.
Unfortunately, in the case of sparse matrices, numerical pivoting prevents a full static prediction of
the structure of the factors: it dynamically modifies the structure of the factors, thus forcing the use of
dynamic data structures. Numerical pivoting can thus have a significant impact on the fill-in and on the
amount of floating-point operations. To limit the amount of numerical pivoting, and stick better to the
sparsity predictions done during the symbolic factorization, partial pivoting can be relaxed, leading to the
partial threshold pivoting strategy:
In the partial threshold pivoting strategy, a pivot ai,i is accepted if it satisfies:

|ai,i | ≥ u × max |ak,i |, (8)

k=i:n

for a given value of u, 0 ≤ u ≤ 1. This ensures a growth factor limited to 1 + 1/u for the corresponding
step of Gaussian elimination. In practice, one often chooses u = 0.1 or u = 0.01 as a default threshold
and this generally leads to a stable factorization. The threshold u can be set using CNTL(1).
It is possible to perform the pivot search on the row rather than on the column with similar stability.
In the multifrontal method, once a frontal matrix is formed, we cannot choose a pivot outside the
fully-summed block, because the corresponding rows are not fully-summed. Once all possible pivots in
the block of candidate pivots have been eliminated, if no other pivot satisfies the partial pivoting threshold,
some rows and columns remain unfactored in the front. Those are then delayed to the frontal matrix of the
parent, as part of the contribution block (delayed pivots). Note that because of the delayed pivots fill-in
the parent node will occur.
The same type of approach is applied to the symmetric case, but with the constrain that we want to
maintain the symmetry of the frontal matrices.
In order to avoid the complications due to numerical pivoting, perturbation techniques can be applied
(static pivoting): a pivot smaller than a threshold in absolute value is replaced by this threshold. In this
case it is recommended do use iterative refinement (see Subsection 3.3.1) to improve the approximate
solution.
In MUMPS, static pivoting (CNTL(4)) and numerical pivoting (CNTL(1)) are combined at runtime.
A comparison of approaches based on static pivoting with approaches based on numerical pivoting in
the context of high-performance distributed solvers can be found in [13].

3.10 The working host processor

The host processor is the one with rank 0 in the communicator provided to MUMPS. By setting the
variable PAR to 1 (see Subsection 5.1.4), MUMPS allows the host to participate in computations during
the factorization and solve phases, just like any other processor. This allows MUMPS to run on a single
processor and prevents the host processor being idle during the factorization and solve phases (as would
be the case for PAR=0). We thus generally recommend using a working host processor (PAR=1).
The only case where it may be worth using PAR=0 is with a large centralized matrix on a purely
distributed architecture with relatively small local memory: PAR=1 will lead to a memory imbalance
because of the storage related to the initial matrix on the host.

3.11 MPI-free version

It is possible to use MUMPS with a single MPI process by limiting the number of processors to one, or by
passing to the solver a communicator consisting of a single MPI process. However, the link phase still
requires the MPI, BLACS, and ScaLAPACK libraries in this case.
An MPI-free version of MUMPS is also available. For this, a special library is distributed that provides
all external references needed by MUMPS for a sequential environment. MUMPS can thus be used in a

16
simple sequential program, ignoring everything related to MPI. Note that in this case parallel ordering
packages such as ParMetis or PT-SCOTCH must be disabled during installation. Details on how to
build a purely sequential version of MUMPS are available in the INSTALL file available in the MUMPS
distribution.
Remark that for the sequential version, the component PAR must be set to 1 (see Subsection 5.1.4).
Furthermore, the calling program should not make use of MPI: if the calling program is a parallel MPI
code which requires sequential MUMPS, a parallel version of MUMPS must then be installed, to which a
communicator consisting of a single process should be provided. Finally, the MPI-free version can make
use of several cores by relying on multithreading (see Section 3.12).

3.12 Multithreaded parallelism (possibly combined with MPI

parallelism)
MUMPS supports shared memory, multithreaded parallelism mostly through the use of multithreaded
BLAS libraries such MKL, ACML or OpenBLAS. In case MPI is used, multithreading is used within
each MPI process. Parts of the MUMPS code, other than BLAS operations, have been parallelized through
the use of OpenMP directives which can be activated by adding the appropriate OpenMP compiler and
linker option in the OPTF, OPTC and OPTL variables of the MUMPS Makefile.inc file; this option is, for
example, -fopenmp for GNU gfortran and -qopenmp for Intel ifort but equivalent options exist for
other commonly used compilers.
The number of threads within the OpenMP parallel regions of MUMPS should be set through
the OMP NUM THREADS environment variable. Note that, in most cases, the OMP NUM THREADS
environment variable conveniently and efficiently controls both the number of threads in the BLAS library
and in the MUMPS OpenMP parallel regions. Furthermore, it is worth to have an installation allowing
BLAS calls inside OpenMP regions (see Subsection 5.16.1).
Modern high performance parallel computers are commonly made of multiple nodes, each equipped
with multiple processors and cores; in order to make the best out of their performance, we strongly
recommend to use both MPI and multithreading (both parallel BLAS and the MUMPS OpenMP directives).
A typical setting uses one MPI process per socket each with as many threads as the number of cores on
the socket.

3.13 Using the BLAS extension GEMMT for symmetric matrix-matrix

products
MUMPS is able to use the GEMMT extension of the matrix-matrix product to update only the upper
or the lower triangular part of the result matrix. To be compatible with MUMPS, the routine signature
should be the same as the one described here: https://www.intel.com/content/www/
us/en/develop/documentation/onemkl-developer-reference-fortran/top/
blas-and-sparse-blas-routines/blas-like-extensions/gemmt.html.
We strongly recommend to use this ability if your BLAS library enables it. See the ”Using BLAS
extension GEMMT” section of the INSTALL to activate it.

3.14 Out-of-core facility

An out-of-core (disk is used as an extension to main memory) facility is available in both sequential and
parallel environments. This option is controlled by ICNTL(22).
In this version only the factors are written to disk during the factorization phase and will be read each
time a solution phase is requested. Our experience is that on a reasonably small number of processors
this can significantly reduce the memory requirement while not increasing much the factorization time.
The extra cost of the out-of-core feature is thus mainly during the solve phase, where factors have to be
read from disk for both the forward elimination and the backward substitution. To significantly reduce
the cost of the solve phase in an out-of-core context, we advise performing the forward elimination (see
Equation (3)) during the factorization, if this is compatible with your usage of MUMPS (limited amount of
dense right-hand sides, no iterative refinement or error analysis (see Subsection 3.7)).

17
3.15 Determinant
MUMPS has an option to compute the determinant of the input matrix. It is available for symmetric and
unsymmetric matrices for all arithmetics (single, double, real, complex), and for all matrix input formats.
This option is controlled by ICNTL(33).
Let n be the order of the matrix A, if A = LU (unsymmetric matrices), then
n
Y
det(A) = det(L) × det(U ) = Uii
i=1

If A = LDLT (symmetric matrices), then

n
Y
det(A) = Dii
i=1

The sign of the determinant is maintained by keeping track of all internal permutations. Scaling arrays
are taken into account too, in case the matrix is scaled. To avoid overflows and guarantee an accurate
computation, the mantissa and exponent are computed separately and renormalized when needed.
The determinant is computed when requested by the user. If the user is only interested in the
determinant, he/she may tell MUMPS that the factor matrices can be discarded (see Subsection 5.11),
significantly reducing the storage requirements.

3.16 Computing selected entries of A−1

Several applications require the explicit computation of selected entries of the inverse of large sparse
matrices. In most cases, many entries are requested, for example all diagonal entries. To compute column
j of the inverse, the equation Ax = ej can be used, where ej is the jth column of the identity matrix.
One can obtain major savings if the structural zeros of ej are exploited or if only few entries of the jth
column of the inverse are requested [48, 44, 14]. If an L U factorization of A had been computed, a−1 ij ,
the (i, j) entry of A−1 , is obtained by solving successively the two triangular systems:

y = L−1 ej (9)

a−1
ij = (U
−1
y)i (10)
MUMPS provides a functionality, controlled by ICNTL(30), to compute a set of entries of A−1 ,
while avoiding most of the computations on explicit zeros in Equations (9) and (10). The list of entries of
A−1 to be computed and the memory for those entries should be provided as a sparse right-hand side (see
Subsection 5.14.2). In a parallel environment it is not so natural to combine parallelism and exploiting
sparsity. Recent work based on [44, 15] to exploit parallelism is provided in this release.

3.17 Reduce/condense a problem on an interface (Schur complement and

reduced/condensed right-hand side)
MUMPS provides the possibility to perform a partial factorization of the complete input matrix and to
return the corresponding Schur matrix, that is the part of the matrix that has been updated but that is still
to be factorized. This option is controlled by ICNTL(19).
Let us consider a partitioned matrix (here with an unsymmetric matrix) where the variables of A2,2
correspond to the Schur variables and on which a partial factorization has been performed. In the
following, and only for the sake of clearness, we have ordered all the variables belonging to the Schur
last.

A1,1 A1,2 L1,1 0 U1,1 U1,2
A= = (11)
A2,1 A2,2 L2,1 I 0 S

Thus the Schur complement, as returned by MUMPS, is such that S = A2,2 − A2,1 A−1
1,1 A1,2 .

18
The user must specify on entry to the analysis phase the list of indices of the Schur matrix,
corresponding to the variables of A2,2 . MUMPS returns to the user, on exit of the factorization phase, the
Schur complement matrix S, as a full matrix but with different type of distribution (see Subsection 5.15
for more details.)
This partial factorization can be used to solve Ax = b in different ways using the ICNTL(26)
parameter. It can be used to solve the linear system associated with the “interior” variables or to handle a
reduced/condensed right-hand-side as described in the following discussion.
• Compute a partial solution (ICNTL(26) = 0):
The solve is performed on the internal problem:

A1,1 x1 = b1 .

Entries in the right-hand side corresponding to indices from the Schur matrix need not be set on
entry and they are explicitly set to zero on output.
• Solve the complete system in three steps:

L1,1 0 U1,1 U1,2 x1 b1
= (12)
L2,1 I 0 S x2 b2

First solving:

L1,1 0 y1 b1
= (13)
L2,1 I y2 b2

And thereafter:

U1,1 U1,2 x1 y1
= (14)
0 S x2 y2

1. Reduction/condensation phase (ICNTL(26) = 1) :

The Equation (13) can be solved setting ICNTL(26)=1, and the intermediate y vector will
be computed, but only the y2 vector is returned to the user. Note that y2 is often referred to as
the reduced/condensed right-hand-side.
2. Using the Schur matrix :
Thenafter the solution Sx2 = y2 has to be computed. It is the responsibility of the user to
compute x2 using the Schur matrix S given on output of the factorization phase.
3. Expansion phase (ICNTL(26) = 2) :
Given x2 and y1 , it is possible to compute x1 , such that U11 x1 = y1 − U12 x2 , using the
option ICNTL(26)=2 . Note that the package uses y1 computed (and stored in the main
MUMPS structure) during the first step (ICNTL(26)=1) and provides the complete solution x
on output.
Note that the Schur complement could be considered as an element contribution to the interface block
in a domain decomposition approach. MUMPS could then be used to solve this interface problem using
the element entry functionality.

3.18 Block Low-Rank (BLR) multifrontal factorization

A low-rank feature that allows decreasing the time and memory complexity of sparse direct solvers on
some problems arising from partial differential equations is provided.
Matrices coming from elliptic Partial Differential Equations (PDEs) have been shown to have a low-
rank property: well defined off-diagonal blocks of their Schur complements can be approximated by
low-rank products. Given a suitable ordering of the matrix which gives to the blocks a geometrical
meaning, such approximations can be computed using for example a truncated QR factorization. The
resulting representation offers a substantial reduction of the memory requirement and gives efficient ways

19
to perform many of the basic dense algebra operations. Several strategies have been proposed in the
literature to exploit this property.
We have designed a so-called Block Low-Rank (BLR) format (see [4, 5]) to reduce the memory
footprint and the computational complexity of our multifrontal solver. The compression of BLR blocks
is based on a truncated QR factorization with column pivoting.
At each node of the multifrontal computational graph, a partial factorization (part corresponding
to the fully summed variables, see Figure 1(a)) of a dense frontal matrix is performed. The so-called
contribution block (CB in the figure) will correspond to the local Schur matrix built at the end of the
partial factorization of the front. As illustrated in Figure 1, the BLR factorization is performed by
panels of size the size of a BLR block. On the left-hand side (a) of the figure, a standard full-rank
(FR) factorization of the panel is first performed leading to dense FR L and U factor blocks (colored
in (a)). The block structure follows the flat BLR structure of the front. A truncated QR factorization,
controlled by a numerical threshold (referred to as ε in Subsection 5.16), is then performed on each off-
diagonal block (as indicated in Figure 1(b)). Blocks in low-rank form are then used to update the trailing
submatrix of the front (grey in (b)) at a lower cost than a full-rank standard update.
BLR block size

U U U U U U U U U U

Fully Summed Var.

L L
L L U U U U L L U U U U
L L L L

L L L L
CB CB
L L L L
(a) Full-Rank (FR) panel factorization (b) Low-rank panel compression and update

Figure 1: Block Low-Rank factorization of one panel of a the partial factorization of a frontal matrix.

This functionality is controlled by a numerical parameter, called the low-rank threshold and noted
ε, defining the accuracy of the low-rank approximations (see Subsection 5.16). We have observed in
practice that the backward error of the final solution is closely related to this numerical parameter. With
this parameter, MUMPS can be used to provide either a direct solution at an accuracy controlled by the
low-rank threshold or an approximate factorization that can be used as a preconditioner (see [41]).

3.18.1 BLR factorization variants and their complexity

The theoretical analysis in [5] proves that the BLR approach can reduce the asymptotic complexity of the
multifrontal factorization. The work presented in [5, 41] also introduces novel variants that can further
reduce the complexity of the BLR factorization by using different strategies to perform the updates in the
frontal matrices and by compressing at an earlier stage of the factorization.
The default choice of BLR variant in MUMPS is the left-looking standard UFSC factorization
(described in [41]). It relies on the accumulation of the low-rank updates (so-called Low-rank Updates
Accumulation, or LUA) to improve the performance of the factorization by increasing the granularity and
efficiency (Gflops/s) of the operations, as analyzed in [6].
The UCFS variant, described in [41], consists in performing the compression earlier to achieve
a higher reduction of the number of operations. It is not used by default in the current version of
MUMPS but is activable (see Subsection 5.16). While its increased compression rate may slightly degrade
the backward error, the currently implemented variant remains able to handle numerical pivoting and
therefore insures numerical stability.

20
3.18.2 Reducing the memory consumption using BLR
The LU factors are compressed during the factorization. To reduce the memory footprint, factors are
stored in low-rank form (see Subsection 5.16) . The solution phase is then performed using the low-rank
factors and may thus also be accelerated. This is the automatic choice when BLR is activated.
It is also possible to keep the full-rank factors and discard the low-rank factors that have only thus be
used to accelerate the factorization. The standard, full-rank solution phase is performed and no memory
gains can be expected in this case.
The memory consumption can be further reduced by also compressing intermediate working space,
the so-called contribution blocks (CB) of the frontal matrices, as described in [41]. However, contrarily
to the LU factors compression, the CB compression does not contribute to reducing the global number
of operations and therefore represents an overhead cost. For this reason, in the current version
of MUMPS when BLR is activated the CB compression is not activated by default but is activable
(see Subsection 5.16). Note that in a parallel context, CB compression can also reduce the volume of
communications.
The memory footprint can be further reduced using a preliminary version of mixed precision Block
Low-Rank feature. Mixed precision is used to reduce storage of the matrices of factors and possibly the
contribution blocks when stored with BLR representation. This work (see [2]), supported by EDF R&D,
is performed in the context of the PhD thesis of Matthieu Gerest (LIP6-Sorbonne University and EDF
R&D).

3.19 Save / Restore feature

This version of MUMPS allows users to save MUMPS internal data before or after any of the main steps
of MUMPS (analysis, factorization, solve, see Subsection 5.1.1) and then restart, possibly from another
process (or set of MPI processes, in case of MPI), by restoring MUMPS internal data from disk and be at
the same state as at the moment the data was saved to disk. After saving the data to disk, it is possible
to continue working with the existing instance or terminate it. When the data are not needed anymore,
MUMPS can be called to delete the data saved to disk. Examples of use are as follows:
• save all MUMPS internal data after factorization, terminate the instance, stop the process(es); later,
start new process(es), initialize a new instance, and restore internal data from disk in order to
continue from the point where data was saved to disk.
• similarly, save all MUMPS internal data after analysis; terminate the instance, stop (or not) the MPI
processes; later, initialize a new instance, restore internal data and continue with the factorization
phase.
More details on the save and restore feature (JOB=7 and 8) and on the files deletion (JOB=-3) are
provided in Subsection 5.1.1 and Subsection 5.17.

4 User interface and available routines

In the Fortran 95 interface (see Section 9 for the C interface), there is a single user callable subroutine
per arithmetic, called [SDCZ]MUMPS, that has a single parameter mumps par of Fortran 95 derived
datatype [SDCZ]MUMPS STRUC defined in [sdcz]mumps struc.h.2
The interface is the same for the MPI-free version (see Subsection 3.11), only the compilation process
and libraries need be changed. In the case of the parallel version, MPI must be initialized by the user
before the first call to [SDCZ]MUMPS is made.
The calling sequence for the DOUBLE PRECISION version may look as follows:
2 We use the notation [SDCZ]MUMPS to refer to DMUMPS, SMUMPS, ZMUMPS or CMUMPS, corresponding to the REAL, DOUBLE

PRECISION, COMPLEX and DOUBLE COMPLEX versions, respectively. Similarly [SDCZ]MUMPS STRUC refers to either
SMUMPS STRUC, DMUMPS STRUC, CMUMPS STRUC, or ZMUMPS STRUC, and [sdcz]mumps struc.h to smumps struc.h,
dmumps struc.h, cmumps struc.h or zmumps struc.h.

21
INCLUDE ’ m p i f . h ’
INCLUDE ’ d m u m p s s t r u c . h ’
...
INTEGER : : IERR
TYPE (DMUMPS STRUC) : : mumps par
...
CALL MPI INIT ( IERR )
...
mumps par%JOB = . . . ! S e t some a r g u m e n t s t o t h e p a c k a g e : t h o s e
mumps par%ICNTL ( 3 ) = 6 ! a r e c o m p o n e n t s o f t h e mumps par s t r u c t u r e
...
CALL DMUMPS( mumps par )
...
CALL MPI FINALIZE ( IERR )

For other arithmetics, dmumps struc.h should be replaced by smumps struc.h,

cmumps struc.h, or zmumps struc.h, and the ’D’ in DMUMPS and DMUMPS STRUC by
’S’, ’C’ or ’Z’.
The variable mumps par of datatype [SDCZ]MUMPS STRUC holds all the data for the problem. It
has many components, only some of which are of interest to the user. The other components are internal
to the package. Some of the components must only be defined on the host. Others must be defined
on all processors. The file [sdcz]mumps struc.h defines the derived datatype and must always be
included in the program that calls MUMPS.
The interface to MUMPS consists in calling the subroutine [SDCZ]MUMPS with the appropriate
parameters set in mumps par.
Components of the structure [SDCZ]MUMPS STRUC that are of interest to the user are shown in
Figure 2.
In the following, real/complex qualifies parameters that are real in the real version and complex in the
complex version, whereas real is used for parameters that are always real, even in the complex version of
MUMPS.

22
INCLUDE ’ [ s d c z ] mumps root . h ’
TYPE [ SDCZ ]MUMPS STRUC
SEQUENCE
C INPUT PARAMETERS
C ----------------
C Problem definition
C ------------------
C Solver (SYM=0 Unsymmetric, SYM=1 Sym. Positive Definite, SYM=2 General Symmetric)
C Type of parallelism (PAR=1 host working, PAR=0 host not working)
INTEGER SYM, PAR , JOB
C Control parameters
C ------------------
INTEGER ICNTL ( 6 0 )
r e a l CNTL ( 1 5 )
INTEGER N ! Order o f i n p u t m a t r i x
C Assembled input matrix : User interface
C ----------------------------------------
INTEGER : : NZ ! S t a n d a r d i n t e g e r i n p u t + bwd . compat .
INTEGER ( 8 ) : : NNZ ! 64− b i t i n t e g e r i n p u t
r e a l / complex , DIMENSION ( : ) , POINTER : : A
INTEGER, DIMENSION ( : ) , POINTER : : IRN , JCN
C Case of distributed matrix entry
C --------------------------------
INTEGER : : NZ loc ! S t a n d a r d i n t e g e r i n p u t + bwd . compat .
INTEGER ( 8 ) : : NNZ loc ! 64− b i t i n t e g e r i n p u t
INTEGER, DIMENSION ( : ) , POINTER : : IR N l oc , J C N l o c

r e a l / complex , DIMENSION ( : ) , POINTER : : A l o c

C Unassembled input matrix: User interface
C ----------------------------------------
INTEGER NELT
INTEGER, DIMENSION ( : ) , POINTER : : ELTPTR , ELTVAR
r e a l / complex , DIMENSION ( : ) , POINTER : : A ELT
C MPI Communicator and identifier
C -------------------------------
INTEGER COMM, MYID
C Ordering and scaling, if given by user (optional)
C -------------------------------------------------
INTEGER, DIMENSION ( : ) , POINTER : : PERM IN
r e a l , DIMENSION ( : ) , POINTER : : COLSCA, ROWSCA
C INPUT/OUTPUT data : right-hand side and solution
C -----------------
r e a l / complex , DIMENSION ( : ) , POINTER : : RHS , REDRHS
r e a l / complex , DIMENSION ( : ) , POINTER : : RHS SPARSE
INTEGER, DIMENSION ( : ) , POINTER : : IRHS SPARSE , IRHS PTR
INTEGER NRHS, LRHS , NZ RHS , LSOL loc , LRHS loc , Nloc RHS , LREDRHS
r e a l / complex , DIMENSION ( : ) , POINTER : : SOL loc , RHS loc
INTEGER, DIMENSION ( : ) , POINTER : : I SO L l oc , I R H S l o c
C Metis options, possibly modified by user
INTEGER : : METIS OPTIONS ( 4 0 )
C OUTPUT data and Statistics
C --------------------------
INTEGER, DIMENSION ( : ) , POINTER : : SYM PERM, UNS PERM
INTEGER INFO ( 8 0 )
INTEGER INFOG ( 8 0 ) ! G l o b a l i n f o r m a t i o n ( h o s t o n l y )
r e a l RINFO ( 4 0 )
r e a l RINFOG ( 4 0 ) ! G l o b a l i n f o r m a t i o n ( h o s t o n l y )
C Schur
INTEGER SIZE SCHUR , NPROW, NPCOL, MBLOCK, NBLOCK
INTEGER SCHUR MLOC, SCHUR NLOC , SCHUR LLD
INTEGER, DIMENSION ( : ) , POINTER : : LISTVAR SCHUR
r e a l / complex , DIMENSION ( : ) , POINTER : : SCHUR
C Mapping if provided by MUMPS
INTEGER, DIMENSION ( : ) , POINTER : : MAPPING
C Null pivots
INTEGER, DIMENSION ( : ) , POINTER : : PIVNUL LIST
C Version number
CHARACTER(LEN= 4 6 ) VERSION NUMBER
C Name of file to dump a matrix/rhs to disk
CHARACTER(LEN= 2 5 5 ) WRITE PROBLEM
C Out-of-core}
CHARACTER(LEN= 6 3 ) : : OOC PREFIX
CHARACTER(LEN= 2 5 5 ) : : OOC TMPDIR
END TYPE [ SDCZ ]MUMPS STRUC

Figure 2: Main components of the structure [SDCZ]MUMPS STRUC defined in

[sdcz]mumps struc.h.
23
5 Application Program Interface
In this section, we describe the main components of the variable mumps par of datatype
[SDCZ]MUMPS STRUC (see Section 4), that must be set by the user, or that are returned to the user.
The parameters controlling the activation of the main functionalities of the package are provided on input
in the arrays ICNTL and CNTL. In each section, we describe in detail the entry of the ICNTL and CNTL
related to the section. The entire list is given in Section 6. Statistics and various information parameters on
output to the solver are then described in Section 7. Information parameters returned by the package might
correspond to values local to each processor (mumps par%RINFO and mumps par%INFO) or might be
global information and available on all processors (mumps par%RINFOG and mumps par%INFOG).
For the sake of clarity (and when no confusion is possible), we refer to components of the structure
only by their component name; for example, we use ICNTL to refer to mumps par%ICNTL.

5.1 General
5.1.1 Initialization, Analysis, Factorization, Solve, Save, Restore, Termination (JOB)
JOB (integer) must be initialized by the user on all processors before a call to MUMPS. It controls the
main actions taken by MUMPS. It is not altered by MUMPS. Possible values of JOB are:
–1 initializes an instance of the package. A call with JOB = –1 must be performed before any
other call to the package on the same instance. It sets default values for other components of
[SDCZ]MUMPS STRUC (such as ICNTL), which may then be altered before subsequent calls
to MUMPS. Note that three components of the structure must always be set by the user (on all
processors) before a call with JOB = –1. These are
• COMM,
• SYM, and
• PAR.
Note that if the user wants to modify one of those three components then he/she must terminate the
instance (call with JOB = –2) then reinitialize the instance (call with JOB = –1).
Furthermore, after a call with JOB = –1, the internal component MYID contains the rank of the
calling processor in the communicator provided to MUMPS. Thus, the test “(MYID == 0)” may be
used to identify the host processor (see Subsection 3.10).
Finally, the version number is returned in VERSION NUMBER (see Subsection 5.1.3).
–2 terminates an instance of the package. All data structures associated in mumps par with the instance,
except those provided by the user, are deallocated. It should be called by the user only when no
further calls to MUMPS with this instance are required. In order to avoid memory leaks, it must also
be called before a further JOB = –1 call with the same argument mumps par. When out-of-core
was used (ICNTL(22)=1), the out-of-core factor files are deleted except if the save/restore feature
(Subsection 5.17) has been used and the files are associated to a saved instance. In this case, the
out-of-cores files are kept because they will be exploited to restore the saved instance. See also
Subsection 5.17.
–3 save / restore feature: removes data saved to disk. The files that were used to save an instance of
MUMPS are deleted. It should be called by the user when no further restore of MUMPS with the files
is required. See also Subsection 5.17.4.
1 performs the analysis. In this phase, MUMPS chooses pivots from the diagonal using a selection criterion
to preserve sparsity, using the pattern of the matrix A input by the user. Several formats and
distributions onto the processors are available to input matrix (see Subsection 5.2.2). It subsequently
constructs subsidiary information for the numerical factorization (a JOB=2 call).
An option exists for the user to input the pivotal sequence (ICNTL(7)=1, see Subsection 5.4) in
which case only the necessary information for a JOB=2 call will be generated.
If a preprocessing based on the numerical values is requested (see Subsection 5.3 and ICNTL(6)),
then the numerical values of the original matrix A must also be provided by the user during the
analysis phase, and scaling vectors are optionally computed.

24
Note that a call to MUMPS with JOB=1 must be preceded by a call with JOB = –1 on the same
instance.
2 performs the factorization. It uses the numerical values of the matrix A provided by the user (see
Subsection 5.2.2) and the information from the analysis phase (JOB=1) to factorize the matrix A.
The actual pivot sequence used during the factorization may slightly differ from the sequence
returned by the analysis (see Subsection 5.4.1) if the matrix A is not diagonally dominant.
An option exists for the user to input scaling vectors or let MUMPS compute automatically such
vectors (see Subsection 5.3.2 and ICNTL(8)) just before the numerical factorization.
A call to MUMPS with JOB=2 must be preceded by a call with JOB=1 on the same instance.
3 computes the solution. It uses the right-hand side(s) B provided by the user and the factors generated
by the factorization (JOB=2) to solve a system of equations AX = B or AT X = B. The pattern
and values of the matrix should be passed unchanged since the last call to the factorization phase
(see JOB=2). Several possibilities are given to input the right-hand side matrix B and to ouput the
solution matrix X. The structure component mumps par%RHS must be set by the user (on the
host only) before a call with JOB=3 (see Subsection 5.14). This solution phase can also be used to
compute the null space basis of singular matrices (see ICNTL(25)), provided that null pivot row
detection (ICNTL(24)) was on and that the deficiency obtained INFOG(28) was different from
0.
A call to MUMPS with JOB=3 must be preceded by a call with JOB=2 (or JOB=4) on the same
instance.
4 combines the actions of JOB=1 with those of JOB=2. It must be preceded by a call to MUMPS with
JOB = –1 on the same instance.
5 combines the actions of JOB=2 and JOB=3. It must be preceded by a call to MUMPS with JOB=1 on
the same instance.
6 combines the actions of calls with JOB=1, 2, and 3. It must be preceded by a call to MUMPS with JOB
= –1 on the same instance.
7 saves MUMPS internal data to disk. It must be preceded by a call to MUMPS with JOB=-1 on the same
instance. A call to MUMPS with JOB=7 should be followed, at some point, by a call to MUMPS
with JOB=-2 in order to free MUMPS internal data. The calling processes may then be stopped.
It is possible to delete the files of a saved instance using a call to MUMPS with JOB=-3. See
Subsection 5.17 for more details, in particular how to specify where to store the files of the saved
instance.
8 restores MUMPS internal data from disk. It must be preceded by a call to MUMPS with JOB=-1 on the
same instance. The values of PAR, SYM and COMM should be compatible with the values used at the
moment the data have been written down to disk (JOB=7). See Subsection 5.17 for more details, in
particular where to find the files containing the saved instance.
-200 (experimental, subject to change in the future) suppresses MUMPS out-of-core factor files associated
to the calling MPI process and returns. This feature was designed in case out-of-core was used
(ICNTL(22)=1) and a serious crash occurred, otherwise it should not be used. For example,
if MPI crashes with an error that cannot be recovered, or if some processes encounter some
unrecoverable error that will lead to the termination of the processes, an error handler may call
MUMPS with JOB=-200 in order to free the disk space associated with the out-of-core factor
files before terminating the application. Internal data in memory are not freed and no MPI
communications are performed during this call.

5.1.2 Special use of JOB

MUMPS can be called with a special value of JOB between two phases of the computation. Such special
values allow the user to get information to prepare the next steps. As for now, the only such cases are
JOB=9, which returns to the user information on specific RHS distributions before the solve phase, and
JOB=-4, which frees data from a factorization without freeing data from the corresponding analysis.

25
5.1.2.1 JOB=9 before solution with distributed right-hand sides: although the user is free to
provide his/her own distribution of the right-hand sides, to target performance he/she could be interested
in getting distributions with specific properties before calling the solve phase. (For example to exploit the
distribution of the factor matrices to limit data redistribution of the RHS). A call to MUMPS with JOB=9
can be done to obtain such information after a successful factorization (JOB=2 or 4 with INFOG(1) ≥ 0
on exit). On entry, IRHS loc should be allocated on each MPI process, of size at least INFO(23). The
result of this JOB=9 call may also depend on the values of ICNTL(20) and ICNTL(9) on entry. More
information is provided in Subsection 5.14.3, which describes the distributed RHS interface.

5.1.2.2 JOB=-4 after factorization or solve phases: a call to MUMPS with JOB=-4 frees all
MUMPS internal data structures except the ones from analysis. It can be used to free internal data allocated
at factorization (factor matrices, . . . ) when the user wants to free some memory associated to an instance
but still be able to perform a new factorization, possibly with new numerical values.

5.1.3 Version number

VERSION NUMBER (string) is set by MUMPS to the version number of MUMPS after a call to the
initialization phase (JOB=-1).
For C users (see Section 9 for more general information), a macro MUMPS VERSION is also defined
in the include files [sdcz]mumps c.h; it contains a string defining the version number. Typically, it
is defined by: #define MUMPS VERSION "5.6.0". This may be useful for users who wish to
get the version number associated to the header file they include in their application (the component
VERSION NUMBER of the structure may be badly initialized in case of incompatible alignment options
or incorrect version of the header file).

5.1.4 Control of parallelism (COMM, PAR)

COMM (integer) must be set by the user on all processors before the initialization phase (JOB = –1) and
must not be changed in further calls. It must be set to a valid MPI communicator that will be used
for message passing inside MUMPS. It is not altered by MUMPS and a copy of the communicator is
kept internally by the package until a call to the termination phase (JOB = –2). The processor with
rank 0 in this communicator is used by MUMPS as the host processor. Note that only the processors
belonging to the communicator should call MUMPS.
PAR (integer) must be initialized by the user on all processors before the initialization phase (JOB =
–1) and is accessed by MUMPS only during this phase. It is not altered by MUMPS and its value is
communicated internally to the other phases as required. Possible values for PAR are:
0 : The host is not involved in the parallel steps of the factorization and solve phases. The host
will only hold the initial problem, perform symbolic computations during the analysis phase,
distribute data, and collect results from other processors.
1 : The host is also involved in the parallel steps of the factorization and solve phases.
Other values are treated as 1.
Note that the value of PAR should be identical on all processors; if this is not the case, the value on
processor 0 (the host) is used by the package.
If the initial problem is large and memory is an issue, PAR=1 is not recommended if the matrix is
centralized on processor 0 because this can lead to memory imbalance, with processor 0 having a
larger memory load than the other processors.

5.2 Input Matrix

5.2.1 Matrix type (SYM)
The user must set the variable SYM to indicate which kind of matrix has to be factorize and consequently,
which factorization has to be performed.

26
SYM (integer) must be initialized by the user on all processors before the initialization phase (JOB =
–1) and is accessed by MUMPS only during this phase. It is not altered by MUMPS. Its value is
communicated internally to the other phases as required. Possible values for SYM are:
0 : A is unsymmetric.
1 : A is assumed to be symmetric positive definite so that pivots are taken from the diagonal
without numerical pivoting during the factorization. With this option, non-positive definite
matrices that do not require pivoting can also be treated in certain cases (see remark below).
2 : A is general symmetric
Other values are treated as 0.
Note that the value of SYM should be identical on all processors; if this is not the case, the value on
processor 0 is used by the package. For the complex version, the value SYM=1 is currently treated
as SYM=2. We do not have a version for Hermitian matrices in this release of MUMPS.
Remark for symmetric matrices (SYM=1). When SYM=1 is indicated by the user, an LDLT
factorization (in opposition to Cholesky factorization which requires positive diagonal pivots) of matrix
A is performed internally by the package, and numerical pivoting is switched off. Therefore, this
setting works for classes of matrices more general than positive definite matrices, including matrices with
negative pivots. However, this feature depends on the use of the ScaLAPACK library (see ICNTL(13))
to factorize the last dense block in the factorization of A associated to the root node of the elimination
tree. More precisely,
• if ScaLAPACK is allowed for the last dense block (default in parallel, ICNTL(13)=0), then the
presence of negative pivots in the part of the factorization processed with ScaLAPACK (subroutine
P POTRF) will raise an error and the code -40 is then returned in INFOG(1);
• if ScaLAPACK is not used (ICNTL(13)>0, or sequential execution, or last dense block detected
to be too small), then negative pivots are allowed and the factorization will work for some classes of
non-positive definite matrices where numerical pivoting is not necessary, e.g., symmetric negative
matrices.
The successful factorization of a symmetric matrix with SYM=1 is thus not an indication that the
matrix provided was symmetric positive definite. In order to verify that a matrix is positive definite,
the user can check that the number of negative pivots or inertia (INFOG(12)) is 0 on exit from the
factorization phase. Another approach to suppress numerical pivoting on symmetric matrices which is
compatible with the use of ScaLAPACK (see ICNTL(13)) consists in setting SYM=2 (general symmetric
matrices) with the relative threshold for pivoting CNTL(1) set to 0 (recommended strategy).

5.2.2 Matrix format

The formats of the input matrix and its distribution onto the processors are controlled by ICNTL(5) and
ICNTL(18), respectively.
ICNTL(5) controls the matrix input format
Phase: accessed by the host and only during the analysis phase
Possible variables/arrays involved: N, NNZ (or NZ for backward compatibility), IRN, JCN,
NNZ loc (or NZ loc for backward compatibility), IRN loc, JCN loc, A loc, NELT, ELTPTR,
ELTVAR, and A ELT
Possible values :
0 : assembled format. The matrix must be input in the structure components N, NNZ (or NZ),
IRN, JCN, and A if the matrix is centralized on the host (see Subsection 5.2.2.1) or in the
structure components N, NNZ loc (or NZ loc), IRN loc, JCN loc, A loc if the matrix is
distributed (see Subsection 5.2.2.2).
1 : elemental format. The matrix must be input in the structure components N, NELT, ELTPTR,
ELTVAR, and A ELT (see Subsection 5.2.2.3).
Any other values will be treated as 0.

27
Default value: 0 (assembled format)
Related parameters: ICNTL(18)
Incompatibility: If the matrix is in elemental format (ICNTL(5)=1), the BLR feature
(ICNTL(35)≥ 1) is currently not available, see error -800.
Remarks: NNZ and NNZ loc are 64-bit integers (NZ and NZ loc are 32-bit integers kept for
backward compatibility and will be obsolete in future releases).
Parallel analysis (ICNTL(28) =2) is only available for matrices in assembled format and, thus, an
error will be raised for elemental matrices (ICNTL(5)=1).
Elemental matrices can be input only centralized on the host (ICNTL(18)=0).

ICNTL(18) defines the strategy for the distribution of the input matrix (only for assembled matrix).
Phase: accessed by the host during the analysis phase.
Possible values :
0 : the input matrix is centralized on the host (see Subsection 5.2.2.1).
1 : the user provides the structure of the matrix on the host at analysis, MUMPS returns a mapping
and the user should then provide the matrix entries distributed according to the mapping on
entry to the numerical factorization phase (see Subsection 5.2.2.2).
2 : the user provides the structure of the matrix on the host at analysis, and the distributed
matrix entries on all slave processors at factorization. Any distribution is allowed (see
Subsection 5.2.2.2).
3 : user directly provides the distributed matrix, pattern and entries, input both for analysis and
factorization (see Subsection 5.2.2.2).
Other values are treated as 0.
Default value: 0 (input matrix centralized on the host)
Related parameters: ICNTL(5)
Remarks: In case of distributed matrix, we recommand options 2 or 3. Among them, we
recommand option 3 which is easier to use. Option 1 is kept for backward compatibility but is
deprecated and we plan to suppress it in a future release.

5.2.2.1 Centralized assembled matrix (ICNTL(5)=0 and ICNTL(18)=0). In the following,

an example of an unsymmetric 3x3 assembled matrix with 5 nonzeros is given.

a11
!
A= a22 a23
a31 a33

Order of the matrix N =3

Nonzeros in the matrix NNZ =5
array of row indices IRN [1 : N N Z] = 2 3 2 1 3
array of col indices JCN [1 : N N Z] = 3 1 2 1 3
array of values A [1 : N N Z] = a23 a31 a22 a11 a33
Note that the elements of the matrix can be input in any order.

The following components of [SDCZ]MUMPS STRUC hold the matrix in centralized assembled
format:
mumps par%N (integer) is the order of the matrix A, N > 0. It must be set by the user on the host
before analysis. It is not altered by MUMPS.
mumps par%NNZ (integer(8)) is the number of nonzero entries being input, NNZ > 0. It must be set
by the user on the host before analysis. (Note that mumps par%NZ (integer) is also available for
backward compatibility.) It is not altered by MUMPS.

28
mumps par%IRN and mumps par%JCN (integer pointer arrays, dimension NNZ) contain the row and
column indices, respectively, for the matrix entries. They must be set by the user on the host before
analysis. They are not altered by MUMPS.
mumps par%A (real/complex pointer array, dimension NNZ) must be set by the user in such a way that
A(k) is the value of the entry in row IRN(k) and column JCN(k) of the matrix. It must be set before
the factorization phase (JOB=2) or before analysis (JOB=1) if a numerical preprocessing option is
requested (1 < ICNTL(6) < 7). A is not altered by MUMPS. Duplicate entries are summed and
all entries with IRN(k) or JCN(k) out-of-range are ignored.
Note that, in the case of symmetric matrices (SYM=1 or 2), only half of the matrix should be provided.
For example, only the lower triangular part of the matrix (including the diagonal) or only the upper
triangular part of the matrix (including the diagonal) can be provided in IRN, JCN, and A. More precisely,
a diagonal nonzero aii must be provided as A(k)=aii , IRN(k)=JCN(k)=i, and a pair of off-diagonal
nonzeros aij = aji must be provided either as A(k)=aij and IRN(k)=i, JCN(k)=j or vice-versa. Again,
out-of-range entries are ignored and duplicate entries are summed. In particular, this means that if both
aij and aji are provided, they will be summed.

5.2.2.2 Distributed assembled matrix (ICNTL(5)=0 and ICNTL(18)=1,2,3). When the

matrix is in assembled format (ICNTL(5)=0), we offer several options to distribute the matrix, defined
by the control parameter ICNTL(18).
⋄ only the matrix structure is provided on the host for the analysis phase and the matrix entries are
provided for the numerical factorization, distributed across the processors
• either according to a mapping supplied by the analysis (ICNTL(18)=1),
• or according to a user determined mapping (ICNTL(18)=2);
⋄ it is also possible to distribute the matrix pattern and the entries in any distribution in local triplets
(ICNTL(18)=3) for both analysis and factorization (recommended option for distributed entry).
The following components of the structure define the distributed assembled matrix input. They are
valid for ICNTL(18)=1,2,3, otherwise the user should refer to Subsection 5.2.2.1 for the centralized
assembled matrix input.
The following components of [SDCZ]MUMPS STRUC hold the matrix in distributed assembled
format:
mumps par%N (integer) is the order of the matrix A, N > 0. It must be set by the user on the host
before analysis. It is not altered by MUMPS.
mumps par%NNZ (integer(8)) is the number of entries being input in the definition of A, NNZ > 0.
(mumps par%NZ (integer) is also available for backward compatibility.) It must be set by the user
on the host before analysis if ICNTL(18) = 1 or 2.
mumps par%IRN and mumps par%JCN (integer pointer array, dimension NNZ) contain the row and
column indices, respectively, for the matrix entries. They must be set by the user on the host before
analysis if ICNTL(18) = 1, or 2. In this case (ICNTL(18) = 1 or 2), they can be deallocated by
the user just after the analysis phase.
mumps par%NNZ loc (integer(8)) is the number of entries local to a processor. It must be defined on
all processors in the case of the working host model of parallelism (PAR=1), and on all processors
except the host in the case of the non-working host model of parallelism (PAR=0), before analysis if
ICNTL(18) = 3, and before factorization if ICNTL(18) = 1 or 2. (mumps par%NZ loc (integer)
is also available for backward compatibility.)
mumps par%IRN loc and mumps par%JCN loc (integer pointer array, dimension NNZ loc) contain
the global3 row and column indices, respectively, for the matrix entries. They must be defined
on all processors if PAR=1, and on all processors except the host if PAR=0, before analysis if
ICNTL(18) = 3, and before factorization if ICNTL(18) = 1 or 2.
3 If the calling application manages both local and global indices, the global indices must be provided.

29
mumps par%A loc (real/complex pointer array, dimension NNZ loc) must be defined before the
factorization phase (JOB=2) on all processors if PAR = 1, and on all processors except the host
if PAR = 0. The user must set A loc(k) to the value in row IRN loc(k) and column JCN loc(k).
mumps par%MAPPING (integer array, dimension NNZ) is returned by MUMPS on the host after
the analysis phase as an indication of a preferred mapping if ICNTL(18) = 1. In that case,
MAPPING(i) = IPROC means that entry IRN(i), JCN(i) should be provided on processor with
rank IPROC in the MUMPS communicator. Remark that MAPPING is allocated by MUMPS, and not
by the user. It will be freed during a call to MUMPS with JOB = -2. This parameter and the option
ICNTL(18) = 1 are kept for backward compatibility with previous versions but are deprecated
and will be suppressed in a future release.
We recommend the use of options ICNTL(18)= 2 or 3 because they are the most flexible options.
Furthermore, these options (2 or 3) are in general as efficient as the more complicated (and deprecated)
option ICNTL(18)=1. Among those two options, ICNTL(18)=3 is the simplest and most natural one
to use. ICNTL(18)=2 should only be used if the application has a centralized version of the entire
matrix already available on the host processor.
Again, out-of-range entries are ignored and duplicate entries are summed. In particular, if an entry
aij is provided both as (IRN loc(k1), JCN loc(k1), A loc(k1)) on a process P1 and as (IRN loc(k2),
JCN loc(k2), A loc(k2)) on a process P2, the corresponding numerical value considered for aij is the
sum of A loc(k1) on P1 and A loc(k2) on P2. This also means that it is possible to only perform local
assemblies inside each MPI process and that entries that are common to several MPI processes (which
may typically correspond to interface variables) will be summed internally by the MUMPS package without
the user having to take care of communications to assemble those entries.

5.2.2.3 Elemental matrix (ICNTL(5)=1 and ICNTL(18)=0). In the following, an example of

elemental matrix with two elements is given.

1 −1 2 3 3 2 −1 3
! !
A1 = 2 2 1 1 , A2 = 4 1 2 −1
3 1 1 1 5 3 2 1

 
−1 2 3 0 0
 2 1 1 0 0 
A=  1 1 3 −1 3  = A1 + A2
 
 0 0 1 2 −1 
0 0 3 2 1
PN ELT
• N=5 NELT=2 NVAR=6 A= i=1
Ai
ELTPTR [1:NELT+1] = 147
• ELTVAR [1:NVAR] = 123345
A ELT [1:NVAL] = -1 2 1 2 1 1 3 1 1 2 1 3 -1 2 2 3 -1 1
• Remarks:
– NVAR = ELTPTR(NELT+1)-1
– Order of element i: Si = ELT P T R(i + 1) − ELT P T R(i)
P 2 P
– NVAL = Si (unsymmetric) or Si (Si + 1)/2 (symmetric),
– storage of elements in ELTVAL: by columns

In the current release of the package, a matrix in elemental format must be input centrally on the host
(ICNTL(5)=1 and ICNTL(18)=0). The distributed elemental format is not currently available.
mumps par%N (integer), mumps par%NELT (integer), mumps par%ELTPTR (integer pointer
array, dimension NELT+1), mumps par%ELTVAR (integer pointer array, dimension ELTPTR(NELT+1)
– 1), and mumps par%A ELT (real/complex pointer array) hold the matrix in elemental format. The
following components of the MUMPS STRUC hold the matrix in elemental format:

30
mumps par%N (integer) is the order of the matrix A, N > 0. It is not altered by MUMPS.
mumps par%NELT (integer) is the number of elements being input, NELT > 0. It is not altered by
MUMPS.
mumps par%ELTPTR (integer pointer array, dimension NELT+1) is such that ELTPTR(j) points to
the position in ELTVAR of the first variable in element j, and ELTPTR(NELT+1) must be set to
the position after the last variable of the last element. Note that ELTPTR(1) should be equal to 1.
ELPTR is not altered by MUMPS.
mumps par%ELTVAR (integer pointer array, dimension ELTPTR(NELT+1) – 1) must be set to the lists
of variables of the elements. It is not altered by MUMPS. The variables for element j are stored in
positions ELTPTR(j), . . . , ELTPTR(j+1)–1. Out-of-range variables are ignored.
mumps par%A ELT (real/complex pointer array) If Np denotes ELTPTR(p+1)–ELTPTR(p), then the
values for element j are stored in positions Kj + 1, . . . , Kj + Lj , where
Pj−1
→ Kj = p=1
Np 2 , and Lj = Nj 2 in the unsymmetric case (SYM = 0)
Pj−1
→ Kj = p=1 (Np · (Np + 1))/2, and Lj = (Nj · (Nj + 1))/2 in the symmetric case (SYM=1
or 2). Only the lower triangular part is stored.
Values within each element are stored column-wise. Values corresponding to out-of-range variables
are ignored and values corresponding to duplicate variables within an element are summed. A ELT
is not accessed at the analysis phase (JOB = 1). Note that, although the elemental matrix may be
symmetric or unsymmetric in value, its structure is always symmetric.
The components N, NELT, ELTPTR, and ELTVAR describe the pattern of the matrix and must be set
by the user before the analysis phase (JOB=1) and should be passed unchanged when later calling the
factorization (JOB=2) and solve (JOB=3) phases. Component A ELT must be set before the factorization
phase (JOB=2).

5.2.3 Writing the input matrix to a file

If the input matrix is in assembled format (centralized or distributed), it is possible to write the matrix at
the analysis phase into a file whose name is given in the string “WRITE PROBLEM” from the MUMPS
structure. Either the “matrix market” format4 or a binary format will be used to write the matrix,
depending on the extension of “WRITE PROBLEM” (see below). If a right-hand side is provided, it
will also be written.
mumps par%WRITE PROBLEM (string) must be set by the user before the analysis phase (JOB=1)
in order to write the matrix/right-hand side into a file.
If the matrix is distributed, each MPI process must initialize the string WRITE PROBLEM. Each
MPI process then writes its share of the matrix in a file whose name is defined by the string
“WRITE PROBLEM” appended by its MPI rank.
Note that WRITE PROBLEM should include both the path and the file name.
• If the last four characters of the string WRITE PROBLEM are “.bin” (using either lowercase
or uppercase letters), then a binary format is used. The write operations are performed
using unformatted stream I/O, with no record boundaries. For a centralized matrix
(ICNTL(18)=0), it contains:
– N: four bytes (one 32-bit integer)
– NNZ: eight bytes (one 64-bit integer)
– IRN(1:NNZ): 4×NNZ bytes (NNZ 32-bit integers)
– JCN(1:NNZ): 4×NNZ bytes (NNZ 32-bit integers)
– A(1:NNZ) – if numerical values are provided at analysis: NNZ scalars, each of size 4
(resp. 8, 8, 16) bytes for s (resp. d, c, z) arithmetics.
4 See http://math.nist.gov/MatrixMarket/

31
For a distributed matrix, the binary format is the same except that NNZ loc, IRN loc,
JCN loc, A loc are printed on each MPI process (instead of NNZ, IRN, JCN, and A as
above).
A small text file which contains the corresponding matrix-market header and some comments
describing the binary file is also written. The name of this text file is constructed by replacing
“.bin” by “.header” in the string WRITE PROBLEM.
• Otherwise, that is, if the last four characters of the string WRITE PROBLEM different from
“.bin”, then the matrix (matrices in case of distributed format – ICNTL(18)=3) is written in
matrix-market format.
Furthermore, if a dense and centralized right-hand side (see Subsection 5.14.1) is provided on
the host before the analysis phase, it is also written in a file whose name is the matrix file name
(WRITE PROBLEM) appended by “.rhs”. In case a binary format is used, the file for the RHS
contains N×NRHS scalars, of size 4, 8, 8, 16 for s, d, c, z arithmetics, respectively, and the header
file mentioned above also provides the values of N and NRHS.
Finally, in case of analysis by blocks (see ICNTL(15)), BLKPTR and/or BLKVAR are also written
if they are provided. If BLKPTR is provided, NBLK and BLKPTR(1:NBLK+1) are written (one
integer per line, in text form) in a file whose name is the matrix file name appended by “.blkptr”.
If BLKVAR is provided, BLKVAR(1:N) is also written (one integer per line, in text form) in a file
whose name is the matrix file name appended by “.blkvar”.

5.3 Preprocessing: permutation to zero-free diagonal and scaling

The permutation to a zero-free diagonal and the scaling strategies are controlled by ICNTL(6) and
ICNTL(8), respectively.
ICNTL(6) computes a permutation to permute the matrix to a zero-free diagonal and/or computes a
matrix scaling.
Phase: accessed by the host and only during sequential analysis (ICNTL(28)=1)
Possible variables/arrays involved: optionally UNS PERM, mumps par%A, COLSCA and ROWSCA
Possible values :
0 : No column permutation is computed.
1 : The permuted matrix has as many entries on its diagonal as possible. The values on the
diagonal are of arbitrary size.
2 : The permutation is such that the smallest value on the diagonal of the permuted matrix is
maximized. The numerical values of the original matrix, (mumps par%A), must be provided
by the user during the analysis phase.
3 : Variant of option 2 with different performance. The numerical values of the original matrix
(mumps par%A) must be provided by the user during the analysis phase.
4 : The sum of the diagonal entries of the permuted matrix is maximized. The numerical values of
the original matrix (mumps par%A) must be provided by the user during the analysis phase.
5 : The product of the diagonal entries of the permuted matrix is maximized. Scaling vectors
are also computed and stored in COLSCA and ROWSCA, if ICNTL(8) is set to -2 or 77.
With these scaling vectors, the nonzero diagonal entries in the permuted matrix are one in
absolute value and all the off-diagonal entries less than or equal to one in absolute value.
For unsymmetric matrices, COLSCA and ROWSCA are meaningful on the permuted matrix
A Qc (see Equation (5)). For symmetric matrices, COLSCA and ROWSCA are meaningful on
the original matrix A. The numerical values of the original matrix, mumps par%A, must be
provided by the user during the analysis phase.
6 : Similar to 5 but with a more costly (time and memory footprint) algorithm. The numerical
values of the original matrix, mumps par%A, must be provided by the user during the analysis
phase.
7 : Based on the structural symmetry of the input matrix and on the availability of the numerical
values, the value of ICNTL(6) is automatically chosen by the software.

32
Other values are treated as 0. On output from the analysis phase, INFOG(23) holds the value of
ICNTL(6) that was effectively used.
Default value: 7 (automatic choice done by the package)
Incompatibility: If the matrix is symmetric positive definite (SYM = 1), or in elemental format
(ICNTL(5)=1), or the parallel analysis is requested (ICNTL(28)=2) or the ordering is provided
by the user (ICNTL(7)=1), or the Schur option (ICNTL(19) = 1, 2, or 3) is required, or the
matrix is initially distributed (ICNTL(18)=1,2,3), then ICNTL(6) is treated as 0.
Related parameters: ICNTL(8), ICNTL(12)
Remarks: On assembled centralized unsymmetric matrices (ICNTL(5)=0, ICNTL(18)=0, SYM
= 0), if ICNTL(6)=1, 2, 3, 4, 5, 6 a column permutation (based on weighted bipartite matching
algorithms described in [24, 25]) is applied to the original matrix to get a zero-free diagonal.
The user is advised to set ICNTL(6) to a nonzero value when the matrix is very unsymmetric
in structure. On output to the analysis phase, when the column permutation is not the identity, the
pointer UNS PERM (internal data valid until a call to MUMPS with JOB=-2) provides access to the
permutation on the host processor (see Subsection 5.3.1). Otherwise, the pointer is not associated.
The column permutation is such that entry ai,perm(i) is on the diagonal of the permuted matrix.
On general assembled centralized symmetric matrices (ICNTL(5)=0, ICNTL(18)=0, SYM =
2), if ICNTL(6)=1, 2, 3, 4, 5, 6, the column permutation is internally used to determine a set of
recommended 1×1 and 2×2 pivots (see [26] and the description of ICNTL(12) in Subsection 6.1
for more details). We advise either to let MUMPS select the strategy (ICNTL(6) = 7) or to set
ICNTL(6) = 5 if the user knows that the matrix is for example an augmented system (which is a
system with a large zero diagonal block). On output from the analysis the pointer UNS PERM is not
associated.

ICNTL(8) describes the scaling strategy

33
Other values are treated as 77.
Default value: 77 (automatic choice done by the package)
Related parameters: ICNTL(6), ICNTL(12)
Remarks: If ICNTL(8) = 77, then an automatic choice of the scaling option may be performed,
either during the analysis or the factorization. The effective value used for ICNTL(8) is returned in
INFOG(33). If the scaling arrays are computed during the analysis, then they are ready to be used
by the factorization phase. Note that scalings can be efficiently computed during analysis when
requested (see ICNTL(6) and ICNTL(12)).
If the input matrix is real and symmetric with SYM= 1 then automatic choice is no scaling. However,
the user may want to scale the matrix when BLR feature is activated (see ICNTL(35)).
Incompatibility: If the input matrix is symmetric (SYM= 1 or 2), then only options –2, –1, 0, 1, 7,
8 and 77 are allowed and other options are treated as 0. If the input matrix is in elemental format
(ICNTL(5) = 1), then only options –1 and 0 are allowed and other options are treated as 0. If
the input matrix is assembled and distributed (ICNTL(18)=1,2,3 and ICNTL(5) = 0), then only
options 7, 8 and 77 are allowed, otherwise no scaling is applied.
If block format is exploited (ICNTL(15)̸= 0) then scaling is not applied.

5.3.1 Permutation to a zero-free diagonal (ICNTL(6))

On assembled centralized unsymmetric matrices (ICNTL(5)=0, ICNTL(18)=0, SYM = 0), if
ICNTL(6)=1, 2, 3, 4, 5, 6 a column permutation (based on weighted bipartite matching algorithms
described in [24, 25]) is applied to the original matrix to get a zero-free diagonal. The user is advised to
set ICNTL(6) to a nonzero value when the matrix is very unsymmetric in structure, or to leave it to its
default (automatic) value.
mumps par%UNS PERM (integer pointer array, dimension N) is returned on the host processor on
output to the analysis phase for a centralized unsymmetric matrix, when a column permutation
is requested (ICNTL(6) ̸=0) and if it not the identity. For all other cases, the pointer is not
associated. It is allocated internally by MUMPS and provides access to the permutation, and is such
that entry ai,perm(i) is on the diagonal of the permuted matrix.

5.3.2 Scaling (ICNTL(6) or ICNTL(8))

mumps par%COLSCA, mumps par%ROWSCA (real pointer arrays, dimension N) are optional,
column and row scaling arrays, respectively, required only by the host. Note that this arrays are
real also in the complex version.
On input: If a scaling is provided by the user (ICNTL(8) = –1), these arrays must be allocated
and initialized by the user on the host, before a call to the factorization phase (JOB=2).
On output:
If ICNTL(6)=5 or 6, and ICNTL(8)=-2 or 77, they are automatically allocated and
computed by the package during the analysis phase.
Otherwise, they are automatically allocated and computed by the package during the
factorization phase.

5.4 Preprocessing: symmetric permutations

The choice of the symmetric permutation P, the so called ordering, from Equation (5) is managed by the
control parameters ICNTL(28), ICNTL(7) and ICNTL(29) defined below:
ICNTL(28) determines whether a sequential or a parallel analysis is performed.
Phase: accessed by the host process during the analysis phase.
Possible values :
0: automatic choice.

34
1: sequential computation. In this case the ordering method is set by ICNTL(7) and the
ICNTL(29) parameter is meaningless (choice of the parallel ordering tool).
2: parallel computation. A parallel ordering and parallel symbolic factorization is requested by
the user. For that, one of the parallel ordering tools (or all) must be available, and the matrix
should not be too small. The ordering method is set by ICNTL(29) and the ICNTL(7)
parameter is meaningless.
Any other values will be treated as 0.
Default value: 0 (automatic choice)
Incompatibility: The parallel analysis is not available when the Schur complement feature is
requested (ICNTL(19)=1,2 or 3), when a maximum transversal is requested on the input matrix
(i.e., ICNTL(6)=1, 2, 3, 4, 5 or 6) or when the input matrix is an unassembled matrices
(ICNTL(5)=1). When the number of processes available for parallel analysis is equal to 1,
or when the initial matrix is extremely small, a sequential analysis is indeed performed, even if
ICNTL(28)=2 (no error is raised in that case).
Related parameters: ICNTL(7), ICNTL(29), INFOG(32)
Remarks: Performing the analysis in parallel (ICNTL(28)= 2) will enable saving both time and
memory. Note that then the quality of the ordering depends on the number of processors used.
The number of processors for parallel analysis may be smaller than the number of MPI processes
available for MUMPS, in order to satisfy internal constraints of parallel ordering tools. On output,
INFOG(32) is set to the type of analysis (sequential or parallel) that was effectively chosen
internally.

ICNTL(7) computes a symmetric permutation (ordering) to determine the pivot order to be used for the
factorization (see Subsection 3.2)
Phase: accessed by the host and only during the sequential analysis phase (ICNTL(28) = 1).
Possible variables/arrays involved: PERM IN, SYM PERM
Possible values :
0 : Approximate Minimum Degree (AMD) [7] is used,
1 : The pivot order should be set by the user in PERM IN, on the host processor. In that case,
PERM IN must be allocated on the host by the user and PERM IN(i), (i=1, ... N) must hold the
position of variable i in the pivot order. In other words, row/column i in the original matrix
corresponds to row/column PERM IN(i) in the reordered matrix.
2 : Approximate Minimum Fill (AMF) is used,
3 : SCOTCH5 [42] package is used if previously installed by the user otherwise treated as 7.
4 : PORD6 [46] is used if previously installed by the user otherwise treated as 7.
5 : the Metis7 [34] package is used if previously installed by the user otherwise treated as 7.
It is possible to modify some components of the internal options array of Metis (see
Metis manual) in order to fine-tune and modify various aspects of the internal algorithms
used by Metis. This can be done by setting some elements (see the file metis.h in the
Metis installation to check the position of each option in the array) of the MUMPS array
mumps par%METIS OPTIONS after the MUMPS initialization phase (JOB=-1) and before
the analysis phase. Note that the METIS OPTIONS array of the MUMPS structure is of size 40,
which is large enough for both Metis 4.x and Metis 5.x verions. It is passed by MUMPS as the
argument “options” to the METIS ordering routine METIS NodeND (METIS NodeWND is
sometimes also called in case MUMPS was installed with Metis 4.x) during the analysis phase.
6 : Approximate Minimum Degree with automatic quasi-dense row detection (QAMD) is used.
5 See http://gforge.inria.fr/projects/scotch/ to obtain a copy.
6 Distributed
within MUMPS by permission of J. Schulze (University of Paderborn).
7 See http://glaros.dtc.umn.edu/gkhome/metis/metis/overview to obtain a copy.

35
7 : Automatic choice by the software during analysis phase. This choice will depend on the
ordering packages made available, on the matrix (type and size), and on the number of
processors.
Other values are treated as 7.
Default value: 7 (automatic choice)
Incompatibility: ICNTL(7) is meaningless if the parallel analysis is chosen (ICNTL(28)=2).
Related parameters: ICNTL(28)
Remarks: Even when the ordering is provided by the user, the analysis must be performed before
numerical factorization.
For assembled matrices (centralized or distributed) (ICNTL(5)=0) all the options are available.
For elemental matrices (ICNTL(5)=1), only options 0, 1, 5 and 7 are available, with option 7
leading to an automatic choice between AMD and Metis (options 0 or 5); other values are treated
as 7.
If the user asks for a Schur complement matrix (ICNTL(19)= 1, 2, 3) and
– the matrix is assembled (ICNTL(5)=0) then only options 0, 1, 5 and 7 are currently available.
Other options are treated as 7.
– the matrix is elemental (ICNTL(5)=1) only options 0, 1 and 7 are currently available. Other
options are treated as 7 which will (currently) be treated as 0 (AMD).
– in both cases (assembled or elemental matrix) if the pivot order is given by the user
(ICNTL(7)=1) then the following property should hold: PERM IN(LISTVAR SCHUR(i)) =
N-SIZE SCHUR+i, for i=1,SIZE SCHUR.
For matrices with relatively dense rows, we highly recommend option 6 which may significantly
reduce the time for analysis.
On output, the pointer array SYM PERM provides access, on the host processor, to the symmetric
permutation that is effectively computed during the analysis phase by the MUMPS package, and
INFOG(7) to the ordering option that was effectively chosen. In fact, the option corresponding to
ICNTL(7) may be forced by MUMPS when for example the ordering option chosen by the user is
not compatible with the value of ICNTL(12) or the necessary package is not installed.
SYM PERM(i), i=1, ... N, holds the position of variable i in the pivot order. In other words,
row/column i in the original matrix corresponds to row/column SYM PERM(i) in the reordered
matrix. See also Subsection 5.4.1.

ICNTL(29) defines the parallel ordering tool to be used to compute the fill-in reducing permutation.
Phase: accessed by host process only during the parallel analysis phase (ICNTL(28)=2).
Possible variables/arrays involved: SYM PERM
Possible values :
0: automatic choice.
1: PT-SCOTCH is used to reorder the input matrix, if available.
2: ParMetis is used to reorder the input matrix, if available.
Other values are treated as 0.
Default value: 0 (automatic choice)
Related parameters: ICNTL(28)
Remarks: On output, the pointer array SYM PERM provides access, on the host processor, to the
symmetric permutation that is effectively considered during the analysis phase, and INFOG(7)
to the ordering option that was effectively used. SYM PERM(i), (i=1, ... N) holds the position of
variable i in the pivot order, see Subsection 5.4.1 for a full description.

36
5.4.1 Symmetric permutation vector (ICNTL(7) and ICNTL(29))
When the ordering is not provided by the user, the choice of the ordering strategy is controlled by
ICNTL(7) in case of sequential analysis (ICNTL(28)=1), and by ICNTL(29) in case of parallel
analysis (ICNTL(28)=2). In all cases (serial or parallel analysis, ordering computed internally or
provided by the user, Schur complement, assembled or elemental matrix), the symmetric permutation
of the variables that MUMPS relies on is returned to the user in the mumps par%SYM PERM array.
mumps par%SYM PERM (integer pointer array, dimension N) is allocated internally and returned
on the host processor on output to the analysis phase. It contains the permutation that was
effectively computed during the analysis phase and that will serve as a basis for the numerical
factorization. It is such that SYM PERM(i) holds the position of variable i in the pivot order.
For example, SYM PERM(12)=2 means that variable 12 in the original matrix is the second
variable to be eliminated in the pivot order. In case a Schur complement was requested (see
ICNTL(19)), the returned permutation also includes the variables from the Schur complement,
so that: SYM PERM(LISTVAR SCHUR(i))=N-SIZE SCHUR+i, for 1 ≤ i ≤SIZE SCHUR (see
also Subsection 5.15).

5.4.2 Given ordering (ICNTL(7)=1 and ICNTL(28)=1)

An ordering can optionally be provided by the user on input to the package. Even in this case, the analysis
phase (JOB=1) must be performed before the numerical factorization. Note that this functionality is only
available with the sequential analysis (ICNTL(28)=1).
mumps par%PERM IN (integer pointer array, dimension N) must be allocated and initialized by
the user on the host before the sequential analysis phase (JOB=1, ICNTL(28)=1) when
ICNTL(7)=1. Although the input matrix can be provided in assembled or elemental format (see
ICNTL(5)), PERM IN always defines the elimination order of the variables, not the elimination
order of the elements. The user should define PERM IN such that PERM IN(i), i=1, . . . , N holds
the position of variable i in the pivot order. For example, PERM IN(12)=2 indicates that variable
12 in the original matrix is the second variable to be eliminated in the pivot order. In case a Schur
complement is requested (ICNTL(19)=1,2,3), the permutation should also include the variables
from the Schur complement, so that: PERM IN(LISTVAR SCHUR(i))=N-SIZE SCHUR+i, for
1 ≤ i ≤SIZE SCHUR (see Subsection 5.15). In case of parallel analysis (ICNTL(28)=2),
PERM IN is ignored. The input matrix can be centralized or distributed (see ICNTL(18)).
Remark that, in case of given ordering, although PERM IN is used, SYM PERM (see Subsection 5.4.1)
will generally differ from PERM IN because MUMPS takes some freedom to reorganize the order of the
computations for locality and efficiency. However, PERM IN and SYM PERM are equivalent orderings in
terms of estimated factor size and estimated number of operations for the factorization.

5.5 Preprocessing: exploit compression of the input matrix resulting

from a block format (ICNTL(15)̸= 0)
The user can provide block information (so called block format) related to the input matrix, that will
be exploited internally to work on a compressed graph during the ordering and symbolic factorization
phases. This can significantly decrease the time and memory consumption for the analysis phase. Note
that only the integer description of the block format is required.
If ICNTL(15)̸= 0 then the block format will be described on the host process in the following
components of [SDCZ]MUMPS STRUC:
mumps par%NBLK (integer) corresponds to the number of blocks and should be set by the user only if
ICNTL(15)= 1
mumps par%BLKPTR (integer pointer array, dimension NBLK+1) is such that BLKPTR(iblk) points
to the position in BLKVAR of the first variable in block iblk. It must be set by the user only if
ICNTL(15)= 1. If ICNTL(15)< 0 it will be computed internally.

37
mumps par%BLKVAR (integer pointer array, dimension N). If BLKVAR is not provided,
it is internally treated as the identity BLKVAR(i)=i, (i=1, ..., N). Otherwise,
BLKVAR(BLKPTR(iblk):BLKPTR(iblk+1)-1), (iblk=1, NBLK) holds the variables associated
to block iblk.

ICNTL(15) exploits compression of the input matrix resulting from a block format
Phase: accessed by the host process during the analysis phase.
Possible variables/arrays involved: NBLK, BLKPTR, BLKVAR
Possible values :
0: no compression
-k: all blocks are of fixed size k> 0. N (the order of the matrix A) must be a multiple of k. NBLK
and BLKPTR should not be provided by the user and will be computed internally. Concerning
BLKVAR, please refer to the Remarks below.
1: block format provided by the user. NBLK need be provided on the host by the user and
holds the number of blocks. BLKPTR(1:NBLK+1) must be provided by the user on the host.
Concerning BLKVAR, please refer to the Remarks below.
Any other values will be treated as 0.

Default value: 0
Remarks: If BLKVAR is not provided by the user then BLKVAR is internally treated as the identity
(BLKVAR(i)=i, (i=1, ..., N)). It corresponds to contiguous variables in blocks.
– If ICNTL(15)=1 then BLKVAR(BLKPTR(iblk):BLKPTR(iblk+1)-1), (iblk=1, NBLK) holds
the variables associated to block iblk.
– If ICNTL(15) < 0 then BLKPTR need not be provided by the user and NBLK = N/k where
N must be a multiple of k.
In case the pivot order is provided on entry by the user at the analysis phase (ICNTL(7)= 1) then
PERM IN should be compatible with the compression. This means that PERM IN, of size N, should
result from an expansion of a pivot order on the compressed matrix, i.e., variables in a block should
be consecutive in the pivot order.
Incompatibility: With element entry format ICNTL(5)= 1, with Schur complement
ICNTL(19)̸= 0 and with permutation to a zero-free diagonal and related compressed/constrained
ordering for symmetric matrices (ICNTL(6)̸= 0, ICNTL(12)̸= 1).

5.6 Post-processing: iterative refinement

It is possible to improve the accuracy of the solution using an iterative refinement procedure, thanks to
the control parameter ICNTL(10). The iterative refinement procedure can stop either when a stopping
criterion is satisfied, or after a fixed number of steps. Algorithm 2 provides the iterative refinement
procedure with a stopping criterion, as implemented in MUMPS. In that case, the stopping criterion is
based on the backward errors ω1 and ω2 , as defined in Section 3.3.2.
Algorithm 3 can also be used in order to perform a fixed number of steps of iterative refinement,
without convergence test. In fact, it has been shown [20] that with only two to three steps of iterative
refinement the solution can often be significantly improved.
Note that the iterative refinement method may diverge. In case of iterative refinement with a fixed
number of steps, the final solution may then be worse than the initial solution. On the other hand, in
case of divergence with Algorithm 2 at a given iteration, iterative refinement stops and the solution x is
overwritten by the previous iterate.
ICNTL(10) applies iterative refinement to improve the computed solution.
Phase: accessed by the host during the solve phase.
Possible variables/arrays involved: NRHS

38
Let x be the initial solution of Ax = b
Compute residual r = b − Ax
Compute the associated backward errors ω1 and ω2 (see Subsection 3.3.2)
i=0
while ω1 + ω2 ≥ α and convergence is not too slow and i ≤ IR steps do
Solve A∆x = r using the computed factorization
x = x + ∆x
r = b − Ax
Compute backward errors ω1 and ω2
i=i+1
end while

Algorithm 2: Iterative refinement. At each step, backward errors are computed and compared to
α, the stopping criterion (see CNTL(2)). The number of steps performed is limited to IR steps (=
ICNTL(10)).

Let x be the initial solution of Ax = b

Compute residual r = b − Ax
for i = 1 to IR Steps do
Solve A∆x = r using the computed factorization
x = x + ∆x
r = b − Ax
end for

Algorithm 3: Iterative refinement with a fixed number of steps IR Steps (= |ICNTL(10)|).

Possible values :
< 0 : Fixed number of steps of iterative refinement. No stopping criterion is used.
0 : No iterative refinement.
> 0 : Maximum number of steps of iterative refinement. A stopping criterion is used, therefore a
test for convergence is done at each step of the iterative refinement algorithm.
Default value: 0 (no iterative refinement)
Related parameters: CNTL(2)
Incompatibility: If ICNTL(21)=1 (solution kept distributed) or if ICNTL(32)=1 (forward
elimination during factorization), or if NRHS>1 (multiple right hand sides), or if ICNTL(20)=10
or 11 (distributed right hand sides), then iterative refinement is disabled and ICNTL(10) is treated
as 0.
Remarks: Note that if ICNTL(10)< 0, |ICNTL(10)| steps of iterative refinement are performed,
without any test of convergence (see Algorithm 3). This means that the iterative refinement may
diverge, that is the solution instead of being improved may be worse from an accuracy point of view.
But it has been shown [20] that with only two to three steps of iterative refinement the solution can
often be significantly improved. So if the convergence test should not be done we recommend to
set ICNTL(10) to -2 or -3.
Note also that it is not necessary to activate the error analysis option (ICNTL(11)= 1,2) to be
able to run the iterative refinement with stopping criterium (ICNTL(10) > 0). However, since
the backward errors ω1 and ω2 have been computed, they are still returned in RINFOG(7) and
RINFOG(8), respectively.
It must be noticed that iterative refinement with stopping criterium (ICNTL(10) > 0) will stop
when
1. either the requested accuracy is reached (ω1 + ω2 < CNTL(2))

39
2. or when the convergence rate is too slow (ω1 + ω2 does not decrease by at least a factor of 5)
3. or when exactly ICNTL(10) steps have been performed.
In the first two cases the number of iterative refinement steps (INFOG(15)) may be lower than
ICNTL(10).

5.7 Post-processing: error analysis

MUMPS enables the user to perform classical error analysis based on the residuals. We calculate
an estimate of the sparse backward error using the theory and metrics developed in [20] (see
Subsection 3.3.2).
If ICNTL(11) = 2, main statistics are computed:
• the infinite norm of the input matrix: ∥A∥∞ or ∥AT ∥∞ = RINFOG(4)
• the infinite norm of the computed solution x̄: ∥x̄∥∞ = RINFOG(5)
∥Ax̄−b∥∞
• the scaled residual: ∥A∥∞ ∥x̄∥∞
= RINFOG(6)
• ω1 = RINFOG(7)
• ω2 = RINFOG(8)
If ICNTL(11) = 1, in addition to the above statistics, the condition numbers for the linear system
(not just the matrix) and an upper bound of the forward error of the computed solution are also returned
(see Subsection 3.3.2):
• cond1 = RINFOG(10)
• cond2 = RINFOG(11)
∥δx̄∥∞
• ∥x̄∥∞
≤ ω1 cond1 + ω2 cond2 = RINFOG(9)
Note that the error analysis in the case of ICNTL(11) = 1 is significantly more costly than the solve
phase itself.
ICNTL(11) computes statistics related to an error analysis of the linear system solved (Ax = b or
AT x = b (see ICNTL(9))).
Phase: accessed by the host and only during the solve phase.
Possible variables/arrays involved: NRHS
Possible values :
0 : no error analysis is performed (no statistics).
1 : compute all the statistics (very expensive).
2 : compute main statistics (norms, residuals, componentwise backward errors), but not the most
expensive ones like (condition number and forward error estimates).
Values different from 0, 1, and 2 are treated as 0.
Default value: 0 (no statistics).
Incompatibility: If ICNTL(21)=1 (solution kept distributed) or if ICNTL(32)=1 (forward
elimination during factorization), or if NRHS>1 (multiple right hand sides), or if ICNTL(20)=10
or 11 (distributed right hand sides), or if ICNTL(25)=-1 (computation of the null space basis),
then error analysis is not performed and ICNTL(11) is treated as 0.
Related parameters: ICNTL(9)
Remarks: The computed statistics are returned in various informational parameters, see also
Subsection 3.3:
– If ICNTL(11)= 2, then the infinite norm of the input matrix (∥A∥∞ or ∥AT ∥∞ in
RINFOG(4)), the infinite norm of the computed solution (∥x̄∥∞ in RINFOG(5)), and the
∥Ax̄−b∥∞
scaled residual ∥A∥∞ ∥x̄∥∞
in RINFOG(6), a componentwise backward error estimate in
RINFOG(7) and RINFOG(8) are computed.

40
– If ICNTL(11)= 1, then in addition to the above statistics also an estimate for the error in
the solution in RINFOG(9), and condition numbers for the linear system in RINFOG(10) and
RINFOG(11) are also returned.
If performance is critical, ICNTL(11) should be set to 0. If both performance is critical and statistics
are requested, then ICNTL(11) should be set to 2. If ICNTL(11)=1, the error analysis is very costly
(typically significantly more costly than the solve phase itself).

5.8 Out-of-core (ICNTL(22))

The decision to use the disk to store the matrix of factors is controlled by ICNTL(22). Only the value
on the host node is significant.
ICNTL(22) controls the in-core/out-of-core (OOC) factorization and solve.
Phase: accessed by the host during the factorization phase.
Possible variables/arrays involved: OOC TMPDIR and OOC PREFIX
Possible values :
0: In-core factorization and solution phases (default standard version).
1: Out-of-core factorization and solve phases. The complete matrix of factors is written to disk
(see Subsection 3.14).
Other values are treated as 0 (in-core factorization).
Default value: 0 (in-core factorization)
Related parameters: ICNTL(35) since factors in low-rank form are not written to disk
Remarks: The variables OOC TMPDIR and OOC PREFIX are used to indicate the directory and
the prefix, respectively, where to store the factors. They must be set after the initialization phase
(JOB = -1) and before the factorization phase (JOB=2,4,5 or 6). Otherwise, MUMPS will use the
/tmp directory and arbitrary file names. Note MUMPS accesses to the variables OOC TMPDIR and
OOC PREFIX only during the factorization phase. Several files under the same directory and with
the same prefix are created to store the factors. Their names contain a unique hash and MUMPS is
in charge of keeping trace of them.
The files containing the factors will be deleted if a new factorization starts or when a termination
phase (JOB=-2) is called, except if the save/restore feature has been used and the files containing
the factors are associated to a saved instance. See Section Subsection 5.17.4).
Note that, in case of abnormal termination of an application calling MUMPS (for example, a
termination of the calling process with a segmentation fault, or, more generally, a termination of
the calling process without a call to MUMPS with JOB=-2), the files containing the factors are not
deleted. It is then the user’s responsibility to delete them, as shown in bold in the example below,
where the application calling MUMPS is launched from a bash script and environment variables are
used to define the OOC environment:
#!/bin/bash
export MUMPS OOC TMPDIR="/local/mumps data/"
export MUMPS OOC PREFIX="job myapp "
mpirun -np 128 ./myapplication
# Suppress MUMPS OOC files in case of bad application termination
rm -f ${MUMPS OOC TMPDIR}/${MUMPS OOC PREFIX}*
mumps par%OOC TMPDIR (string) can be provided by the user (on each processor) to control the
directory where the out-of-core files will be stored.
Note that it is also possible to provide the directory through environment variables. If OOC TMPDIR
is not defined, then MUMPS checks for the environment variable MUMPS OOC TMPDIR. If neither
OOC TMPDIR nor MUMPS OOC TMPDIR are defined, then the directory /tmp is attempted.

41
mumps par%OOC PREFIX (string) can be provided by the user (on each processor) to prefix the out-
of-core files.
Note that it is also possible to provide the files prefix through environment variables.
If OOC PREFIX is not defined, then MUMPS checks for the environment variable
MUMPS OOC PREFIX. If neither OOC PREFIX nor MUMPS OOC PREFIX are defined, then
MUMPS chooses the file names automatically.

5.9 Workspace parameters (ICNTL(14) and ICNTL(23)) and user workspace

The memory required to run the numerical phases (factorization and solve) is estimated during the
analysis. The size of the workspace actually required during numerical factorization depends on the
numerical characteristics of the matrix, and therefore on the numerical pivoting that may lead to extra
storage, but also on algorithmic parameters such as the in-core/out-of-core strategies (ICNTL(22)),
the memory relaxation parameter (ICNTL(14)) and the fact to discard the factor matrices during the
factorization (ICNTL(31)). It also depends on the way factors of large frontal matrices are stored (full-
rank or low-rank, see ICNTL(35) and ICNTL(37)).
Two main workarrays are allocated internally: IS and S (integer and real/complex workarray,
respectively), that hold full-rank factors, active frontal matrices, and full-rank contribution blocks.
Dynamic storage may also be used when the size of S was underestimated (ICNTL(14) too small).
In case of low-rank storage of the factors (ICNTL(35)=2), dynamic allocation is used to store
corresponding low-rank data. ICNTL(38) holds an estimate of the compression rate of the factors of
BLR fronts on entry to the analysis phase and is be used to provide estimations of memory usage. In case
of low-rank storage of CB blocks (ICNTL(37)=1) dynamic allocation is used to store corresponding
low-rank data. ICNTL(39) holds an estimate of the compression rate of the CB blocks of BLR fronts
on entry to the analysis phase and is used to provide estimations of memory usage.
In addition to these two large workarrays and to dynamic allocation for low-rank structures, other
internal workarrays are used: for example, internal communication buffers in the parallel case, or integer
arrays holding the structure of the elimination tree.
At the end of the analysis phase, the following estimations of the memory required to run the
numerical phases are provided (note that these estimations depends on the memory relaxation parameter
ICNTL(14) and in the case of low-rank storage it depends also of the average compression rate of the
factors ICNTL(38) and of the average compression rate of the CB ICNTL(39). In case of out-of-
core strategy, only uncompressed factors (in full-rank format) are written to disk and that MegaBytes
corresponds to millions (106 ) of Bytes.
• Full-rank factors (ICNTL(35)=0 or 3):
– Size in MegaBytes of the total working space locally requested by each processor:
INFO(15) for in-core strategy;
INFO(17) for out-of-core strategy.
Maximum and sum over all processors:
INFOG(16) and INFOG(17), respectively for in-core strategy;
INFOG(26) and INFOG(27), respectively for out-of-core strategy;
– Size of the main real/complex workarray S:
INFO(8) for in-core strategy;
INFO(20) for out-of-core strategy
(negative value corresponds to millions of real/complex entries needed in this workarray).
• Low-rank factors (ICNTL(35)=1 or 2) only (ICNTL(37)=0) (estimates depends on both
ICNTL(14) and ICNTL(38)):
– Size in MegaBytes of the total working space locally requested by each processor:
INFO(30) for in-core strategy;
INFO(31) for out-of-core strategy.
Maximum and sum over all processors:
INFOG(36) and INFOG(37), respectively for in-core strategy;
INFOG(38) and INFOG(39), respectively for out-of-core strategy;
– Size of the main real/complex workarray S:
INFO(29) for in-core strategy;
INFO(20) for out-of-core strategy
(negative value corresponds to millions of real/complex entries needed in this workarray).

42
• Low-rank CB only (ICNTL(35)=0,3 and ICNTL(37)=1) (estimates depends on both
ICNTL(14) and ICNTL(39)):
– Size in MegaBytes of the total working space locally requested by each processor:
INFO(37) for in-core strategy;
INFO(38) for out-of-core strategy.
Maximum and sum over all processors:
INFOG(44) and INFOG(45), respectively for in-core strategy;
INFOG(46) and INFOG(47), respectively for out-of-core strategy;
– Size of the main real/complex workarray S:
INFO(36) for in-core strategy;
INFO(33) for out-of-core strategy
(negative value corresponds to millions of real/complex entries needed in this workarray).
• Low-rank factors and CB (ICNTL(35)=1,2 and ICNTL(37)=1) (estimates depends on
ICNTL(14), ICNTL(38) and ICNTL(39)):
– Size in MegaBytes of the total working space locally requested by each processor:
INFO(34) for in-core strategy;
INFO(35) for out-of-core strategy.
Maximum and sum over all processors:
INFOG(40) and INFOG(41), respectively for in-core strategy;
INFOG(42) and INFOG(43), respectively for out-of-core strategy;
– Size of the main real/complex workarray S:
INFO(32) for in-core strategy;
INFO(33) for out-of-core strategy
(negative value corresponds to millions of real/complex entries needed in this workarray).
As a first general approach, we advise the user to rely on the estimations provided during the analysis
phase. If the user wants to/must increase the allocated workspace (typically, because of numerical pivoting
that leads to extra storage, or previous call to MUMPS that failed because of a lack of allocated memory),
we describe in the following how the size of the workspace can be controlled.
• The user can modify the value of the memory relaxation parameter, ICNTL(14), that is designed
to control the increase with respect to the estimations performed during analysis, in the size of all
(integer and real/complex) workspace allocated during the numerical phase.
• The user can explicitly control the memory used by the package by providing in ICNTL(23) the
size of the total memory that is allowed to be used internally.
We provide the definitions of ICNTL(14) and ICNTL(23) below:
ICNTL(14) corresponds to the percentage increase in the estimated working space.
Phase: accessed by the host both during the analysis and the factorization phases.
Default value: between 20 and 35 (which corresponds to at most 35 % increase) and depends on
the number of MPI processes. It is set to 5 % with SYM=1 and one MPI process.
Related parameters: ICNTL(23)
Remarks: When significant extra fill-in is caused by numerical pivoting, increasing ICNTL(14)
may help.

ICNTL(23) corresponds to the maximum size of the working memory in MegaBytes that MUMPS can
allocate per working processor. It covers all internal integer and real (complex in the complex version)
workspace allocated by MUMPS.
Phase: accessed by all processes at the beginning of the factorization phase. If the value is greater
than 0 only on the host, then the value on the host is used for all processes, otherwise ICNTL(23)
is interpreted locally on each MPI process.
Possible values :
0 : each processor will allocate workspace based on the estimates computed during the analysis
>0 : maximum size of the working memory in MegaBytes per working processor to be allocated
Default value: 0

43
Related parameters: ICNTL(14), ICNTL(38), ICNTL(39)
Remarks: If ICNTL(23) is greater than 0 then MUMPS automatically computes the size of
the internal workarrays such that the storage for all MUMPS internal data does not exceed
ICNTL(23). The relaxation ICNTL(14) is first applied to the internal integer workarray IS and to
communication and I/O buffers; the remaining available space is then shared between the main (and
often most critical) real/complex internal workarray S holding the factors, the stack of contribution
blocks and dynamic workarrays that are used either to expand the S array or to store low-rank
dynamic structures.
Lower bounds for ICNTL(23), in case ICNTL(23) is provided only on the host:
– In case of full-rank factors only (ICNTL(35)=0 or 3), a lower bound for ICNTL(23) (if ICNTL(14),
has not been modified since the analysis) is given by INFOG(16) if the factorization is in-core
(ICNTL(22)=0), and by INFOG(26) if the factorization is out-of-core (ICNTL(22)=1).
– In case of low-rank factors (ICNTL(35)=1 or 2) only (ICNTL(37)=0), a lower bound for ICNTL(23)
(if ICNTL(14), has not been modified since the analysis and ICNTL(38) is a good approximation
of the average compression rate of the factors) is given by INFOG(36) if the factorization is in-core
(ICNTL(22)=0), and by INFOG(38) if the factorization is out-of-core (ICNTL(22)=1).
– In case of low-rank contribution blocks (CB) only (ICNTL(35)=0,3 and ICNTL(37)=1), a lower bound
for ICNTL(23) (if ICNTL(14), has not been modified since the analysis and ICNTL(39) is a good
approximation of the average compression rate of the CB) is given by INFOG(44) if the factorization is
in-core (ICNTL(22)=0), and by INFOG(46) if the factorization is out-of-core (ICNTL(22)=1).
– In case of low-rank factors and contribution blocks (ICNTL(35)=1,2 and ICNTL(37)=1), a lower
bound for ICNTL(23) (if ICNTL(14), has not been modified since the analysis, and ICNTL(38) and
ICNTL(39) are good approximations of the average compression rate of respectively the factors and the
CB) is given by INFOG(40) if the factorization is in-core (ICNTL(22)=0), and by INFOG(42) if the
factorization is out-of-core (ICNTL(22)=1).
Lower bounds for ICNTL(23), in case ICNTL(23) is provided locally to each MPI process:
– Full-rank factors only (ICNTL(35)=0 or 3) ⇒ INFO(15) if the factorization is in-core
(ICNTL(22)=0), INFO(17) if the factorization is out-of-core (ICNTL(22)=1).
– Low-rank factors (ICNTL(35)=1 or 2) only (ICNTL(37)=0) ⇒ INFO(30) if the factorization is
in-core (ICNTL(22)=0), INFO(31) if the factorization is out-of-core (ICNTL(22)=1).
– Low-rank factors and contribution blocks (ICNTL(35)=1,2 and ICNTL(37)=1) ⇒ is given by
INFO(34) if the factorization is in-core (ICNTL(22)=0), INFO(35) if the factorization is out-of-
core (ICNTL(22)=1).
The above lower bounds include memory for the real/complex internal workarray S holding the
factors and stack of contribution blocks. In case WK USER is provided, the above quantities should
be diminished by the estimated memory for S/WK USER. This estimated memory can be obtained
from INFO(8), INFO(9), or INFO(20) (depending on MUMPS settings) by taking their absolute
value, if negative, or by dividing them by 106 , if positive. See also the paragraph Recommended
values of LWK USER below.
If ICNTL(23) is left to its default value 0 then MUMPS will allocate for the factorization phase
a workspace based on the estimates computed during the analysis if ICNTL(14) has not been
modified since analysis, or larger if ICNTL(14) was increased. Note that even with full-rank
factorization, these estimates are only accurate in the sequential version of MUMPS but they can
be inaccurate in the parallel case, especially for the out-of-core version. Therefore, in parallel, we
recommend to use ICNTL(23) and provide a value larger than the provided estimations.
Another possibility is that the user provides the real/complex workarray instead of using the internal
main real/complex workarray S. Note that in case factors of large frontal matrices are stored in low-rank
form (ICNTL(35)=2), they will use a separate dynamic storage allocation, but providing a workarray to
store frontal matrices, contribution blocks and factors that stay full-rank fronts is still possible. A pointer
array that points to that workspace must be provided. In this case, the value of ICNTL(23) excludes the
workspace corresponding to the user workspace.

We describe below the two parameters associated to this functionality:

mumps par%LWK USER (integer) is local to each processor if PAR=1, and on all processors except the
host if PAR=0. It is accessed at the beginning of the numerical phases of MUMPS. Its default value

44
is 0. At the beginning of the numerical phases, if the user sets LWK USER to a nonzero value
then LWK USER will define the size of the pointer array WK USER. If negative, -LWK USER
is a lower bound for the number of entries in millions of the pointer array WK USER so that
abs(LWK USER) × 106 ≤ size(WK USER) must hold.
Recommended values of LWK USER (otherwise an error with code -9 may occur):
• In case of full-rank factors (ICNTL(35)=0,3), we recommend the user to set LWK USER to a value larger
than than INFO(8) (in-core factorization) or INFO(20) (out-of-core factorization).
• In case of low-rank factors (ICNTL(35)=1 or 2) only (ICNTL(37)=0), we recommend the user to
set LWK USER to a value larger than INFO(29) (in-core factorization) or INFO(20) (out-of-core
factorization).
• In case of low-rank contribution blocks only (ICNTL(35)=0,3 and ICNTL(37)=1), we recommend the
user to set LWK USER to a value larger than INFO(36) (in-core factorization) or INFO(33) (out-of-
core factorization).
• In case of low-rank factors and contribution blocks (ICNTL(35)=1,2 and ICNTL(37)=1), we
recommend the user to set LWK USER to a value larger than INFO(32) (in-core factorization) or
INFO(33) (out-of-core factorization).
Moreover, if the factorization is in-core, the value of LWK USER must not be modified between
factorization and subsequent solution phases.
If the numerical phases are out-of-core (ICNTL(22)=1), we recommend LWK USER to be
larger than INFO(20). In this case, the user can reduce the value for LWK USER between the
factorization phase and the solve phase.
mumps par%WK USER is a real/complex pointer array that can point to the workspace provided by
the user. It is only accessed by MUMPS when LWK USER has been set by the user to a non-zero
value. In that case, MUMPS will avoid the internal allocation of the main real/complex workarray S
and use WK USER instead.
Note that the type of WK USER should follow the arithmetic: single precision for SMUMPS, double
precision for DMUMPS, single complex for CMUMPS, and double complex for ZMUMPS.
If the factorization is in-core (ICNTL(22)=0), then WK USER should not be modified between
factorization and solution phases of MUMPS.

5.10 Null pivot row detection (ICNTL(24))

It is possible to detect null pivot rows during the factorization by setting ICNTL(24)= 1. A pivot row
is considered as null if its infinite norm is smaller than a threshold whose value relies on CNTL(3).
At the end of the factorization mumps par%INFOG(28) will contain the deficiency of the initial
matrix and if INFOG(28) ̸= 0, the array PIVNUL LIST(1:INFOG(28)) will hold, on the host, the
row indices corresponding to the null pivot rows .
Null pivot rows are modified to enable the solution phase to provide one solution among the possible
solutions of the numerically deficient matrix (see ICNTL(25) = 0) . CNTL(5) is used to define the
value of the fixation. A vector or the complete basis of the null space can be returned to the user during
the solve phase (see ICNTL(25)̸=0) using backward substitutions.
ICNTL(24) controls the detection of “null pivot rows”.
Phase: accessed by the host during the factorization phase
Possible variables/arrays involved: PIVNUL LIST
Possible values :
0: Nothing done. A null pivot row will result in error INFO(1)=-10.
1: Null pivot row detection.
Other values are treated as 0.
Default value: 0 (no null pivot row detection)
Related parameters: CNTL(3), CNTL(5), ICNTL(13), ICNTL(25)

45
Remarks: CNTL(3) is used to compute the threshold to decide if a pivot row is “null”.
Note that when ScaLAPACK is applied on the root node (see ICNTL(13) = 0), then exact null
pivots on the root will stop the factorization (INFO(1)=-10) while if tiny pivots are present on
the root node the ScaLAPACK routine will factorize the root matrix. Computing the root node
factorization sequentially (this can be forced by setting ICNTL(13) to 1) will help with the correct
detection of null pivots but may degrade performance.
Related data:
mumps par%INFOG(28) (integer): INFOG(28) is set to the number of null pivot rows detected
during the factorization step. .
mumps par%PIVNUL LIST (integer array, dimension N):
If INFOG(28) ̸= 0 then PIVNUL LIST(1:INFOG(28)) will hold, on the host, the row indices
corresponding to the null pivot rows (ICNTL(24)= 1) .

5.11 Discard matrix factors (ICNTL(31))

In some cases the user may want to discard one or both factors during factorization. This can be done
using the ICNTL(31) control parameter.
ICNTL(31) indicates which factors may be discarded during the factorization.
Phase: accessed by the host during the analysis phase.
Possible values :
0 : the factors are kept during the factorization, phase except in the case of unsymmetric matrices
when the forward elimination is performed during factorization (ICNTL(32) = 1). In this
case, since it will not be used during the solve phase, the L factor is discarded.
1: all factors are discarded during the factorization phase. The user is not interested in solving
the linear system (Equations (3) or (4)) and will not call MUMPS solution phase (JOB=3).
This option is meaningful when only a Schur complement is needed (see ICNTL(19)), or
when only statistics from the factorization, such as (for example) definiteness, value of the
determinant, number of entries in factors after numerical pivoting, number of negative or null
pivots are required. In this case, the memory allocated for the factorization will rely on the
out-of-core estimates (and factors will not be written to disk).
2: this setting is meaningful only for unsymmetric matrices and has no impact on symmetric
matrices: only the U factor is kept after factorization so that exclusively a backward
substitution is possible during the solve phase (JOB=3). This can be useful when:
−the user is only interested in the computation of a null space basis (see ICNTL(25))
during the solve phase, or
−the forward elimination is performed during the factorization (ICNTL(32)=1). Note that
for unsymmetric matrices, if the forward elimination is performed during the factorization
(ICNTL(32) = 1) then the L factor is always discarded during factorization. In this case
(ICNTL(32) = 1), both ICNTL(31) = 0 and ICNTL(31) = 2 have the same behaviour.

Other values are treated as 0.

Default value: 0 (the factors are kept during the factorization phase in order to be able to solve the
system).
Incompatibility: ICNTL(31) = 2 is not meaningful for symmetric matrices.
Related parameters: ICNTL(32), forward elimination during factorization, ICNTL(33),
computation of the determinant, ICNTL(25) computation of a null space basis, ICNTL(22)
out-of-core factors.
Remarks: For unsymmetric matrices and ICNTL(32)=2, MUMPS currently discards the L
factors corresponding to full-rank frontal matrices but not of low-rank frontal matrices (except
if ICNTL(35)=3). In a future version, discarding all the L factors in case of BLR factorization
and ICNTL(32)=2 may lead to a further memory reduction.

46
5.12 Computation of the determinant (ICNTL(33))
The user interested in computing the determinant of the matrix A can use the control parameter
ICNTL(33). See Subsection 3.15 for details on how it will be computed.
ICNTL(33) computes the determinant of the input matrix.
Phase: accessed by the host during the factorization phase.
Possible values :
0 : the determinant of the input matrix is not computed.
̸= 0: computes the determinant of the input matrix. The determinant is obtained by computing
(a + ib) × 2c where a =RINFOG(12), b =RINFOG(13) and c = INFOG(34). In real
arithmetic b=RINFOG(13) is equal to 0.
Default value: 0 (determinant is not computed)
Related parameters: ICNTL(31)
Remarks: In case a Schur complement was requested (see ICNTL(19)), elements of the Schur
complement are excluded from the computation of the determinant so that the determinant is that
one of matrix A1,1 (using notations of Subsection 3.17).
Although we recommend to compute the determinant on non-singular matrices, null pivot rows
(ICNTL(24)) and static pivots (CNTL(4)) are excluded from the determinant so that a non-zero
determinant is still returned on singular or near-singular matrices. This determinant is then not
unique and will depend on which equations were excluded.
Furthermore, we recommend to switch off scaling (ICNTL(8)) in such cases. If not (ICNTL(8)
̸= 0), we describe in the following the current behaviour of the package:
– if static pivoting (CNTL(4)) is activated: all entries of the scaling arrays ROWSCA and
COLSCA are currently taken into account in the computation of the determinant.
– if the null pivot row detection (ICNTL(24)) is activated, then entries of ROWSCA and
COLSCA corresponding to pivots in PIVNUL LIST are excluded from the determinant so
that
* for symmetric matrices (SYM=1 or 2), the returned determinant correctly corresponds to
the matrix excluding rows and columns of PIVNUL LIST.
* for unsymmetric matrices (SYM=0), scaling may perturb the value of the determinant in
case off-diagonal pivoting has occurred (INFOG(12)̸=0).
Note that if the user is interested in computing only the determinant, we recommend to discard the
factors during factorization ICNTL(31).

5.13 Forward elimination during factorization (ICNTL(32))

This option makes much sense when the factors have to be used only once or in an out-of-core context
(ICNTL(22)=1), where loading the factors from disk during the solution phase (JOB=3), both during
the forward (Equation (3)) and backward (Equation (4)) substitutions, can be particularly costly.
Factorization Phase Without Forward With forward
type during factorization during factorization
ICNTL(32) = 0 ICNTL(32) = 1
Factorization phase A = LU A = LU
Solve Ly = b
LU
Solve phase Solve Ly = b
Solve U x = y Solve U x = y
Factorization phase A = LDLT A = LDLT
Solve LDy = b
LDLT
Solve phase Solve LDy = b
Solve LT x = y Solve LT x = y

47
Note that for unsymmetric matrices, if the forward elimination is performed during factorization, the
U factor may be discarded (see ICNTL(31)). In the symmetric LDLT case, the L factor must always
be kept in order to be able to perform the backward substitution, i.e., solve LT x = y.
ICNTL(32) performs the forward elimination of the right-hand sides (Equation (3)) during the
factorization (JOB=2).
Phase: accessed by the host during the analysis phase.
Possible variables/arrays involved: RHS, NRHS, LRHS, and possibly REDRHS, LREDRHS when
ICNTL(26)=1
Possible values :
0: standard factorization not involving right-hand sides.
1: forward elimination (Equation (3)) of the right-hand side vectors is performed during
factorization (JOB=2). The solve phase (JOB=3) will then only involve backward substitution
(Equation (4)).
Other values are treated as 0.
Default value: 0 (standard factorization)
Related parameters: ICNTL(31),ICNTL(26)
Incompatibility: This option is incompatible with sparse right-hand sides (ICNTL(20)=1,2,3),
with the solution of the transposed system (ICNTL(9) ̸= 1), with the computation of entries of
the inverse (ICNTL(30)=1), and with BLR factorizations (ICNTL(35)=1,2,3). In such cases,
error -43 is raised.
Furthermore, iterative refinement (ICNTL(10)) and error analysis (ICNTL(11)) are disabled.
Finally, the current implementation imposes that all right-hand sides are processed in one pass
during the backward step. Therefore, the blocking size (ICNTL(27)) is ignored.
Remarks: The right-hand sides must be dense to use this functionality: RHS, NRHS, and LRHS
should be provided as described in Subsection 5.14.1. They should be provided at the beginning of
the factorization phase (JOB=2) rather than at the beginning of the solve phase (JOB=3).
For unsymmetric matrices, if the forward elimination is performed during factorization
(ICNTL(32) = 1), the L factor (see ICNTL(31)) may be discarded to save space. In fact,
the L factor will then always be discarded (even when ICNTL(31)=0) in the case of a full-rank
factorization (ICNTL(35)=0) or BLR factorization with full-rank solve (ICNTL(35)=3). In the
case of a BLR factorization with ICNTL(35)=1 or 2, only the L factor corresponding to full-rank
frontal matrices are discarded in the current version.
We advise to use this option only for a reasonably small number of dense right-hand side vectors
because of the additional associated storage required when this option is activated and the number
of right-hand sides is large compared to ICNTL(27).

5.14 Right-hand side and solution vectors/matrices

n×nrhs
MUMPS can solve the systems AX = B or AT X = B where X, B ∈ R . The B matrix is
referred to as the right-hand side and the X matrix to as the solution.
MUMPS gives the option to input the right-hand side matrix B in dense or sparse format, centralized
or distributed, and to output the solution matrix X centralized or distributed, but always in dense format.
Sparsity of the right-hand side can be exploited to accelerate the solution phase [14, 15, 44, 37]. Note
that the first step of the solution phase involves the distribution (scatter step) of the right-hand side onto
the processors. The cost of scatter can be highly reduce in when right-hand sides are sparse and provided
in a sparse format.
Moreover, MUMPS can optionally compute some entries of the inverse A−1 solving the system with
a particular sparse right-hand side B (see Subsection 5.14.4).
The formats of the right-hand side and of the solution vectors are controlled by ICNTL(20) and
ICNTL(21), respectively.
ICNTL(20) determines the format (dense, sparse, or distributed) of the right-hand side.

48
Phase: accessed by the host during the solve phase and before a JOB=9 call.
Possible variables/arrays involved: RHS, NRHS, LRHS, IRHS SPARSE, RHS SPARSE,
IRHS PTR, NZ RHS, Nloc RHS, LRHS loc, IRHS loc and RHS loc.
Possible values :
0 : the right-hand side is in dense format in the structure component RHS, NRHS, LRHS (see
Subsection 5.14.1)
1,2,3 : the right-hand side is in sparse format in the structure components IRHS SPARSE,
RHS SPARSE, IRHS PTR and NZ RHS.
1 : The decision of exploiting sparsity of the right-hand side to accelerate the solution phase
is done automatically.
2 : Sparsity of the right-hand side is NOT exploited to improve solution phase.
3 : Sparsity of the right-hand side is exploited during solution phase.
10, 11 : When provided before the solve phase, values 10 and 11 have the same meaning. The
right-hand side is provided distributed in the structure components Nloc RHS, LRHS loc,
IRHS loc, RHS loc (see Subsection 5.14.3).
When provided before a JOB=9 call, values 10 and 11 indicate which distribution MUMPS
should build and return to the user in IRHS loc. In this case, the user should provide a
workarray IRHS loc on each MPI process of size at least INFO(23), where INFO(23) is
returned after the factorization phase.
10 : fill IRHS loc to minimize internal communications of right-hand side data during the
solve phase.
11 : fill IRHS loc to match the distribution of the solution (imposed by MUMPS), in case of
distributed solution (ICNTL(21)=1).
Values different from 0, 1, 2, 3, 10, 11 are treated as 0. For a sparse right-hand side, the
recommended value is 1.
Default value: 0 (dense right-hand sides)
Incompatibility: When NRHS > 1 (multiple right-hand side), the functionalities related to iterative
refinement ( ICNTL(10)) and error analysis (ICNTL(11)) are currently disabled.
With sparse right-hand sides (ICNTL(20)=1,2,3), the forward elimination during the factorization
(ICNTL(32)=1) is not currently available.
Remarks: For details on how to set the input parameters see Subsection 5.14.1, Subsection 5.14.2
and Subsection 5.14.3. Please note that duplicate entries in the sparse or distributed right-hand sides
are summed. A JOB=9 call can only be done after a successful factorization phase and its results
depends on the transpose option ICNTL(9), which should not be modified between a JOB=9 and
JOB=3 call. The distributed right-hand side feature enables the user to provide a sparse structured
RHS (i.e., a RHS with some empty rows).

ICNTL(21) determines the distribution (centralized or distributed) of the solution vectors.

Phase: accessed by the host during the solve phase.
Possible variables/arrays involved: RHS, ISOL loc and SOL loc, LSOL loc
Possible values :
0 : the solution vector is assembled and stored in the structure component RHS (gather phase),
that must have been allocated earlier by the user (see Subsection 5.14.5).
1 : the solution vector is kept distributed on each slave processor in the structure components
ISOL loc and SOL loc. ISOL loc and SOL loc must then have been allocated by the
user and must be of size at least INFO(23), where INFO(23) has been returned by MUMPS
at the end of the factorization phase (see Subsection 5.14.6).
Values different from 0 and 1 are currently treated as 0.
Default value: 0 (assembled centralized format)
Incompatibility: If the solution is kept distributed, error analysis and iterative refinement (controlled
by ICNTL(10) and ICNTL(11)) are not applied.

49
5.14.1 Dense right-hand side (ICNTL(20)=0)
In this case, the matrix B of size n x nrhs is input in a one dimensional array of size lrhs x nrhs where
the leading dimension lrhs must be ≥ n, the dimension of the matrix A.
The following components of the MUMPS structure should be allocated by the user on the host before a
call to MUMPS with JOB= 3, 5, or 6 (call including the solve) if forward and backward elimination are both
computed during the solve (ICNTL(32)=0), or before a call to MUMPS with JOB= 2, 4 (call including
the factorization) if the forward elimination is computed during the factorization (ICNTL(32)=1). In
case of forward elimination performed during factorization, NRHS should also be provided before analysis
phase with JOB=1 and should be kept unchanged for the subsequent numerical phases.
mumps par%RHS (real/complex pointer array, dimension LRHS×NRHS) is a real (complex in the
complex version) array.
On entry RHS(i+(k-1)× LRHS) must hold the i-th component of the kth column of the right-hand
side matrix (1 ≤ k ≤ NRHS) of the equations being solved.
On exit, if the solution matrix has to be centralized (ICNTL(21)=0), then RHS(i+(k-1)×LRHS)
will hold the i-th component of the kth column of the solution matrix, 1 ≤ k ≤ NRHS.
Otherwise, if the solution matrix has to be distributed (ICNTL(21)=1), on exit to the package,
RHS will not contain any significant data for the user, even if it may have been modified.
mumps par%NRHS (integer) is an optional parameter that should be set by the user, on the host
processor, to the number of right-hand side vectors. Otherwise, the value 1 is assumed.
mumps par%LRHS (integer) is an optional parameter that should be set by the user, in the case where
NRHS is set by the user. In this case, it must hold the leading dimension of the array RHS and
should be greater than or equal to N (the matrix dimension). Otherwise, a single-column right-hand
side is assumed and LRHS is not accessed.

5.14.2 Sparse right-hand side (ICNTL(20)=1,2,3)

If the user wants to compute the solution with sparse right-hand sides, the right-hand side matrix should
be input as a sparse matrix in column format. Sparsity of the right-hand side can then be exploited
to accelerate the solution phase (see [44, 37]) but at the cost of some extra preprocessing that can be
switched off by the user setting ICNTL(20) to 2.
In the following, an example of a 4x2 matrix B with 5 nonzeros is provided.
 
a11
a22 
B= 

a31 a32 
a41

total nonzeros in B NZ RHS =5

number of columns B
NRHS =2
(n. of rhs vectors)
pointers to the columns IRHS PTR [1 : N RHS + 1] = 1 4 6
array of row indices IRHS SPARSE [1 : N Z RHS] = 1 3 4 2 3
array of values RHS SPARSE [1 : N Z RHS] = a11 a31 a41 a22 a32

The following input parameters should be defined on the host only before a call to MUMPS including
the solve phase (JOB=3, 5, or 6):
mumps par%NZ RHS (integer) should hold the total number of non-zeros in all the right-hand side
vectors.
mumps par%NRHS (integer) is an optional parameter that should be set by the user on the host
processor, to the number of right-hand side vectors. Otherwise, the value 1 is assumed.

50
mumps par%RHS SPARSE (real/complex pointer array, dimension NZ RHS) should hold the
numerical values of the non-zero entries of each right-hand side vector. This means that the B
matrix should be input by columns.
mumps par%IRHS SPARSE (integer pointer array, dimension NZ RHS) should hold the indices of the
variables of the non-zero inputs of each right-hand side vector.
mumps par%IRHS PTR (integer pointer array, dimension NRHS+1) is such that the i-th right-hand
side vector is defined by its non-zero row indices IRHS SPARSE(IRHS PTR(i)...IRHS PTR(i+1)-
1) and the corresponding numerical values RHS SPARSE(IRHS PTR(i)...IRHS PTR(i+1)-1). Note
that IRHS PTR(1)=1 and IRHS PTR(NRHS+1)=NZ RHS+1.
mumps par%RHS (real/complex pointer array, dimension LRHS×NRHS) must be allocated by the user
on the host if the output solution should be centralized (ICNTL(21)=0). On exit from a call to
MUMPS it will hold the centralized solution (ICNTL(21) =0).

5.14.3 Distributed right-hand side (ICNTL(20)=10,11)

In the context of dense multiple right-hand sides, the memory footprint on the host processor can be an
issue. The associated communications can also be responsible for a non-negligible part of the cost of the
solve phase. Furthermore, from a user’s point of view it may be more natural in an MPI environment to
provide the right-hand side already distributed on the MPI processors. The distributed memory feature
has been developed to address these issues.
When the distributed right-hand side feature is requested (ICNTL(20)=10 or ICNTL(20)=11), the
following parameters should be allocated by the user before the solve phase (JOB=3) on all processors
in the case of the working host model of parallelism (PAR=1), and on all processors except the host in
the case of the non-working host model of parallelism (PAR=0). Note that the solve phase of MUMPS has
exactly the same behaviour whether ICNTL(20)=10 or ICNTL(20)=11. It is interesting to distinguish
between these two values only in the unsymmetric case when retrieving special RHS distributions related
to MUMPS internals (JOB=9, see paragraph “Special distributions returned by MUMPS” below).
mumps par%NRHS (integer) is an optional parameter that should be set by the user, on the host
processor, to the number of right-hand side vectors. Otherwise, the value 1 is assumed.
mumps par%Nloc RHS (integer) is the number of local right-hand side rows. It must be defined by
the user on all processors if PAR=1, and on all processors except the host if PAR=0. Note that
Nloc RHS=0 is allowed, indicating that the local processor does not contribute to the right-hand
side.
mumps par%LRHS loc (integer) is the leading dimension of RHS loc. It must be such that LRHS loc
≥ Nloc RHS.
mumps par%RHS loc (real/complex pointer array, dimension LRHS loc×NRHS, where NRHS is
either equal to 1 or corresponds to the value provided by the user in the structure component NRHS
on the host) must be allocated by the user. Its leading dimension LRHS loc should be larger than
or equal to Nloc RHS, the number of right-hand side rows provided locally (see above).
mumps par%IRHS loc (integer pointer array, of dimension Nloc RHS) must be allocated and defined
on all processors if PAR=1, and on all processors except the host if PAR=0. The user must set
IRHS loc(k) to the global index (between 1 and N ) of the right-hand side rows that will be provided
locally in RHS loc. For example, if IRHS loc(5)=47, row 5 of of the local RHS loc corresponds
to row 47 in the global system. Note that if IRHS loc(k1 )=IRHS loc(k2 ), for k1 ̸= k2 , or if the
same index appears on several MPI processes, the corresponding rows in RHS loc are summed.
Furthermore, entries such that IRHS loc(k) greater than N or smaller than 0 result in row k of
RHS loc to be ignored.
mumps par%RHS (real/complex pointer array, dimension LRHS×NRHS) must be allocated by the user
on the host if the output solution should be centralized (ICNTL(21)=0). On exit from a call to
MUMPS it will hold the centralized solution (ICNTL(21) =0).
Special distributions returned by MUMPS.

51
We now describe how to obtain special IRHS loc distributions from MUMPS, that can guide the
distribution of the right-hand sides provided by the user. This can be useful in case the user does not
have specific constraints on the RHS distribution. For this, MUMPS should be called with JOB=9 after a
successful factorization, with the following parameters.
mumps par%IRHS loc (integer pointer array, of dimension at least INFO(23), where INFO(23)
was computed during the factorization phase) must be allocated by the user after the factorization
phase. On exit from a call to MUMPS with JOB=9, IRHS loc(1:INFO(23)) contains the list of
global RHS indices on the local MPI process corresponding to the internal MUMPS distributions.
mumps par%ICNTL(20)=10 if the user asks for a distribution that avoids RHS communication during
MUMPS solve phase, or 11 if the user asks for a distribution that is identical to the distribution of
the solution (the distribution of the solution is imposed by MUMPS, see ICNTL(21)). Other values
are treated as 10.
mumps par%ICNTL(9) indicates if the distribution provided should target the standard solve phase
(AX = B) or the solve phase for the transposed system (AT X = B). Note that if ICNTL(9)
changes between a JOB=9 call and a JOB=3 call, the distribution computed with ICNTL(20)=10
will be the one corresponding to the description of ICNTL(20)=11 and vice versa.
In the symmetric case (SYM=1 or 2), and in the unsymmetric case (SYM=0) when pivoting and
maximum weighted matching options are switched off (CNTL(1)=0 and ICNTL(6)=0), all possible
values of ICNTL(9) and ICNTL(20) lead to the same distribution. Therefore, these control parameters
need not be set on entry to the call to MUMPS with JOB=9. In case an unsymmetric factorization (SYM=0)
did not use an unsymmetric permutation of the matrix (see ICNTL(6)) and there were no off-diagonal
pivots (INFOG(12)=0), the same distribution will be returned for all values of ICNTL(9) and for
ICNTL(20)=10 or 11.
Special case of distributed right-hand side and distributed solution.
In case of both distributed RHS (ICNTL(20)=10,11) and distributed solution (ICNTL(21)=1),
IRHS loc and ISOL loc may point to the same workarrays. This should only be done when the
contents are known to be identical (symmetric case, unsymmetric case under some circumstances, see
above). Otherwise, the solve phase (JOB=3) with distributed solution will overwrite entries of IRHS loc
while building ISOL loc.
Furthermore, RHS loc and SOL loc may point to the same memory location, as long as LRHS loc
> LSOL loc.

5.14.4 A particular case of sparse right-hand side: computing entries of A−1

(ICNTL(30)=1)
It is possible to compute some selected entries of the inverse matrix A−1 (see Subsection 3.16) using the
control parameter ICNTL(30).
Let us consider the example below, in which

a−1 a−1 a−1 a−1

 
11 12 13 14
 a−1 a−1 a−1 a−1
A−1 =  21 22 23 24 
a−1
31 a−1
32 a−1
33 a−1
34

a−1
41 a−1
42 a−1
43 a−1
44

denotes the inverse of the matrix A.

We would like to compute the boldface elements:

a−1 a−1 a−1 a−1

 
11 12 13 14
 a−1 a−1 a−1 a−1
A−1 21
=  −1 22 23 24 
a31 a−1
32 a−1
33 a−1
34

a−1
41 a−1
42 a−1
43
−1
a44

On input, the following parameters should be allocated and initialized:

52
total entries A−1 to be computed NZ RHS =4
number of columns A−1 NRHS =N =4
pointers to the columns IRHS PTR [1 : N RHS + 1] = 1 3 4 4 5
array of row indices IRHS SPARSE [1 : N Z RHS] = 1 3 3 4

Note that column 3 will be considered as empty, because no elements have to be computed.
The following parameter should be allocated, but not initialized:

array of values RHS SPARSE [1 : N Z RHS]

On output, the following parameters will hold the requested entries:

total entries A−1 to be computed NZ RHS =4

number of columns A−1 NRHS =N =4
pointers to the columns IRHS PTR [1 : N RHS + 1] = 1 3 4 4 5
array of row indices IRHS SPARSE [1 : N Z RHS] = 1 3 3 4
array of values RHS SPARSE [1 : N Z RHS] = a−1
11 a−1
31 a−1
32 a−1
44

ICNTL(30) computes a user-specified set of entries in the inverse A−1 of the original matrix.
Phase: accessed during the solution phase.
Possible variables/arrays involved: NZ RHS, NRHS, RHS SPARSE, IRHS SPARSE, IRHS PTR
Possible values :
0: no entries in A−1 are computed.
1: computes entries in A−1 .
Other values are treated as 0.
Default value: 0 (no entries in A−1 are computed)
Incompatibility: Error analysis and iterative refinement will not be performed, even if the
corresponding options are set (ICNTL(10) and ICNTL(11)). Because the entries of A−1 are
returned in RHS SPARSE on the host, this functionality is incompatible with the distributed solution
option (ICNTL(21)). Furthermore, computing entries of A−1 is not possible in the case of partial
factorizations with a Schur complement (ICNTL(19)). Option to compute solution using A or
AT (ICNTL(9)) is meaningless and thus ignored.
Related parameters: ICNTL(27)
Remarks: When a set of entries of A−1 is requested, the associated set of columns will be
computed in blocks of size ICNTL(27). Larger ICNTL(27) values will most likely decrease the
amount of factor accesses, enable more parallelism and thus reduce the solution time [48, 44, 14].
The user must specify on input to a call of the solve phase in the arrays IRHS PTR and
IRHS SPARSE the target entries. The array RHS SPARSE should be allocated but not initialized.
Note that since selected entries of the inverse of the matrix are requested, NRHS must be set to N. On
output the arrays IRHS PTR, IRHS SPARSE and RHS SPARSE will hold the requested entries. If
duplicate target entries are provided then duplicate solutions will be returned.
When entries of A−1 are requested (ICNTL(30) = 1), mumps par%RHS needs not be allocated.

5.14.5 Centralized solution (ICNTL(21)=0)

The solution vector X can be returned centralized on the host in both cases: dense or sparse right-hand
side vectors. The matrix X will be returned as a dense vector in the array mumps par%RHS. For this
reason the mumps par%RHS should be allocated even when the right-hand side matrix B is input in
sparse format.

53
5.14.6 Distributed solution (ICNTL(21)=1)
On some networks with low bandwidth, and especially when there are many right-hand side vectors,
centralizing the solution on the host processor might be a costly part of the solution phase. If this
is critical to the user, this functionality allows the solution to be left distributed over the processors
(ICNTL(21)=1). The solution should then be exploited in its distributed form by the user application.
Note that this option can be used only with JOB=3 and should not be used with JOB= 5 or 6, because
some parameters needed for this option must be set using information output by the factorization.
The following input parameters should be allocated by the user before the solve phase (JOB=3) on
all processors in the case of the working host model of parallelism (PAR=1), and on all processors except
the host in the case of the non-working host model of parallelism (PAR=0).
mumps par%SOL loc (real/complex pointer array, dimension LSOL loc× NRHS where NRHS is
either equal to 1 or corresponds to the value provided by the user in NRHS on the host) must be
allocated by the user between the factorization and solve steps. Its leading dimension LSOL loc
should be larger than or equal to INFO(23), that is returned by the factorization phase.
On exit from the solve phase, SOL loc(i+(k-1)×LSOL loc) will contain the value corresponding
to variable ISOL loc(i) in the kth solution vector.
mumps par%LSOL loc (integer). LSOL loc must be set to the leading dimension of SOL loc (see
above) and should be larger than or equal to INFO(23), that is returned by the factorization phase.
mumps par%ISOL loc (integer pointer array, of dimension at least INFO(23), that is returned by the
factorization phase) must be allocated by the user between the factorization and solve steps.
On exit from the solve phase, ISOL loc(i) contains the index of the variables for which the solution
(in SOL loc) is available on the local processor.
If successive calls to the solve phase (JOB=3) are performed for a given matrix, ISOL loc will have
the same contents for each of these calls.
Note that if the solution is kept distributed, then functionalities related to error analysis and iterative
refinement (ICNTL(10) and ICNTL(11)) are currently not available.

5.15 Schur complement with reduced or condensed right-hand side

(ICNTL(19) and ICNTL(26))
MUMPS gives the possibility to perform the partial factorization of the complete matrix and to return the
Schur matrix, that is the part of the matrix still to be factorized. The Schur matrix will be returned as a full
matrix, distributed in different ways (see Subsection 5.15.1, Subsection 5.15.3 and Subsection 5.15.2).
ICNTL(19) computes the Schur complement matrix.
Phase: accessed by the host during the analysis phase.
Possible variables/arrays involved: SIZE SCHUR, LISTVAR SCHUR, NPROW, NPCOL, MBLOCK,
NBLOCK, SCHUR, SCHUR MLOC, SCHUR NLOC, and SCHUR LLD
Possible values :
0 : complete factorization. No Schur complement is returned.
1 : the Schur complement matrix will be returned centralized by rows on the host after the
factorization phase. On the host before the analysis phase, the user must set the integer variable
SIZE SCHUR to the size of the Schur matrix, the integer pointer array LISTVAR SCHUR to
the list of indices of the Schur matrix.
2 or 3 : the Schur complement matrix will be returned distributed by columns: the Schur will
be returned on the slave processors in the form of a 2D block cyclic distributed matrix
(ScaLAPACK style) after factorization. Workspace should be allocated by the user before
the factorization phase in order for MUMPS to store the Schur complement (see SCHUR,
SCHUR MLOC, SCHUR NLOC, and SCHUR LLD in Subsection 5.15). On the host before the
analysis phase, the user must set the integer variable SIZE SCHUR to the size of the Schur
matrix, the integer pointer array LISTVAR SCHUR to the list of indices of the Schur matrix.

54
The integer variables NPROW, NPCOL, MBLOCK, NBLOCK may also be defined (default values
will otherwise be provided).
Values not equal to 1, 2 or 3 are treated as 0.
Default value: 0 (complete factorization)
Incompatibility: Since the Schur complement is a partial factorization of the global matrix (with
partial ordering of the variables provided by the user), the following options of MUMPS are
incompatible with the Schur option: maximum transversal, scaling, iterative refinement, error
analysis and parallel analysis.
Related parameters: ICNTL(7), ICNTL(26)
Remarks: If the ordering is given (ICNTL(7)= 1) then the following property should hold:
PERM IN(LISTVAR SCHUR(i)) = N-SIZE SCHUR+i, for i=1,SIZE SCHUR.
Note that, in order to have a centralized Schur complement matrix by columns (see
Subsection 5.15.3), it is possible (and recommended) to use a particular case of the distributed
Schur complement (ICNTL(19)=2 or 3), where the Schur complement is assigned to only one
processor (NPCOL × NPROW = 1).
If ICNTL(19) = 1,2,3 the user should give on input on the host before the analysis phase the
following parameters:
mumps par%SIZE SCHUR (integer) must be initialized to the number of variables defining the Schur
complement. It is only accessed during the analysis phase and is not altered by MUMPS. Its value is
communicated internally to the other phases as required. SIZE SCHUR should be greater or equal
to 0 and strictly smaller than N.
mumps par%LISTVAR SCHUR (integer pointer array, dimension SIZE SCHUR) must be allocated
and initialized by the user so that LISTVAR SCHUR(i), i=1, . . . , SIZE SCHUR must hold the
ith variable of the Schur complement matrix. It is accessed during analysis (JOB=1) and it is not
altered by MUMPS.
If a given ordering (Subsection 5.4.2, ICNTL(7) =1) is set by the user, the permutation should
also include the variables of the Schur complement, so that: PERM IN(LISTVAR SCHUR(i))=N-
SIZE SCHUR+i, for 1 ≤ i ≤SIZE SCHUR.

5.15.1 Centralized Schur complement stored by rows (ICNTL(19)=1)

Note that this option is becoming obsolete and is not recommended anymore because the memory for the
Schur is doubled and because it requires a copy or message transfer of the Schur computed internally by
MUMPS into the SCHUR argument. If a centralized Schur complement is required, we refer the user to the
Subsection 5.15.3 “Centralized Schur complement stored by columns” instead.
mumps par%SCHUR is a real (complex in the complex version) 1-dimensional pointer array that
should point to SIZE SCHUR × SIZE SCHUR locations in memory. It must be allocated by the
user on the host (independently of the value of PAR) before the factorization phase. On output
from the factorization phase, and on the host node, the 1-dimensional pointer array SCHUR of
length SIZE SCHUR × SIZE SCHUR holds the (dense) Schur matrix of order SIZE SCHUR.
Note that the order of the indices in the Schur matrix is identical to the order provided by the user
in LISTVAR SCHUR and that the Schur matrix is stored by rows. If the matrix is symmetric then
only the lower triangular part of the Schur matrix is provided (by rows) and the upper part is not
significant. This can also be viewed as the upper triangular part stored by columns in which case
the lower part is not defined.

5.15.2 Distributed Schur complement (ICNTL(19)=2 or 3)

MUMPS gives the possibility to output the Schur complement matrix distributed onto the processors
(ICNTL(19) = 2,3) using a 2D block cyclic distribution (please refer to [21] (for example) for the
notion of grid of processors and 2D block cyclic distributions) stored by columns.

55
For symmetric matrices, if ICNTL(19) =2, only the lower part of the Schur matrix is generated,
otherwise, if ICNTL(19) =3, the complete Schur matrix is generated.
For unsymmetric matrices MUMPS always provides the complete Schur matrix, so that
ICNTL(19)=2 and ICNTL(19)=3 have the same effect.
On entry to the analysis phase (JOB = 1), the following parameters should be defined on the host:
mumps par%NPROW, mumps par%NPCOL, mumps par%MBLOCK, and mumps par%NBLOCK
are integers corresponding to the characteristics of a 2D block cyclic grid of processors. If any of
these quantities is smaller than or equal to zero or has not been defined by the user, or if NPROW×
NPCOL is larger than the number of slave processors available (total number of processors if
PAR=1, total number of processors minus 1 if PAR=0), then a grid shape will be computed by
the analysis phase of MUMPS and NPROW, NPCOL, MBLOCK, NBLOCK will be overwritten on
exit from the analysis phase. We briefly describe here the meaning of the four above parameters in
a 2D block cyclic distribution:
• NPROW is the number of rows of the process grid (or the number of processors in a column
of the process grid),
• NPCOL is the number of columns of the process grid (or the number of processors in a row
of the process grid),
• MBLOCK is the blocking factor used to distribute the rows of the Schur complement,
• NBLOCK is the blocking factor used to distribute the columns of the Schur complement.
As in ScaLAPACK, we use a row-major process grid of processors, that is, process ranks (as
provided to MUMPS in the MPI communicator) are consecutive in a row of the process grid.
NPROW, NPCOL, MBLOCK and NBLOCK should be passed unchanged from the analysis phase
to the factorization phase. If the matrix is symmetric (SYM=1 or 2) and ICNTL(19)=3 (see below),
then the values of MBLOCK and NBLOCK should be equal.
On exit from the analysis phase, the following two components are set by MUMPS on the first NPROW
× NPCOL slave processors (the host is excluded if PAR=0 and the processors with largest MPI ranks in
the communicator provided to MUMPS may not be part of the grid of processors).
mumps par%SCHUR MLOC is an integer giving the number of rows of the local Schur complement
matrix on the concerned processor. It is equal to MAX(1,NUMROC(SIZE SCHUR, MBLOCK,
myrow, 0, NPROW)), where
• NUMROC is an integer function defined in most ScaLAPACK implementations (also used
internally by the MUMPS package),
• SIZE SCHUR, MBLOCK, NPROW have been defined earlier, and
• myrow is defined as follows:
Let myid be the rank of the calling process in the communicator COMM provided to MUMPS.
(myid can be returned by the MPI routine MPI COMM RANK.)
– if PAR = 1 myrow is equal to myid / NPCOL,
– if PAR = 0 myrow is equal to (myid − 1) / NPCOL.
Note that an upperbound of the minimum value of leading dimension (SCHUR LLD defined below)
is equal to ((SIZE SCHUR+MBLOCK-1)/MBLOCK+NPROW-1)/NPROW*MBLOCK.
mumps par%SCHUR NLOC is an integer giving the number of columns of the local Schur
complement matrix on the concerned processor. It is equal to NUMROC(SIZE SCHUR, NBLOCK,
mycol, 0, NPCOL), where
• SIZE SCHUR, NBLOCK, NPCOL have been defined earlier, and
• mycol is defined as follows:
Let myid be the rank of the calling process in the communicator COMM provided to MUMPS.
(myid can be returned by the MPI routine MPI COMM RANK.)
– if PAR = 1 mycol is equal to MOD(myid, NPCOL),
– if PAR = 0 mycol is equal to MOD(myid − 1, NPCOL).

56
On entry to the factorization phase (JOB = 2), the user should give on input the following components
of the structure:
mumps par%SCHUR LLD (integer) should be set to the leading dimension of the local Schur
complement matrix. It should be larger or equal to the local number of rows of that matrix,
SCHUR MLOC (as returned by MUMPS on exit from the analysis phase on the processors that
participate in the computation of the Schur). SCHUR LLD is not modified by MUMPS.
mumps par%SCHUR (real/complex one-dimensional pointer array) should be allocated by the user
on the NPROW × NPCOL first slave processors (the host is excluded if PAR=0 and the processors
with largest MPI ranks in the communicator provided to MUMPS may not be part of the grid of
processors). Its size should be at least equal to SCHUR LLD × (SCHUR NLOC - 1) + SCHUR MLOC,
where SCHUR MLOC, SCHUR NLOC, and SCHUR LLD have been defined above.
On exit from the factorization phase, the pointer array SCHUR contains the Schur complement, stored
by columns, in the format corresponding to the 2D cyclic grid of NPROW × NPCOL processors, with
block sizes MBLOCK and NBLOCK, and local leading dimensions SCHUR LLD.
Note that if ICNTL(19)=3 and the Schur is symmetric (SYM=1 or 2), then the constraint
mumps par%MBLOCK = mumps par%NBLOCK should hold.
Note that setting NPCOL × NPROW = 1 will centralize the Schur complement matrix, stored by
columns (instead of by rows as in the ICNTL(19)=1 option). More details on this are presented in
Subsection 5.15.3.

5.15.3 Centralized Schur complement stored by columns (ICNTL(19)=2 or 3)

In order to have a centralized Schur complement matrix by columns, it is possible to use a particular
case of the distributed Schur complement (ICNTL(19)=2 or 3, see Subsection 5.15.2), where the Schur
complement is only assigned to one processor (NPCOL × NPROW = 1). Therefore we refer the reader
to the previous section for a detailed description of the parameters for using this option. This option is
recommended compared to ICNTL(19)=1 (centralized Schur complement by rows).
The Schur complement matrix will be available on the host node if PAR=1, and on the node with MPI
identifier 1 (first working slave processor) if PAR=0.
Let us summarize a simple case of use, where the user wants a centralized Schur complement and
where PAR=1 (working host node).
On top of SIZE SCHUR and LISTVAR SCHUR described earlier, the user should set the following
parameters on the host on entry to the analysis phase:
NPROW = NPCOL = 1, in order to define a distribution that uses only one processor (the host, assuming
that PAR=1);
MBLOCK = NBLOCK = 100. Those arguments must be provided and be strictly positive but their actual
value will not change the distribution since NPROW=NPCOL=1.
ICNTL(19)=2 or 3.
On entry to the factorization phase, the user should provide on the host8 :
mumps par%SCHUR LLD=SIZE SCHUR: we consider here the simple case where the leading
dimension of the Schur is equal to its order.
mumps par%SCHUR, a real (complex in the complex version) one-dimensional pointer array of size
SIZE SCHUR × SIZE SCHUR that should be allocated by the user.
On exit from the factorization phase, the pointer array SCHUR available on the host contains the
Schur complement. If the matrix is unsymmetric (SYM=0), then the settings ICNTL(19)=2 and
ICNTL(19)=3 have an identical behaviour and the unsymmetric Schur complement is returned by
columns (i.e., in column-major format). If the matrix is symmetric (SYM=1 or 2) and ICNTL(19)=2,
then only the lower triangular part of the symmetric Schur is returned, stored by columns, and the upper
triangular part should not be accessed. Note that this is equivalent to say that the upper triangular part
8 As said above, we assume a working host model (PAR=1), otherwise this becomes processor 1 – please refer to the general

description from paragraph “Distributed Schur Complement” above for more information.

57
is returned by rows and the lower triangular part is not accessed. If the matrix is symmetric (SYM=1 or
2) and ICNTL(19)=3, then both the lower and upper triangular parts are returned. Because the Schur
complement is symmetric, this can be seen both as a row-major and as a column-major storage.

5.15.4 Using partial factorization during solution phase (ICNTL(26)= 0, 1 or 2)

As explained in Subsection 3.17, when a Schur complement has been computed during the factorization
phase, either the solution phase computes a solution on the internal problem (ICNTL(26)=0) or
the complete problem can be used to first condensed the right-hand side on the Schur variables
(ICNTL(26)=1). Then the condensed right-hand side is made available to the user for computing the
local solution using the Schur matrix. This local solution corresponding to Schur variables can then be
expanded to compute a global solution (ICNTL(26)=2).
ICNTL(26) drives the solution phase if a Schur complement matrix has been computed (ICNTL(19)
̸= 0, see Subsection 3.17 for details)
Phase: accessed by the host during the solution phase. It will be accessed also during factorization
if the forward elimination is performed during factorization (ICNTL(32)=1)
Possible variables/arrays involved: REDRHS, LREDRHS
Possible values :
0 : standard solution phase on the internal problem; referring to the notations from
Subsection 3.17, only the system A1,1 x1 = b1 is solved and the entries of the right-hand
side corresponding to the Schur are explicitly set to 0 on output.
1 : condense/reduce the right-hand side on the Schur. Only a forward elimination is performed.
The solution corresponding to the ‘internal” (non-Schur) variables is returned together with
the reduced/condensed right-hand-side. The reduced right-hand side is made available on the
host in the pointer array REDRHS, that must be allocated by the user. Its leading dimension
LREDRHS must be provide, too.
2 : expand the Schur local solution on the complete solution variables. REDRHS is considered
to be the solution corresponding to the Schur variables. It must be allocated by the user as
well as its leading dimension LREDRHS must be provided. The backward substitution is then
performed with the given right-hand side to compute the solution associated with the ”internal”
variables. Note that the solution corresponding to the Schur variables is also made available
in the main solution vector/matrix.
Values different from 1 and 2 are treated as 0.
Default value: 0 (normal solution phase)
Incompatibility: If ICNTL(26) = 1 or 2, then error analysis and iterative refinement are disabled
(ICNTL(11) and ICNTL(10))
Related parameters: ICNTL(19), ICNTL(32)
Remarks: If ICNTL(26) ̸= 0, then the user should provide workspace in the pointer array REDRHS,
as well as a leading dimension LREDRHS (see Subsection 5.15). Note that if no Schur complement
was computed, ICNTL(26) = 1 or 2 results in an error.
If the complete system should be solved using the Schur complement matrix (see Subsection 3.17,
ICNTL(26)=1 or 2), then the following parameters must be defined on the host on entry to the solution
step:
The right-hand side matrix [B1 B2 ]T (see Figure 3) must be defined on input of the solve phase. The
user can input the right-hand side matrix as a dense matrix (ICNTL(20) = 0, RHS, LRHS, NRHS,
see Subsection 5.14.1) or as a sparse matrix (ICNTL(20) = 1, NZ RHS, NRHS, RHS SPARSE,
IRHS SPARSE, IRHS PTR, see Subsection 5.14.2).
mumps par%LREDRHS is an optional integer parameter defining the leading dimension of the reduced
right-hand side, REDRHS, that must be set by the user when NRHS is provided and is greater than
1. In that case, it must be larger or equal to SIZE SCHUR, the size of the Schur complement. If
NRHS is not provided (or is equal to 1), LREDRHS needs not be provided.

58
NRHS NRHS

L 11 U 11 U 12 X1 B1
=
L 21 I S X2 B2 SIZE_SCHUR

SIZE_SCHUR

Figure 3: Solving the complete system using the Schur complement matrix

mumps par%REDRHS is a real (complex in the complex version) one-dimensional pointer array that
should be allocated by the user before entering the solution phase. Its size should be at least equal
to LREDRHS ×(NRHS-1)+ SIZE SCHUR.
If the reduction/condensation phase should be performed (ICNTL(26)=1), then on exit from the
solution phase, REDRHS(i+(k-1)*LREDRHS), i=1, . . ., SIZE SCHUR, k=1, . . ., NRHS will hold
the reduced right-hand side (the y2 vector of Equation (13)).
If the expansion phase should be performed (ICNTL(26)=2), then REDRHS(i+(k-1)*LREDRHS),
i=1, . . ., SIZE SCHUR, k=1, . . ., NRHS must be set (on entry to the solution phase) to the solution
on the Schur variables (the x2 vector of Equation (14)). In this case (i.e., ICNTL(26)=2) REDRHS
is not altered by MUMPS.
Note that on exit, the solution matrix [X1 X2 ]T in Figure 3 is stored in the RHS parameter, except in
case of distributed solution where it will be stored in ISOL loc and SOL loc (see ICNTL(21) and
Subsection 5.14).

5.16 Block Low-Rank (BLR) feature (ICNTL(35) and CNTL(7))

The current version of MUMPS implements the Block Low-Rank approach described in Subsection 3.18.
In this section, we first describe how to enable this BLR functionality in a multithreaded environment
(Subsection 5.16.1), we then describe the Application Program Interface (API) to control the use of the
BLR factorization (Subsection 5.16.2), and finally comment on the statistics that are provided on output
(Subsection 5.16.3).

5.16.1 Enabling the BLR functionality at installation

It is strongly advised to interface MUMPS with external ordering packages (Metis [34] or SCOTCH [42])
to enable the BLR representation of frontal matrix to rely on K-way partitionning [50].
On multiprocessor nodes with a shared memory it is strongly advised to use multithreading and
combine multithreaded BLAS with OpenMP directives-based multithreading. For the GNU fortran
compiler -fopenmp should thus be added to OPTF and OPTL in Makefile.inc. For the Intel
compiler -qopenmp should be added. OMP NUM THREADS should then be set at run time.
MUMPS assumes that the BLAS library is compatible with OpenMP and both multithreaded BLAS
and sequential BLAS in appropriate portions of the code will be used: serial BLAS when parallelism
comes from OpenMP parallel regions and multithreaded BLAS when there is not enough parallelism
from OpenMP. In this case, only OMP NUM THREADS should be set and the number of threads for the
BLAS should not be fixed (e.g., with MKL do not set MKL NUM THREADS).
In case you use a multithreaded implementation of the BLAS routines that is not compatible with
OpenMP and cannot adapt the number of threads to the current OpenMP region, then you can still use a
multithreaded BLAS library, but you should compile MUMPS with the option -DBLR NOOPENMP. This
will lead to a performance loss with the BLR feature (see ICNTL(35)), because multithreaded BLAS
will be used on small matrices which would otherwise have been processed in parallel at a higher level
with OpenMP.
Note that, if you mainly use BLR compression on large matrices (see ICNTL(35)), another
possilibility is to use OpenMP with a serial BLAS library.

59
As these two options will lead to suboptimal performance, we recommend to use a BLAS library
compatible with OpenMP when available.
Please note that, low-rank approximations are computed using a truncated QR factorization with
column pivoting, implemented as a variant of the LAPACK GEQP3 and LAQPS routines; linking with
a LAPACK library is necessary to satisfy the dependencies of this feature. On many systems, BLAS and
LAPACK routines are provided by a single library (e.g., the Intel MKL library) but, if it’s not the case,
the LAPACK library can simply be added in the LAPACK variable of the MUMPS Makefile.inc file.

5.16.2 BLR API

In order to activate the BLR feature and define the dropping parameter ε used for the approximations and
choose the BLR variant, the following parameters have been introduced.
ICNTL(35) controls the activation of the BLR feature
Phase: accessed by the host during the analysis and during the factorization phases
Possible values :
0 : Standard analysis and factorization (BLR feature is not activated).
1 : BLR feature is activated and automatic choice of BLR option is performed by the software.
2 : BLR feature is activated during both the factorization and solution phases, which allows for
memory gains by storing the factors in low-rank.
3 : BLR feature is activated during the factorization phase but not the solution phase, which is still
performed in full-rank. As a consequence, the full-rank factors must be kept and no memory
gains can be obtained. In an OOC context, (ICNTL(22)=1) this option enables the user to
write all factors to disk which is not the case with ICNTL(35)=2 since factors in low-rank
form are not written to disk.
Other values are treated as 0.
Default value: 0 (standard multifrontal factorization).
Related parameters: CNTL(7) (BLR approximations accuracy), ICNTL(36) (BLR factorization
variant), ICNTL(37) (compression of the contribution blocks), ICNTL(38) (estimation of the
compression rate of the factors) and ICNTL(39) (estimation of the compression rate of the
contribution blocks).
Incompatibility: Note that the activation of the BLR feature is currently incompatible with
elemental matrices (ICNTL(5) = 1) (see error -800, subject to change in the future), and when
the forward elimination during the factorization is requested (ICNTL(32) = 1), see error -43.
Remarks: If ICNTL(35)=1, then the automatic choice of BLR option is to activate BLR
feature during both factorization and solution phases (ICNTL(35)=2). In order to activate the
BLR factorization, ICNTL(35) must be equal to 1, 2 or 3 before the analysis, where some
preprocessing on the graph of the matrix is needed to prepare the low-rank factorization. The value
of ICNTL(35) can then be set to any of the above values on entry to the factorization (e.g., taking
into account the values returned by the analysis). On the other hand, if ICNTL(35)=0 at analysis,
only ICNTL(35)=0 is allowed for the factorization (full-rank factorization). When activating
BLR, it is recommended to set ICNTL(35) to 1 or 2 rather than 3 to benefit from memory gains.

CNTL(7) is the dropping parameter ε (double precision real value) controlling the accuracy of the Block
Low-Rank approximations.
Phase: accessed by the host during the factorization phase when ICNTL(35)=1, 2 or 3
Possible values :
0.0 : full precision approximation.
> 0.0 : the dropping parameter is CNTL(7).
Default value: 0.0 (full precision (i.e., no approximation)).

60
Related parameters: ICNTL(35)
Remarks: The value of CNTL(7) is used as a stopping criterion for the compression of BLR
blocks which is achieved through a truncated Rank Revealing QR factorization. More precisely, to
compute the low-rank form of a block, we perform a QR factorization with column pivoting which
is stopped as soon as a diagonal coefficient of the R factor falls below the threshold, i.e., when
∥rkk ∥ < ε. This is implemented as a variant of the LAPACK [19] GEQP3 routine. Larger values
of this parameter lead to more compression at the price of a lower accuracy. Note that ϵ is used as
an absolute tolerance, i.e., not relative to the input matrix, or the frontal matrix or the block norms;
for this reason we recommend to scale the matrix or or let the solver automatically preprocess (e.g.,
scale) the input matrix.
Note that, depending on the application, gains can be expected even with small values (close to
machine precision) of CNTL(7).

ICNTL(36) controls the choice of the BLR factorization variant

Phase: accessed by the host during the factorization phase when ICNTL(35)=1, 2 or 3
Possible values :
0 : Standard UFSC variant with low-rank updates accumulation (LUA)
1 : UCFS variant with low-rank updates accumulation (LUA). This variant consists in performing
the compression earlier in order to further reduce the number of operations. Although it
may have a numerical impact, the current implementation is still compatible with numerical
pivoting.
Other values are treated as 0.
Default value: 0 (UFSC variant).
Related parameters: ICNTL(35) and CNTL(1)
Remarks: If numerical pivoting is not required and thus CNTL(1) can be set to 0.0, further
performance gains can be expected with the UCFS version.

ICNTL(37) controls the compression of the contribution blocks (CB)

Phase: accessed by the host during the factorization phase when ICNTL(35)=1, 2 or 3
Possible values :
0 : contribution blocks are not compressed
1 : contribution blocks are compressed, reducing the memory consumption at the cost of some
additional operations
Other values are treated as 0.
Default value: 0 (contribution blocks not compressed).
Related parameters: ICNTL(35), CNTL(7)

Estimating the memory footprint during analysis is difficult because the compression rate of the BLR
factors is only known after factorization. (When ICNTL(35)=2, factors are kept in compressed form
and the memory footprint can be reduced.) To enable the user to estimate during analysis the effect
of BLR compression on the memory footprint, ICNTL(38) has been introduced. ICNTL(38) only
influences the statistics printed during the analysis phase (only the INFO/INFOG arrays are affected).
Furthermore, Out-Of-Core can also be combined with BLR compression. In this case, the factors that are
kept full-rank (all of them if ICNTL(35)=3, or the ones of the frontal matrices not considered for BLR
compression if ICNTL(35)=2) will be written onto the disk during the factorization. BLR estimations
of the peak of memory for all possible situations are provided during analysis.

61
After factorization, statistics on the effective low-rank compression and on the memory effectively
allocated/used are then provided to the user (see Section 5.16.3) and can then be used to adjust the value
of ICNTL(38).
Similarly, with ICNTL(39) the user can provide an estimation of the compression rate of the
contributions blocks.
ICNTL(38) estimated compression rate of LU factors
Phase: accessed by the host during the analysis and the factorization phases when ICNTL(35)=1,
2 or 3
Possible values : between 0 and 1000 (1000 is no compression and 0 is full compression); other
values are treated as 0; ICNTL(38)/10 is a percentage representing the typical compression of the
compressed factors
factor matrices in BLR fronts: ICNTL(38)/10 = uncompressed factors × 100.
Default value: 600 (when factors of BLR fronts are compressed, their size is 60.0% of their full-
rank size).
Related parameters: ICNTL(35), CNTL(7)
Remarks: Influences statistics provided in INFO(29), INFO(30), INFO(31), INFOG(36),
INFOG(37), INFOG(38), INFOG(39) , but also INFO(32-35) and INFOG(40-43)

ICNTL(39) estimated compression rate of contribution blocks

Phase: accessed by the host during the analysis and the factorization phases when ICNTL(35)=1,
2 or 3, and ICNTL(37)= 1
Possible values : between 0 and 1000 (1000 is no compression and 0 is full compression); other
values are treated as 0; ICNTL(39)/10 is a percentage representing the typical compression of the
compressed CB
CB in BLR fronts: ICNTL(39)/10 = uncompressed CB × 100.
Default value: 500 (when CB of BLR fronts are compressed, their size is 50% of their full-rank
size).
Related parameters: ICNTL(35), ICNTL(37)
Remarks: Influences statistics provided in INFO(32), INFO(33), INFO(34), INFO(35),
INFO(36), INFO(37), INFO(38), INFOG(40), INFOG(41), INFOG(42), INFOG(43)
INFOG(44) INFOG(45) INFOG(46) INFOG(47)

5.16.3 BLR output: statistics

In the following, output statistics produced after the BLR factorization (see Section 3.18) are indicated.
The effect of BLR compression on both the number of operations and the number of entries in the factors
is reported.
Please note that when ICNTL(35)=2, factors are stored in compressed form and will be used in
compressed form during the solve phase. The effective size used to store the factors (INFOG(9)) thus
depends on the fact that ICNTL(35) is set to 2. The number of operations during the solve phase is
related to the size of the factors and will also benefit from the compression.

-------------- Beginning of BLR statistics --------------------------------

ICNTL(36) BLR variant = 1
CNTL(7) Dropping parameter controlling accuray = 1.0E-06
Statistics after BLR factorization
Number of BLR fronts = 847
Fraction of factors in BLR fronts = 90.8%
Statistics on the number of entries in factors:
INFOG(29) Theoretical nb of entries in factors = 4.881E+06 (100.0%)
INFOG(35) Effective nb of entries (% of INFOG(29))= 1.599E+06 ( 32.7%)

62
Statistics on operation counts (OPC):
RINFOG(3) Total theoretical operations = 1.985E+09 (100.0%)
RINFOG(14) Total effective OPC (% of RINFOG(3)) = 2.243E+08 ( 11.3%)
---------- End of BLR statistics -----------------------------------------

This output reports the following information:

• Number of BLR fronts: not all frontal matrices are considered for compression. Only those
that fulfill a minimum size requirement are considered.
• Fraction of factors in BLR fronts: corresponds to the fraction of full-rank factors
that will be be considered for compression (i.e. the ratio of the number of entries in the factors of
all frontal matrices on which BLR compression will be performed over the total number of entries
(INFOG(29)).
• INFOG(29) Theoretical nb of entries in factors: number of entries in the
factors (sum over all processors), if a standard (full-rank) factorization had been performed.
• INFOG(35) Effective nb of entries: effective number of entries in the factors (sum
over all processors) taking into account BLR factor compression. The value reported in parentheses
is the fraction of the theoretical number of entries in full-rank factors INFOG(29).
• RINFOG(3) Total theoretical full-rank OPC: number of (real or complex) floating-
point operations (sum over all processors), if a standard standard (full-rank) factorization had been
performed. It is provided here as a reference.
• RINFOG(14) Total effective OPC: actual number of floating-point operations done by
the factorization phase with BLR feature. The value reported in parentheses is the fraction of the
theoretical full-rank operation count RINFOG(3). Using the BLR feature, the reduction in time
for factorization should be related to the percentage reduction of full-rank operations.

63
5.17 Save (JOB=7) / Restore (JOB=8) feature
To save to disk MUMPS internal data associated to a given instance, MUMPS should be called with JOB=7
(see Subsection 5.1.1). These MUMPS internal data are saved in binary files. It is possible to use the save
feature (JOB=7) before or after any of the main phases (analysis, factorization, solve, JOB=1,2,3,4,5,6).
After that, it is possible to continue working with the existing instance until, at some point, the instance
should be terminated (JOB=-2). In order to restart MUMPS with the saved data, the user should first create
a new instance (JOB=-1, see Subsection 5.1.1) and then restore into that instance the saved data with
a call to MUMPS with JOB=8. Note that arrays that are allocated and freed by the user are not saved
(JOB=7) nor restored (JOB=8), although some of them might be requested for further calls to MUMPS.
For example, the arrays associated to the input matrix are not saved. See important remarks on how to
use this feature in Subsection 5.17.3.

5.17.1 Location and names of the save files

The save files are written in the directory provided by the user. Their names may start with an user defined
prefix. The following variables are involved:
mumps par%SAVE DIR (string, maximum length of 255 characters) must be provided by the user (on
each processor) to control the directory where MUMPS internal data will be stored.
It is also possible to provide the directory through environment variables. If SAVE DIR is
not defined, then MUMPS checks for the environment variable MUMPS SAVE DIR. If neither
SAVE DIR nor the environment variable MUMPS SAVE DIR are defined, then error -77 will be
raised.
mumps par%SAVE PREFIX (string, maximum length of 255 characters)
can be provided by the user (on each processor) to prefix the files. It is also possible to provide the
file prefix through environment variables. If SAVE PREFIX is not defined, then MUMPS checks for
the environment variable MUMPS SAVE PREFIX. If neither SAVE PREFIX nor the environment
variable MUMPS SAVE PREFIX are defined, then MUMPS will use save.
To compound the file names, the prefix strings (eventually defined with SAVE PREFIX) will be
appended at least with myid in case of MPI, where myid is the MPI rank of the current processor. The
file name extension is .mumps, so the files can easily be detected afterwards.
A text file containing a recap of the saved instance is also created with the same basename, on every
processor. The file name extension is .info and the files are meant to be read by the user.

5.17.2 Deletion of the save files (JOB=-3)

Once the binary files are not needed anymore, i.e. no other restart is planned with these data, it is
possible to delete the files by calling MUMPS with JOB=-3. MUMPS will access the structure components
SAVE DIR and SAVE PREFIX to know which files should be deleted. If the user doesn’t specify
the directory in the structure components mumps par%SAVE DIR and/or the prefix of the files in
mumps par%SAVE PREFIX, MUMPS will check for the environment variable MUMPS SAVE DIR for
the directory, and for the environment variable MUMPS SAVE PREFIX or save for the prefix name.
During the deletion phase (JOB=-3), MUMPS does not restore the files in SAVE DIR with
SAVE PREFIX in the current MUMPS instance. Thus it is possible to remove some previously saved
files and to pursue the computations with the current instance.
In case of out-of-core (ICNTL(22)=1), special care should be taken to manage associated out-of-
core files. See Subsection 5.17.4 for more details.

5.17.3 Important remarks for the restore feature (JOB=8)

A call to MUMPS with JOB= 8 must be preceded by a call with JOB=-1 on the same instance.
Requested parameters: the parameters COMM, SYM and PAR are not restored and must be given back
by the user before the call to MUMPS with JOB=-1. The values SYM and PAR must be identical to
the values of the saved instance. Also, the size of the communicator COMM should be the same as
the size of the one that was used at the moment of the save (JOB= 7).

64
Control parameters: during the restoration of MUMPS data, the values of ICNTL and CNTL are set back
to the saved values. So any intended change of these values should occur after the call of MUMPS
with JOB=8 to be taken into account during the following calls of MUMPS.
Constraint with binary files: because binary files are used to save the data, the binary storage should
have identical format (for example, little or big endian) on the machine that saves the data and the
one that restores it.
Arrays allocated and freed by the user As mentioned above, the arrays that are allocated and freed by
the user are not saved and cannot be restored by this feature, although some of them might be
requested for further calls to MUMPS. For example, the arrays associated to the input matrix are not
stored. The user is thus responsible of providing them again, if they are going to be needed. Data
from the user that are not saved but that might be required internally by MUMPS after the restoration
include:
Input matrix: the arrays associated to the input matrix (see Subsection 5.2.2) are not saved during
a call of MUMPS with JOB=7. The possible arrays concerned are: IRN, JCN, A, IRN loc,
JCN loc, A loc, ELTPTR, ELTVAR, A ELT. Consequently, the same matrix data must be
provided again by the user before a (possibly new) factorization or before a solve if iterative
refinement (ICNTL(10)) or error analysis (ICNTL(11)) are requested.
Scaling arrays: the scaling arrays ROWSCA and COLSCA are not saved by the save phase in case
they had been provided by the user before the factorization and should then be provided again
before the solve phase.
Right-hand side vector: the Right-hand side (or Solution) vector RHS is not saved by the save
phase.
MPI context: the only constraint is that before a call to MUMPS with JOB=8 using the communicator
COMM provided in the JOB=-1 call, the process with rank<myid> should have access to the file
<SAVE DIR>/<SAVE PREFIX> myid.mumps. There are 6 possibilities :
<SAVE DIR>/<SAVE PREFIX> myid.mumps
<SAVE DIR>/<MUMPS SAVE PREFIX> myid.mumps
<SAVE DIR>/save myid.mumps
< MUMPS SAVE DIR>/<SAVE PREFIX> myid.mumps
< MUMPS SAVE DIR>/<MUMPS SAVE PREFIX> myid.mumps
< MUMPS SAVE DIR>/save myid.mumps

5.17.4 Combining the save/restore feature with out-of-core

The save/restore feature is fully compatible with the out-of-core computations (ICNTL(22) = 1). After
a call to MUMPS with JOB=7, the files storing the factors are associated to a saved instance. They will
be needed if the saved instance is restored. Thus they are not deleted, neither during the termination
phase (JOB=-2), nor at the beginning of a new factorization (JOB=2), unlike what usually happens with
out-of-core files (see Subsection 5.8).
After a save (JOB=7) or a restore (JOB=8), it is still possible to perform a new factorization (JOB=2)
using the out-of-core feature. In this case, new out-of-core files are produced. The old out-of-core files
are not deleted since they are associated to a saved instance. On the contrary, the new files storing the new
factors are not associated to a saved instance. They will be treated as usual out-of-cores files, i.e. they
will be deleted during the termination phase (JOB=-2) or at the beginning of an other new factorization.
When MUMPS is called with JOB=-3, the out-of-core factor files associated to the saved instance are
marked out for deletion. This means the current out-of-core files can be used again during solve phases.
But because they are not associated to a saved instance, they will be deleted during the termination phase
or at the beginning of a new factorization.
In the lifetime of a classical MUMPS instance, the out-of-core files are kept up to the termination phase
or a new factorization. If several saves (JOB=7) are performed after the same out-of-core factorization,
they will all refer to the same out-of-core files. In such a situation, the user might need advanced controls
on the out-of-core files management available using ICNTL(34).
ICNTL(34) controls the conservation of the OOC files during JOB=-3.

65
Phase: accessed by the host during the save/restore files deletion phase (JOB=-3) in case of out-of-
core (ICNTL(22)=1).
Possible values :
0: the out-of-core files are marked out for deletion
1: the out-of-core files should not be deleted because another saved instance references them.
Other values are treated as 0.
Default value: 0 (out-of-core files associated to a saved instance are marked out for deletion at the
end of the out-of-core file lifetime)
Remarks: MUMPS will delete only the out-of-core files that are referenced in the saved data
identified by the value of SAVE DIR and SAVE PREFIX. Extra out-of-core files with the same
OOC TMPDIR and OOC PREFIX are not deleted.

5.17.5 Combining the save/restore feature with WK USER

In case an optional workspace is provided by the user and a call to MUMPS with JOB=7 is executed,
MUMPS will expect the user to provide a new workspace to load the data before restore (LWK USER (and
WK USER) (see Subsection 5.9) must be set (and allocated) after the call to MUMPS with JOB=-1 but
before the call with JOB=8).
MUMPS will check that the workspace provided is large enough or has the exact same size as the one
provided in an earlier call. If not, MUMPS will raise an error with the error code −11.

5.17.6 Error management

In case of error while saving or restoring an instance, an error code will be returned in INFO(1) and
INFOG(1) (see Section 7) and no files are created.
In case the user wants to save (JOB=7)) an instance after a call to MUMPS that returned with an error
(INFO(1)<0 and INFOG(1)<0), this is possibile, although it is not recommended. After the save, the
values of INFO(1), INFO(2), INFOG(1), INFOG(2) on exit will represent the success or failure of
the save phase. Similarly, after restoring a saved instance (JOB=8), INFO(1), INFO(2), INFOG(1),
INFOG(2) are associated to the success or failure of the restore operation, not of the last call to MUMPS
done before saving the instance.
It is still possible to pursue the computations on the restored instance after the last successful step. For
example, one can perform a factorization (JOB=2) on a restored instance for which the analysis (JOB=1)
was successful but the factorization failed before saving the instance.

5.18 Setting the number of OpenMP threads (ICNTL(16))

For normal and safe setting of multithreaded parallelism (in combination with MPI or not) please refer to
Section 3.12.
When, for some reason, setting the number of OpenMP threads is not possible, then one can ask
MUMPS to set the number of OpenMP threads. This is done internally by a call to the OpenMP
function omp set num threads. If this possibility is activated, during the whole current call to MUMPS,
ICNTL(16) OpenMP threads will be used. Before the call to MUMPS exits, the number of OpenMP
threads is set back to its previous value.
ICNTL(16) controls the setting of the number of OpenMP threads
Phase: accessed by the host at the beginning of all phases
Possible values :
0 : nothing is done, MUMPS uses the number of OpenMP threads configured by the calling
application.
> 0 : MUMPS sets the number of OpenMP threads on entry and reset the previous value on exit.
Other values are treated as 0.
Default value: 0 (no setting of the number of OpenMP threads done by MUMPS internally)

66
5.19 Compact workarray id%S at the end of factorization phase
When memory for the solve phase is critical (case of large number of right-hand sides) or when the
memory footprint at the end of the factorization phase needs to be reduced, setting ICNTL(49) to 1 or
2 enables to compact the internal workarray id%S at the end of the factorization so that id%S only holds
factor matrices needed for the solution phase.
ICNTL(49) compact workarray id%S at the end of factorization phase
Phase: accessed by the host during factorization phase
Possible values :
0 : nothing is done.
1 : compact workarray id%S(MAXS) at the end of the factorization phase while satisfying the
memory contraint that might have been provided with ICNTL(23) feature.
2 : compact workarray id%S(MAXS) at the end of the factorization phase. The memory
constraint that might have been provided with ICNTL(23) feature does not apply to this
process.
Other values are treated as 0.
Default value: 0
Incompatibility: With the use of LWK USER / WK USER feature.
Remarks: ICNTL(49)=1,2 might require intermediate memory allocation to reallocate id%S of
minimal size. If the memory allocation fails, then a warning is returned and nothing is done. If
ICNTL(49)=1 and the memory constraint provided with ICNTL(23)> 0 would not be satisfied
then a warning is raised and nothing is done.

6 Control parameters
On exit from the initialization call (JOB = –1), the control parameters are set to default values. If the
user wishes to use values other than the defaults, the corresponding entries in mumps par%ICNTL and
mumps par%CNTL should be reset after this initial call and before the call in which they are used.

6.1 Integer control parameters

mumps par%ICNTL is an integer array of dimension 60.
• ICNTL(1) is the output stream for error messages
• ICNTL(2) is the output stream for diagnostic printing, statistics, and warning message
• ICNTL(3) is the output stream for global information, collected on the host
• ICNTL(4) is the level of printing for error, warning, and diagnostic messages
• ICNTL(5) controls the matrix input format
• ICNTL(6) permutes the matrix to a zero-free diagonal and/or scale the matrix
• ICNTL(7) computes a symmetric permutation in case of sequential analysis
• ICNTL(8) describes the scaling strategy
• ICNTL(9) computes the solution using A or AT
• ICNTL(10) applies the iterative refinement to the computed solution
• ICNTL(11) computes statistics related to an error analysis
• ICNTL(12) defines an ordering strategy for symmetric matrices
• ICNTL(13) controls the parallelism of the root node
• ICNTL(14) controls the percentage increase in the estimated working space
• ICNTL(15) exploits compression of the input matrix resulting from a block format

67
• ICNTL(16) controls the setting of the number of OpenMP threads
• ICNTL(18) defines the strategy for the distributed input matrix
• ICNTL(19) computes the Schur complement matrix
• ICNTL(20) determines the format (dense, sparse, or distributed) of the right-hand sides
• ICNTL(21) determines the distribution (centralized or distributed) of the solution vectors
• ICNTL(22) controls the in-core/out-of-core (OOC) factorization and solve
• ICNTL(23) corresponds to the maximum size of the working memory in MegaBytes that MUMPS
can allocate per working processor
• ICNTL(24) controls the detection of “null pivot rows”
• ICNTL(25) allows the computation of a solution of a deficient matrix and also of a null space basis
• ICNTL(26) drives the solution phase if a Schur complement matrix
• ICNTL(27) controls the blocking size for multiple right-hand sides
• ICNTL(28) determines whether a sequential or parallel computation of the ordering is performed
• ICNTL(29) defines the parallel ordering tool to be used to compute the fill-in reducing permutation
• ICNTL(30) computes a user-specified set of entries in the inverse A−1 of the original matrix
• ICNTL(31) indicates which factors may be discarded during the factorization
• ICNTL(32) performs the forward elimination of the right-hand sides during the factorization
• ICNTL(33) computes the determinant of the input matrix
• ICNTL(34) controls the deletion of the files in case of save/restore
• ICNTL(35) controls the activation of the Block Low-Rank (BLR) feature
• ICNTL(36) controls the choice of BLR factorization variant
• ICNTL(37) controls the BLR compression of the contribution blocks
• ICNTL(38) estimates compression rate of LU factors
• ICNTL(39) estimates compression rate of contribution blocks
• ICNTL(40) reserved in current version
• ICNTL(41-48) reserved in current version
• ICNTL(49) compact workarray id%S at the end of factorization phase
• ICNTL(50) reserved in current version
• ICNTL(51) reserved in current version
• ICNTL(52-57) reserved in current version
• ICNTL(58) defines options for symbolic factorization
• ICNTL(59-60) not used in current version

ICNTL(1) is the output stream for error messages.

Possible values :
≤ 0: these messages will be suppressed.
> 0 : is the output stream.
Default value: 6 (standard output stream)

ICNTL(2) is the output stream for diagnostic printing and statistics local to each MPI process.
Possible values :
≤ 0: these messages will be suppressed.
> 0 : is the output stream.

68
Default value: 0
Remarks: If ICNTL(2) > 0 and ICNTL(4) ≥ 2, then information on advancement (flops done)
is also printed.

ICNTL(3) is the output stream for global information, collected on the host.
Possible values :
≤ 0: these messages will be suppressed.
> 0 : is the output stream.
Default value: 6 (standard output stream)

ICNTL(4) is the level of printing for error, warning, and diagnostic messages.
Possible values :
≤ 0: No messages output.
1: Only error messages printed.
2: Errors, warnings, and main statistics printed.
3: Errors and warnings and terse diagnostics (only first ten entries of arrays) printed.
≥4: Errors, warnings and information on input, output parameters printed.
Default value: 2 (errors, warnings and main statistics printed)

ICNTL(5) controls the matrix input format (see Subsection 5.2.2).

Phase: accessed by the host and only during the analysis phase
Possible variables/arrays involved: N, NNZ (or NZ for backward compatibility), IRN, JCN,
NNZ loc (or NZ loc for backward compatibility), IRN loc, JCN loc, A loc, NELT, ELTPTR,
ELTVAR, and A ELT
Possible values :
0 : assembled format. The matrix must be input in the structure components N, NNZ (or NZ),
IRN, JCN, and A if the matrix is centralized on the host (see Subsection 5.2.2.1) or in the
structure components N, NNZ loc (or NZ loc), IRN loc, JCN loc, A loc if the matrix is
distributed (see Subsection 5.2.2.2).
1 : elemental format. The matrix must be input in the structure components N, NELT, ELTPTR,
ELTVAR, and A ELT (see Subsection 5.2.2.3).
Any other values will be treated as 0.
Default value: 0 (assembled format)
Related parameters: ICNTL(18)
Incompatibility: If the matrix is in elemental format (ICNTL(5)=1), the BLR feature
(ICNTL(35)≥ 1) is currently not available, see error -800.
Remarks: NNZ and NNZ loc are 64-bit integers (NZ and NZ loc are 32-bit integers kept for
backward compatibility and will be obsolete in future releases).
Parallel analysis (ICNTL(28) =2) is only available for matrices in assembled format and, thus, an
error will be raised for elemental matrices (ICNTL(5)=1).
Elemental matrices can be input only centralized on the host (ICNTL(18)=0).

ICNTL(6) permutes the matrix to a zero-free diagonal and/or scale the matrix (see Subsection 3.2 and
Subsection 5.3.2).
Phase: accessed by the host and only during sequential analysis (ICNTL(28)=1)
Possible variables/arrays involved: optionally UNS PERM, mumps par%A, COLSCA and ROWSCA
Possible values :

69
0 : No column permutation is computed.
1 : The permuted matrix has as many entries on its diagonal as possible. The values on the
diagonal are of arbitrary size.
2 : The permutation is such that the smallest value on the diagonal of the permuted matrix is
maximized. The numerical values of the original matrix, (mumps par%A), must be provided
by the user during the analysis phase.
3 : Variant of option 2 with different performance. The numerical values of the original matrix
(mumps par%A) must be provided by the user during the analysis phase.
4 : The sum of the diagonal entries of the permuted matrix is maximized. The numerical values of
the original matrix (mumps par%A) must be provided by the user during the analysis phase.
5 : The product of the diagonal entries of the permuted matrix is maximized. Scaling vectors
are also computed and stored in COLSCA and ROWSCA, if ICNTL(8) is set to -2 or 77.
With these scaling vectors, the nonzero diagonal entries in the permuted matrix are one in
absolute value and all the off-diagonal entries less than or equal to one in absolute value.
For unsymmetric matrices, COLSCA and ROWSCA are meaningful on the permuted matrix
A Qc (see Equation (5)). For symmetric matrices, COLSCA and ROWSCA are meaningful on
the original matrix A. The numerical values of the original matrix, mumps par%A, must be
provided by the user during the analysis phase.
6 : Similar to 5 but with a more costly (time and memory footprint) algorithm. The numerical
values of the original matrix, mumps par%A, must be provided by the user during the analysis
phase.
7 : Based on the structural symmetry of the input matrix and on the availability of the numerical
values, the value of ICNTL(6) is automatically chosen by the software.
Other values are treated as 0. On output from the analysis phase, INFOG(23) holds the value of
ICNTL(6) that was effectively used.
Default value: 7 (automatic choice done by the package)
Incompatibility: If the matrix is symmetric positive definite (SYM = 1), or in elemental format
(ICNTL(5)=1), or the parallel analysis is requested (ICNTL(28)=2) or the ordering is provided
by the user (ICNTL(7)=1), or the Schur option (ICNTL(19) = 1, 2, or 3) is required, or the
matrix is initially distributed (ICNTL(18)=1,2,3), then ICNTL(6) is treated as 0.
Related parameters: ICNTL(8), ICNTL(12)
Remarks: On assembled centralized unsymmetric matrices (ICNTL(5)=0, ICNTL(18)=0, SYM
= 0), if ICNTL(6)=1, 2, 3, 4, 5, 6 a column permutation (based on weighted bipartite matching
algorithms described in [24, 25]) is applied to the original matrix to get a zero-free diagonal.
The user is advised to set ICNTL(6) to a nonzero value when the matrix is very unsymmetric
in structure. On output to the analysis phase, when the column permutation is not the identity, the
pointer UNS PERM (internal data valid until a call to MUMPS with JOB=-2) provides access to the
permutation on the host processor (see Subsection 5.3.1). Otherwise, the pointer is not associated.
The column permutation is such that entry ai,perm(i) is on the diagonal of the permuted matrix.
On general assembled centralized symmetric matrices (ICNTL(5)=0, ICNTL(18)=0, SYM =
2), if ICNTL(6)=1, 2, 3, 4, 5, 6, the column permutation is internally used to determine a set of
recommended 1×1 and 2×2 pivots (see [26] and the description of ICNTL(12) in Subsection 6.1
for more details). We advise either to let MUMPS select the strategy (ICNTL(6) = 7) or to set
ICNTL(6) = 5 if the user knows that the matrix is for example an augmented system (which is a
system with a large zero diagonal block). On output from the analysis the pointer UNS PERM is not
associated.

ICNTL(7) computes a symmetric permutation (ordering) to determine the pivot order to be used for the
factorization in case of sequential analysis (ICNTL(28)=1). See Subsection 3.2 and Subsection 5.4.
Phase: accessed by the host and only during the sequential analysis phase (ICNTL(28) = 1).
Possible variables/arrays involved: PERM IN, SYM PERM
Possible values :

70
0 : Approximate Minimum Degree (AMD) [7] is used,
1 : The pivot order should be set by the user in PERM IN, on the host processor. In that case,
PERM IN must be allocated on the host by the user and PERM IN(i), (i=1, ... N) must hold the
position of variable i in the pivot order. In other words, row/column i in the original matrix
corresponds to row/column PERM IN(i) in the reordered matrix.
2 : Approximate Minimum Fill (AMF) is used,
3 : SCOTCH9 [42] package is used if previously installed by the user otherwise treated as 7.
4 : PORD10 [46] is used if previously installed by the user otherwise treated as 7.
5 : the Metis11 [34] package is used if previously installed by the user otherwise treated as 7.
It is possible to modify some components of the internal options array of Metis (see
Metis manual) in order to fine-tune and modify various aspects of the internal algorithms
used by Metis. This can be done by setting some elements (see the file metis.h in the
Metis installation to check the position of each option in the array) of the MUMPS array
mumps par%METIS OPTIONS after the MUMPS initialization phase (JOB=-1) and before
the analysis phase. Note that the METIS OPTIONS array of the MUMPS structure is of size 40,
which is large enough for both Metis 4.x and Metis 5.x verions. It is passed by MUMPS as the
argument “options” to the METIS ordering routine METIS NodeND (METIS NodeWND is
sometimes also called in case MUMPS was installed with Metis 4.x) during the analysis phase.
6 : Approximate Minimum Degree with automatic quasi-dense row detection (QAMD) is used.
7 : Automatic choice by the software during analysis phase. This choice will depend on the
ordering packages made available, on the matrix (type and size), and on the number of
processors.
Other values are treated as 7.
Default value: 7 (automatic choice)
Incompatibility: ICNTL(7) is meaningless if the parallel analysis is chosen (ICNTL(28)=2).
Related parameters: ICNTL(28)
Remarks: Even when the ordering is provided by the user, the analysis must be performed before
numerical factorization.
For assembled matrices (centralized or distributed) (ICNTL(5)=0) all the options are available.
For elemental matrices (ICNTL(5)=1), only options 0, 1, 5 and 7 are available, with option 7
leading to an automatic choice between AMD and Metis (options 0 or 5); other values are treated
as 7.
If the user asks for a Schur complement matrix (ICNTL(19)= 1, 2, 3) and
– the matrix is assembled (ICNTL(5)=0) then only options 0, 1, 5 and 7 are currently available.
Other options are treated as 7.
– the matrix is elemental (ICNTL(5)=1) only options 0, 1 and 7 are currently available. Other
options are treated as 7 which will (currently) be treated as 0 (AMD).
– in both cases (assembled or elemental matrix) if the pivot order is given by the user
(ICNTL(7)=1) then the following property should hold: PERM IN(LISTVAR SCHUR(i)) =
N-SIZE SCHUR+i, for i=1,SIZE SCHUR.
For matrices with relatively dense rows, we highly recommend option 6 which may significantly
reduce the time for analysis.
On output, the pointer array SYM PERM provides access, on the host processor, to the symmetric
permutation that is effectively computed during the analysis phase by the MUMPS package, and
INFOG(7) to the ordering option that was effectively chosen. In fact, the option corresponding to
ICNTL(7) may be forced by MUMPS when for example the ordering option chosen by the user is
not compatible with the value of ICNTL(12) or the necessary package is not installed.
9 See http://gforge.inria.fr/projects/scotch/ to obtain a copy.
10 Distributed
within MUMPS by permission of J. Schulze (University of Paderborn).
11 See http://glaros.dtc.umn.edu/gkhome/metis/metis/overview to obtain a copy.

71
SYM PERM(i), i=1, ... N, holds the position of variable i in the pivot order. In other words,
row/column i in the original matrix corresponds to row/column SYM PERM(i) in the reordered
matrix. See also Subsection 5.4.1.

ICNTL(8) describes the scaling strategy (see Subsection 5.3).

Phase: accessed by the host during analysis phase (that need be sequential ICNTL(28)=1) or on
entry to numerical factorization phase
Possible variables/arrays involved: COLSCA,ROWSCA
Possible values :
-2: Scaling done during analysis (see [24, 25] for the unsymmetric case and [26] for the symmetric
case), under certain conditions. The original matrix has to be centralized (ICNTL(18)=0)
and the user has to provide its numerical values (mumps par%A) on entry to the analysis,
otherwise the scaling will not be computed. Also, ICNTL(6) must be activated for the scaling
to be computed.
The effective value of the scaling applied is reported in INFOG(33). We recommend that the
user checks for INFOG(33) in order to know if the scaling was applied. When not applied,
other scalings can be performed during the factorization.
-1: Scaling provided by the user. Scaling arrays must be provided in COLSCA and ROWSCA on
entry to the numerical factorization phase by the user, who is then responsible for allocating
and freeing them. If the input matrix is symmetric (SYM= 1 or 2), then the user should ensure
that the array ROWSCA is equal to (or points to the same location as) the array COLSCA.
0 : No scaling applied/computed.
1 : Diagonal scaling computed during the numerical factorization phase,
3 : Column scaling computed during the numerical factorization phase,
4 : Row and column scaling based on infinite row/column norms, computed during the numerical
factorization phase,
7 : Simultaneous row and column iterative scaling (based on [45, 16, 36, 35]) computed during
the numerical factorization phase.
8 : Similar to 7 but more rigorous and expensive to compute; computed during the numerical
factorization phase.
77 : Automatic choice of the value of ICNTL(8) done during analysis.
Other values are treated as 77.
Default value: 77 (automatic choice done by the package)
Related parameters: ICNTL(6), ICNTL(12)
Remarks: If ICNTL(8) = 77, then an automatic choice of the scaling option may be performed,
either during the analysis or the factorization. The effective value used for ICNTL(8) is returned in
INFOG(33). If the scaling arrays are computed during the analysis, then they are ready to be used
by the factorization phase. Note that scalings can be efficiently computed during analysis when
requested (see ICNTL(6) and ICNTL(12)).
If the input matrix is real and symmetric with SYM= 1 then automatic choice is no scaling. However,
the user may want to scale the matrix when BLR feature is activated (see ICNTL(35)).
Incompatibility: If the input matrix is symmetric (SYM= 1 or 2), then only options –2, –1, 0, 1, 7,
8 and 77 are allowed and other options are treated as 0. If the input matrix is in elemental format
(ICNTL(5) = 1), then only options –1 and 0 are allowed and other options are treated as 0. If
the input matrix is assembled and distributed (ICNTL(18)=1,2,3 and ICNTL(5) = 0), then only
options 7, 8 and 77 are allowed, otherwise no scaling is applied.
If block format is exploited (ICNTL(15)̸= 0) then scaling is not applied.

ICNTL(9) computes the solution using A or AT

Phase: accessed by the host during the solve phase.

72
Possible values :
1 : AX = B is solved.
̸= 1 : AT X = B is solved.
Default value: 1
Related parameters: ICNTL(10), ICNTL(11), ICNTL(21), ICNTL(32)
Remarks: when a forward elimination is performed during the factorization (see ICNTL(32))
only ICNTL(9)=1 is allowed.

ICNTL(10) applies the iterative refinement to the computed solution (see Subsection 5.6).
Phase: accessed by the host during the solve phase.
Possible variables/arrays involved: NRHS
Possible values :
< 0 : Fixed number of steps of iterative refinement. No stopping criterion is used.
0 : No iterative refinement.
> 0 : Maximum number of steps of iterative refinement. A stopping criterion is used, therefore a
test for convergence is done at each step of the iterative refinement algorithm.
Default value: 0 (no iterative refinement)
Related parameters: CNTL(2)
Incompatibility: If ICNTL(21)=1 (solution kept distributed) or if ICNTL(32)=1 (forward
elimination during factorization), or if NRHS>1 (multiple right hand sides), or if ICNTL(20)=10
or 11 (distributed right hand sides), then iterative refinement is disabled and ICNTL(10) is treated
as 0.
Remarks: Note that if ICNTL(10)< 0, |ICNTL(10)| steps of iterative refinement are performed,
without any test of convergence (see Algorithm 3). This means that the iterative refinement may
diverge, that is the solution instead of being improved may be worse from an accuracy point of view.
But it has been shown [20] that with only two to three steps of iterative refinement the solution can
often be significantly improved. So if the convergence test should not be done we recommend to
set ICNTL(10) to -2 or -3.
Note also that it is not necessary to activate the error analysis option (ICNTL(11)= 1,2) to be
able to run the iterative refinement with stopping criterium (ICNTL(10) > 0). However, since
the backward errors ω1 and ω2 have been computed, they are still returned in RINFOG(7) and
RINFOG(8), respectively.
It must be noticed that iterative refinement with stopping criterium (ICNTL(10) > 0) will stop
when
1. either the requested accuracy is reached (ω1 + ω2 < CNTL(2))
2. or when the convergence rate is too slow (ω1 + ω2 does not decrease by at least a factor of 5)
3. or when exactly ICNTL(10) steps have been performed.
In the first two cases the number of iterative refinement steps (INFOG(15)) may be lower than
ICNTL(10).

ICNTL(11) computes statistics related to an error analysis of the linear system solved (Ax = b or
AT x = b (see ICNTL(9))). See Subsection 5.7.
Phase: accessed by the host and only during the solve phase.
Possible variables/arrays involved: NRHS
Possible values :
0 : no error analysis is performed (no statistics).
1 : compute all the statistics (very expensive).

73
2 : compute main statistics (norms, residuals, componentwise backward errors), but not the most
expensive ones like (condition number and forward error estimates).
Values different from 0, 1, and 2 are treated as 0.
Default value: 0 (no statistics).
Incompatibility: If ICNTL(21)=1 (solution kept distributed) or if ICNTL(32)=1 (forward
elimination during factorization), or if NRHS>1 (multiple right hand sides), or if ICNTL(20)=10
or 11 (distributed right hand sides), or if ICNTL(25)=-1 (computation of the null space basis),
then error analysis is not performed and ICNTL(11) is treated as 0.
Related parameters: ICNTL(9)
Remarks: The computed statistics are returned in various informational parameters, see also
Subsection 3.3:
– If ICNTL(11)= 2, then the infinite norm of the input matrix (∥A∥∞ or ∥AT ∥∞ in
RINFOG(4)), the infinite norm of the computed solution (∥x̄∥∞ in RINFOG(5)), and the
∥Ax̄−b∥∞
scaled residual ∥A∥ ∞ ∥x̄∥∞
in RINFOG(6), a componentwise backward error estimate in
RINFOG(7) and RINFOG(8) are computed.
– If ICNTL(11)= 1, then in addition to the above statistics also an estimate for the error in
the solution in RINFOG(9), and condition numbers for the linear system in RINFOG(10) and
RINFOG(11) are also returned.
If performance is critical, ICNTL(11) should be set to 0. If both performance is critical and statistics
are requested, then ICNTL(11) should be set to 2. If ICNTL(11)=1, the error analysis is very costly
(typically significantly more costly than the solve phase itself).

ICNTL(12) defines an ordering strategy for symmetric matrices (SYM = 2) (see [26] for more details)
and is used, in conjunction with ICNTL(6), to add constraints to the ordering algorithm (ICNTL(7)
option).
Phase: accessed by the host and only during the analysis phase.
Possible values :
0: automatic choice
1: usual ordering (nothing done)
2: ordering on the compressed graph associated with the matrix.
3: constrained ordering, only available with AMF (ICNTL(7)=2).
Other values are treated as 1.
Default value: 0 (automatic choice).
Incompatibility: If the matrix is unsymmetric (SYM=0) or symmetric definite positive matrices
(SYM=1), or the matrix is in elemental format (ICNTL(5)=1), or the matrix is initially distributed
(ICNTL(18)=1,2,3) or the ordering is provided by the user (ICNTL(7)=1), or the Schur option
(ICNTL(19) ̸= 0) is required, or the analysis is performed by blocks (ICNTL(15) ̸= 0),
ICNTL(12) is treated as 1 (nothing done).
Related parameters: ICNTL(6), ICNTL(7)
Remarks: If MUMPS detects some incompatibility between control parameters then it uses the
following rules to automatically reset the control parameters. Firstly ICNTL(12) has a lower
priority than ICNTL(7) so that if ICNTL(12) = 3 and the ordering required is not AMF then
ICNTL(12) is internally treated as 2. Secondly ICNTL(12) has a higher priority than ICNTL(6)
and ICNTL(8). Thus if ICNTL(12) = 2 and ICNTL(6) was not active (ICNTL(6)=0) then
ICNTL(6) is treated as 5 if numerical values are provided, or as 1 otherwise. Furthermore, if
ICNTL(12) = 3 then ICNTL(6) is treated as 5 and ICNTL(8) is treated as -2 (scaling computed
during analysis).
On output from the analysis phase, INFOG(24) holds the value of ICNTL(12) that was
effectively used. Note that INFOG(7) and INFOG(23) hold the values of ICNTL(7) and
ICNTL(6) (respectively) that were effectively used.

74
ICNTL(13) controls the parallelism of the root node (enabling or not the use of ScaLAPACK) and also
its splitting.
Phase: accessed by the host during the analysis phase.
Possible values :
< -1 : treated as 0.
-1 : force splitting of the root node in all cases (even sequentially)
0 : parallel factorization of the root node based on ScaLAPACK. If the size of the root frontal node
(last Schur complement to be factored) is larger than an internal threshold, then ScaLAPACK
will be used for factorizing it. Otherwise, the root node will be processed by a single MPI
process.
> 0 : ScaLAPACK is not used (recommended value is 1 to partly recover parallelism of the root
node). It forces a sequential factorization of the root node (ScaLAPACK will not be used). To
recover parallelism lost by the fact of not using ScaLAPACK, splitting of the root node can
be activated: if the number of working processors is strictly larger than ICNTL(13) (always
the case with ICNTL(13)=1) then splitting of the root node is performed to enable node level
parallelism.
Default value: 0 (parallel factorization on the root node)
Remarks: Processing the root sequentially (ICNTL(13) > 0) can be useful when the user is
interested in the inertia of the matrix (see INFO(12) and INFOG(12)), or when the user wants
to detect null pivots (see Subsection 5.10) or to activate BLR compression (Subsection 5.16) on the
root node.
Although ICNTL(13) controls the efficiency of the factorization and solve phases, preprocessing
work is performed during analysis and this option must be set on entry to the analysis phase.
With SYM=1, if ScaLAPACK is allowed (ICNTL(13)≤ 0) then Cholesky factorization will be
performed on the root node and thus negative pivots will raise an error (code -40 is returned in
INFOG(1)).

ICNTL(14) controls the percentage increase in the estimated working space, see Subsection 5.9.
Phase: accessed by the host both during the analysis and the factorization phases.
Default value: between 20 and 35 (which corresponds to at most 35 % increase) and depends on
the number of MPI processes. It is set to 5 % with SYM=1 and one MPI process.
Related parameters: ICNTL(23)
Remarks: When significant extra fill-in is caused by numerical pivoting, increasing ICNTL(14)
may help.

ICNTL(15) exploits compression of the input matrix resulting from a block format, see Subsection 5.5.
Phase: accessed by the host process during the analysis phase.
Possible variables/arrays involved: NBLK, BLKPTR, BLKVAR
Possible values :
0: no compression
-k: all blocks are of fixed size k> 0. N (the order of the matrix A) must be a multiple of k. NBLK
and BLKPTR should not be provided by the user and will be computed internally. Concerning
BLKVAR, please refer to the Remarks below.
1: block format provided by the user. NBLK need be provided on the host by the user and
holds the number of blocks. BLKPTR(1:NBLK+1) must be provided by the user on the host.
Concerning BLKVAR, please refer to the Remarks below.
Any other values will be treated as 0.

Default value: 0

75
Remarks: If BLKVAR is not provided by the user then BLKVAR is internally treated as the identity
(BLKVAR(i)=i, (i=1, ..., N)). It corresponds to contiguous variables in blocks.
– If ICNTL(15)=1 then BLKVAR(BLKPTR(iblk):BLKPTR(iblk+1)-1), (iblk=1, NBLK) holds
the variables associated to block iblk.
– If ICNTL(15) < 0 then BLKPTR need not be provided by the user and NBLK = N/k where
N must be a multiple of k.
In case the pivot order is provided on entry by the user at the analysis phase (ICNTL(7)= 1) then
PERM IN should be compatible with the compression. This means that PERM IN, of size N, should
result from an expansion of a pivot order on the compressed matrix, i.e., variables in a block should
be consecutive in the pivot order.
Incompatibility: With element entry format ICNTL(5)= 1, with Schur complement
ICNTL(19)̸= 0 and with permutation to a zero-free diagonal and related compressed/constrained
ordering for symmetric matrices (ICNTL(6)̸= 0, ICNTL(12)̸= 1).

ICNTL(16) controls the setting of the number of OpenMP threads, see Subsection 5.18.
Phase: accessed by the host at the beginning of all phases
Possible values :
0 : nothing is done, MUMPS uses the number of OpenMP threads configured by the calling
application.
> 0 : MUMPS sets the number of OpenMP threads on entry and reset the previous value on exit.
Other values are treated as 0.
Default value: 0 (no setting of the number of OpenMP threads done by MUMPS internally)

ICNTL(18) defines the strategy for the distributed input matrix (only for assembled matrix, see
Subsection 5.2.2).
Phase: accessed by the host during the analysis phase.
Possible values :
0 : the input matrix is centralized on the host (see Subsection 5.2.2.1).
1 : the user provides the structure of the matrix on the host at analysis, MUMPS returns a mapping
and the user should then provide the matrix entries distributed according to the mapping on
entry to the numerical factorization phase (see Subsection 5.2.2.2).
2 : the user provides the structure of the matrix on the host at analysis, and the distributed
matrix entries on all slave processors at factorization. Any distribution is allowed (see
Subsection 5.2.2.2).
3 : user directly provides the distributed matrix, pattern and entries, input both for analysis and
factorization (see Subsection 5.2.2.2).
Other values are treated as 0.
Default value: 0 (input matrix centralized on the host)
Related parameters: ICNTL(5)
Remarks: In case of distributed matrix, we recommand options 2 or 3. Among them, we
recommand option 3 which is easier to use. Option 1 is kept for backward compatibility but is
deprecated and we plan to suppress it in a future release.

ICNTL(19) computes the Schur complement matrix (see Subsection 5.15).

Phase: accessed by the host during the analysis phase.
Possible variables/arrays involved: SIZE SCHUR, LISTVAR SCHUR, NPROW, NPCOL, MBLOCK,
NBLOCK, SCHUR, SCHUR MLOC, SCHUR NLOC, and SCHUR LLD
Possible values :

76
0 : complete factorization. No Schur complement is returned.
1 : the Schur complement matrix will be returned centralized by rows on the host after the
factorization phase. On the host before the analysis phase, the user must set the integer variable
SIZE SCHUR to the size of the Schur matrix, the integer pointer array LISTVAR SCHUR to
the list of indices of the Schur matrix.
2 or 3 : the Schur complement matrix will be returned distributed by columns: the Schur will
be returned on the slave processors in the form of a 2D block cyclic distributed matrix
(ScaLAPACK style) after factorization. Workspace should be allocated by the user before
the factorization phase in order for MUMPS to store the Schur complement (see SCHUR,
SCHUR MLOC, SCHUR NLOC, and SCHUR LLD in Subsection 5.15). On the host before the
analysis phase, the user must set the integer variable SIZE SCHUR to the size of the Schur
matrix, the integer pointer array LISTVAR SCHUR to the list of indices of the Schur matrix.
The integer variables NPROW, NPCOL, MBLOCK, NBLOCK may also be defined (default values
will otherwise be provided).
Values not equal to 1, 2 or 3 are treated as 0.
Default value: 0 (complete factorization)
Incompatibility: Since the Schur complement is a partial factorization of the global matrix (with
partial ordering of the variables provided by the user), the following options of MUMPS are
incompatible with the Schur option: maximum transversal, scaling, iterative refinement, error
analysis and parallel analysis.
Related parameters: ICNTL(7), ICNTL(26)
Remarks: If the ordering is given (ICNTL(7)= 1) then the following property should hold:
PERM IN(LISTVAR SCHUR(i)) = N-SIZE SCHUR+i, for i=1,SIZE SCHUR.
Note that, in order to have a centralized Schur complement matrix by columns (see
Subsection 5.15.3), it is possible (and recommended) to use a particular case of the distributed
Schur complement (ICNTL(19)=2 or 3), where the Schur complement is assigned to only one
processor (NPCOL × NPROW = 1).

ICNTL(20) determines the format (dense, sparse, or distributed) of the right-hand sides
Phase: accessed by the host during the solve phase and before a JOB=9 call.
Possible variables/arrays involved: RHS, NRHS, LRHS, IRHS SPARSE, RHS SPARSE,
IRHS PTR, NZ RHS, Nloc RHS, LRHS loc, IRHS loc and RHS loc.
Possible values :
0 : the right-hand side is in dense format in the structure component RHS, NRHS, LRHS (see
Subsection 5.14.1)
1,2,3 : the right-hand side is in sparse format in the structure components IRHS SPARSE,
RHS SPARSE, IRHS PTR and NZ RHS.
1 : The decision of exploiting sparsity of the right-hand side to accelerate the solution phase
is done automatically.
2 : Sparsity of the right-hand side is NOT exploited to improve solution phase.
3 : Sparsity of the right-hand side is exploited during solution phase.
10, 11 : When provided before the solve phase, values 10 and 11 have the same meaning. The
right-hand side is provided distributed in the structure components Nloc RHS, LRHS loc,
IRHS loc, RHS loc (see Subsection 5.14.3).
When provided before a JOB=9 call, values 10 and 11 indicate which distribution MUMPS
should build and return to the user in IRHS loc. In this case, the user should provide a
workarray IRHS loc on each MPI process of size at least INFO(23), where INFO(23) is
returned after the factorization phase.
10 : fill IRHS loc to minimize internal communications of right-hand side data during the
solve phase.

77
11 : fill IRHS loc to match the distribution of the solution (imposed by MUMPS), in case of
distributed solution (ICNTL(21)=1).
Values different from 0, 1, 2, 3, 10, 11 are treated as 0. For a sparse right-hand side, the
recommended value is 1.
Default value: 0 (dense right-hand sides)
Incompatibility: When NRHS > 1 (multiple right-hand side), the functionalities related to iterative
refinement ( ICNTL(10)) and error analysis (ICNTL(11)) are currently disabled.
With sparse right-hand sides (ICNTL(20)=1,2,3), the forward elimination during the factorization
(ICNTL(32)=1) is not currently available.
Remarks: For details on how to set the input parameters see Subsection 5.14.1, Subsection 5.14.2
and Subsection 5.14.3. Please note that duplicate entries in the sparse or distributed right-hand sides
are summed. A JOB=9 call can only be done after a successful factorization phase and its results
depends on the transpose option ICNTL(9), which should not be modified between a JOB=9 and
JOB=3 call. The distributed right-hand side feature enables the user to provide a sparse structured
RHS (i.e., a RHS with some empty rows).

ICNTL(21) determines the distribution (centralized or distributed) of the solution vectors.

ICNTL(22) controls the in-core/out-of-core (OOC) factorization and solve.

Phase: accessed by the host during the factorization phase.
Possible variables/arrays involved: OOC TMPDIR and OOC PREFIX
Possible values :
0: In-core factorization and solution phases (default standard version).
1: Out-of-core factorization and solve phases. The complete matrix of factors is written to disk
(see Subsection 3.14).
Other values are treated as 0 (in-core factorization).
Default value: 0 (in-core factorization)
Related parameters: ICNTL(35) since factors in low-rank form are not written to disk
Remarks: The variables OOC TMPDIR and OOC PREFIX are used to indicate the directory and
the prefix, respectively, where to store the factors. They must be set after the initialization phase
(JOB = -1) and before the factorization phase (JOB=2,4,5 or 6). Otherwise, MUMPS will use the
/tmp directory and arbitrary file names. Note MUMPS accesses to the variables OOC TMPDIR and
OOC PREFIX only during the factorization phase. Several files under the same directory and with
the same prefix are created to store the factors. Their names contain a unique hash and MUMPS is
in charge of keeping trace of them.

78
The files containing the factors will be deleted if a new factorization starts or when a termination
phase (JOB=-2) is called, except if the save/restore feature has been used and the files containing
the factors are associated to a saved instance. See Section Subsection 5.17.4).
Note that, in case of abnormal termination of an application calling MUMPS (for example, a
termination of the calling process with a segmentation fault, or, more generally, a termination of
the calling process without a call to MUMPS with JOB=-2), the files containing the factors are not
deleted. It is then the user’s responsibility to delete them, as shown in bold in the example below,
where the application calling MUMPS is launched from a bash script and environment variables are
used to define the OOC environment:
#!/bin/bash
export MUMPS OOC TMPDIR="/local/mumps data/"
export MUMPS OOC PREFIX="job myapp "
mpirun -np 128 ./myapplication
# Suppress MUMPS OOC files in case of bad application termination
rm -f ${MUMPS OOC TMPDIR}/${MUMPS OOC PREFIX}*

ICNTL(23) corresponds to the maximum size of the working memory in MegaBytes that MUMPS can
allocate per working processor, see Subsection 5.9 for more details.
Phase: accessed by all processes at the beginning of the factorization phase. If the value is greater
than 0 only on the host, then the value on the host is used for all processes, otherwise ICNTL(23)
is interpreted locally on each MPI process.
Possible values :
0 : each processor will allocate workspace based on the estimates computed during the analysis
>0 : maximum size of the working memory in MegaBytes per working processor to be allocated
Default value: 0
Related parameters: ICNTL(14), ICNTL(38), ICNTL(39)
Remarks: If ICNTL(23) is greater than 0 then MUMPS automatically computes the size of
the internal workarrays such that the storage for all MUMPS internal data does not exceed
ICNTL(23). The relaxation ICNTL(14) is first applied to the internal integer workarray IS and to
communication and I/O buffers; the remaining available space is then shared between the main (and
often most critical) real/complex internal workarray S holding the factors, the stack of contribution
blocks and dynamic workarrays that are used either to expand the S array or to store low-rank
dynamic structures.
Lower bounds for ICNTL(23), in case ICNTL(23) is provided only on the host:
– In case of full-rank factors only (ICNTL(35)=0 or 3), a lower bound for ICNTL(23) (if ICNTL(14),
has not been modified since the analysis) is given by INFOG(16) if the factorization is in-core
(ICNTL(22)=0), and by INFOG(26) if the factorization is out-of-core (ICNTL(22)=1).
– In case of low-rank factors (ICNTL(35)=1 or 2) only (ICNTL(37)=0), a lower bound for ICNTL(23)
(if ICNTL(14), has not been modified since the analysis and ICNTL(38) is a good approximation
of the average compression rate of the factors) is given by INFOG(36) if the factorization is in-core
(ICNTL(22)=0), and by INFOG(38) if the factorization is out-of-core (ICNTL(22)=1).
– In case of low-rank contribution blocks (CB) only (ICNTL(35)=0,3 and ICNTL(37)=1), a lower bound
for ICNTL(23) (if ICNTL(14), has not been modified since the analysis and ICNTL(39) is a good
approximation of the average compression rate of the CB) is given by INFOG(44) if the factorization is
in-core (ICNTL(22)=0), and by INFOG(46) if the factorization is out-of-core (ICNTL(22)=1).
– In case of low-rank factors and contribution blocks (ICNTL(35)=1,2 and ICNTL(37)=1), a lower
bound for ICNTL(23) (if ICNTL(14), has not been modified since the analysis, and ICNTL(38) and
ICNTL(39) are good approximations of the average compression rate of respectively the factors and the
CB) is given by INFOG(40) if the factorization is in-core (ICNTL(22)=0), and by INFOG(42) if the
factorization is out-of-core (ICNTL(22)=1).
Lower bounds for ICNTL(23), in case ICNTL(23) is provided locally to each MPI process:
– Full-rank factors only (ICNTL(35)=0 or 3) ⇒ INFO(15) if the factorization is in-core
(ICNTL(22)=0), INFO(17) if the factorization is out-of-core (ICNTL(22)=1).

79
– Low-rank factors (ICNTL(35)=1 or 2) only (ICNTL(37)=0) ⇒ INFO(30) if the factorization is
in-core (ICNTL(22)=0), INFO(31) if the factorization is out-of-core (ICNTL(22)=1).
– Low-rank factors and contribution blocks (ICNTL(35)=1,2 and ICNTL(37)=1) ⇒ is given by
INFO(34) if the factorization is in-core (ICNTL(22)=0), INFO(35) if the factorization is out-of-
core (ICNTL(22)=1).
The above lower bounds include memory for the real/complex internal workarray S holding the
factors and stack of contribution blocks. In case WK USER is provided, the above quantities should
be diminished by the estimated memory for S/WK USER. This estimated memory can be obtained
from INFO(8), INFO(9), or INFO(20) (depending on MUMPS settings) by taking their absolute
value, if negative, or by dividing them by 106 , if positive. See also the paragraph Recommended
values of LWK USER below.
If ICNTL(23) is left to its default value 0 then MUMPS will allocate for the factorization phase
a workspace based on the estimates computed during the analysis if ICNTL(14) has not been
modified since analysis, or larger if ICNTL(14) was increased. Note that even with full-rank
factorization, these estimates are only accurate in the sequential version of MUMPS but they can
be inaccurate in the parallel case, especially for the out-of-core version. Therefore, in parallel, we
recommend to use ICNTL(23) and provide a value larger than the provided estimations.

ICNTL(24) controls the detection of “null pivot rows”.

Phase: accessed by the host during the factorization phase
Possible variables/arrays involved: PIVNUL LIST
Possible values :
0: Nothing done. A null pivot row will result in error INFO(1)=-10.
1: Null pivot row detection.
Other values are treated as 0.
Default value: 0 (no null pivot row detection)
Related parameters: CNTL(3), CNTL(5), ICNTL(13), ICNTL(25)
Remarks: CNTL(3) is used to compute the threshold to decide if a pivot row is “null”.
Note that when ScaLAPACK is applied on the root node (see ICNTL(13) = 0), then exact null
pivots on the root will stop the factorization (INFO(1)=-10) while if tiny pivots are present on
the root node the ScaLAPACK routine will factorize the root matrix. Computing the root node
factorization sequentially (this can be forced by setting ICNTL(13) to 1) will help with the correct
detection of null pivots but may degrade performance.

ICNTL(25) allows the computation of a solution of a deficient matrix and also of a null space basis.
Phase: accessed by the host during the solution phase
Possible variables/arrays involved: RHS, PIVNUL LIST, ISOL loc and SOL loc
Possible values :
0: A normal solution step is performed. If the matrix was found singular during factorization
then one of the possible solutions is returned.
i: with 1 ≤ i ≤ INFOG(28). The i-th vector of the null space basis is computed.
-1: The complete null space basis is computed.
Default value: 0 (normal solution step)
Incompatibility: Iterative refinement, error analysis, and the option to solve the transpose system
(ICNTL(9) ̸= 1) are ignored when the solution step is used to return vectors from the null space
(ICNTL(25) ̸= 0).
Related parameters: ICNTL(21), ICNTL(24)

80
Remarks: Null space basis computation can be activated when a zero-pivot detection option was
requested (ICNTL(24) ̸= 0) during the factorization and the matrix was found to be deficient
(INFOG(28) > 0).
Note that when vectors from the null space are requested (ICNTL(25) ̸= 0), both centralized
(ICNTL(21)=0) and distributed (ICNTL(21)=1) solutions options can be used. If the solution
is centralized (ICNTL(21)=0), then the null space vectors are returned to the user in the array
RHS, allocated by the user on the host. If the solution is distributed (ICNTL(21)=1), then the
null space vectors are returned in the array SOL loc, which must be allocated by the user on
all working processors (see Subsection 5.14.6). In both cases, the number of columns of RHS or
SOL loc must be equal to the number of vectors requested, so that NRHS must be equal to:
– 1 if 1 ≤ ICNTL(25) ≤ INFOG(28)
– INFOG(28) if ICNTL(25)=-1.

ICNTL(26) drives the solution phase if a Schur complement matrix has been computed (ICNTL(19) ̸=
0), see Subsection 3.17 for details
Phase: accessed by the host during the solution phase. It will be accessed also during factorization
if the forward elimination is performed during factorization (ICNTL(32)=1)
Possible variables/arrays involved: REDRHS, LREDRHS
Possible values :
0 : standard solution phase on the internal problem; referring to the notations from
Subsection 3.17, only the system A1,1 x1 = b1 is solved and the entries of the right-hand
side corresponding to the Schur are explicitly set to 0 on output.
1 : condense/reduce the right-hand side on the Schur. Only a forward elimination is performed.
The solution corresponding to the ‘internal” (non-Schur) variables is returned together with
the reduced/condensed right-hand-side. The reduced right-hand side is made available on the
host in the pointer array REDRHS, that must be allocated by the user. Its leading dimension
LREDRHS must be provide, too.
2 : expand the Schur local solution on the complete solution variables. REDRHS is considered
to be the solution corresponding to the Schur variables. It must be allocated by the user as
well as its leading dimension LREDRHS must be provided. The backward substitution is then
performed with the given right-hand side to compute the solution associated with the ”internal”
variables. Note that the solution corresponding to the Schur variables is also made available
in the main solution vector/matrix.
Values different from 1 and 2 are treated as 0.
Default value: 0 (normal solution phase)
Incompatibility: If ICNTL(26) = 1 or 2, then error analysis and iterative refinement are disabled
(ICNTL(11) and ICNTL(10))
Related parameters: ICNTL(19), ICNTL(32)
Remarks: If ICNTL(26) ̸= 0, then the user should provide workspace in the pointer array REDRHS,
as well as a leading dimension LREDRHS (see Subsection 5.15). Note that if no Schur complement
was computed, ICNTL(26) = 1 or 2 results in an error.

ICNTL(27) controls the blocking size for multiple right-hand sides.

Phase: accessed by the host during the solution phase
Possible variables/arrays involved: id%NRHS
Possible values :
< 0 : an automatic setting is performed by the solver:
(i) the blocksize = min(id%NRHS, −2×ICNTL(27)) if the factors are on disk
(ICNTL(22)=1);
(ii) the blocksize = min(id%NRHS,−ICNTL(27)) if the factors are in-core (ICNTL(22)=0)

81
0 : no blocking, it is treated as 1.
> 0 : blocksize = min(id%NRHS,ICNTL(27))
Default value: -32
Remarks: It influences both the memory usage (see INFOG(30) and INFOG(31)) and the
solution time. Larger values of ICNTL(27) lead to larger memory requirements and a better
performance (except if the larger memory requirements induce swapping effects). Tuning
ICNTL(27) is critical, especially when factors are on disk (ICNTL(22)=1 at the factorization
stage) because factors must be accessed once for each block of right-hand sides.

ICNTL(28) determines whether a sequential or parallel computation of the ordering is performed

(see Subsection 3.2 and Subsection 5.4).
Phase: accessed by the host process during the analysis phase.
Possible values :
0: automatic choice.
1: sequential computation. In this case the ordering method is set by ICNTL(7) and the
ICNTL(29) parameter is meaningless (choice of the parallel ordering tool).
2: parallel computation. A parallel ordering and parallel symbolic factorization is requested by
the user. For that, one of the parallel ordering tools (or all) must be available, and the matrix
should not be too small. The ordering method is set by ICNTL(29) and the ICNTL(7)
parameter is meaningless.
Any other values will be treated as 0.
Default value: 0 (automatic choice)
Incompatibility: The parallel analysis is not available when the Schur complement feature is
requested (ICNTL(19)=1,2 or 3), when a maximum transversal is requested on the input matrix
(i.e., ICNTL(6)=1, 2, 3, 4, 5 or 6) or when the input matrix is an unassembled matrices
(ICNTL(5)=1). When the number of processes available for parallel analysis is equal to 1,
or when the initial matrix is extremely small, a sequential analysis is indeed performed, even if
ICNTL(28)=2 (no error is raised in that case).
Related parameters: ICNTL(7), ICNTL(29), INFOG(32)
Remarks: Performing the analysis in parallel (ICNTL(28)= 2) will enable saving both time and
memory. Note that then the quality of the ordering depends on the number of processors used.
The number of processors for parallel analysis may be smaller than the number of MPI processes
available for MUMPS, in order to satisfy internal constraints of parallel ordering tools. On output,
INFOG(32) is set to the type of analysis (sequential or parallel) that was effectively chosen
internally.

ICNTL(29) defines the parallel ordering tool (when ICNTL(28)=1) to be used to compute the fill-in
reducing permutation. See Subsection 3.2 and Subsection 5.4.
Phase: accessed by host process only during the parallel analysis phase (ICNTL(28)=2).
Possible variables/arrays involved: SYM PERM
Possible values :
0: automatic choice.
1: PT-SCOTCH is used to reorder the input matrix, if available.
2: ParMetis is used to reorder the input matrix, if available.
Other values are treated as 0.
Default value: 0 (automatic choice)
Related parameters: ICNTL(28)

82
Remarks: On output, the pointer array SYM PERM provides access, on the host processor, to the
symmetric permutation that is effectively considered during the analysis phase, and INFOG(7)
to the ordering option that was effectively used. SYM PERM(i), (i=1, ... N) holds the position of
variable i in the pivot order, see Subsection 5.4.1 for a full description.

ICNTL(30) computes a user-specified set of entries in the inverse A−1 of the original matrix (see
Subsection 5.14.4).
Phase: accessed during the solution phase.
Possible variables/arrays involved: NZ RHS, NRHS, RHS SPARSE, IRHS SPARSE, IRHS PTR
Possible values :
0: no entries in A−1 are computed.
1: computes entries in A−1 .
Other values are treated as 0.
Default value: 0 (no entries in A−1 are computed)
Incompatibility: Error analysis and iterative refinement will not be performed, even if the
corresponding options are set (ICNTL(10) and ICNTL(11)). Because the entries of A−1 are
returned in RHS SPARSE on the host, this functionality is incompatible with the distributed solution
option (ICNTL(21)). Furthermore, computing entries of A−1 is not possible in the case of partial
factorizations with a Schur complement (ICNTL(19)). Option to compute solution using A or
AT (ICNTL(9)) is meaningless and thus ignored.
Related parameters: ICNTL(27)
Remarks: When a set of entries of A−1 is requested, the associated set of columns will be
computed in blocks of size ICNTL(27). Larger ICNTL(27) values will most likely decrease the
amount of factor accesses, enable more parallelism and thus reduce the solution time [48, 44, 14].
The user must specify on input to a call of the solve phase in the arrays IRHS PTR and
IRHS SPARSE the target entries. The array RHS SPARSE should be allocated but not initialized.
Note that since selected entries of the inverse of the matrix are requested, NRHS must be set to N. On
output the arrays IRHS PTR, IRHS SPARSE and RHS SPARSE will hold the requested entries. If
duplicate target entries are provided then duplicate solutions will be returned.
When entries of A−1 are requested (ICNTL(30) = 1), mumps par%RHS needs not be allocated.

ICNTL(31) indicates which factors may be discarded during the factorization.

Phase: accessed by the host during the analysis phase.
Possible values :
0 : the factors are kept during the factorization, phase except in the case of unsymmetric matrices
when the forward elimination is performed during factorization (ICNTL(32) = 1). In this
case, since it will not be used during the solve phase, the L factor is discarded.
1: all factors are discarded during the factorization phase. The user is not interested in solving
the linear system (Equations (3) or (4)) and will not call MUMPS solution phase (JOB=3).
This option is meaningful when only a Schur complement is needed (see ICNTL(19)), or
when only statistics from the factorization, such as (for example) definiteness, value of the
determinant, number of entries in factors after numerical pivoting, number of negative or null
pivots are required. In this case, the memory allocated for the factorization will rely on the
out-of-core estimates (and factors will not be written to disk).
2: this setting is meaningful only for unsymmetric matrices and has no impact on symmetric
matrices: only the U factor is kept after factorization so that exclusively a backward
substitution is possible during the solve phase (JOB=3). This can be useful when:
−the user is only interested in the computation of a null space basis (see ICNTL(25))
during the solve phase, or
−the forward elimination is performed during the factorization (ICNTL(32)=1). Note that

83
for unsymmetric matrices, if the forward elimination is performed during the factorization
(ICNTL(32) = 1) then the L factor is always discarded during factorization. In this case
(ICNTL(32) = 1), both ICNTL(31) = 0 and ICNTL(31) = 2 have the same behaviour.

Other values are treated as 0.

ICNTL(32) performs the forward elimination of the right-hand sides (Equation (3)) during the
factorization (JOB=2). (see Subsection 5.13).
Phase: accessed by the host during the analysis phase.
Possible variables/arrays involved: RHS, NRHS, LRHS, and possibly REDRHS, LREDRHS when
ICNTL(26)=1
Possible values :
0: standard factorization not involving right-hand sides.
1: forward elimination (Equation (3)) of the right-hand side vectors is performed during
factorization (JOB=2). The solve phase (JOB=3) will then only involve backward substitution
(Equation (4)).
Other values are treated as 0.
Default value: 0 (standard factorization)
Related parameters: ICNTL(31),ICNTL(26)
Incompatibility: This option is incompatible with sparse right-hand sides (ICNTL(20)=1,2,3),
with the solution of the transposed system (ICNTL(9) ̸= 1), with the computation of entries of
the inverse (ICNTL(30)=1), and with BLR factorizations (ICNTL(35)=1,2,3). In such cases,
error -43 is raised.
Furthermore, iterative refinement (ICNTL(10)) and error analysis (ICNTL(11)) are disabled.
Finally, the current implementation imposes that all right-hand sides are processed in one pass
during the backward step. Therefore, the blocking size (ICNTL(27)) is ignored.
Remarks: The right-hand sides must be dense to use this functionality: RHS, NRHS, and LRHS
should be provided as described in Subsection 5.14.1. They should be provided at the beginning of
the factorization phase (JOB=2) rather than at the beginning of the solve phase (JOB=3).
For unsymmetric matrices, if the forward elimination is performed during factorization
(ICNTL(32) = 1), the L factor (see ICNTL(31)) may be discarded to save space. In fact,
the L factor will then always be discarded (even when ICNTL(31)=0) in the case of a full-rank
factorization (ICNTL(35)=0) or BLR factorization with full-rank solve (ICNTL(35)=3). In the
case of a BLR factorization with ICNTL(35)=1 or 2, only the L factor corresponding to full-rank
frontal matrices are discarded in the current version.
We advise to use this option only for a reasonably small number of dense right-hand side vectors
because of the additional associated storage required when this option is activated and the number
of right-hand sides is large compared to ICNTL(27).

ICNTL(33) computes the determinant of the input matrix.

84
Phase: accessed by the host during the factorization phase.
Possible values :
0 : the determinant of the input matrix is not computed.
̸ 0: computes the determinant of the input matrix. The determinant is obtained by computing
=
(a + ib) × 2c where a =RINFOG(12), b =RINFOG(13) and c = INFOG(34). In real
arithmetic b=RINFOG(13) is equal to 0.
Default value: 0 (determinant is not computed)
Related parameters: ICNTL(31)
Remarks: In case a Schur complement was requested (see ICNTL(19)), elements of the Schur
complement are excluded from the computation of the determinant so that the determinant is that
one of matrix A1,1 (using notations of Subsection 3.17).
Although we recommend to compute the determinant on non-singular matrices, null pivot rows
(ICNTL(24)) and static pivots (CNTL(4)) are excluded from the determinant so that a non-zero
determinant is still returned on singular or near-singular matrices. This determinant is then not
unique and will depend on which equations were excluded.
Furthermore, we recommend to switch off scaling (ICNTL(8)) in such cases. If not (ICNTL(8)
̸= 0), we describe in the following the current behaviour of the package:
– if static pivoting (CNTL(4)) is activated: all entries of the scaling arrays ROWSCA and
COLSCA are currently taken into account in the computation of the determinant.
– if the null pivot row detection (ICNTL(24)) is activated, then entries of ROWSCA and
COLSCA corresponding to pivots in PIVNUL LIST are excluded from the determinant so
that
* for symmetric matrices (SYM=1 or 2), the returned determinant correctly corresponds to
the matrix excluding rows and columns of PIVNUL LIST.
* for unsymmetric matrices (SYM=0), scaling may perturb the value of the determinant in
case off-diagonal pivoting has occurred (INFOG(12)̸=0).
Note that if the user is interested in computing only the determinant, we recommend to discard the
factors during factorization ICNTL(31).

ICNTL(34) controls the conservation of the OOC files during JOB=-3 (See Subsection 5.17).
Phase: accessed by the host during the save/restore files deletion phase (JOB=-3) in case of out-of-
core (ICNTL(22)=1).
Possible values :
0: the out-of-core files are marked out for deletion
1: the out-of-core files should not be deleted because another saved instance references them.
Other values are treated as 0.
Default value: 0 (out-of-core files associated to a saved instance are marked out for deletion at the
end of the out-of-core file lifetime)
Remarks: MUMPS will delete only the out-of-core files that are referenced in the saved data
identified by the value of SAVE DIR and SAVE PREFIX. Extra out-of-core files with the same
OOC TMPDIR and OOC PREFIX are not deleted.

ICNTL(35) controls the activation of the BLR feature (see Subsection 5.16).
Phase: accessed by the host during the analysis and during the factorization phases
Possible values :
0 : Standard analysis and factorization (BLR feature is not activated).
1 : BLR feature is activated and automatic choice of BLR option is performed by the software.
2 : BLR feature is activated during both the factorization and solution phases, which allows for
memory gains by storing the factors in low-rank.

85
3 : BLR feature is activated during the factorization phase but not the solution phase, which is still
performed in full-rank. As a consequence, the full-rank factors must be kept and no memory
gains can be obtained. In an OOC context, (ICNTL(22)=1) this option enables the user to
write all factors to disk which is not the case with ICNTL(35)=2 since factors in low-rank
form are not written to disk.
Other values are treated as 0.
Default value: 0 (standard multifrontal factorization).
Related parameters: CNTL(7) (BLR approximations accuracy), ICNTL(36) (BLR factorization
variant), ICNTL(37) (compression of the contribution blocks), ICNTL(38) (estimation of the
compression rate of the factors) and ICNTL(39) (estimation of the compression rate of the
contribution blocks).
Incompatibility: Note that the activation of the BLR feature is currently incompatible with
elemental matrices (ICNTL(5) = 1) (see error -800, subject to change in the future), and when
the forward elimination during the factorization is requested (ICNTL(32) = 1), see error -43.
Remarks: If ICNTL(35)=1, then the automatic choice of BLR option is to activate BLR
feature during both factorization and solution phases (ICNTL(35)=2). In order to activate the
BLR factorization, ICNTL(35) must be equal to 1, 2 or 3 before the analysis, where some
preprocessing on the graph of the matrix is needed to prepare the low-rank factorization. The value
of ICNTL(35) can then be set to any of the above values on entry to the factorization (e.g., taking
into account the values returned by the analysis). On the other hand, if ICNTL(35)=0 at analysis,
only ICNTL(35)=0 is allowed for the factorization (full-rank factorization). When activating
BLR, it is recommended to set ICNTL(35) to 1 or 2 rather than 3 to benefit from memory gains.

ICNTL(36) controls the choice of BLR factorization variant (see Subsection 5.16).
Phase: accessed by the host during the factorization phase when ICNTL(35)=1, 2 or 3
Possible values :
0 : Standard UFSC variant with low-rank updates accumulation (LUA)
1 : UCFS variant with low-rank updates accumulation (LUA). This variant consists in performing
the compression earlier in order to further reduce the number of operations. Although it
may have a numerical impact, the current implementation is still compatible with numerical
pivoting.
Other values are treated as 0.
Default value: 0 (UFSC variant).
Related parameters: ICNTL(35) and CNTL(1)
Remarks: If numerical pivoting is not required and thus CNTL(1) can be set to 0.0, further
performance gains can be expected with the UCFS version.

ICNTL(37) controls the BLR compression of the contribution blocks (see Subsection 5.16).
Phase: accessed by the host during the factorization phase when ICNTL(35)=1, 2 or 3
Possible values :
0 : contribution blocks are not compressed
1 : contribution blocks are compressed, reducing the memory consumption at the cost of some
additional operations
Other values are treated as 0.
Default value: 0 (contribution blocks not compressed).
Related parameters: ICNTL(35), CNTL(7)

ICNTL(38) estimates compression rate of LU factors (see Subsection 5.16).

Phase: accessed by the host during the analysis and the factorization phases when ICNTL(35)=1,
2 or 3

86
Possible values : between 0 and 1000 (1000 is no compression and 0 is full compression); other
values are treated as 0; ICNTL(38)/10 is a percentage representing the typical compression of the
compressed factors
factor matrices in BLR fronts: ICNTL(38)/10 = uncompressed factors × 100.
Default value: 600 (when factors of BLR fronts are compressed, their size is 60.0% of their full-
rank size).
Related parameters: ICNTL(35), CNTL(7)
Remarks: Influences statistics provided in INFO(29), INFO(30), INFO(31), INFOG(36),
INFOG(37), INFOG(38), INFOG(39) , but also INFO(32-35) and INFOG(40-43)

ICNTL(39) estimates compression rate of contribution blocks (see Subsection 5.16).

ICNTL(40) reserved in current version

ICNTL(41-48) reserved in current version
ICNTL(49) compact workarray id%S at the end of factorization phase (see Subsection 5.19).
Phase: accessed by the host during factorization phase
Possible values :
0 : nothing is done.
1 : compact workarray id%S(MAXS) at the end of the factorization phase while satisfying the
memory contraint that might have been provided with ICNTL(23) feature.
2 : compact workarray id%S(MAXS) at the end of the factorization phase. The memory
constraint that might have been provided with ICNTL(23) feature does not apply to this
process.
Other values are treated as 0.
Default value: 0
Incompatibility: With the use of LWK USER / WK USER feature.
Remarks: ICNTL(49)=1,2 might require intermediate memory allocation to reallocate id%S of
minimal size. If the memory allocation fails, then a warning is returned and nothing is done. If
ICNTL(49)=1 and the memory constraint provided with ICNTL(23)> 0 would not be satisfied
then a warning is raised and nothing is done.

ICNTL(50) reserved in current version

ICNTL(51) reserved in current version

ICNTL(52-57) reserved in current version
ICNTL(58) defines options for symbolic factorization

87
Phase: accessed by the host during the analysis when centralized ordering is performed,
ICNTL(28)= 0, 1
Possible values :
1: Symbolic factorization based on quotient graph, mixing right looking and left looking updates
2: Column count based symbolic factorization based on [30]
Other values are treated as 2.
Default value: 2
Related parameters: ICNTL(7), ICNTL(28)
Remarks: When symbolic factorization is not performed within the ordering (case of ordering
given, ICNTL(7)=1 or centralized Metis ordering, ICNTL(7)=5) then symbolic factorization
will be automatically performed. When SCOTCH is used, ICNTL(7)=3, a fast block
symbolic factorization (exploiting graph separator information) provided within SCOTCH library,
libesmumps.a, is used.

ICNTL(59-60) not used in current version

6.2 Real/complex control parameters

mumps par%CNTL is a real (also real in the complex version) array of dimension 5.
CNTL(1) is the relative threshold for numerical pivoting. See Subsection 3.9
Phase: accessed by the host during the factorization phase.
Possible values :
< 0.0: values less than 0.0 are treated as 0.0
= 0.0: no numerical pivoting performed and the subroutine will fail if a zero pivot is encountered.
> 0.0: numerical pivoting performed.
For unsymmetric matrices values greater than 1.0 are treated as 1.0
For symmetric matrices values greater than 0.5 are treated as 0.5
Default value:
0.01: for unsymmetric or general symmetric matrices
0.0: for symmetric positive definite matrices
Related parameters: CNTL(4)
Remarks: It forms a trade-off between preserving sparsity and ensuring numerical stability during
the factorization. In general, a larger value of CNTL(1) increases fill-in but leads to a more accurate
factorization.
Note that for diagonally dominant matrix, setting CNTL(1) to zero will decrease the factorization
time while still providing a stable decomposition.

CNTL(2) is the stopping criterion for iterative refinement

Phase: accessed by the host during the solve phase.
Possible values :
√
< 0.0: values < 0 are treated as ϵ, where ϵ holds the machine precision and depends on the
arithmetic version.
≥ 0.0: stopping criterion
√
Default value: ϵ
Related parameters: ICNTL(10), RINFOG(7), RINFOG(8)
Remarks: Let ω1 and ω2 be the backward errors as defined in Subsection 3.3.2. Iterative refinement
(Subsection 5.6) will stop when either the requested accuracy is reached (ω1 + ω2 < CNTL(2))
or when the convergence rate is too slow (ω1 + ω2 does not decrease by at least a factor of 5).

88
CNTL(3) it is used to determine null pivot rows when the null pivot row detection option is enabled
(ICNTL(24) = 1) .
Phase: accessed by the host during the numerical factorization phase.
Possible values : we define the threshold thres as follows
> 0.0: thres = CN T L(3) × ∥Apre ∥
√
= 0.0: thres = ϵ × ∥Apre ∥ × Nh
< 0.0: thres = |CN T L(3)|
where Apre is the preprocessed matrix to be factorized (see Equation (5)), Nh number of variables
on the deepest branch of the elimination tree, ϵ is the machine precision and ∥.∥ is the infinite norm.
Default value: 0.0
Related parameters: ICNTL(24)
Remarks:
– When null pivot row detection is enabled (ICNTL(24)=1). A pivot is considered to be null
if the infinite norm of its row/column is smaller than thres.

CNTL(4) determines the threshold for static pivoting. See Subsection 3.9
Related parameters: CNTL(1), INFOG(25)
Phase: accessed by the host, and must be set either before the factorization phase, or before the
analysis phase.
Possible values :
< 0.0: static pivoting is not activated.
> 0.0: static pivoting is activated and the pivots whose magnitude is smaller than CNTL(4) will be
set to CNTL(4).
= 0.0: static pivoting is activated and the threshold value to define a small pivot is determined
√
automatically. In the current version, this threshold is equal to ϵ × ∥Apre ∥, where Apre is
the preprocessed matrix to be factored (see Equation (5)).
Default value: -1.0 (no static pivoting)
Related parameters: CNTL(1)
Incompatibility: This option is incompatible with null pivot row detection (ICNTL(24)= 1) ) and
will be ignored.
Remarks: By static pivoting (as in [39]) we mean replacing small pivots whose elimination should
be postponed because of partial threshold pivoting and would thus result in an increase of our
estimations (memory and operations), by a small perturbation of the original matrix controlled by
CNTL(4). The number of modified pivots is returned in INFOG(25).

CNTL(5) defines the fixation for null pivots and is effective only when null pivot row detection is active
(ICNTL(24) = 1).
Phase: accessed by the host during the numerical factorization phase.
Possible values :
≤ 0.0: In the symmetric case (SYM = 2), the pivot column of the L factors is set to zero and the pivot
entry in matrix D is set to one.
In the unsymmetric case (SYM = 0), the fixation is automatically set to a large positive value
and the pivot row of the U factors is set to zero.

> 0.0: when a pivot piv is detected as null, in order to limit the impact of this pivot on the rest of
the matrix, it is set to sign(piv) CNTL(5) ×∥Apre ∥, where Apre is the preprocessed matrix
to be factored (see Equation (5)). We recommend setting CNTL(5) to a large floating-point
value (e.g. 1020 ).

89
Default value: 0.0
Related parameters: ICNTL(24)

CNTL(6) is not used in the current version.

CNTL(7) defines the precision of the dropping parameter used during BLR compression (see
Subsection 5.16).
Phase: accessed by the host during the factorization phase when ICNTL(35)=1, 2 or 3
Possible values :
0.0 : full precision approximation.
> 0.0 : the dropping parameter is CNTL(7).
Default value: 0.0 (full precision (i.e., no approximation)).
Related parameters: ICNTL(35)
Remarks: The value of CNTL(7) is used as a stopping criterion for the compression of BLR
blocks which is achieved through a truncated Rank Revealing QR factorization. More precisely, to
compute the low-rank form of a block, we perform a QR factorization with column pivoting which
is stopped as soon as a diagonal coefficient of the R factor falls below the threshold, i.e., when
∥rkk ∥ < ε. This is implemented as a variant of the LAPACK [19] GEQP3 routine. Larger values
of this parameter lead to more compression at the price of a lower accuracy. Note that ϵ is used as
an absolute tolerance, i.e., not relative to the input matrix, or the frontal matrix or the block norms;
for this reason we recommend to scale the matrix or or let the solver automatically preprocess (e.g.,
scale) the input matrix.
Note that, depending on the application, gains can be expected even with small values (close to
machine precision) of CNTL(7).

CNTL(8-15) are not used in the current version.

6.3 Compatibility between options

As shown above, the package has a lot of options and this gives an exponential amount of combinations
of options. Almost all options are indeed compatible with each other but obviously a few of them are not,
either because the implementation of some options is more complicated in some context, or because some
algorithms cannot be applied or do not make sense under certain conditions. For each option and ICNTL
parameter, the list of incompatibilities is normally given in the description of the option. The objective of
this section is to provide to the user a more global view of the main incompatibilities.
Table 2 highlights the incompatibilities between functionalities and matrix input formats
(functionalities which do not appear in this table are compatible with all matrix input formats).
In Table 3, we present the numerical limitations of the solver when ScaLAPACK is used on the final
dense Schur complement of MUMPS.
Regarding the solve phase (JOB=3), iterative refinement and error analysis are incompatible with
some options, as reported in Table 4. Although iterative refinement and error analysis could be performed
externally to the package, they are provided by the package for convenience for all matrix formats but not
for all situations. For example, they do not really make sense when computing something different from
the solution of Ax = b (e.g. entries of the inverse, null space basis, only forward substitution performed),
or when the factors have been discarded during factorization.
Finally, note that orderings available for the sequential and the parallel analysis phase (see
ICNTL(28)) are controlled by two different parameters (ICNTL(7) and ICNTL(29)), which have
a different range of allowed values, so there is no incompatibility as such. But orderings based on
minimum-degree (for example) are only available with the sequential analysis.

90
Functionality (Control) Matrix input format (ICNTL(18) and ICNTL(5))
Centralised Distributed assembled
Assembled Elemental (distr. elemental not avail.)
Unsymmetric (ICNTL(6)) All options Not available Not available
permutations (ICNTL(6)=0) (ICNTL(6)=0)
Scalings (ICNTL(8)) All options Only option 1 Only options 7, 8, or 1
(user-provided) (user-provided)
Constrained/com- (ICNTL(12)) All options Not available Not available
pressed orderings (ICNTL(12)=0) (ICNTL(12)=0)
Type of analysis (ICNTL(28)) Seq. or parallel Sequential only Sequential or parallel
Schur complement (ICNTL(19)) All options except All options except
parallel analysis parallel analysis
Block Low-Rank (ICNTL(35)=1,2,3) All options except Not available All options except
ICNTL(32)=1 ICNTL(32)=1
(fwd in facto) (fwd in facto)

Table 2: Compatibilities between MUMPS functionalities and matrix-input formats.

SCALAPACK
OFF ON

Null pivot list ok null pivots on root node not available

(ICNTL(24)) and failure if exact null pivot on root

LDLT factorization ok ok but LU /PDGETRF performed on root

(SYM=2) ok node (no Scalapack LDLT kernel)

Number of negative ok lowerbound (negative pivots

pivots (INFOG(12)) not counted on root node)

Table 3: MUMPS relies on ScaLAPACK to factorize the last dense Schur complement. If exact inertia
(number of negative pivots) or null pivot list is critical, ScaLAPACK can be switched off, see ICNTL(13)
although this might imply a small performance degradation.

91
iterative refinement error analysis
Functionality Control ICNTL(10) ICNTL(11)

Multiple right-hand sides NRHS > 1 Incomp. Incomp.

Distributed solution ICNTL(21) Incomp. Incomp.

Forward during factorization ICNTL(32) Incomp. Incomp.

Reduced right-hand sides/ ICNTL(26)=1 Incomp. Incomp.

partial solution ICNTL(26)=2 Incomp. Incomp.

Discard some factors ICNTL(31) Incomp. Incomp.

Compute null space basis ICNTL(25) Incomp. Incomp.

Entries of A−1 ICNTL(30) Incomp. Incomp.

Table 4: List of incompatibilities with postprocessing options at the end of the solve phase.

7 Information parameters
The parameters described in this section are returned by MUMPS and hold information that may be of
interest to the user. Some of the information is local to each processor and some only on the host. If an
error is detected (see Section 8), the information may be incomplete.

7.1 Information local to each processor

The arrays mumps par%RINFO and mumps par%INFO are local to each process.

mumps par%RINFO is a double precision array of dimension 20. It contains the following local
information on the execution of MUMPS:
RINFO(1) - after analysis: The estimated number of floating-point operations on the processor for the
elimination process.
RINFO(2) - after factorization: The number of floating-point operations on the processor for the
assembly process.
RINFO(3) - after factorization: The number of floating-point operations on the processor for the
elimination process. In case the BLR feature is activated (ICNTL(35)=1, 2 or 3), RINFO(3)
represents the theoretical number of operations for the standard full-rank factorization.
RINFO(4) - after factorization: The effective number of floating-point operations on the processor
for the elimination process. It is equal to RINFO(3) when the BLR feature is not activated
(ICNTL(35)=0) and will typically be smaller than RINFO(3) when the BLR feature is activated
and leads to compression.
RINFO(5) - after analysis: if the user decides to perform an out-of-core factorization
(ICNTL(22)=1), then a rough estimation of the size of the disk space in MegaBytes of the
files written by the concerned processor is provided in RINFO(5). If the analysis is full-
rank (ICNTL(35)=0 for the analysis step), then the factorization is necessarily full-rank so that
RINFO(5) is computed for a full-rank factorization (ICNTL(35)=0 also for the factorization).
If ICNTL(35)=1, 2 or 3 at analysis, then RINFO(5) is computed assuming a low-rank (in-
core) storage of the factors of the BLR fronts during the factorization (ICNTL(35)=1 or 2 during
factorization). In case ICNTL(35)=1, 2 or 3 at analysis and the factors are stored in full-rank
format (ICNTL(35)=0 or 3 for the factorization), we refer the user to INFO(3) in order to obtain
a rough estimate of the necessary disk space for the concerned processor.

92
The effective size in MegaBytes of the files written by the current processor will be returned in
RINFO(6), but only after the factorization. The total estimated disk space (sum of the values of
RINFO(5) over all processors) is returned in RINFOG(15).
RINFO(6) - after factorization: in the case of an out-of-core execution (ICNTL(22)=1), the size in
MegaBytes of the disk space used by the files written by the concerned processor is provided. The
total disk space (for all processors) is returned in RINFOG(16).
RINFO(7) - after each job: The size (in MegaBytes) of the file used to save the data on the processor
(See Subsection 5.17).
RINFO(8) - after each job: The size (in MegaBytes) of the MUMPS strucuture.
RINFO(9) - RINFO(40) are not used in the current version.

mumps par%INFO is an integer array of dimension 80. It contains the following local information on
the execution of MUMPS:
INFO(1) is 0 if the call to MUMPS was successful, negative if an error occurred (see Section 8), or
positive if a warning is returned. In particular, after successfully saving or restoring an instance
(call to MUMPS with JOB=7 or JOB=8), INFO(1) will be 0 even if INFO(1) was different from
0 at the moment of saving the MUMPS instance to disk.
INFO(2) holds additional information about the error or the warning. If INFO(1) = -1, INFO(2) is
the processor number (in communicator COMM) on which the error was detected.
INFO(3) - after analysis: Estimated size of the real/complex space needed on the processor to
store the factors, assuming the factors are stored in full-rank format (ICNTL(35)=0 or 3 during
factorization). If INFO(3) is negative, then its absolute value corresponds to millions of
real/complex entries used to store the factor matrices. Assuming that the factors will be stored
in full-rank format during the factorization (ICNTL(35)=0 or 3), a rough estimation of the size of
the disk space in bytes of the files written by the concerned processor can be obtained by multiplying
INFO(3) (or its absolute value multiplied by 1 million when negative) by 4, 8, 8, or 16 for single
precision, double precision, single complex, and double complex arithmetics, respectively. See also
RINFO(5).
Note that, when all factors are discarded (ICNTL(31)=1), INFO(3) corresponds to the factors
storage if factors were not discarded (rather than 0). However, if only the L factor is discarded
(case of forward substitution during factorization, ICNTL(32)=1, or case of ICNTL(31)=2),
then INFO(3) corresponds to the factor storage excluding L.
The effective size of the real/complex space needed to store the factors will be returned in INFO(9)
(see below), but only after the factorization. Furthermore, after an out-of-core factorization
(ICNTL(22)=1), the size of the disk space for the files written by the local processor is returned
in RINFO(6). Finally, the total estimated size of the full-rank factors for all processors (sum of
the INFO(3) values over all processors) is returned in INFOG(3).
INFO(4) - after analysis: Estimated integer space needed on the processor for factors (assuming a
full-rank storage for the factors)
INFO(5) - after analysis: Estimated maximum front size on the processor.
INFO(6) - after analysis: Number of nodes in the complete tree. The same value is returned on all
processors.
INFO(7) - after analysis: Minimum estimated size of the main internal integer workarray IS to run the
numerical factorization in-core .
INFO(8) - after analysis: Minimum estimated size of the main internal real/complex workarray S to
run the numerical factorization in-core when factors are stored full-rank (ICNTL(35)=0 or 3).
If negative, then the absolute value corresponds to millions of real/complex entries needed in this
workarray. It is also the estimated minimum size of LWK USER in that case, if the user provides
WK USER.

93
INFO(9) - after factorization: Size of the real/complex space used on the processor to store the factor
matrices, possibly including low-rank factor matrices (ICNTL(35)=1 or 2). If negative, then the
absolute value corresponds to millions of real/complex entries used to store the factor matrices.
Finally, the total size of the factor matrices for all processors (sum of the INFO(9) values over all
processors) is returned in INFOG(9).
INFO(10) - after factorization: Size of the integer space used on the processor to store the factor
matrices.
INFO(11) - after factorization: Order of the largest frontal matrix processed on the processor.
INFO(12) - after factorization: Number of off-diagonal pivots selected on the processor if SYM=0
or number of negative pivots on the processor if SYM=1 or 2. If ICNTL(13)=0 (the default),
this excludes pivots from the parallel root node treated by ScaLAPACK. (This means that the user
should set ICNTL(13)=1 or use a single processor in order to get the exact number of off-diagonal
or negative pivots rather than a lower bound.) Furthermore, when ICNTL(24) is set to 1 and
SYM=1 or 2, INFOG(12) excludes the null12 pivots, even if their sign is negative. In other words,
a pivot cannot be both null and negative. Note that for complex symmetric matrices (SYM=1 or 2),
INFO(12) will be 0. See also INFOG(12), which provides the total number of off-diagonal
or negative pivots over all processors. For real symmetric matrices, see also INFO(40) and
INFOG(50), which provide the local (resp. global) number of negative pivots among the null
pivots detected when ICNTL(24) is activated.
INFO(13) - after factorization: The number of postponed elimination because of numerical issues.
INFO(14) - after factorization: Number of memory compresses.
INFO(15) - after analysis: estimated size in MegaBytes (millions of bytes) of all working space
to perform full-rank numerical phases (factorization/solve) in-core (ICNTL(22)=0 for the
factorization). The maximum and sum over all processors are returned respectively in INFOG(16)
and INFOG(17). See also INFO(22) which provides the actual memory that was needed but
only after factorization.
INFO(16) - after factorization: total size (in millions of bytes) of all MUMPS internal data allocated
during the numerical factorization. This excludes the memory for WK USER, in the case where
WK USER is provided. The maximum and sum over all processors are returned respectively in
INFOG(18) and INFOG(19).
INFO(17) - after analysis: estimated size in MegaBytes (millions of bytes) of all working space to
run the numerical phases out-of-core (ICNTL(22)̸=0) with the default strategy. The maximum
and sum over all processors are returned respectively in INFOG(26) and INFOG(27). See also
INFO(22) which provides the actual memory that was needed but only after factorization.
INFO(18) - after factorization: local number of null pivot rows detected locally when ICNTL(24)=1
or ICNTL(56)= 1.
INFO(19) - after analysis: Estimated size of the main internal integer workarray IS to run the
numerical factorization out-of-core .
INFO(20) - after analysis: Estimated size of the main internal real/complex workarray S to run the
numerical factorization out-of-core . If negative, then the absolute value corresponds to millions of
real/complex entries needed in this workarray. It is also the estimated minimum size of LWK USER
in that case, if the user provides WK USER.
INFO(21) - after factorization: Effective space used in the main real/complex workarray S– or in the
workarray WK USER, in the case where WK USER is provided. If negative, then the absolute value
corresponds to millions of real/complex entries needed in this workarray.
INFO(22) - after factorization: Size in millions of bytes of memory effectively used during
factorization. This includes the part of the memory effectively used from the workarray WK USER,
in the case where WK USER is provided. The maximum and sum over all processors are
returned respectively in INFOG(21) and INFOG(22). The difference between estimated and
effective memory may results from numerical pivoting difficulties, parallelism and BLR effective
compression rates.
12 i.e., whose magnitude is smaller than the tolerance defined by CNTL(3).

94
INFO(23) - after factorization: total number of pivots eliminated on the processor. In the case of a
distributed solution (see ICNTL(21)), this should be used by the user to allocate solution vectors
ISOL loc and SOL loc of appropriate dimensions (ISOL loc of size INFO(23), SOL loc
of size LSOL loc × NRHS where LSOL loc ≥ INFO(23)) on that processor, between the
factorization and solve steps.
INFO(24) - after analysis: estimated number of entries in the factor matrices on the processor. If
negative, then the absolute value corresponds to millions of entries in the factors. Note that in the
unsymmetric case, INFO(24)=INFO(3). In the symmetric case, however, INFO(24) < INFO(3).
The total number of entries in the factor matrices for all processors (sum of the INFO(24) values
over all processors) is returned in INFOG(20)
INFO(25) - after factorization: number of tiny pivots (number of pivots modified by static pivoting)
detected on the processor (see INFOG(25) for the the total number of tiny pivots).
INFO(26) - after solution: effective size in MegaBytes (millions of bytes) of all working space to run
the solution phase. (The maximum and sum over all processors are returned in INFOG(30) and
INFOG(31), respectively).
INFO(27) - after factorization: effective number of entries in factor matrices assuming full-rank
factorization has been performed. If negative, then the absolute value corresponds to millions of
entries in the factors. Note that in case full-rank storage of factors (ICNTL(35)=0 or 3), we have
INFO(27)=INFO(9) in the unsymmetric case and INFO(27) ≤ INFO(9) in the symmetric case.
The sum of INFO(27) over all processors is available in INFOG(29).
INFO(28) - after factorization: effective number of entries in factors on the processor taking into
account BLR compression. If negative, then the absolute value corresponds to millions of entries in
the factors. It is equal to INFO(27) when BLR functionality (see ICNTL(35)) is not activated
or leads to no compression.
INFO(29) - after analysis: minimum estimated size of the main internal real/complex workarray S
to run the numerical factorization in-core when factors are stored low-rank (ICNTL(35)=1,2).
If negative, then the absolute value corresponds to millions of real/complex entries needed in this
workarray. It is also the estimated minimum size of LWK USER in that case, if the user provides
WK USER.
INFO(30) and INFO(31) - after analysis: estimated size in MegaBytes (millions of bytes) of all
working space to perform low-rank numerical phases (factorization/solve) with low-rank factors
(ICNTL(35)=1,2) and estimated compression rate given by ICNTL(38).
• —– (30) in-core factorization and solve The maximum and sum over all processors are
returned respectively in INFOG(36) and INFOG(37).
• —– (31) out-of-core factorization and solve The maximum and sum over all processors are
returned respectively in INFOG(38) and INFOG(39).
See also INFO(22) which provides the actual memory that was needed but only after
factorization. Numerical pivoting difficulties and the effective compression of the factors (in case
ICNTL(35)=1,2) typically impact the difference between estimated and effective memory.
INFO(32) - after analysis: minimum estimated size of the main internal real/complex workarray S
to run the numerical factorization in-core when factors and contribution blocks are stored low-
rank (ICNTL(35)=1,2 and ICNTL(37)=1). If negative, then the absolute value corresponds to
millions of real/complex entries needed in this workarray. It is also the estimated minimum size of
LWK USER in that case, if the user provides WK USER.
INFO(33) - after analysis: minimum estimated size of the main internal real/complex workarray S to
run the numerical factorization out-of-core when factors and contribution blocks are stored low-
rank (ICNTL(35)=1,2 and ICNTL(37)=1). If negative, then the absolute value corresponds to
millions of real/complex entries needed in this workarray. It is also the estimated minimum size of
LWK USER in that case, if the user provides WK USER.
INFO(34) and INFO(35) - after analysis: estimated size in MegaBytes (millions of bytes) of
all working space to perform low-rank numerical phases (factorization/solve) with low-rank
factors and low-rank contribution blocks (ICNTL(35)=1,2 and ICNTL(37)=1) and estimated
compression rates given by ICNTL(38) and ICNTL(39) relatively.

95
• —– (34) in-core factorization and solve. The maximum and sum over all processors are
returned respectively in INFOG(40) and INFOG(41).
• —– (35) out-of-core factorization and solve The maximum and sum over all processors are
returned respectively in INFOG(42) and INFOG(43).
See also INFO(22) which provides the actual memory that was needed but only after factorization.
INFO(36) - after analysis: minimum estimated size of the main internal real/complex workarray
S to run the numerical factorization out-of-core when contribution blocks are stored low-rank
(ICNTL(35)=0,3 and ICNTL(37)=1). If negative, then the absolute value corresponds to
millions of real/complex entries needed in this workarray. It is also the estimated minimum size
of LWK USER in that case, if the user provides WK USER.
INFO(37) and INFO(38) - after analysis: estimated size in MegaBytes (millions of bytes) of
all working space to perform low-rank numerical phases (factorization/solve) with low-rank
contribution blocks only (ICNTL(35)=0,3 and ICNTL(37)=1) and estimated compression rate
given by ICNTL(39).
• —– (37) in-core factorization and solve. The maximum and sum over all processors are
returned respectively in INFOG(44) and INFOG(45).
• —– (38) out-of-core factorization and solve The maximum and sum over all processors are
returned respectively in INFOG(46) and INFOG(47).
See also INFO(22) which provides the actual memory that was needed but only after factorization.
INFO(39) - after factorization: effective size of the main internal real/complex workarray S (allocated
internally or by the user when WK USER is provided) to run the numerical factorization. If negative,
then the absolute value corresponds to millions of real/complex entries needed in this workarray.
INFO(40) - after factorization: can only be nonzero for real symmetric matrices, in case the null pivot
row detection (see ICNTL(24)) feature is activated. INFO(40) is the number of negative pivots
among the null pivots/deficiency detected. Note that, for singular matrices, INFO(40) may vary
from one run to another due to floating-point rounding effects. A pivot counted in INFO(12),
the number of negative non-null pivots, will not be counted in INFO(40). See also INFOG(28)
which provides the number of null pivots/deficiency over all processors and INFOG(50), which
provides the number of negative null pivots over all processors.
INFO(41) - INFO(80) are not used in the current version.

7.2 Information available on all processors

The arrays mumps par%RINFOG and mumps par%INFOG :

mumps par%RINFOG is a double precision array of dimension 20. It contains the following global
information on the execution of MUMPS:
RINFOG(1) - after analysis: the estimated number of floating-point operations (on all processors) for
the elimination process.
RINFOG(2) - after factorization: the total number of floating-point operations (on all processors) for
the assembly process.
RINFOG(3) - after factorization: the total number of floating-point operations (on all processors) for
the elimination process. In case the BLR feature is activated (ICNTL(35)=1, 2 or 3), RINFOG(3)
represents the theoretical number of operations for the standard full-rank factorization.
RINFOG(4) to RINFOG(8) - after solve with error analysis: Only returned if ICNTL(11) = 1 or 2.
See description in Subsection 5.7 .
RINFOG(9) to RINFOG(11) - after solve with error analysis: Only returned if ICNTL(11) = 2.
See description in Subsection 5.7 .
RINFOG(12) - after factorization: if the computation of the determinant was requested (see
ICNTL(33)), RINFOG(12) contains the real part of the determinant. The determinant may
contain an imaginary part in case of complex arithmetic (see RINFOG(13)). It is obtained by
multiplying (RINFOG(12), RINFOG(13)) by 2 to the power INFOG(34).

96
RINFOG(13) - after factorization: if the computation of the determinant was requested (see
ICNTL(33)), RINFOG(13) contains the imaginary part of the determinant. The determinant
is then obtained by multiplying (RINFOG(12), RINFOG(13)) by 2 to the power INFOG(34).
RINFOG(14) - after factorization: the total effective number of floating-point operations (on all
processors) for the elimination process. It is equal to RINFOG(3) when the BLR feature is
not activated (ICNTL(35)=0) and will typically be smaller than RINFOG(3) when the BLR
functionality is activated and leads to compression.
RINFOG(15) - after analysis: if the user decides to perform an out-of-core factorization
(ICNTL(22)=1), then a rough estimation of the total size of the disk space in MegaBytes of
the files written by all processors is provided in RINFOG(15). If the analysis is full-rank
(ICNTL(35)=0 for the analysis step), then the factorization is necessarily full-rank so that
RINFOG(15) is computed for a full-rank factorization (ICNTL(35)=0 also for the factorization).
If ICNTL(35)=1, 2 or 3 at analysis, then RINFOG(15) is computed assuming a low-rank (in-
core) storage of the factors of the BLR fronts during the factorization (ICNTL(35)=2 during
factorization). In case ICNTL(35)=1, 2 or 3 for the analysis and the factors will be stored in full-
rank format (ICNTL(35)=0 or 3 for the factorization), we refer the user to INFOG(3) in order
to obtain a rough estimate of the necessary disk space for all processors.
The effective size in Megabytes of the files written by all processors will be returned in
RINFOG(16), but only after the factorization.
RINFOG(16) - after factorization: in the case of an out-of-core execution (ICNTL(22)=1), the total
size in MegaBytes of the disk space used by the files written by all processors is provided.
RINFOG(17) - after each job: sum over all processors of the sizes (in MegaBytes) of the files used to
save the instance (See Subsection 5.17).
RINFOG(18) - after each job: sum over all processors of the sizes (in MegaBytes) of the MUMPS
structures.
RINFOG(19) - RINFOG(40) are not used in the current version.

mumps par%INFOG is an integer array of dimension 80. It contains the following global information on
the execution of MUMPS:
INFOG(1) is 0 if the last call to MUMPS was successful, negative if an error occurred (see Section 8),
or positive if a warning is returned. In particular, after successfully saving or restoring an instance
(call to MUMPS with JOB=7 or JOB=8), INFOG(1) will be 0 even if INFOG(1) was different
from 0 at the moment of saving the MUMPS instance to disk.
INFOG(2) holds additional information about the error or the warning.
The difference between INFOG(1:2) and INFO(1:2) is that INFOG(1:2) is identical on all processors. It
has the value of INFO(1:2) of the processor which returned with the most negative INFO(1) value. For
example, if processor p returns with INFO(1)=-13, and INFO(2)=10000, then all other processors will
return with INFOG(1)=-13 and INFOG(2)=10000, and with INFO(1)=-1 and INFO(2)=p.
INFOG(3) - after analysis: total (sum over all processors) estimated real/complex workspace to
store the factors, assuming the factors are stored in full-rank format (ICNTL(35)=0 or 3). If
INFOG(3) is negative, then its absolute value corresponds to millions of real/complex entries
used to store the factor matrices. Assuming that the factors will be stored in full-rank format during
the factorization (ICNTL(35)=0 or 3), a rough estimation of the size of the disk space in bytes
of the files written all processors can be obtained by multiplying INFOG(3) (or its absolute value
multiplied by 1 million when negative) by 4, 8, 8, or 16 for single precision, double precision,
single complex, and double complex arithmetics, respectively. See also RINFOG(15).
Note that, when all factors are discarded (ICNTL(31)=1), INFOG(3) corresponds to the factors
storage if factors were not discarded (rather than 0). However, if only the L factor is discarded
(case of forward substitution during factorization, ICNTL(32)=1, or case of ICNTL(31)=2),
then INFO(3) corresponds to the factor storage excluding L.
The effective size of the real/complex space needed will be returned in INFOG(9) (see below),
but only after the factorization. Furthermore, after an out-of-core factorization, the size of the disk
space for the files written by all processors is returned in RINFOG(16).

97
INFOG(4) - after analysis: total (sum over all processors) estimated integer workspace to store the
factor matrices (assuming a full-rank storage of the factors). If INFOG(4) is negative, then its
absolute value corresponds to millions of integer entries used to store the factor matrices.
INFOG(5) - after analysis: estimated maximum front size in the complete tree.
INFOG(6) - after analysis: number of nodes in the complete tree.
INFOG(7) - after analysis: the ordering method actually used. The returned value will depend on
the type of analysis performed, e.g. sequential or parallel (see INFOG(32)). Please refer to
ICNTL(7) and ICNTL(29) for more details on the ordering methods available in sequential and
parallel analysis respectively.
INFOG(8) - after analysis: structural symmetry in percent (100 : symmetric, 0 : fully unsymmetric)
of the (permuted) matrix, -1 indicates that the structural symmetry was not computed (which
will be the case if the input matrix is in elemental form or if analysis by block was performed
(ICNTL(15))).
INFOG(9) - after factorization: total (sum over all processors) real/complex workspace to store the
factor matrices, possibly including low-rank factor matrices (ICNTL(35)=2). If negative, then
the absolute value corresponds to the size in millions of real/complex entries used to store the factor
matrices.
INFOG(10) - after factorization: total (sum over all processors) integer workspace to store the factor
matrices. If negative the absolute value corresponds to millions of integer entries in the integer
workspace to store the factor matrices.
INFOG(11) - after factorization: order of largest frontal matrix.
INFOG(12) - after factorization: total number of off-diagonal pivots if SYM=0 or total number of
negative pivots (real arithmetic) if SYM=1 or 2. If ICNTL(13)=0 (the default) this excludes
pivots from the parallel root node treated by ScaLAPACK. (This means that the user should set
ICNTL(13) to a positive value, say 1, or use a single processor in order to get the exact number
of off-diagonal or negative pivots rather than a lower bound.) Furthermore, when ICNTL(24) is
set to 1 and SYM=1 or 2, INFOG(12) excludes the null13 pivots, even if their sign is negative. In
other words, a pivot cannot be both null and negative.
Note that if SYM=1 or 2, INFOG(12) will be 0 for complex symmetric matrices. For real
symmetric matrices, see also INFOG(40), provides the number of negative pivots among the
null pivots detected when ICNTL(24) is activated.
INFOG(13) - after factorization: total number of delayed pivots. A variable of the original matrix may
be delayed several times between successive frontal matrices. In that case, it accounts for several
delayed pivots. A large number (more that 10% of the order of the matrix) indicates numerical
problems. Settings related to numerical preprocessing (ICNTL(6),ICNTL(8), ICNTL(12))
might then be modified by the user.
INFOG(14) - after factorization: total number of memory compresses.
INFOG(15) - after solution: number of steps of iterative refinement.
INFOG(16) and INFOG(17) - after analysis: estimated size (in million of bytes) of all MUMPS
internal data for running full-rank factorization in-core for a given value of ICNTL(14).
• —– (16) : max over all processors
• —– (17) : sum over all processors.
INFOG(18) and INFOG(19) - after factorization: size in millions of bytes of all MUMPS internal data
allocated during factorization.
• —– (18) : max over all processors
• —– (19) : sum over all processors.
Note that in the case where WK USER is provided, the memory allocated by the user for the local
arrays WK USER is not counted in INFOG(18) and INFOG(19).
13 i.e., whose magnitude is smaller than the tolerance defined by CNTL(3).

98
INFOG(20) - after analysis: estimated number of entries in the factors assuming full-rank factorization.
If negative the absolute value corresponds to millions of entries in the factors. Note that in the
unsymmetric case, INFOG(20)=INFOG(3). In the symmetric case, however, INFOG(20) <
INFOG(3).
INFOG(21) and INFOG(22) - after factorization: size in millions of bytes of memory effectively
used during factorization.
• —– (21) : max over all processors
• —– (22) : sum over all processors.
This includes the memory effectively used in the local workarrays WK USER, in the case where the
arrays WK USER are provided.
INFOG(23) - after analysis: value of ICNTL(6) effectively used.
INFOG(24) - after analysis: value of ICNTL(12) effectively used.
INFOG(25) - after factorization: number of tiny pivots (number of pivots modified by static pivoting)
INFOG(26) and INFOG(27) - after analysis: estimated size (in millions of bytes) of all MUMPS
internal data for running full-rank factorization out-of-core (ICNTL(22)̸= 0) for a given value
of ICNTL(14).
• —– (26) : max over all processors
• —– (27) : sum over all processors
INFOG(28) - after factorization: number of null pivot rows encountered. See ICNTL(24) and
CNTL(3) for the definition of a null pivot row.
INFOG(29) - after factorization: effective number of entries in the factor matrices (sum over all
processors) assuming that full-rank factorization has been performed. If negative, then the absolute
value corresponds to millions of entries in the factors. Note that in case the factor matrices are
stored full-rank (ICNTL(35)=0 or 3), we have INFOG(29)=INFOG(9) in the unsymmetric case
and INFOG(29) ≤ INFOG(9) in the symmetric case.
INFOG(30) and INFOG(31) - after solution: size in millions of bytes of memory effectively used
during solution phase:
• —– (30) : max over all processors
• —– (31) : sum over all processors
INFOG(32) - after analysis: the type of analysis actually done (see ICNTL(28)). INFOG(32) has
value 1 if sequential analysis was performed, in which case INFOG(7) returns the sequential
ordering option used, as defined by ICNTL(7). INFOG(32) has value 2 if parallel analysis
was performed, in which case INFOG(7) returns the parallel ordering used, as defined by
ICNTL(29).
INFOG(33): effective value used for ICNTL(8). It is set both after the analysis and the factorization
phases. If ICNTL(8)=77 on entry to the analysis and INFOG(33) has value 77 on exit from the
analysis, then no scaling was computed during the analysis and the automatic decision will only be
done during factorization (except if the user modifies ICNTL(8) to set a specific option on entry
to the factorization).
INFOG(34): if the computation of the determinant was requested (see ICNTL(33)), INFOG(34)
contains the exponent of the determinant. See also RINFOG(12) and RINFOG(13): the
determinant is obtained by multiplying (RINFOG(12), RINFOG(13)) by 2 to the power
INFOG(34).
INFOG(35) - after factorization: effective number of entries in the factors (sum over all processors)
taking into account BLR factor compression. If negative, then the absolute value corresponds
to millions of entries in the factors. It is equal to INFOG(29) when BLR functionality (see
ICNTL(35)) is not activated or leads to no compression.
INFOG(36), INFOG(37), INFOG(38), and INFOG(39) - after analysis: estimated size (in million
of bytes) of all MUMPS internal data for running low-rank factorization with low-rank factors for a
given value of ICNTL(14) and ICNTL(38).

99
• in-core
. —– (36) : max over all processors
. —– (37) : sum over all processors.
• out-of-core
. —– (38) : max over all processors
. —– (39) : sum over all processors.
INFOG(40), INFOG(41), INFOG(42), and INFOG(43) - after analysis: estimated size (in million
of bytes) of all MUMPS internal data for running low-rank factorization with low-rank factors and
low-rank contribution blocks for a given value of ICNTL(14), ICNTL(38) and ICNTL(39).
• in-core
. —– (40) : value on the most memory consuming processor.
. —– (41) : sum over all processors.
• out-of-core
. —– (42) : value on the most memory consuming processor.
. —– (43) : sum over all processors.
INFOG(44), INFOG(45), INFOG(46), and INFOG(47) - after analysis: estimated size (in million
of bytes) of all MUMPS internal data for running low-rank factorization with low-rank low-rank
contribution blocks only for a given value of ICNTL(14) and ICNTL(39).
• in-core
. —– (44) : value on the most memory consuming processor.
. —– (45) : sum over all processors.
• out-of-core
. —– (46) : value on the most memory consuming processor.
. —– (47) : sum over all processors.
INFOG(48) - INFOG(49) are reserved.
INFOG(50) - after factorization: can only be nonzero for symmetric matrices, in case the null pivot row
detection (see ICNTL(24)) feature is activated. INFOG(50) is the total, over all MPI processes,
number of negative pivots among the null pivots/deficiency detected. Note that, for singular
matrices, INFOG(50) may vary from one run to another due to floating-point rounding effects.
A pivot counted in INFOG(12), the number of negative non-null pivots, will not be counted in
INFOG(50). See also INFOG(28), the number of null pivots/deficiency, and INFO(40), the
local static for INFOG(50) on a given MPI process.
INFOG(51) - INFOG(80) are not used in the current version.

8 Error and warning diagnostics

MUMPS uses the following mechanism to process errors that may occur during the parallel execution of
the code. If, during a call to MUMPS, an error occurs on a processor, this processor informs all the other
processors before they return from the call. In parts of the code where messages are sent asynchronously
(for example the factorization and solve phases), the processor on which the error occurs sends a message
to the other processors with a specific error tag. On the other hand, if the error occurs in a subroutine that
does not use asynchronous communication, the processor propagates the error to the other processors.
On successful completion, a call to MUMPS will exit with the parameter mumps par%INFOG(1) set
to zero. A negative value for mumps par%INFOG(1) indicates that an error has been detected on one
of the processors. For example, if processor s returns with INFO(1) = -8 and INFO(2)=1000, then
processor s ran out of integer workspace during the factorization and the size of the workspace should be
increased by 1000 at least. The other processors are informed about this error and return with INFO(1)
= -1 (i.e., an error occurred on another processor) and INFO(2)=s (i.e., the error occurred on processor
s). If several processors raised an error, those processors do not overwrite INFO(1), i.e., only processors

100
that did not produce an error will set INFO(1) to -1 and INFO(2) to the rank of the processor having
the most negative error code.
The behaviour is slightly different for the global information parameters INFOG(1) and INFOG(2):
in the previous example, all processors would return with INFOG(1) = -8 and INFOG(2)=1000.
The possible error codes returned in INFO(1) (and INFOG(1)) have the following meaning:
-1 An error occurred on processor INFO(2).
P P
loc or
-2 NNZ/NZ, NNZ loc/NZ P NNZ loc/ NZ loc are out of range.
P INFO(2)=NNZ/NZ,
NNZ loc/NZ loc or NNZ loc/ NZ loc.
-3 MUMPS was called with an invalid value for JOB. This may happen if the analysis (JOB=1) was not
performed (or failed) before the factorization (JOB=2), or the factorization was not performed (or
failed) before the solve (JOB=3), or the initialization phase (JOB=-1) was performed a second time
on an instance not freed (JOB=-2). See description of JOB in Section 4. This error also occurs if
JOB does not contain the same value on all processes on entry to MUMPS. INFO(2) is then set to
the local value of JOB.
-4 Error in user-provided permutation array PERM IN at position INFO(2). This error may only occur
on the host.
-5 Problem of real workspace allocation of size INFO(2) during analysis. The unit for INFO(2)
is the number of real values (single precision for SMUMPS/CMUMPS, double precision for
DMUMPS/ZMUMPS), in the Fortran “ALLOCATE” statement that did not succeed. If INFO(2)
is negative, then its absolute value should be multiplied by 1 million.
-6 Matrix is singular in structure. INFO(2) holds the structural rank.
-7 Problem of integer workspace allocation of size INFO(2) during analysis. The unit for INFO(2) is
the number of integer values that MUMPS tried to allocate in the Fortran ALLOCATE statement that
did not succeed. If INFO(2) is negative, then its absolute value should be multiplied by 1 million.
-8 Main internal integer workarray IS too small for factorization. This may happen, for example, if
numerical pivoting leads to significantly more fill-in than was predicted by the analysis. The user
should increase the value of ICNTL(14) before calling the factorization again (JOB=2).
-9 The main internal real/complex workarray S is too small. If INFO(2) is positive, then the number
of entries that are missing in S at the moment when the error is raised is available in INFO(2).
If INFO(2) is negative, then its absolute value should be multiplied by 1 million. If an error –9
occurs, the user should increase the value of ICNTL(14) before calling the factorization (JOB=2)
again, except if LWK USER is provided LWK USER should be increased.
-10 Numerically singular matrix. INFO(2) holds the number of eliminated pivots.
-11 Internal real/complex workarray S or LWK USER too small for solution. If INFO(2) is positive,
then the number of entries that are missing in S/LWK USER at the moment when the error is raised
is available in INFO(2). If the numerical phases are out-of-core and LWK USER is provided for
the solution phase and is smaller than the value provided for the factorization, it should be increased
by at least INFO(2). In other cases, please contact us.
-12 Internal real/complex workarray S too small for iterative refinement. Please contact us.
-13 Problem of workspace allocation of size INFO(2) during the factorization or solve steps. The size
that the package tried to allocate with a Fortran ALLOCATE statement is available in INFO(2).
If INFO(2) is negative, then the size that the package requested is obtained by multiplying the
absolute value of INFO(2) by 1 million. In general, the unit for INFO(2) is the number of scalar
entries of the type of the input matrix (real, complex, single or double precision).
-14 Internal integer workarray IS too small for solution. See error INFO(1) = -8.
-15 Integer workarray IS too small for iterative refinement and/or error analysis. See error INFO(1) =
-8.
-16 N is out of range. INFO(2)=N.
-17 The internal send buffer that was allocated dynamically by MUMPS on the processor is too small.
The user should increase the value of ICNTL(14) before calling MUMPS again.

101
-18 The blocking size for multiple RHS (ICNTL(27)) is too large and may lead to an integer overflow.
This error may only occurs for very large matrices with large values of ICNTL(27) (e.g., several
thousands). INFO(2) provides an estimate of the maximum value of ICNTL(27) that should be
used.
-19 The maximum allowed size of working memory ICNTL(23) is too small to run the factorization
phase and should be increased. If INFO(2) is positive, then the number of entries that are missing
at the moment when the error is raised is available in INFO(2). If INFO(2) is negative, then its
absolute value should be multiplied by 1 million.
-20 The internal reception buffer that was allocated dynamically by MUMPS is too small. Normally, this
error is raised on the sender side when detecting that the message to be sent is too large for the
reception buffer on the receiver. INFO(2) holds the minimum size of the reception buffer required
(in bytes). The user should increase the value of ICNTL(14) before calling MUMPS again.
-21 Value of PAR=0 is not allowed because only one processor is available; Running MUMPS in host-
node mode (the host is not a slave processor itself) requires at least two processors. The user should
either set PAR to 1 or increase the number of processors.
-22 A pointer array is provided by the user that is either
• not associated, or
• has insufficient size, or
• is associated and should not be associated (for example, RHS on non-host processors).
INFO(2) points to the incorrect pointer array in the table below:
INFO(2) array
1 IRN or ELTPTR
2 JCN or ELTVAR
3 PERM IN
4 A or A ELT
5 ROWSCA
6 COLSCA
7 RHS
8 LISTVAR SCHUR
9 SCHUR
10 RHS SPARSE
11 IRHS SPARSE
12 IRHS PTR
13 ISOL loc
14 SOL loc
15 REDRHS
16 IRN loc, JCN loc or A loc
17 IRHS loc
18 RHS loc
-23 MPI was not initialized by the user prior to a call to MUMPS with JOB = –1.
-24 NELT is out of range. INFO(2)=NELT.
-25 A problem has occurred in the initialization of the BLACS. This may be because you are using a
vendor’s BLACS. Try using a BLACS version from netlib instead.
-26 LRHS is out of range. INFO(2)=LRHS.
-27 NZ RHS and IRHS PTR(NRHS+1) do not match. INFO(2) = IRHS PTR(NRHS+1).
-28 IRHS PTR(1) is not equal to 1. INFO(2) = IRHS PTR(1).
-29 LSOL loc is smaller than INFO(23). INFO(2)=LSOL loc.
-30 SCHUR LLD is out of range. INFO(2) = SCHUR LLD.
-31 A 2D block cyclic symmetric (SYM=1 or 2) Schur complement is required with the option
ICNTL(19)=3, but the user has provided a process grid that does not satisfy the constraint
MBLOCK=NBLOCK. INFO(2)=MBLOCK-NBLOCK.
-32 Incompatible values of NRHS and ICNTL(25). Either ICNTL(25) was set to -1 and NRHS is
different from INFOG(28); or ICNTL(25) was set to i, 1 ≤ i ≤ INFOG(28) and NRHS is
different from 1. Value of NRHS is stored in INFO(2).

102
-33 ICNTL(26) was asked for during solve phase (or during the factorization – see ICNTL(32))
but the Schur complement was not asked for at the analysis phase (ICNTL(19)).
INFO(2)=ICNTL(26).
-34 LREDRHS is out of range. INFO(2)=LREDRHS.
-35 This error is raised when the expansion phase is called (ICNTL(26) = 2) but reduction phase
(ICNTL(26)=1) was not called before. This error also occurs in case the reduction phase
(ICNTL(26) = 1) is asked for at the solution phase (JOB=3) but the forward elimination
was already performed during the factorization phase (JOB=2 and ICNTL(32)=1). INFO(2)
contains the value of ICNTL(26).
-36 Incompatible values of ICNTL(25) and INFOG(28). The value of ICNTL(25) is stored in
INFO(2).
-37 Value of ICNTL(25) incompatible with some other parameter. If ICNTL(25) is incompatible
with ICNTL(xx), the index xx is stored in INFO(2).
-38 Parallel analysis was set (i.e., ICNTL(28)=2) but PT-SCOTCH or ParMetis were not provided.
-39 Incompatible values for ICNTL(28) and ICNTL(5) and/or ICNTL(19) and/or ICNTL(6).
Parallel analysis is not possible in the cases where the matrix is unassembled and/or a Schur
complement is requested and/or a maximum transversal is requested on the matrix.
-40 The matrix was indicated to be positive definite (SYM=1) by the user but a negative or null pivot
was encountered during the processing of the root by ScaLAPACK. SYM=2 should be used.
-41 Incompatible value of LWK USER between factorization and solution phases. This error may
only occur when the factorization is in-core (ICNTL(22)=1), in which case both the contents
of WK USER and LWK USER should be passed unchanged between the factorization (JOB=2) and
solution (JOB=3) phases.
-42 ICNTL(32) was set to 1 (forward during factorization), but the value of NRHS on the host
processor is incorrect: either the value of NRHS provided at analysis is negative or zero, or the
value provided at factorization or solve is different from the value provided at analysis. INFO(2)
holds the value of id%NRHS that was provided at analysis.
-43 Incompatible values of ICNTL(32) and ICNTL(xx). The index xx is stored in INFO(2).
-44 The solve phase (JOB=3) cannot be performed because the factors or part of the factors are not
available. INFO(2) contains the value of ICNTL(31).
-45 NRHS ≤ 0. INFO(2) contains the value of NRHS.
-46 NZ RHS ≤ 0. This is currently not allowed in case of reduced right-hand-side (ICNTL(26)=1) and
in case entries of A−1 are requested (ICNTL(30)=1). INFO(2) contains the value of NZ RHS.
-47 Entries of A−1 were requested during the solve phase (JOB=3, ICNTL(30)=1) but the constraint
NRHS=N is not respected. The value of NRHS is provided in INFO(2).
-48 A−1 Incompatible values of ICNTL(30) and ICNTL(xx). xx is stored in INFO(2).
-49 SIZE SCHUR has an incorrect value: SIZE SCHUR < 0 or SIZE SCHUR ≥N, or SIZE SCHUR
was modified on the host since the analysis phase. The value of SIZE SCHUR is provided in
INFO(2).
-50 An error occurred while computing the fill-reducing ordering during the analysis phase. This
commonly happens when an (external) ordering tool returns an error code or a wrong result.
-51 An external ordering (Metis/ParMetis, SCOTCH/PT-SCOTCH, PORD), with 32-bit default
integers, is invoked to processing a graph of size larger than 231 − 1. INFO(2) holds the size
required to store the graph as a number of integer values; it is negative and its absolute value should
be multiplied by 1 million.
-52 When default Fortran integers are 64 bit (e.g. Fortran compiler flag -i8 -fdefault-integer-8 or
something equivalent depending on your compiler) then external ordering libraries (Metis/ParMetis,
SCOTCH/PT-SCOTCH, PORD) should also have 64-bit default integers. INFO(2) = 1, 2, 3
means that respectively Metis/ParMetis, SCOTCH/PT-SCOTCH or PORD were invoked and were
not generated with 64-bit default integers.

103
-53 Internal error that could be due to inconsistent input data between two consecutive calls.
-54 The analysis phase (JOB=1) was called with ICNTL(35)=0 but the factorization phase was
called with ICNTL(35)=1, 2 or 3. In order to perform the factorization with BLR compression,
please perform the analysis phase again using ICNTL(35)=1, 2 or 3 (see the documentation of
ICNTL(35)).
-55 During a call to MUMPS including the solve phase with distributed right-hand side, LRHS loc was
detected to be smaller than Nloc RHS. INFO(2)=LRHS loc.
-56 During a call to MUMPS including the solve phase with distributed right-hand side and distributed
solution, RHS loc and SOL loc point to the same workarray but LRHS loc < LSOL loc.
INFO(2)=LRHS loc.
-57 During a call to MUMPS analysis phase with a block format
(ICNTL(15) ̸= 0), an error in the interface provided by the user
was detected. INFO(2) holds additional information about the issue:
INFO(2) issue
1 NBLK is incorrect (or not compatible with BLKPTR size),
or -ICNTL(15) is not compatible with N
2 BLKPTR is not provided or its content is incorrect
3 BLKVAR if provided should be of size N
-70 During a call to MUMPS with JOB=7, the file specified to save the current instance, as derived from
SAVE DIR and/or SAVE PREFIX, already exists. Before saving an instance into this file, it should
be first suppressed (see JOB=-3). Otherwise, a different file should be specified by changing the
values of SAVE DIR and/or SAVE PREFIX.
-71 An error has occured during the creation of one of the files needed to save MUMPS data (JOB=7).
-72 Error while saving data (JOB=7); a write operation did not succeed (e.g., disk full, I/O error, . . . ).
INFO(2) is the size that should have been written during that operation.
If INFO(2) is negative, then its absolute value should be multiplied by 1 million.
-73 During a call to MUMPS with JOB=8, one parameter of the current instance is not compatible with
the corresponding one in the saved instance.
INFO(2) points to the incorrect parameter in the table below:
INFO(2) parameter
1 fortran version (after/before 2003)
2 integer size(32/64 bit)
3 saved instance not compatible over MPI processes
4 number of MPI processes
5 arithmetic
6 SYM
7 PAR
-74 The file resulting from the setting of SAVE DIR and SAVE PREFIX could not be opened for
restoring data (JOB=8). INFO(2) is the rank of the process (in the communicator COMM) on
which the error was detected.
-75 Error while restoring data (JOB=8); a read operation did not succeed (e.g., end of file reached, I/O
error, . . . ). INFO(2) is the size still to be read. If INFO(2) is negative, then the size that the
package requested is obtained by multiplying the absolute value of INFO(2) by 1 million.
-76 Error while deleting the files (JOB=-3); some files to be erased were not found or could not be
suppressed. INFO(2) is the rank of the process (in the communicator COMM) on which the error
was detected.
-77 Neither SAVE DIR nor the environment variable MUMPS SAVE DIR are defined.
-78 Problem of workspace allocation during the restore step. The size still to be allocated is available
in INFO(2). If INFO(2) is negative, then the size that the package requested is obtained by
multiplying the absolute value of INFO(2) by 1 million.
-79 MUMPS could not find a Fortran file unit to perform I/O’s. INFO(2) provides additional
information on the error:
• INFO(2)=1: the problem occurs in the analysis phase, when attempting to find a free Fortran
unit for the WRITE PROBLEM feature (see Subsection 5.2.3).

104
• INFO(2)=2: the problem occurs during a call to MUMPS with JOB=7 or 8 (save-restore
feature, see Subsection 3.19).
-89 Internal error during SCOTCH kway-partitioning in SCOTCHFGRAPHPART. METIS package
should be made available to MUMPS.
-90 Error in out-of-core management. See the error message returned on output unit ICNTL(1) for
more information.
-800 Temporary error associated to the current MUMPS release, subject to change or disappearance
in the future. If INFO(2)=5, then this error is due to the fact that the elemental matrix format
(ICNTL(5)=1) is currently incompatible with a BLR factorization (ICNTL(35)̸=0).
A positive value of INFO(1) is associated with a successful MUMPS execution but with a warning.
The corresponding warning message will be output on unit ICNTL(2) when ICNTL(4) ≥ 2.
+1 Index (in IRN or JCN) out of range. Action taken by subroutine is to ignore any such entries and
continue. INFO(2) is set to the number of faulty entries.
+2 During error analysis the max-norm of the computed solution is close to zero. In some cases, this
could cause difficulties in the computation of RINFOG(6).
+4 ICNTL(49)=1,2 and not enough memory to compact id%S at the end of the factorization.
+8 Warning return from the iterative refinement routine. More than ICNTL(10) iterations are required.
+ Combinations of the above warnings will correspond to summing the constituent warnings. For
example, if an MPI process exits the package with INFO(1)=6, this indicates that both warnings
+2 and +4 occurred on this MPI process. In case several warnings occur on an MPI process,
INFO(2) corresponds to the warning that occurred last. Finally, in case of multiple MPI processes,
INFOG(1) combines the warnings raised on the different MPI processes (for example, INFOG(1)
will be equal to 5 if INFO(1)=1, 1 and 4 on MPI processes with ranks 0, 1 and 2, respectively). In
that case, INFOG(2) is simply set to the number of MPI processes on which a warning occurred
and the values of INFO(1) and INFO(2) on each MPI process can be checked for more detailed
information.

9 Calling MUMPS from C

MUMPS is a Fortran 95 library, designed to be used from Fortran 95 rather than C. However a basic C
interface is provided that allows users to call MUMPS directly from C programs. Similarly to the Fortran
95 interface, the C interface uses a structure whose components match those in the MUMPS structure for
Fortran (Figure 2). Thus the description of the parameters in Sections 5 and 6 applies. Figure 4 shows the
C structure [SDCZ]MUMPS STRUC C. This structure is defined in the include file [sdcz]mumps c.h
and there is one main routine per available arithmetic with the following prototype:
void [sdcz]mumps_c([SDCZ]MUMPS_STRUC_C * idptr);
An example of calling MUMPS from C for a complex assembled problem is given in Subsection 11.3.
The following subsections discuss some technical issues that a user should be aware of before using the
C interface to MUMPS.
In the following, we suppose that id has been declared of type [SDCZ]MUMPS STRUC C.

9.1 Array indices

Arrays in C start at index 0 whereas they normally start at 1 in Fortran. Therefore, care must be taken when
providing arrays to the C structure. For example, the row indices of the matrix A, stored in IRN(1:NNZ)
in the Fortran version should be stored in irn[0:nnz-1] in the C version. (Note that the contents of
irn itself is unchanged with values between 1 and N.) One solution to deal with this is to define macros:
#define ICNTL( i ) icntl[ (i) - 1 ]
#define A( i ) a[ (i) -1 ]
#define IRN( i ) irn[ (i) -1 ]
...

105
typedef struct
{
int sym, par, job;
int comm fortran; /* Fortran communicator */
int icntl[60];
real cntl[15];
int n;
/* Assembled entry */
int nz; int64 t nnz; int *irn; int *jcn; real/complex *a;
/* Distributed entry */
int nz loc; int *irn loc; int *jcn loc; real/complex *a loc;
/* Element entry */
int nelt; int *eltptr; int *eltvar; real/complex *a elt;
/* Ordering, if given by user, Metis options */
int *perm in;
int metis options[40];
/* Scaling */
real/complex *colsca; real/complex *rowsca;
/* RHS, solution, output data and statistics */
real/complex *rhs, *redrhs, *rhs sparse, *sol loc, *rhs loc;
int *irhs sparse, *irhs ptr, *isol loc, *irhs loc;
int nrhs, lrhs, lredrhs, nz rhs, lsol loc, lrhs loc, nloc rhs;
int info[80],infog[80];
real rinfo[40], rinfog[40];
int *sym perm, *uns perm;
/* mapping, null pivots */
int *mapping, *pivnul list;
/* Schur */
int size schur; int *listvar schur; real/complex *schur;
int nprow, npcol, mblock, nblock, schur lld, schur mloc,schur nloc;
/* Version number */
char version number[80];
char ooc tmpdir[256], ooc prefix[64]; char write problem[256];
/* Internal parameters */
int instance number;
} [SDCZ]MUMPS STRUC C;

Figure 4: Definition of the C structure [SDCZ]MUMPS STRUC C. real/complex is used for data that can
be either real or complex, real for data that stays real (float or double) in the complex version.

106
and then use the uppercase notation with parenthesis (instead of lowercase/brackets). In that case, the
notation id.IRN(I), where I is in { 1, 2, ... NNZ} can be used instead of id.irn[I-1]; this
notation then matches exactly with the description in Sections 5 and 6, where arrays are supposed to start
at 1.
This can be slightly more confusing for elemental matrix input (see Subsection 5.2.2.3), where some
arrays are used to index other arrays. For instance, the first value in eltptr, eltptr[0], pointing
into the list of variables of the first element in eltvar, should be equal to 1. Effectively, using the
notation above, the list of variables for element j = 1 starts at location ELTVAR(ELTPTR(j)) =
ELTVAR(eltptr[j-1]) = eltvar[eltptr[j-1]-1].

9.2 Issues related to the C and Fortran communicators

In general, C and Fortran communicators have a different datatype and are not directly compatible.
For the C interface, MUMPS requires a Fortran communicator to be provided in id.comm fortran.
If, however, this field is initialized to the special value -987654, the Fortran communicator
MPI COMM WORLD is used by default. If you need to call MUMPS based on a smaller number of processors
defined by a C subcommunicator, then you should convert your C communicator to a Fortran one. This
has not been included in MUMPS because it is dependent on the MPI implementation and thus not portable.
For MPI2, and most MPI implementations, you may just do
id.comm_fortran = (MUMPS_INT) MPI_Comm_c2f(comm_c);
(Note that MUMPS INT is defined in [sdcz]mumps c.h and is normally an int.) For MPI
implementations where the Fortran and the C communicators have the same integer representation
id.comm_fortran = (MUMPS_INT) comm_c;
should work.
For some MPI implementations, check if id.comm fortran =
MPIR FromPointer(comm c) can be used.

9.3 Fortran I/O

Diagnostic, warning and error messages (controlled by ICNTL(1:4) / icntl[0..3]) are based on
Fortran file units. Use the value 6 for the Fortran unit 6 which corresponds to stdout. For a more
general usage with specific file names from C, passing a C file handler is not currently possible. One
solution would be to use a Fortran subroutine along the lines of the model below:
SUBROUTINE OPENFILE( UNIT, NAME )
INTEGER UNIT
CHARACTER*(*) NAME
OPEN(UNIT, file=NAME)
RETURN
END
and have (in the C user code) a statement like
openfile ( &mumps par.ICNTL(1), name, name length byval)
(or slightly different depending on the C-Fortran calling conventions); something similar could be done
to close the file.

9.4 Runtime libraries

The Fortran 95 runtime library corresponding to the compiler used to compile MUMPS is required at the
link stage. One way to provide it is to perform the link phase with the Fortran compiler (instead of the C
compiler or ld).

107
9.5 Integer, real and complex datatypes in C and Fortran
We assume that the int, int64 t, float and double types are compatible with the Fortran
INTEGER, INTEGER(KIND=8), REAL and DOUBLE PRECISION datatypes. Those assumptions are
used in the files [dscz]mumps c types.h.
Remark that Fortran compilers often provide an option to make all default INTEGER datatypes 64-bit
integers. In that case, one should add the option -DINTSIZE64 during the installation of MUMPS to
indicate that the default Fortran INTEGER should match a 64-bit integer of type int64 t.
When including MUMPS headers files from a C application, one can then check at compilation time
the preprocessing constants MUMPS INTSIZE32 and MUMPS INTSIZE64 to know how MUMPS INT
was defined. At runtime, one can simply check the value of sizeof(MUMPS INT).
Since not all C compilers define the complex datatype (this appeared in the C99 standard), we define
the following, compatible with the Fortran COMPLEX and DOUBLE COMPLEX types:
typedef struct {float r,i;} mumps complex; for single precision (cmumps), and
typedef struct {double r,i;} mumps double complex; for double precision
(zmumps).
Types for complex data from the user program should be compatible with those above.

9.6 Sequential version

The C interface to MUMPS is compatible with the sequential version; see Subsection 3.11.

10 Scilab and MATLAB/Octave interfaces

Thanks to Octave MEX compatibility, an Octave interface can be generated based on the MATLAB one.
The documentation provided in this section also applies to the Octave case.
The main callable functions are
id = initmumps;
id = dmumps(id [,mat] );
id = zmumps(id [,mat] );
We have designed these interfaces such that their usage is as similar as possible to the existing C
and Fortran interfaces to MUMPS. Only an interface to the sequential version of MUMPS is provided, thus
only the parameters related to the sequential version of MUMPS are available. The main differences and
characteristics are:
• The existence of a function initmumps (usage: id=initmumps) that builds an initial structure
id in which id.JOB is set to -1 and id.SYM is set to 0 (unsymmetric solver by default).
• Only the double precision and double complex versions of MUMPS are interfaced, since they
correspond to the arithmetics used in MATLAB/Scilab.
• the sparse matrix A is passed to the interface functions dmumps and zmumps as a Scilab/MATLAB
object (parameters ICNTL(5), N, NNZ (or NZ), NELT, . . . are thus irrelevant).
• the right-hand side vector or matrix, possibly sparse, is passed to the interface functions dmumps
and/or zmumps in the argument id.RHS, as a Scilab/MATLAB object (parameters ICNTL(20),
NRHS, NZ RHS, . . . are thus irrelevant).
• The Schur complement matrix, if required, is allocated within the interface and returned as a
Scilab/MATLAB dense matrix. Furthermore, the parameters SIZE SCHUR and ICNTL(19) need
not be set by the user; they are set automatically depending on the availability and size of the list of
Schur variables, id.VAR SCHUR.
• We have chosen to use a new variable id.SOL to store the solution, instead of overwriting
id.RHS.
• In the out-of-core case, functionalities allowing to control the directory and name of temporary
files, can only be controlled through the environment variables MUMPS OOC TMPDIR and
MUMPS OOC PREFIX – see Subsection 5.8.

108
Please refer to the report [29] for a more detailed description of these interfaces. Please also refer to
the README file in directories MATLAB or Scilab of the main MUMPS distribution for more information
on installation. For example, one important thing to note is that at installation, the user must provide the
Fortran 95 runtime libraries corresponding to the compiled MUMPS package. This can be done in the
makefile for the MATLAB interface (file make.inc) and in the builder for the Scilab interface (file
builder.sce).
Finally, note that examples of usage of the MATLAB and the Scilab interfaces are provided in
directories MATLAB and SCILAB/examples, respectively. In the following, we describe the input
and output parameters of the function [dz]mumps, that are relevant in the context of this interface to the
sequential version of MUMPS.

Input Parameters
• mat : sparse matrix which has to be provided as the second argument of dmumps if id.JOB is
strictly larger than 0.
• id.SYM : controls the matrix type (symmetric positive definite, symmetric indefinite or
unsymmetric) and it has do be initialized by the user before the initialization phase of MUMPS
(see id.JOB). Its value is set to 0 after the call of initmumps.
• id.JOB : defines the action that will be realized by MUMPS: initialize, analyze and/or factorize
and/or solve and release MUMPS internal C/Fortran data. It has to be set by the user before any call
to MUMPS (except after a call to initmumps, which sets its value to -1).
• id.ICNTL and id.CNTL : define control parameters that can be set after the initialization call
(id.JOB = -1). See Section “Control parameters” for more details. If the user does not modify
an entry in id.ICNTL then MUMPS uses the default parameter. For example, if the user wants to
use the AMD ordering, he/she should set id.ICNTL(7) = 0. Note that the following parameters
are inhibited because they are automatically set within the interface: id.ICNTL(19) which controls
the Schur complement option and id.ICNTL(20) which controls the format of the right-hand side.
Note that parameters id.ICNTL(1:4) may not work properly depending on your compiler and your
environment. In case of problem, we recommand to swith printing off by setting id.ICNL(1:4)=-1.
• id.PERM IN : corresponds to the given ordering option (see Section “Input and output parameters”
for more details). Note that this permutation is only accessed if the parameter id.ICNTL(7) is set to
1.
• id.COLSCA and id.ROWSCA : are optional scaling arrays (see Section “Input and output
parameters” for more details)
• id.RHS : defines the right-hand side. The parameter id.ICNTL(20) related to its format (sparse or
dense) is automatically set within the interface. Note that id.RHS is not modified (as in MUMPS),
the solution is returned in id.SOL.
• id.VAR SCHUR : corresponds to the list of variables that appear in the Schur complement matrix
(see Section “Input and output parameters” for more details).
• id.REDRHS (input parameter only if id.VAR SCHUR was provided during the factorization and
if ICNTL(26)=2 on entry to the solve phase): partial solution on the variables corresponding
to the Schur complement. It is provided by the user and normally results from both the Schur
complement and the reduced right-hand side that were returned by MUMPS in a previous call. When
ICNTL(26)=2, MUMPS uses this information to build the solution id.SOL on the complete problem.
See Section “Schur complement” for more details.

Output Parameters
• id.SCHUR : if id.VAR SCHUR is provided of size SIZE SCHUR, then id.SCHUR corresponds to
a dense array of size (SIZE SCHUR,SIZE SCHUR) that holds the Schur complement matrix (see
Section “Input and output parameters” for more details). The user does not have to initialize it.

109
• id.REDRHS (output parameter only if ICNTL(26)=1 and id.VAR SCHUR was defined): Reduced
right-hand side (or condensed right-hand side on the variables associated to the Schur complement).
It is computed by MUMPS during the solve stage if ICNTL(26)=1. It can then be used outside
MUMPS, together with the Schur complement, to build a solution on the interface. See Section
“Schur complement” for more details.
• id.INFOG and id.RINFOG : information parameters (see Section “Information parameters” ).
• id.SYM PERM : corresponds to a symmetric permutation of the variables (see discussion
regarding ICNTL(7) in Section “Control parameters” ). This permutation is computed during the
analysis and is followed by the numerical factorization except when numerical pivoting occurs.
• id.UNS PERM : column permutation (if any) on exit from the analysis phase of MUMPS (see
discussion regarding ICNTL(6) in Section “Control parameters” ).
• id.SOL : dense vector or matrix containing the solution after MUMPS solution phase. Also contains
the nullspace in case of null space computation, or entries of the inverse, in case of computation of
inverse entries.

Internal Parameters
• id.INST: (MUMPS reserved component) MUMPS internal parameter.
• id.TYPE: (MUMPS reserved component) defines the arithmetic (complex or double precision).

11 Examples of use of MUMPS

11.1 An assembled problem
An example program illustrating a possible use of MUMPS on assembled DOUBLE PRECISION
problems is given Figure 5.
Two files must be included in the program: mpif.h for MPI and mumps struc.h for MUMPS.
The file mumps root.h must also be available because it is included in mumps struc.h. The
initialization and termination of MPI are performed in the user program via the calls to MPI INIT and
MPI FINALIZE.
The MUMPS package is initialized by calling MUMPS with JOB = –1, the problem is read in by the
host (in the components N, NNZ, IRN, JCN, A, and RHS), and the solution is computed in RHS with a
call on all processors to MUMPS with JOB=6. Finally, a call to MUMPS with JOB = –2 is performed to
deallocate the data structures used by the instance of the package.
Thus for the assembled 5 × 5 matrix and right-hand side
 2 3 4
  20 
3 −3 6 24
 −1 1 2 ,  9 
2 6
4 1 13
we could have as input
5 : N
12 : NNZ
1 2 3.0
2 3 -3.0
4 3 2.0
5 5 1.0
2 1 3.0
1 1 2.0
5 2 4.0
3 4 2.0
2 5 6.0
3 2 -1.0
1 3 4.0
3 3 1.0 : A
20.0
24.0
9.0
6.0
13.0 :RHS

110
PROGRAM MUMPS EXAMPLE
IMPLICIT NONE
INCLUDE ’ m p i f . h ’
INCLUDE ’ d m u m p s s t r u c . h ’
TYPE (DMUMPS STRUC) mumps par
INTEGER IERR , I , I 8
CALL MPI INIT ( IERR )
C Define a communicator for the package.
mumps par%COMM = MPI COMM WORLD
C Initialize an instance of the package
C for L U factorization (sym = 0, with working host)
mumps par%JOB = −1
mumps par%SYM = 0
mumps par%PAR = 1
CALL DMUMPS( mumps par )
C Define problem on the host (processor 0)
IF ( mumps par%MYID . eq . 0 ) THEN
READ( 5 , * ) mumps par%N
READ( 5 , * ) mumps par%NNZ
ALLOCATE( mumps par%IRN ( mumps par%NNZ ) )
ALLOCATE( mumps par%JCN ( mumps par%NNZ ) )
ALLOCATE( mumps par%A( mumps par%NNZ ) )
ALLOCATE( mumps par%RHS ( mumps par%N ) )
DO I 8 = 1 8 , mumps par%NNZ
READ( 5 , * ) mumps par%IRN ( I 8 ) , mumps par%JCN ( I 8 ) , mumps par%A( I 8 )
END DO
READ( 5 , * ) ( mumps par%RHS( I ) , I =1 , mumps par%N )
END IF
C Call package for solution
mumps par%JOB = 6
CALL DMUMPS( mumps par )
IF ( mumps par%INFOG ( 1 ) . LT . 0 ) THEN
WRITE( 6 , ’ (A, A, I6 , A, I 9 ) ’ ) ” ERROR RETURN : ” ,
& ” mumps par%INFOG ( 1 ) = ” , mumps par%INFOG ( 1 ) ,
& ” mumps par%INFOG ( 2 ) = ” , mumps par%INFOG ( 2 )
GOTO 500
END IF
C Solution has been assembled on the host
IF ( mumps par%MYID . eq . 0 ) THEN
WRITE( 6 , * ) ’ S o l u t i o n i s ’ , ( mumps par%RHS( I ) , I =1 , mumps par%N)
END IF
C Deallocate user data
IF ( mumps par%MYID . eq . 0 )THEN
DEALLOCATE( mumps par%IRN )
DEALLOCATE( mumps par%JCN )
DEALLOCATE( mumps par%A )
DEALLOCATE( mumps par%RHS )
END IF
C Destroy the instance (deallocate internal data structures)
mumps par%JOB = −2
CALL DMUMPS( mumps par )
500 CALL MPI FINALIZE ( IERR )
STOP
END

Figure 5: Example program using MUMPS on an assembled DOUBLE PRECISION problem

111
and we obtain the solution RHS(i) = i, i = 1, . . . , 5.

11.2 An elemental problem

An example of a driver to use MUMPS for element DOUBLE PRECISION problems is given in Figure 6.

The calling sequence is similar to that for the assembled problem in Subsection 11.1 but now the host
reads the problem in components N, NELT, ELTPTR, ELTVAR, A ELT, and RHS. Note that for elemental
problems ICNTL(5) must be set to 1 and that elemental matrices always have a symmetric structure. For
the two-element matrix and right-hand side
 12 
−1 −1

1 2 3 3 2 3 7
2 2 1 1 , 4 1 2 −1 ,  23 
3 1 1 1 5 3 2 1 6
22
we could have as input
5
2
6
18
1 4 7
1 2 3 3 4 5
-1.0 2.0 1.0 2.0 1.0 1.0 3.0 1.0 1.0 2.0 1.0 3.0 -1.0 2.0 2.0 3.0 -1.0 1.0
12.0 7.0 23.0 6.0 22.0

and we obtain the solution RHS(i) = i, i = 1, . . . , 5.

112
PROGRAM MUMPS EXAMPLE
IMPLICIT NONE
INCLUDE ’ m p i f . h ’
INCLUDE ’ d m u m p s s t r u c . h ’
TYPE (DMUMPS STRUC) mumps par
INTEGER I , IERR , LELTVAR, NA ELT
CALL MPI INIT ( IERR )
C Define a communicator for the package
mumps par%COMM = MPI COMM WORLD
C Ask for unsymmetric code
mumps par%SYM = 0
C Host working
mumps par%PAR = 1
C Initialize an instance of the package
mumps par%JOB = −1
CALL DMUMPS( mumps par )
C Define the problem on the host (processor 0)
IF ( mumps par%MYID . eq . 0 ) THEN
READ( 5 , * ) mumps par%N
READ( 5 , * ) mumps par%NELT
READ( 5 , * ) LELTVAR
READ( 5 , * ) NA ELT
ALLOCATE( mumps par%ELTPTR ( mumps par%NELT+1 ) )
ALLOCATE( mumps par%ELTVAR ( LELTVAR ) )
ALLOCATE( mumps par%A ELT ( NA ELT ) )
ALLOCATE( mumps par%RHS ( mumps par%N ) )
READ( 5 , * ) ( mumps par%ELTPTR ( I ) , I =1 , mumps par%NELT+1 )
READ( 5 , * ) ( mumps par%ELTVAR( I ) , I =1 , LELTVAR )
READ( 5 , * ) ( mumps par%A ELT ( I ) , I =1 , NA ELT )
READ( 5 , * ) ( mumps par%RHS( I ) , I =1 , mumps par%N )
END IF
C Specify element entry
mumps par%ICNTL ( 5 ) = 1
C Call package for solution
mumps par%JOB = 6
CALL DMUMPS( mumps par )
IF ( mumps par%INFOG ( 1 ) . LT . 0 ) THEN
WRITE( 6 , ’ (A, A, I6 , A, I 9 ) ’ ) ” ERROR RETURN : ” ,
& ” mumps par%INFOG ( 1 ) = ” , mumps par%INFOG ( 1 ) ,
& ” mumps par%INFOG ( 2 ) = ” , mumps par%INFOG ( 2 )
GOTO 500
END IF
C Solution has been assembled on the host
IF ( mumps par%MYID . eq . 0 ) THEN
WRITE( 6 , * ) ’ S o l u t i o n i s ’ , ( mumps par%RHS( I ) , I =1 , mumps par%N)
C Deallocate user data
DEALLOCATE( mumps par%ELTPTR )
DEALLOCATE( mumps par%ELTVAR )
DEALLOCATE( mumps par%A ELT )
DEALLOCATE( mumps par%RHS )
END IF
C Destroy the instance (deallocate internal data structures)
mumps par%JOB = −2
CALL DMUMPS( mumps par )
500 CALL MPI FINALIZE ( IERR )
STOP
END

Figure 6: Example program using MUMPS on an elemental DOUBLE PRECISION problem.

113
11.3 An example of calling MUMPS from C
An example of a driver to use MUMPS from C is given in Figure 7.

/ * Example program u s i n g t h e C i n t e r f a c e t o t h e
* d o u b l e p r e c i s i o n v e r s i o n o f MUMPS, dmumps c .
* We s o l v e t h e s y s t e m A x = RHS w i t h
* A = d i a g ( 1 2 ) and RHS = [ 1 4 ] ˆ T
* S o l u t i o n i s [1 2 ] ˆ T * /
# i n c l u d e <s t d i o . h>
# i n c l u d e ” mpi . h ”
# i n c l u d e ” dmumps c . h ”
# d e f i n e JOB INIT −1
# d e f i n e JOB END −2
# d e f i n e USE COMM WORLD −987654
i n t main ( i n t a r g c , char ** a r g v ) {
DMUMPS STRUC C i d ;
int n = 2;
i n t 6 4 t nnz = 2 ;
i n t i r n [ ] = {1 ,2};
i n t jcn [ ] = {1 ,2};
double a [ 2 ] ;
double r h s [ 2 ] ;

i n t myid , i e r r ;
i e r r = M P I I n i t (& a r g c , &a r g v ) ;
i e r r = MPI Comm rank (MPI COMM WORLD, &myid ) ;
/ * D e f i n e A and r h s * /
rhs [0]=1.0; rhs [1]=4.0;
a [0]=1.0; a [1]=2.0;

/ * I n i t i a l i z e a MUMPS i n s t a n c e . Use MPI COMM WORLD . */

i d . j o b =JOB INIT ; i d . p a r = 1 ; i d . sym = 0 ; i d . c o m m f o r t r a n =USE COMM WORLD;
dmumps c(& i d ) ;
/ * D e f i n e t h e p r o b l e m on t h e h o s t * /
i f ( myid == 0 ) {
i d . n = n ; i d . nnz = nnz ; i d . i r n = i r n ; i d . j c n = j c n ;
id . a = a ; id . rhs = rhs ;
}
# d e f i n e ICNTL ( I ) i c n t l [ ( I ) − 1 ] / * macro s . t . i n d i c e s match d o c u m e n t a t i o n * /
/ * No o u t p u t s * /
i d . ICNTL ( 1 ) = − 1 ; i d . ICNTL ( 2 ) = − 1 ; i d . ICNTL ( 3 ) = − 1 ; i d . ICNTL ( 4 ) = 0 ;
/ * C a l l t h e MUMPS p a c k a g e . * /
id . job =6;
dmumps c(& i d ) ;
i d . j o b =JOB END ; dmumps c(& i d ) ; / * T e r m i n a t e i n s t a n c e * /
i f ( myid == 0 ) {
p r i n t f ( ” S o l u t i o n i s : (%8.2 f %8.2 f )\ n ” , r h s [ 0 ] , r h s [ 1 ] ) ;
}
i e r r = MPI Finalize ( ) ;
return 0;
}

Figure 7: Example program using MUMPS from C on an assembled problem.

114
11.4 An example of calling MUMPS from fortran using the Save/Restore
feature and Out Of Core
An example program illustrating a possible use of the Save/restore feature combined with Out Of Core:
1 PROGRAM MUMPS_TEST_SAVE_RESTORE
2 IMPLICIT NONE
3 INCLUDE ’mpif.h’
4 INCLUDE ’mumps_struc.h’
5 TYPE (CMUMPS_STRUC) mumps_par_save, mumps_par_restore
6 INTEGER IERR, I
7 CALL MPI_INIT(IERR)
8 C Define a communicator for the package.
9 mumps_par_save%COMM = MPI_COMM_WORLD
10 C Initialize an instance of the package
11 C for L U factorization (sym = 0, with working host)
12 mumps_par_save%JOB = -1
13 mumps_par_save%SYM = 0
14 mumps_par_save%PAR = 1
15 CALL CMUMPS(mumps_par_save)
16 IF (mumps_par_save%INFOG(1).LT.0) THEN
17 WRITE(6,’(A,A,I6,A,I9)’) " ERROR RETURN: ",
18 & " mumps_par_save%INFOG(1)= ", mumps_par_save%INFOG(1),
19 & " mumps_par_save%INFOG(2)= ", mumps_par_save%INFOG(2)
20 GOTO 500
21 END IF
22 C Define problem on the host (processor 0)
23 IF ( mumps_par_save%MYID .eq. 0 ) THEN
24 READ(5,*) mumps_par_save%N
25 READ(5,*) mumps_par_save%NZ
26 ALLOCATE( mumps_par_save%IRN ( mumps_par_save%NZ ) )
27 ALLOCATE( mumps_par_save%JCN ( mumps_par_save%NZ ) )
28 ALLOCATE( mumps_par_save%A( mumps_par_save%NZ ) )
29 DO I = 1, mumps_par_save%NZ
30 READ(5,*) mumps_par_save%IRN(I),mumps_par_save%JCN(I)
31 & ,mumps_par_save%A(I)
32 END DO
33 END IF
34 C Activate OOC
35 mumps_par_save%ICNTL(22)=1
36 C Call package for factorization
37 mumps_par_save%JOB = 4
38 CALL CMUMPS(mumps_par_save)
39 IF (mumps_par_save%INFOG(1).LT.0) THEN
40 WRITE(6,’(A,A,I6,A,I9)’) " ERROR RETURN: ",
41 & " mumps_par_save%INFOG(1)= ", mumps_par_save%INFOG(1),
42 & " mumps_par_save%INFOG(2)= ", mumps_par_save%INFOG(2)
43 GOTO 500
44 END IF
45 C Call package for save
46 mumps_par_save%JOB = 7
47 mumps_par_save%SAVE_DIR="/tmp"
48 mumps_par_save%SAVE_PREFIX="mumps_simpletest_save"
49 CALL CMUMPS(mumps_par_save)
50 IF (mumps_par_save%INFOG(1).LT.0) THEN
51 WRITE(6,’(A,A,I6,A,I9)’) " ERROR RETURN: ",

115
52 & " mumps_par_save%INFOG(1)= ", mumps_par_save%INFOG(1),
53 & " mumps_par_save%INFOG(2)= ", mumps_par_save%INFOG(2)
54 GOTO 500
55 END IF
56 C Deallocate user data
57 IF ( mumps_par_save%MYID .eq. 0 )THEN
58 DEALLOCATE( mumps_par_save%IRN )
59 DEALLOCATE( mumps_par_save%JCN )
60 DEALLOCATE( mumps_par_save%A )
61 END IF
62 C Destroy the instance (deallocate internal data structures)
63 mumps_par_save%JOB = -2
64 CALL CMUMPS(mumps_par_save)
65 C Now mumps_par_save has be destroyed
66 C We use a new instance mumps_par_restore to finish the computation
67
68 C Define a communicator for the package on the new instace.
69 mumps_par_restore%COMM = MPI_COMM_WORLD
70 C Initialize a new instance of the package
71 C for L U factorization (sym = 0, with working host)
72 mumps_par_restore%JOB = -1
73 mumps_par_restore%SYM = 0
74 mumps_par_restore%PAR = 1
75 CALL CMUMPS(mumps_par_restore)
76 IF (mumps_par_restore%INFOG(1).LT.0) THEN
77 WRITE(6,’(A,A,I6,A,I9)’) " ERROR RETURN: ",
78 & " mumps_par_restore%INFOG(1)= ",
79 & mumps_par_restore%INFOG(1),
80 & " mumps_par_restore%INFOG(2)= ",
81 & mumps_par_restore%INFOG(2)
82 GOTO 500
83 END IF
84 C Call package for restore with OOC feature
85 mumps_par_restore%JOB = 8
86 mumps_par_restore%SAVE_DIR="/tmp"
87 mumps_par_restore%SAVE_PREFIX="mumps_simpletest_save"
88 CALL CMUMPS(mumps_par_restore)
89 IF (mumps_par_restore%INFOG(1).LT.0) THEN
90 WRITE(6,’(A,A,I6,A,I9)’) " ERROR RETURN: ",
91 & " mumps_par_restore%INFOG(1)= ",
92 & mumps_par_restore%INFOG(1),
93 & " mumps_par_restore%INFOG(2)= ",
94 & mumps_par_restore%INFOG(2)
95 GOTO 500
96 END IF
97 C Define rhs on the host (processor 0)
98 IF ( mumps_par_restore%MYID .eq. 0 ) THEN
99 ALLOCATE( mumps_par_restore%RHS ( mumps_par_restore%N ) )
100 DO I = 1, mumps_par_restore%N
101 READ(5,*) mumps_par_restore%RHS(I)
102 END DO
103 END IF
104 C Call package for solution
105 mumps_par_restore%JOB = 3
106 CALL CMUMPS(mumps_par_restore)

116
107 IF (mumps_par_restore%INFOG(1).LT.0) THEN
108 WRITE(6,’(A,A,I6,A,I9)’) " ERROR RETURN: ",
109 & " mumps_par_restore%INFOG(1)= ",
110 & mumps_par_restore%INFOG(1),
111 & " mumps_par_restore%INFOG(2)= ",
112 & mumps_par_restore%INFOG(2)
113 GOTO 500
114 END IF
115 C Solution has been assembled on the host
116 IF ( mumps_par_restore%MYID .eq. 0 ) THEN
117 WRITE( 6, * ) ’ Solution is ’,
118 & (mumps_par_restore%RHS(I),I=1,mumps_par_restore%N)
119 END IF
120 C Deallocate user data
121 IF ( mumps_par_restore%MYID .eq. 0 )THEN
122 DEALLOCATE( mumps_par_restore%RHS )
123 END IF
124 C Delete the saved files
125 C Note mumps_par_restore%ICNTL(34) is kept to default (0) to suppress
126 C also the OOC files.
127 mumps_par_restore%JOB = -3
128 CALL CMUMPS(mumps_par_restore)
129 C Destroy the instance (deallocate internal data structures)
130 mumps_par_restore%JOB = -2
131 CALL CMUMPS(mumps_par_restore)
132 500 CALL MPI_FINALIZE(IERR)
133 STOP
134 END
The MUMPS instance mumps par save is initialized by calling MUMPS with JOB = –1, the problem is
read in by the host (in the components N, NNZ, IRN, JCN, A), and the factorization is done using Out
Of Core (ICNTL(22) = 1) on all processors to MUMPS with JOB=4. The instance mumps par save is
saved by calling MUMPS with JOB=7, a call to MUMPS with JOB = –2 is performed to deallocate the data
structures used by the instance mumps par save.
The MUMPS instance mumps par restore is initialized by calling MUMPS with JOB = –1. The instance
mumps par restore is restore at the same state as mumps par save was by calling MUMPS with JOB=8.
The rest of the problem is read in by the host (in the component RHS), and the solution is computed in
RHS with a call on all processors to MUMPS with JOB=3. Finally, a call to MUMPS with JOB = –3 is
performed to deallocate the data structures used by the instance mumps par restore and all files used for
restarting (OOC and Save/Restore) are suppressed because ICNTL(34) = 0.

117
11.5 An example of calling MUMPS from C using the Save/Restore feature
An example of a driver to use MUMPS from C :
1 /* Example program using the C interface to the
2 * double real arithmetic version of MUMPS, dmumps_c.
3 * We solve the system A x = RHS with
4 * A = diag(1 2) and RHS = [1 4]ˆT
5 * Solution is [1 2]ˆT */
6 #include <stdio.h>
7 #include <string.h>
8 #include "mpi.h"
9 #include "dmumps_c.h"
10 #define JOB_INIT -1
11 #define JOB_END -2
12 #define USE_COMM_WORLD -987654
13
14 #if defined(MAIN_COMP)
15 /*
16 * Some Fortran compilers (COMPAQ fort) define "main" in
17 * their runtime library while a Fortran program translates
18 * to MAIN_ or MAIN__ which is then called from "main".
19 * We defined argc/argv arbitrarily in that case.
20 */
21 int MAIN__();
22 int MAIN_()
23 {
24 return MAIN__();
25 }
26
27 int MAIN__()
28 {
29 int argc=1;
30 char * name = "c_example_save_restore";
31 char ** argv ;
32 #else
33 int main(int argc, char ** argv)
34 {
35 #endif
36 DMUMPS_STRUC_C id_save,id_restore;
37 MUMPS_INT n = 2;
38 MUMPS_INT8 nnz = 2;
39 MUMPS_INT irn[] = {1,2};
40 MUMPS_INT jcn[] = {1,2};
41 double a[2];
42 double rhs[2];
43
44 int error = 0;
45 /* When compiling with -DINTSIZE64, MUMPS_INT is 64-bit but MPI
46 ilp64 versions may still require standard int for C interface. */
47 /* MUMPS_INT myid, ierr; */
48 int myid, ierr;
49 #if defined(MAIN_COMP)
50 argv = &name;
51 #endif
52 ierr = MPI_Init(&argc, &argv);
53 ierr = MPI_Comm_rank(MPI_COMM_WORLD, &myid);

118
54 /* Define A and rhs */
55 rhs[0]=1.0;rhs[1]=4.0;
56 a[0]=1.0;a[1]=2.0;
57
58 /* Initialize MUMPS save instance. Use MPI_COMM_WORLD */
59 id_save.comm_fortran=USE_COMM_WORLD;
60 id_save.par=1; id_save.sym=0;
61 id_save.job=JOB_INIT;
62 dmumps_c(&id_save);
63 /* Define the problem on the host */
64 if (myid == 0) {
65 id_save.n = n; id_save.nnz =nnz; id_save.irn=irn; id_save.jcn=jcn;
66 id_save.a = a;
67 }
68 #define ICNTL(I) icntl[(I)-1] /* macro s.t. indices match documentation */
69 /* No outputs */
70 id_save.ICNTL(1)=-1; id_save.ICNTL(2)=-1;
71 id_save.ICNTL(3)=-1; id_save.ICNTL(4)=0;
72 /* Call the MUMPS package on the save instance (analyse and factorization). */
73 id_save.job=4;
74 dmumps_c(&id_save);
75
76 /* MUMPS save feature on the save instance. */
77 strcpy(id_save.save_prefix,"csave_restore");
78 strcpy(id_save.save_dir,"/tmp");
79 if (myid == 0) {
80 printf("Saving MUMPS instance in %s with prefix %s.\n",
81 id_save.save_dir, id_save.save_prefix);
82 }
83 id_save.job=7;
84 dmumps_c(&id_save);
85 if (id_save.infog[0]<0) {
86 printf("\n (PROC %d) ERROR RETURN: \tINFOG(1)= %d\n\t\t\t\tINFOG(2)= %d\n",
87 myid, id_save.infog[0], id_save.infog[1]);
88 error = 1;
89 } else if (myid == 0) {
90 printf(" DONE\n\n");
91 }
92
93 /* Terminate the save instance. */
94 id_save.job=JOB_END;
95 dmumps_c(&id_save);
96
97
98
99 if (!error) {
100 /* Initialize MUMPS restore instance. Use MPI_COMM_WORLD */
101 id_restore.comm_fortran=USE_COMM_WORLD;
102 id_restore.par=1; id_restore.sym=0;
103 id_restore.job=JOB_INIT;
104 dmumps_c(&id_restore);
105 /* Define the rhs on the host */
106 if (myid == 0) {
107 id_restore.rhs = rhs;
108 }

119
109
110 /* No outputs */
111 id_save.ICNTL(1)=-1; id_save.ICNTL(2)=-1;
112 id_save.ICNTL(3)=-1; id_save.ICNTL(4)=0;
113
114 /* MUMPS restore feature on restore instance. */
115 if (myid == 0) {
116 printf("Restoring MUMPS instance in %s with prefix %s.\n",
117 id_save.save_dir, id_save.save_prefix);
118 }
119 strcpy(id_restore.save_prefix,"csave_restore");
120 strcpy(id_restore.save_dir,"/tmp");
121 id_restore.job=8;
122 dmumps_c(&id_restore);
123 if (id_save.infog[0]<0) {
124 printf("\n (PROC %d) ERROR RETURN: \tINFOG(1)= %d\n\t\t\t\tINFOG(2)= %d\n",
125 myid, id_save.infog[0], id_save.infog[1]);
126 error = 1;
127 } else if (myid == 0) {
128 printf(" DONE\n\n");
129 }
130 }
131
132 if (!error) {
133 /* Call the MUMPS package on restore instance (solve). */
134 if (myid == 0) {
135 printf("Calling MUMPS package (solve).\n");
136 }
137 id_restore.job=3;
138 dmumps_c(&id_restore);
139 if (id_save.infog[0]<0) {
140 printf("=> (PROC %d) ERROR RETURN: \tINFOG(1)= %d\n\t\t\t\tINFOG(2)= %d\n",
141 myid, id_save.infog[0], id_save.infog[1]);
142 error = 1;
143 } else if (myid == 0) {
144 printf(" DONE\n\n");
145 }
146
147 /* Deletes the saved and the OOC files. */
148 if (myid == 0) {
149 printf("Removing save files.\n");
150 }
151 id_restore.job=-3;
152 dmumps_c(&id_restore);
153 if (id_save.infog[0]<0) {
154 printf("=> (PROC %d) ERROR RETURN: \tINFOG(1)= %d\n\t\t\t\tINFOG(2)= %d\n",
155 myid, id_save.infog[0], id_save.infog[1]);
156 error = 1;
157 } else if (myid == 0) {
158 printf(" DONE\n\n");
159 }
160
161 /* Terminate the restore instance. */
162 id_restore.job=JOB_END;
163 dmumps_c(&id_restore);

120
164 }
165
166 if (myid == 0) {
167 if (!error) {
168 printf("Solution is : (%8.2f %8.2f)\n", rhs[0],rhs[1]);
169 } else {
170 printf("An error has occured, please check error code returned by MUMPS.\n");
171 }
172 }
173 ierr = MPI_Finalize();
174 return 0;
175 }

121
12 License
Copyright 1991-2023 CERFACS, CNRS, ENS Lyon, INP Toulouse, Inria,
Mumps Technologies, University of Bordeaux.

This version of MUMPS is provided to you free of charge. It is

released under the CeCILL-C license
(see doc/CeCILL-C_V1-en.txt, doc/CeCILL-C_V1-fr.txt, and
https://cecill.info/licences/Licence_CeCILL-C_V1-en.html),
except for variants of AMD ordering and [sdcz]MUMPS_TRUNCATED_RRQR
derived from the LAPACK package distributed under BSD 3-clause
license (see headers of ana_orderings.F and [sdcz]lr_core.F),
and except for the external and optional ordering PORD provided
in a separate directory PORD (see PORD/README for License information).

You can acknowledge (using references [1] and [2]) the contribution
of this package in any scientific publication dependent upon the use
of the package. Please use reasonable endeavours to notify the authors
of the package of this publication.

[1] P. R. Amestoy, I. S. Duff, J. Koster and J.-Y. L’Excellent,

A fully asynchronous multifrontal solver using distributed dynamic
scheduling, SIAM Journal on Matrix Analysis and Applications,
Vol 23, No 1, pp 15-41 (2001).

[2] P. R. Amestoy, A. Buttari, J.-Y. L’Excellent and T. Mary,

Performance and scalability of the block low-rank multifrontal
factorization on multicore architectures,
ACM Transactions on Mathematical Software,
Vol 45, Issue 1, pp 2:1-2:26 (2019)

As a counterpart to the access to the source code and rights to copy,

modify and redistribute granted by the license, users are provided only
with a limited warranty and the software’s author, the holder of the
economic rights, and the successive licensors have only limited
liability.

In this respect, the user’s attention is drawn to the risks associated

with loading, using, modifying and/or developing or reproducing the
software by the user in light of its specific status of free software,
that may mean that it is complicated to manipulate, and that also
therefore means that it is reserved for developers and experienced
professionals having in-depth computer knowledge. Users are therefore
encouraged to load and test the software’s suitability as regards their
requirements in conditions enabling the security of their systems
and/or data to be ensured and, more generally, to use and operate it
in the same conditions as regards security.

The fact that you are presently reading this means that you have had
knowledge of the CeCILL-C license and that you accept its terms.

13 Credits
This version of MUMPS has been developed by employees of CERFACS, ENS Lyon,

122
INPT(ENSEEIHT)-IRIT, Inria, Mumps Technologies and University of Bordeaux:
Emmanuel Agullo, Patrick Amestoy, Maurice Bremond, Alfredo Buttari, Philippe
Combes, Marie Durand, Aurelia Fevre, Abdou Guermouche, Guillaume Joslin,
Jacko Koster, Jean-Yves L’Excellent, Theo Mary, Stephane Pralet, Chiara
Puglisi, Francois-Henry Rouet, Wissam Sid-Lakhdar, Tzvetomila Slavova,
Bora Ucar and Clement Weisbecker.

Since January 2019, the MUMPS solver is maintained by Mumps Technologies

(http://mumps-tech.com).

We are grateful to Caroline Bousquet, Indranil Chowdhury, Christophe Daniel,

Iain Duff, Vincent Espirat, Gilles Moreau, Gregoire Richard, Alexis Salzman,
Miroslav Tuma and Christophe Voemel who have been contributing to this project.

We are also grateful to Juergen Schulze for letting us distribute PORD developed
at the University of Paderborn. We thank Eddy Caron for the administration of a
server used on a daily basis for MUMPS.

We want to thank the French ANR programme, the European community,

Airbus Group-IW, Altair, ANSYS, CINES, EDF, EMGS, ESI Group, FFT/Hexagon, LBNL, LSTC, Miche
We also thank LBNL, LSTC, PARALLAB and the Rutherford Appleton Laboratory
for research discussions that have certainly influenced this work.

Finally we want to thank the institutions that have provided access to their
parallel machines: Centre Informatique National de l’Enseignement Superieur
(CINES), CERFACS, CALMIP ("Centre Interuniversitaire de Calcul" located in Toulouse),
Federation Lyonnaise de Calcul Haute-Performance, Institut du Developpement et
des Ressources en Informatique Scientifique (IDRIS), Lawrence Berkeley National
Laboratory, Laboratoire de l’Informatique du Parallelisme, Inria, and PARALLAB.

References
[1] E. Agullo. On the Out-of-core Factorization of Large Sparse Matrices. PhD thesis, École Normale
Supérieure de Lyon, Nov. 2008.
[2] P. Amestoy, O. Boiteau, A. Buttari, M. Gerest, F. Jézéquel, J.-Y. L’Excellent, and T. Mary. Mixed
Precision Low Rank Approximations and their Application to Block Low Rank LU Factorization.
working paper or preprint, June 2021.
[3] P. R. Amestoy. Recent progress in parallel multifrontal solvers for unsymmetric sparse matrices.
In Proceedings of the 15th World Congress on Scientific Computation, Modelling and Applied
Mathematics, IMACS 97, Berlin, 1997.
[4] P. R. Amestoy, C. Ashcraft, O. Boiteau, A. Buttari, J.-Y. L’Excellent, and C. Weisbecker. Improving
multifrontal methods by means of block low-rank representations. SIAM Journal on Scientific
Computing, 37(3):A1451–A1474, 2015.
[5] P. R. Amestoy, A. Buttari, J.-Y. L’Excellent, and T. Mary. On the complexity of the Block Low-Rank
multifrontal factorization. SIAM Journal on Scientific Computing, 39(4):A1710–A1740, 2017.
[6] P. R. Amestoy, A. Buttari, J.-Y. L’Excellent, and T. Mary. Performance and Scalability of the
Block Low-Rank Multifrontal Factorization on Multicore Architectures. ACM Transactions on
Mathematical Software, 45:2:1–2:26, 2019.
[7] P. R. Amestoy, T. A. Davis, and I. S. Duff. An approximate minimum degree ordering algorithm.
SIAM Journal on Matrix Analysis and Applications, 17(4):886–905, 1996.
[8] P. R. Amestoy and I. S. Duff. Vectorization of a multiprocessor multifrontal code. International
Journal of Supercomputer Applications, 3:41–59, 1989.

123
[9] P. R. Amestoy, I. S. Duff, J. Koster, and J.-Y. L’Excellent. A fully asynchronous multifrontal
solver using distributed dynamic scheduling. SIAM Journal on Matrix Analysis and Applications,
23(1):15–41, 2001.
[10] P. R. Amestoy, I. S. Duff, and J.-Y. L’Excellent. Multifrontal solvers within the PARASOL
environment. In B. Kågström, J. Dongarra, E. Elmroth, and J. Waśniewski, editors, Applied Parallel
Computing, PARA’98, Lecture Notes in Computer Science, No. 1541, pages 7–11, Berlin, 1998.
Springer-Verlag.
[11] P. R. Amestoy, I. S. Duff, and J.-Y. L’Excellent. Parallélisation de la factorisation LU de matrices
creuses non-symétriques pour des architectures à mémoire distribuée. Calculateurs Parallèles
Réseaux et Systèmes Répartis, 10(5):509–520, 1998.
[12] P. R. Amestoy, I. S. Duff, and J.-Y. L’Excellent. Multifrontal parallel distributed symmetric and
unsymmetric solvers. Comput. Methods Appl. Mech. Eng., 184:501–520, 2000.
[13] P. R. Amestoy, I. S. Duff, J.-Y. L’Excellent, and X. S. Li. Analysis and comparison of two general
sparse solvers for distributed memory computers. ACM Transactions on Mathematical Software,
27(4):388–421, 2001.
[14] P. R. Amestoy, I. S. Duff, J.-Y. L’Excellent, Y. Robert, F.-H. Rouet, and B. Uçar. On computing
inverse entries of a sparse matrix in an out-of-core environment. SIAM Journal on Scientific
Computing, 34(4):A1975–A1999, 2012.
[15] P. R. Amestoy, I. S. Duff, J.-Y. L’Excellent, and F.-H. Rouet. Parallel computation of entries of A-1 .
SIAM Journal on Scientific Computing, 37(2):C268–C284, 2015.
[16] P. R. Amestoy, I. S. Duff, D. Ruiz, and B. Uçar. A parallel matrix scaling algorithm. In J. M.
L. M. Palma, P. R. Amestoy, M. J. Daydé, M. Mattoso, and J. C. Lopes, editors, High Performance
Computing for Computational Science, VECPAR’08, number 5336 in Lecture Notes in Computer
Science, pages 309–321. Springer-Verlag, 2008.
[17] P. R. Amestoy, A. Guermouche, J.-Y. L’Excellent, and S. Pralet. Hybrid scheduling for the parallel
solution of linear systems. Parallel Computing, 32(2):136–156, 2006.
[18] P. R. Amestoy, J.-Y. L’Excellent, F.-H. Rouet, and W. M. Sid-Lakhdar. Modeling 1D distributed-
memory dense kernels for an asynchronous multifrontal sparse solver. In High Performance
Computing for Computational Science, VECPAR 2014 - 11th International Conference, Eugene,
Oregon, USA, June 30 - July 3, 2014, Revised Selected Papers, pages 156–169, 2014.
[19] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. DuCroz, A. Greenbaum,
S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users’ Guide. SIAM Press, Philadelphia,
PA, third edition, 1995.
[20] M. Arioli, J. Demmel, and I. S. Duff. Solving sparse linear systems with sparse backward error.
SIAM Journal on Matrix Analysis and Applications, 10(2):165–190, 1989.
[21] L. S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Dongarra,
S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. ScaLAPACK Users’
Guide. SIAM Press, 1997.
[22] J. J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling. Algorithm 679: A set of Level 3 Basic
Linear Algebra Subprograms. ACM Transactions on Mathematical Software, 16:1–17, 1990.
[23] J. J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling. Algorithm 679. A set of Level 3 Basic
Linear Algebra Subprograms: model implementation and test programs. ACM Transactions on
Mathematical Software, 16:18–28, 1990.
[24] I. S. Duff and J. Koster. The design and use of algorithms for permuting large entries to the diagonal
of sparse matrices. SIAM Journal on Matrix Analysis and Applications, 20(4):889–901, 1999.
[25] I. S. Duff and J. Koster. On algorithms for permuting large entries to the diagonal of a sparse matrix.
SIAM Journal on Matrix Analysis and Applications, 22(4):973–996, 2001.
[26] I. S. Duff and S. Pralet. Strategies for scaling and pivoting for sparse symmetric indefinite problems.
SIAM Journal on Matrix Analysis and Applications, 27(2):313–340, 2005.

124
[27] I. S. Duff and J. K. Reid. The multifrontal solution of indefinite sparse symmetric linear systems.
ACM Transactions on Mathematical Software, 9:302–325, 1983.
[28] I. S. Duff and J. K. Reid. The multifrontal solution of unsymmetric sets of linear systems. SIAM
Journal on Scientific and Statistical Computing, 5:633–641, 1984.
[29] A. Fèvre, J.-Y. L’Excellent, and S. Pralet. Scilab and MATLAB interfaces to MUMPS. Technical
Report RR-5816, INRIA, Jan. 2006. Also appeared as ENSEEIHT-IRIT report TR/TLSE/06/01 and
LIP report RR2006-06.
[30] J. R. Gilbert, E. G. Ng, and B. W. Peyton. An efficient algorithm to compute row and column counts
for sparse cholesky factorization. SIAM Journal on Matrix Analysis and Applications, 15:1075–
1091, 1994.
[31] A. Guermouche. Étude et optimisation du comportement mémoire dans les méthodes parallèles de
factorisation de matrices creuses. PhD thesis, École Normale Supérieure de Lyon, July 2004.
[32] A. Guermouche and J.-Y. L’Excellent. Constructing memory-minimizing schedules for multifrontal
methods. ACM Transactions on Mathematical Software, 32(1):17–32, 2006.
[33] A. Guermouche, J.-Y. L’Excellent, and G. Utard. Impact of reordering on the memory of a
multifrontal solver. Parallel Computing, 29(9):1191–1218, 2003.
[34] G. Karypis and V. Kumar. M ETI S – A Software Package for Partitioning Unstructured Graphs,
Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices – Version 4.0.
University of Minnesota, Sept. 1998.
[35] P. Knight and D. Ruiz. A fast algorithm for matrix balancing. IMA Journal of Numerical Analysis,
33(3):1029–1047, octobre 2012.
[36] P. A. Knight, D. Ruiz, and B. Uçar. A symmetry preserving algorithm for matrix scaling. SIAM
Journal on Matrix Analysis and Applications, 35(3):931–955, 2014.
[37] J.-Y. L’Excellent. Multifrontal Methods: Parallelism, Memory Usage and Numerical Aspects.
Habilitation à diriger des recherches, École normale supérieure de Lyon, Sept. 2012.
[38] J.-Y. L’Excellent and M. W. Sid-Lakhdar. A study of shared-memory parallelism in a multifrontal
solver. Parallel Computing, 40(3-4):34–46, 2014.
[39] X. S. Li and J. W. Demmel. SuperLU DIST: A scalable distributed-memory sparse direct solver for
unsymmetric linear systems. ACM Transactions on Mathematical Software, 29(2):110–140, 2003.
[40] J. W. H. Liu. The role of elimination trees in sparse factorization. SIAM Journal on Matrix Analysis
and Applications, 11:134–172, 1990.
[41] T. Mary. Block Low-Rank multifrontal solvers: complexity, performance, and scalability. PhD
thesis, Université de Toulouse, November 2017.
[42] F. Pellegrini. S COTCH and LIBSCOTCH 5.0 User’s guide. Technical Report, LaBRI, Université
Bordeaux I, 2007.
[43] S. Pralet. Constrained orderings and scheduling for parallel sparse linear algebra. PhD thesis,
Institut National Polytechnique de Toulouse, Sept 2004. Available as CERFACS technical report,
TH/PA/04/105.
[44] F.-H. Rouet. Memory and performance issues in parallel multifrontal factorizations and triangular
solutions with sparse right-hand sides. PhD thesis, Institut National Polytechnique de Toulouse,
Oct. 2012.
[45] D. Ruiz. A scaling algorithm to equilibrate both rows and columns norms in matrices. Technical
Report RT/APO/01/4, ENSEEIHT-IRIT, 2001. Also appeared as RAL report RAL-TR-2001-034.
[46] J. Schulze. Towards a tighter coupling of bottom-up and top-down sparse matrix ordering methods.
BIT, 41(4):800–841, 2001.
[47] W. M. Sid-Lakhdar. Scaling multifrontal methods for the solution of large sparse linear systems on
hybrid shared-distributed memory architectures. Ph.D. dissertation, ENS Lyon, Dec. 2014.
[48] Tz. Slavova. Parallel triangular solution in the out-of-core multifrontal approach for solving large
sparse linear systems. Ph.D. dissertation, Institut National Polytechnique de Toulouse, Apr. 2009.
Available as CERFACS Report TH/PA/09/59.

125
[49] M. Snir, S. W. Otto, S. Huss-Lederman, D. W. Walker, and J. Dongarra. MPI: The Complete
Reference. The MIT Press, Cambridge, Massachusetts, 1996.
[50] C. Weisbecker. Improving multifrontal solvers by means of algebraic block low-rank
representations. PhD thesis, Institut National Polytechnique de Toulouse, Oct. 2013.

126

Sistem Informasi Geografis
100% (1)
Sistem Informasi Geografis
310 pages
Programming With Java
No ratings yet
Programming With Java
1 page
Proceedings of the 2021 DigitalFUTURES The 3rd International Conference on Computational Design and Robotic Fabrication CDRF 2021 1st Edition Philip F. Yuan - The ebook in PDF format is ready for immediate access
100% (3)
Proceedings of the 2021 DigitalFUTURES The 3rd International Conference on Computational Design and Robotic Fabrication CDRF 2021 1st Edition Philip F. Yuan - The ebook in PDF format is ready for immediate access
73 pages
Interactive Dashboards With Plotly & Dash
No ratings yet
Interactive Dashboards With Plotly & Dash
181 pages
Begins Here.: Clear Vision
No ratings yet
Begins Here.: Clear Vision
16 pages
Jurnal SKRIPSI
No ratings yet
Jurnal SKRIPSI
7 pages
Matrix Algorithms
No ratings yet
Matrix Algorithms
74 pages
Fall Detection
No ratings yet
Fall Detection
54 pages
Multimedia
No ratings yet
Multimedia
4 pages
Game Log
No ratings yet
Game Log
28 pages
Python Unit 5
No ratings yet
Python Unit 5
8 pages
II Rdbms Notes
No ratings yet
II Rdbms Notes
111 pages
Zaid Raad 999999
No ratings yet
Zaid Raad 999999
9 pages
PHP Microproject
No ratings yet
PHP Microproject
26 pages
1 D Krishna Mohan Case Status TS - D
No ratings yet
1 D Krishna Mohan Case Status TS - D
1 page
QM - CW - ENS - GU - 03 ChartWorld ENS Guidelines
No ratings yet
QM - CW - ENS - GU - 03 ChartWorld ENS Guidelines
11 pages
Elmer Solver Manual
No ratings yet
Elmer Solver Manual
159 pages
Summative Test Ict 9
No ratings yet
Summative Test Ict 9
3 pages
MATLAB For Engineers
50% (2)
MATLAB For Engineers
74 pages
Dji Robomaster Ai Challenge Technical Report
No ratings yet
Dji Robomaster Ai Challenge Technical Report
10 pages
User Guide: Easy Duplicate Finder
No ratings yet
User Guide: Easy Duplicate Finder
42 pages
Airline Documentation Final
No ratings yet
Airline Documentation Final
86 pages
Kiliki Font
No ratings yet
Kiliki Font
1 page
First Advertisement For JOOUST - USAID Boresha Jamii 11th July 21
No ratings yet
First Advertisement For JOOUST - USAID Boresha Jamii 11th July 21
20 pages
Elmer Solver Manual
No ratings yet
Elmer Solver Manual
163 pages
Developer Guide v8p4
No ratings yet
Developer Guide v8p4
179 pages
CO Assignment-1: ISA With Reference To MIPS
No ratings yet
CO Assignment-1: ISA With Reference To MIPS
3 pages
Introduction To Matlab For Engineering Students
No ratings yet
Introduction To Matlab For Engineering Students
75 pages
Introduction To Matlab For Engineering Students: David Houcque Northwestern University
No ratings yet
Introduction To Matlab For Engineering Students: David Houcque Northwestern University
76 pages
Dynare: Reference Manual Version 4: Dynare Working Papers Series
No ratings yet
Dynare: Reference Manual Version 4: Dynare Working Papers Series
224 pages
Enercalc
No ratings yet
Enercalc
11 pages
PLTMG:: A Software Package For Solving Elliptic Partial Differential Equations Users' Guide 9.0
No ratings yet
PLTMG:: A Software Package For Solving Elliptic Partial Differential Equations Users' Guide 9.0
140 pages
Photoshop Basics - Doing More With Layers
No ratings yet
Photoshop Basics - Doing More With Layers
14 pages
Sample Paper CS604
No ratings yet
Sample Paper CS604
11 pages
Dynare
No ratings yet
Dynare
74 pages
What Is Arduino?
No ratings yet
What Is Arduino?
3 pages
Matlab Guide 3ed Toc
100% (1)
Matlab Guide 3ed Toc
5 pages
Nokia Service Management Platform Data Sheet EN
No ratings yet
Nokia Service Management Platform Data Sheet EN
4 pages
Windows 7 Platinum Edition File List
0% (1)
Windows 7 Platinum Edition File List
3 pages
Ma50174 Advanced Numerical Methods - Part 1: I.G. Graham (Heavily Based On Original Notes by C.J.Budd)
No ratings yet
Ma50174 Advanced Numerical Methods - Part 1: I.G. Graham (Heavily Based On Original Notes by C.J.Budd)
53 pages
Dynare
No ratings yet
Dynare
74 pages
PPT-Secure Distributed Deduplication Systems With Improved Reliability
No ratings yet
PPT-Secure Distributed Deduplication Systems With Improved Reliability
46 pages
Introduction Matlab
No ratings yet
Introduction Matlab
8 pages
Animation and Movie Making
No ratings yet
Animation and Movie Making
2 pages
Haoudout1 4
0% (1)
Haoudout1 4
243 pages
bookAdomaitisAppliedMath4ChE 19utf81 PDF
No ratings yet
bookAdomaitisAppliedMath4ChE 19utf81 PDF
216 pages
Matlab Finite Element Modeling For Materials Engineers Using MATLAB
No ratings yet
Matlab Finite Element Modeling For Materials Engineers Using MATLAB
74 pages
The Linux Terminal for Beginners - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Beginners - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
The Elements of Quantitative Investing
From Everand
The Elements of Quantitative Investing
Giuseppe A. Paleologo
No ratings yet
Artificial Intelligence Programming with Python: From Zero to Hero
From Everand
Artificial Intelligence Programming with Python: From Zero to Hero
Perry Xiao
4/5 (1)
Generative AI for Trading and Asset Management
From Everand
Generative AI for Trading and Asset Management
Hamlet Jesse Medina Ruiz
No ratings yet
Understanding Checksums and Cyclic Redundancy Checks
From Everand
Understanding Checksums and Cyclic Redundancy Checks
Philip Koopman
No ratings yet
Accelerated Computing With HIP: Second Edition
From Everand
Accelerated Computing With HIP: Second Edition
Yifan Sun
No ratings yet
Modern JavaScript: Coding for Today and Tomorrow
From Everand
Modern JavaScript: Coding for Today and Tomorrow
Jay Nans
No ratings yet
The Linux Shell Scripting Handbook - From Journeyman to Master
From Everand
The Linux Shell Scripting Handbook - From Journeyman to Master
Michael Basler
No ratings yet
Linux Shell Scripting - A Beginner's Guide: First Edition
From Everand
Linux Shell Scripting - A Beginner's Guide: First Edition
Michael Basler
No ratings yet
ChatGPT Hacks; Work Smarter, Not Harder
From Everand
ChatGPT Hacks; Work Smarter, Not Harder
Maria Cowen
No ratings yet
Mastering Python Advanced Concepts and Practical Applications
From Everand
Mastering Python Advanced Concepts and Practical Applications
Aissa Younes
No ratings yet
Grow with Python Programming: From Basics to Advanced
From Everand
Grow with Python Programming: From Basics to Advanced
Mark Fliks
No ratings yet
GUI Magic: Mastering Real Projects in Python
From Everand
GUI Magic: Mastering Real Projects in Python
John Nunez
No ratings yet
Multicore DSP: From Algorithms to Real-time Implementation on the TMS320C66x SoC
From Everand
Multicore DSP: From Algorithms to Real-time Implementation on the TMS320C66x SoC
Naim Dahnoun
No ratings yet
Cutting-Edge Desktop UI Development with Python, PySide6, PyQt6
From Everand
Cutting-Edge Desktop UI Development with Python, PySide6, PyQt6
Jay Nans
No ratings yet
CompTIA A+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Core 2 Exam 220-1102
From Everand
CompTIA A+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Core 2 Exam 220-1102
Mike Chapple
No ratings yet
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
Risk Management and System Safety
From Everand
Risk Management and System Safety
Leonam dos Santos Guimarães
5/5 (1)
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
Mastering AUTOSAR: A Comprehensive Guide for Automotive Engineers
From Everand
Mastering AUTOSAR: A Comprehensive Guide for Automotive Engineers
Mohamad Charara
No ratings yet
The Science of Controller Synthesis
From Everand
The Science of Controller Synthesis
Martin Braae
No ratings yet
Plain JavaScript: Learning the Front-End
From Everand
Plain JavaScript: Learning the Front-End
Roger Beans-Rivet
No ratings yet
Mathematics N4: FET College Nated, #6
From Everand
Mathematics N4: FET College Nated, #6
Efetobo Emede
No ratings yet
Decoding Reliability-Centered Maintenance Process for Manufacturing Industries 10th Discipline of World Class Maintenance Management: 1, #7
From Everand
Decoding Reliability-Centered Maintenance Process for Manufacturing Industries 10th Discipline of World Class Maintenance Management: 1, #7
Rolly Angeles
No ratings yet
C Programming for the Pc the Mac and the Arduino Microcontroller System
From Everand
C Programming for the Pc the Mac and the Arduino Microcontroller System
Peter D Minns
No ratings yet
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Conquering the Competition: Strategies for Standing Out in the Gaming Content Landscape
From Everand
Conquering the Competition: Strategies for Standing Out in the Gaming Content Landscape
Rian McCullen
No ratings yet
Cybersecurity for Executives: A Guide to Protecting Your Business
From Everand
Cybersecurity for Executives: A Guide to Protecting Your Business
Matthew C. Smith
No ratings yet
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
CAN Bus for Beginners: A Practical Guide to Automotive Networking
From Everand
CAN Bus for Beginners: A Practical Guide to Automotive Networking
Mohamad Charara
No ratings yet
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
From Everand
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
Massimiliano Bocciarelli
No ratings yet
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Blog Smarter, Not Harder: SEO, Blogging, and AI Strategies to Skyrocket Your Traffic
From Everand
Blog Smarter, Not Harder: SEO, Blogging, and AI Strategies to Skyrocket Your Traffic
Jay Nans
No ratings yet
A To Z of Internet: Everything You Wanted to Know
From Everand
A To Z of Internet: Everything You Wanted to Know
Bittu Kumar
No ratings yet
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet
Software Patterns Made Easy
From Everand
Software Patterns Made Easy
Justice Nanhou
No ratings yet
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet
Web Video Business
From Everand
Web Video Business
MUHAMMAD NUR WAHID ANUAR
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.