Userguide 5.6.0
Userguide 5.6.0
(MUMPS 5.6.0)
Users’ guide *
Abstract
This document describes the Fortran 95 and C user interfaces to MUMPS5.6.0, a software package
to solve sparse systems of linear equations, with many features. For some classes of problems, the
complexity of the factorization and the memory footprint of MUMPS can be reduced thanks to the Block
Low-Rank (BLR) feature.
We describe in detail the data structures, parameters, calling sequences, and error diagnostics. Basic
example programs using MUMPS are also provided. Users are allowed to save to disk MUMPS internal
data before or after any of the main steps of MUMPS (analysis, factorization, solve) and then restart.
* Information on how to obtain updated copies of MUMPS can be obtained from the Web page http://mumps-solver.org/
1
Contents
1 Introduction 5
2
5 Application Program Interface 24
5.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.1.1 Initialization, Analysis, Factorization, Solve, Save, Restore, Termination (JOB) . 24
5.1.2 Special use of JOB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.1.2.1 JOB=9 before solution with distributed right-hand sides: . . . . . . . 26
5.1.2.2 JOB=-4 after factorization or solve phases: . . . . . . . . . . . . . . 26
5.1.3 Version number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.1.4 Control of parallelism (COMM, PAR) . . . . . . . . . . . . . . . . . . . . . . . 26
5.2 Input Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2.1 Matrix type (SYM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2.2 Matrix format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2.2.1 Centralized assembled matrix (ICNTL(5)=0 and ICNTL(18)=0). . 28
5.2.2.2 Distributed assembled matrix (ICNTL(5)=0 and ICNTL(18)=1,2,3). 29
5.2.2.3 Elemental matrix (ICNTL(5)=1 and ICNTL(18)=0). . . . . . . . . 30
5.2.3 Writing the input matrix to a file . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3 Preprocessing: permutation to zero-free diagonal and scaling . . . . . . . . . . . . . . . 32
5.3.1 Permutation to a zero-free diagonal (ICNTL(6)) . . . . . . . . . . . . . . . . 34
5.3.2 Scaling (ICNTL(6) or ICNTL(8)) . . . . . . . . . . . . . . . . . . . . . . . 34
5.4 Preprocessing: symmetric permutations . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.4.1 Symmetric permutation vector (ICNTL(7) and ICNTL(29)) . . . . . . . . . 37
5.4.2 Given ordering (ICNTL(7)=1 and ICNTL(28)=1) . . . . . . . . . . . . . . . 37
5.5 Preprocessing: exploit compression of the input matrix resulting from a block format
(ICNTL(15)̸= 0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.6 Post-processing: iterative refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.7 Post-processing: error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.8 Out-of-core (ICNTL(22)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.9 Workspace parameters (ICNTL(14) and ICNTL(23)) and user workspace . . . . . . 42
5.10 Null pivot row detection (ICNTL(24)) . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.11 Discard matrix factors (ICNTL(31)) . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.12 Computation of the determinant (ICNTL(33)) . . . . . . . . . . . . . . . . . . . . . 47
5.13 Forward elimination during factorization (ICNTL(32)) . . . . . . . . . . . . . . . . . 47
5.14 Right-hand side and solution vectors/matrices . . . . . . . . . . . . . . . . . . . . . . . 48
5.14.1 Dense right-hand side (ICNTL(20)=0) . . . . . . . . . . . . . . . . . . . . . 50
5.14.2 Sparse right-hand side (ICNTL(20)=1,2,3) . . . . . . . . . . . . . . . . . . . 50
5.14.3 Distributed right-hand side (ICNTL(20)=10,11) . . . . . . . . . . . . . . . . 51
5.14.4 A particular case of sparse right-hand side: computing entries of A−1
(ICNTL(30)=1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.14.5 Centralized solution (ICNTL(21)=0) . . . . . . . . . . . . . . . . . . . . . . 53
5.14.6 Distributed solution (ICNTL(21)=1) . . . . . . . . . . . . . . . . . . . . . . 54
5.15 Schur complement with reduced or condensed right-hand side (ICNTL(19) and
ICNTL(26)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.15.1 Centralized Schur complement stored by rows (ICNTL(19)=1) . . . . . . . . 55
5.15.2 Distributed Schur complement (ICNTL(19)=2 or 3) . . . . . . . . . . . . . . 55
5.15.3 Centralized Schur complement stored by columns (ICNTL(19)=2 or 3) . . . . 57
5.15.4 Using partial factorization during solution phase (ICNTL(26)= 0, 1 or 2) . . . 58
5.16 Block Low-Rank (BLR) feature (ICNTL(35) and CNTL(7)) . . . . . . . . . . . . 59
5.16.1 Enabling the BLR functionality at installation . . . . . . . . . . . . . . . . . . 59
5.16.2 BLR API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.16.3 BLR output: statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.17 Save (JOB=7) / Restore (JOB=8) feature . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.17.1 Location and names of the save files . . . . . . . . . . . . . . . . . . . . . . . 64
5.17.2 Deletion of the save files (JOB=-3) . . . . . . . . . . . . . . . . . . . . . . . . 64
5.17.3 Important remarks for the restore feature (JOB=8) . . . . . . . . . . . . . . . . 64
5.17.4 Combining the save/restore feature with out-of-core . . . . . . . . . . . . . . . 65
5.17.5 Combining the save/restore feature with WK USER . . . . . . . . . . . . . . . 66
3
5.17.6 Error management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.18 Setting the number of OpenMP threads (ICNTL(16)) . . . . . . . . . . . . . . . . . . 66
5.19 Compact workarray id%S at the end of factorization phase . . . . . . . . . . . . . . . . 67
6 Control parameters 67
6.1 Integer control parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2 Real/complex control parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.3 Compatibility between options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7 Information parameters 92
7.1 Information local to each processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.2 Information available on all processors . . . . . . . . . . . . . . . . . . . . . . . . . . 96
12 License 122
13 Credits 122
4
1 Introduction
MUMPS (“MUltifrontal Massively Parallel Solver”) is a package for solving systems of linear equations of
the form Ax = b, where A is a square sparse matrix that can be either unsymmetric, symmetric positive
definite, or general symmetric, on distributed memory computers. MUMPS implements a direct method
based on a multifrontal approach which performs a Gaussian factorization
A = LU (1)
where L is a lower triangular matrix and U an upper triangular matrix. If the matrix is symmetric then
the factorization
A = LDLT (2)
where D is block diagonal matrix with blocks of order 1 or 2 on the diagonal is performed. We refer the
reader to the papers [8, 9, 12, 27, 28, 33, 43, 31, 32, 17, 1, 48, 14, 44, 4, 37, 50, 18, 38, 47] for full details
of the techniques used, algorithms and related research.
The system Ax = b is solved in three main steps:
1. Analysis.
During analysis, preprocessing (see Subsection 3.2), including an ordering based on the
symmetrized pattern A + AT , and a symbolic factorization are performed. During the symbolic
factorization, a mapping of the multifrontal computational graph, the so called elimination tree
[40], is computed and used to estimate the number of operations and memory necessary for
factorization and solution. Both parallel and sequential implementations of the analysis phase are
available. Let Apre denote the preprocessed matrix (further defined in Subsection 3.2).
2. Factorization.
During factorization Apre = LU or Apre = LDLT , depending on the symmetry of the
preprocessed matrix, is computed. The original matrix is first distributed (or redistributed)
onto the processors depending on the mapping computed during the analysis. The numerical
factorization is then a sequence of dense factorization on so called frontal matrices. In addition to
standard threshold pivoting and two-by-two pivoting (not so standard in distributed memory codes)
there is an option to perform static pivoting. The elimination tree also expresses independency
between tasks and enables multiple fronts to be processed simultaneously. This approach is
called multifrontal approach. After factorization, the factor matrices are kept distributed (in-core
memory or on disk); they will be used at the solution phase.
3. Solution.
The solution xpre of LUxpre = bpre or LDLT xpre = bpre where xpre and bpre are
respectively the transformed solution x and right-hand side b associated to the preprocessed matrix
Apre , is obtained through a forward elimination step
The solution xpre is finally postprocessed to obtain the solution x of the original system Ax = b,
where x is either assembled on an identified processor (the host) or kept distributed on the working
processors. Iterative refinement and backward error analysis are also postprocessing options of the
solution phase.
Each of these 3 phases can be called separately (see Subsection 5.1.1). A special case is the one
where the forward elimination step is performed during factorization (see Subsection 3.7), instead of
during the solve phase. This allows accessing the L factors right after they have been computed, with a
better locality, and can avoid writing the L factors to disk in an out-of-core context. In this case (forward
elimination during factorization), only the backward elimination is performed during the solution phase .
The software is mainly written in Fortran 95 although a C interface is available (see Section 9). Scilab
and MATLAB/Octave interfaces are also available in the case of sequential executions. The parallel
5
version of MUMPS requires MPI [49] for message passing and makes use of the BLAS [22, 23], LAPACK,
BLACS, and ScaLAPACK [21] libraries. The sequential version only relies on BLAS and LAPACK.
MUMPS exploits both parallelism arising from sparsity in the matrix A and from dense factorization
kernels. It distributes the work tasks among the processors, but an identified processor (the host) is
required to perform most of the analysis phase, to distribute the incoming matrix to the other processors
(slaves) in the case where the matrix is centralized, and to collect the solution if it is not kept distributed.
Several instances of MUMPS can be handled simultaneously. MUMPS allows the host processor to
participate to the factorization and solve phases, just like any other processor (see Subsection 3.10).
For both the symmetric and the unsymmetric algorithms used in the code, we have chosen a
fully asynchronous approach with dynamic scheduling of the computational tasks. Asynchronous
communication is used to enable overlapping between communication and computation. Dynamic
scheduling is used to accommodate numerical pivoting in the factorization and to remap work and data
to appropriate processors at execution time. In fact, we combine the main features of static and dynamic
approaches; we use the estimation obtained during the analysis to map some of the main computational
tasks; the other tasks are dynamically scheduled at execution time. The main data structures (the original
matrix and the factors) are similarly partially mapped during the analysis phase.
The main features of the MUMPS package include:
• various arithmetics (real or complex, single or double precision)
• input of the matrix in assembled format (distributed or centralized) or elemental format
• sequential or parallel analysis phase
• use of several built-in ordering algorithms, a tight interface to some external ordering packages
such as PORD [46], SCOTCH [42] or Metis [34] (strongly recommended), and the possibility for
the user to input a given ordering.
• scaling of the original matrix
• out-of-core capability
• save and restore feature
• detection of null pivot rows, basic estimate of rank deficiency and computation of a null space basis
• computation of a Schur complement matrix
• computation of the determinant
• computation of selected entries of the inverse of A
• exploiting sparsity of the right-hand sides
• forward elimination during factorization
• solution of the transposed system
• error analysis
• iterative refinement
• selective 64-bit integer
• exploitation of a Block Low-Rank (BLR) format for factorization and solution
MUMPS is downloaded from the web site many times per day and has been run on very many machines,
compilers and operating systems, although our experience is really only with UNIX-based systems. We
have tested it extensively on many parallel computers. Please visit our webstite for recommendations,
from our users, on how to use the solver on Windows platforms.
6
We discuss binary compatibility in Subsection 2.2 and how to upgrade between minor versions in
Subsection 2.3. We then discuss in Subsection 2.9 and in Subsection 2.10 some minor modifications
to control parameters or to interfaces with ordering packages. Such control parameters are normally
backward compatible, but it may happen that their range of possible values or their meaning has been
slightly modified or extended. Please read this section if you are using an anterior version of MUMPS and
want to use MUMPS 5.6.0.
The interface is backward compatible so that a code working with the previous version should also
work with the new one (all codes including MUMPS headers should be recompiled). Still, the following
points can be noted.
2.1 ChangeLog
Changes from 5.5.1 to 5.6.0
* Analysis by blocks and out-of-range entries compatible with parallel analysis
* Compact workarray S before solution phase (ICNTL(49)=1,2)
* JOB=-4 frees data from factorization and keeps results from analysis
* NEC vector engine version: tuned block low rank and GEMMT usage
* Reduced memory for symbolic datastructures (order-N arrays, arrowheads)
* Use new symbolic factorization (column counts) by default (+allow forest in case of Schur
* Improved amalgamation algorithm to reduce the number tiny nodes
* Discard factors option is now compatible with BLR
* Discard L factors option in case of LU factorization available for in-core
* Count null negative pivots (INFOG(50))
* Compilation with -DBLR_MT is no longer needed (see also -DBLR_NOOPENMP)
* Improve performance of A-1 entries computation when #MPI > 1
* Fixed risk of hanging in case of OOC error (e.g. disk full)
* Free internal data earlier (e.g., factors in case of failed factorization, etc.)
* Avoid setenv/unsetenv on MinGW environments (Scotch versions >= 7)
* Fix error during ordering when processing matrices of order 1
* Out-Of-Core, values of ICNTL(22) different from 1 are treated as zero (in-core)
* BLKPTR ignored when ICNTL(15) < 0 (INFO(1:2)=-57 4 no longer occurs)
* BLR OpenMP fac_lr.F fix for possible non-increasing scheduling of loop indices
* Fix: id%LRGROUPS was not nullified in CMUMPS_FREE_ONENTRY_ANA_DRIVER
* Fixed parameter error 1202 in PDGETRS after restoring a factorization
* Workaround MPI_SSEND intel MPI 2021.6 issue
* NEC: Fix MUMPS_WRAP_GINP94 offloading to host
* Shared libraries: minor update in Makefiles
* Fixed missing printings of preprocessing constants in mumps_print_defined.F
7
2.4 Upgrading from MUMPS 5.5.1 to MUMPS 5.6.0
• The -DBLR MT, indicating that the BLAS is compatible with OpenMP is deprecated. MUMPS
now assumes by default that the BLAS library is compatible with OpenMP and that the number
of threads in the BLAS adapts to the number of OpenMP threads available in the current
OpenMP region. If this is not the case, then MUMPS should be compiled with OPTF containing
-DBLR NOOPENMP, indicating that the BLAS should not be called inside OpenMP parallel
regions.
• Values of ICNTL(22) different from 1 (Out-Of-Core activated during factorization) are now
treated as 0 (In-Core factorization), in previous versions they were treated as 1 (Out-Of-Core
activated).
• The feature to discard factors (ICNTL(31)=1) now also discards the low-rank factors in case
the block low-rank feature is activated (ICNTL(35)), leading to a smaller memory usage when
ICNTL(31)=1 and ICNTL(35)=1 or 2. The feature to discard only the L factor (unsymmetric
matrices, ICNTL(31)=2 or ICNTL(32)=1) now also discards the in-core L factor, whereas it
was only effective for out-of-core factorizations in previous versions.
• Concerning the analysis by blocks, ICNTL(15) < 0 is now authorized even if BLKPTR (or
BLKVAR) is allocated (BLKPTR is then not accessed). As a consequence, error -57 with
INFO(2)=4 no longer occurs.
8
2.8 Upgrading from MUMPS 5.1.2 to MUMPS 5.2.1
Although MUMPS 5.2.1 contains new features compared to MUMPS 5.1.2 (one of them being the
possibility to store factors in low-rank format, leading to significant memory gains on some classes of
matrices), the installation procedure has not changed and the interface of existing features is almost fully
backward compatible, subject to the remarks below. Still, your code should be recompiled.
A minor modification concerns the combination of the BLR feature (ICNTL(35)) with elemental
input (ICNTL(5)) and with the forward elimination during factorization (ICNTL(32)). In 5.1.x
versions of MUMPS, BLR was automatically switched off in case of elemental input or forward elimination
during factorization where as an error is now raised in such cases (see error codes -43 and -800).
The scope of ICNTL(35) has been extended: ICNTL(35)=2 exploits low-rank factors during
solve, ICNTL(35)=3 is a backward-compatibility option to only exploit low-rank factors during
factorization but perform a full-rank solve, as was doing ICNTL(35)=1 in MUMPS 5.1.2.
ICNTL(35)=1 now does an automatic choice of the “best” BLR option (currently BLR factorization
and BLR solve).
Thanks to a more flexible and improved memory management, the ICNTL(23) parameter to limit
the amount of memory allocation by MUMPS to a given amount is now compatible with the block low-
rank feature (ICNTL(35)). Therefore, the error characterized by INFO(1)=-800 and INFO(2)=23
from previous versions has disappeared. Furthermore, when ICNTL(23) is provided, MUMPS will no
longer try to allocate all the memory authorized in virtual memmory. This is due to the new capacity of
dynamically allocating some working memory when the static workspace is not large enough.
In case a maximum amount of allowed memory ICNTL(23) is provided and MUMPS stops with an
error because ICNTL(23) cannot be respected, a new specific error code -19 is now raised, whereas
previously only the error code -9 was raised (in full-rank). This allows to distinguish between the
two types of errors. The error code -9 may still occur from time to time to indicate that the main
internal workarray allocated at the beginning of the factorization is too small (and as before, the parameter
ICNTL(14) defining the relaxation of the main internal working space should be relaxed). However, this
error should now occur much less often than before, e.g. in case of extreme numerical pivoting difficulties.
This is because this version includes a first step towards a more flexible memory management that allows
part of the working memory to be stored in dynamically allocated blocks when needed.
Because of dynamic allocations, remark that the parameter ICNTL(14) used to relax the main
internal working space does not lead to a bound on the total memory allocated: dynamic data may in
some cases be allocated when this allows to avoid -9 errors and low-rank matrices will also be allocated
dynamically and possibly kept during the solve phase. If the user would like to provide a strict bound on
the total allocated memory, we recommend the use of ICNTL(23).
Related to the low-rank feature, the new control parameter ICNTL(38) was introduced for the user
to provide an estimate compression rate for the factors. It is used by the analysis phase to provide memory
estimates for the block low-rank factorization.
Out-of-range values of ICNTL(12) are now treated as 1 (usual ordering) instead of 0 (automatic
choice).
The range of statistics provided to the users was extended and the lengths of the arrays INFO and
INFOG is now 80 instead of 40.
9
versions, LAPACK was only necessary as a dependency on ScaLAPACK, in case the ScaLAPACK
library used did not come with all its LAPACK dependencies.
10
Since SCOTCH version 6.0.0, the PT-SCOTCH library does not include the SCOTCH library. So
during the link phase, the SCOTCH library must be provided to MUMPS. It can be easily done by adding
”-lscotch” to the LSCOTCH variable in your Makefile.inc file.
Unfortunately, there is a problem in the SCOTCH 6.0.0 package which is making it unusable with
MUMPS. You should update your version of SCOTCH to 6.0.1 or later.
Table 1: Backward compatibility issues between MUMPS 4.10.0 and MUMPS 5.6.0 for ICNTL(11). Full statistics include
condition numbers estimates and forward error estimate, which are very expensive to compute.
The main reason for this change is that, because backward error analysis already provides a good
indication of the quality of the computed solution, most users might not want to compute forward error
analysis and condition numbers estimates in all cases.
11
2.10.6 ICNTL(4): Control of the level of printing
By default, some printings that were appearing in MUMPS 4.10.0 no longer appear in MUMPS 5.6.0.
This is because, when ICNTL(4)< 2, some diagnostic messages were printed in MUMPS 4.10.0
whereas they should not have been printed. This change of behaviour should thus be considered as a bug
correction between MUMPS 4.10.0 and the new version, rather than a backward compatibility issue.
In order to have such messages printed as in MUMPS 4.10.0 with the latest version, please set
the value of ICNTL(4) to 2. Please also refer to the detailed descriptions of ICNTL(4), ICNTL(1),
ICNTL(2), and ICNTL(3).
3.2 Preprocessing
During the analysis phase, it is possible to preprocess the matrix to make easier/cheaper the numerical
factorization. The package offers a range of symmetric orderings to preserve sparsity, but also other
preprocessing facilities: permuting to zero-free diagonal and prescaling. When all preprocessing options
are activated, the preprocessed matrix Apre that will be effectively factored is :
Apre = P Dr A Qc Dc PT , (5)
where P is a permutation matrix applied symmetrically, Qc is a (column) permutation and Dr and Dc
are diagonal matrices for (respectively row and column) scaling. Note that when the matrix is symmetric,
preprocessing is designed to preserve symmetry.
Preprocessing highly influences the performance (memory and time) of the factorization and solution
steps. The default values correspond to an automatic setting performed by the package which depends on
the ordering packages installed, the type of the matrix (symmetric or unsymmetric), the size of the matrix
and the number of processors available. We thus strongly recommend the user to install all ordering
packages to offer maximum choice to the automatic decision process.
• Symmetric permutation : P
The symmetric permutation can be computed either sequentially, or in parallel. This option is
controlled by ICNTL(28). The sequential computation is controlled by ICNTL(7) whereas the
parallel computation is controlled by ICNTL(29).
− In the case where the symmetric permutation is computed sequentially, an important range of
ordering options is offered including the approximate minimum degree ordering (AMD, [7]), an
12
approximate minimum degree ordering with automatic quasi-dense row detection (QAMD, [3]),
an approximate minimum fill-in ordering (AMF), an ordering where bottom-up strategies are used
to build separators by Jürgen Schulze from University of Paderborn (PORD, [46]), the SCOTCH
package [42] from the University of Bordeaux 1, and the Metis package from Univ. of Minnesota
[34]. A user-supplied permutation can also be provided and the pivot order must be set by the user
on the host in the array PERM IN (see Subsection 5.4.2).
− When the symmetric permutation is computed in parallel, possible orderings are computed by
PT-SCOTCH or ParMetis, which must have been installed by the user.
• Permutations to a zero-free diagonal : Qc
Controlled by ICNTL(6), this permutation is recommended for very unsymmetric matrices to
reduce fill-in and arithmetic cost, see [24, 25]. For symmetric matrices this permutation can
also be used to constrain the symmetric permutation P (see ICNTL(12)). Furthermore, when
numerical values are provided on entry to the analysis phase, ICNTL(6) may also build scaling
vectors during the analysis, that will be either used or discarded depending on the scaling option
ICNTL(8) (see next paragraph).
• Row and column scalings : Dr and Dc
Controlled by ICNTL(8), this preprocessing improves the numerical accuracy and makes all
estimations performed during analysis more reliable. A range of classical scalings are provided
and can be automatically performed within the package either during the analysis phase or at the
beginning of the factorization phase.
Furthermore, preprocessing strategies for symmetric indefinite matrices, as described in [26], can
be applied and also lead to scaling arrays; they are controlled by ICNTL(12).
13
3.3 Post-processing facilities
3.3.1 Iterative refinement
A well known and simple technique to improve the accuracy of the solution of linear systems is the use
of iterative refinement. It consists in refining an initial solution obtained after solution phase as described
in Algorithm 1.
repeat
Solve A∆x = r using the computed factorization
x = x + ∆x
r = b − Ax
The computed backward error ω (see Subsection 3.3.2)
until ω < α
Algorithm 1: Iterative refinement. At each step, backward errors are computed and compared to α, the
stopping criterion.
It has been shown [20] that with only two to three steps of iterative refinement the solution can
often be significantly improved. For this reason, alternatively to Algorithm 1, a simple variant of
iterative refinement can be used with a fixed number of steps and thus without convergence test (see
Subsection 5.6).
The use of iterative refinement can be particularly useful if static pivoting has been used during
factorization (see Subsection 3.9).
In MUMPS, iterative refinement can be optionally performed after the solution step using the parameter
ICNTL(10).
|b − Ax̄|i
(6)
(|b| + |A| |x̄|)i
is computed for all equations except those for which the numerator is nonzero and the denominator is
small. For all these exceptional equations,
|b − Ax̄|i
(7)
(|A| |x̄|)i + ∥Ai ∥∞ ∥x̄∥∞
is computed instead, where Ai is row i of A. In [20], the largest scaled residual in Equation (6), is defined
as ω1 and the largest scaled residual in Equation (7) as ω2 . If all equations are in category (1), ω2 is zero.
ω1 and ω2 are the two backward errors.
Then, the computed solution x̄ is the exact solution of the equation
(A + δA)x = (b + δb),
where
|δAij | ≤ max(ω1 , ω2 )|A|ij ,
and |δbi | ≤ max(ω1 |b|i , ω2 ∥Ai ∥∞ ∥x̄∥∞ ). Note that δA respects the sparsity of A in the sense that
δAij is zero for structural zeros in A, i.e., when Aij =0.
Finally, if the user can afford a significantly more costly error analysis, condition numbers cond1 and
cond2 for the linear system (not just the matrix) can also be returned together with an upper bound of the
forward error of the computed solution :
14
∥δx∥∞
≤ ω1 cond1 + ω2 cond2
∥x∥∞
This option is controlled by ICNTL(11).
15
3.9 Numerical pivoting
The goal of pivoting is to ensure a good numerical accuracy during Gaussian elimination. A widely used
technique is known as partial pivoting. Considering an unsymmetric matrix, at step i of the factorization
we first determine k such that |ak,i | = maxl=i:n |al,i |. Rows i and k are swapped in A (and the
permutation information is stored in order to apply it to the right-hand side b) before dividing the column
by the pivot and performing the rank-one update. The advantage of this approach is that it bounds the
growth factor and improves the numerical stability.
Unfortunately, in the case of sparse matrices, numerical pivoting prevents a full static prediction of
the structure of the factors: it dynamically modifies the structure of the factors, thus forcing the use of
dynamic data structures. Numerical pivoting can thus have a significant impact on the fill-in and on the
amount of floating-point operations. To limit the amount of numerical pivoting, and stick better to the
sparsity predictions done during the symbolic factorization, partial pivoting can be relaxed, leading to the
partial threshold pivoting strategy:
In the partial threshold pivoting strategy, a pivot ai,i is accepted if it satisfies:
for a given value of u, 0 ≤ u ≤ 1. This ensures a growth factor limited to 1 + 1/u for the corresponding
step of Gaussian elimination. In practice, one often chooses u = 0.1 or u = 0.01 as a default threshold
and this generally leads to a stable factorization. The threshold u can be set using CNTL(1).
It is possible to perform the pivot search on the row rather than on the column with similar stability.
In the multifrontal method, once a frontal matrix is formed, we cannot choose a pivot outside the
fully-summed block, because the corresponding rows are not fully-summed. Once all possible pivots in
the block of candidate pivots have been eliminated, if no other pivot satisfies the partial pivoting threshold,
some rows and columns remain unfactored in the front. Those are then delayed to the frontal matrix of the
parent, as part of the contribution block (delayed pivots). Note that because of the delayed pivots fill-in
the parent node will occur.
The same type of approach is applied to the symmetric case, but with the constrain that we want to
maintain the symmetry of the frontal matrices.
In order to avoid the complications due to numerical pivoting, perturbation techniques can be applied
(static pivoting): a pivot smaller than a threshold in absolute value is replaced by this threshold. In this
case it is recommended do use iterative refinement (see Subsection 3.3.1) to improve the approximate
solution.
In MUMPS, static pivoting (CNTL(4)) and numerical pivoting (CNTL(1)) are combined at runtime.
A comparison of approaches based on static pivoting with approaches based on numerical pivoting in
the context of high-performance distributed solvers can be found in [13].
16
simple sequential program, ignoring everything related to MPI. Note that in this case parallel ordering
packages such as ParMetis or PT-SCOTCH must be disabled during installation. Details on how to
build a purely sequential version of MUMPS are available in the INSTALL file available in the MUMPS
distribution.
Remark that for the sequential version, the component PAR must be set to 1 (see Subsection 5.1.4).
Furthermore, the calling program should not make use of MPI: if the calling program is a parallel MPI
code which requires sequential MUMPS, a parallel version of MUMPS must then be installed, to which a
communicator consisting of a single process should be provided. Finally, the MPI-free version can make
use of several cores by relying on multithreading (see Section 3.12).
17
3.15 Determinant
MUMPS has an option to compute the determinant of the input matrix. It is available for symmetric and
unsymmetric matrices for all arithmetics (single, double, real, complex), and for all matrix input formats.
This option is controlled by ICNTL(33).
Let n be the order of the matrix A, if A = LU (unsymmetric matrices), then
n
Y
det(A) = det(L) × det(U ) = Uii
i=1
The sign of the determinant is maintained by keeping track of all internal permutations. Scaling arrays
are taken into account too, in case the matrix is scaled. To avoid overflows and guarantee an accurate
computation, the mantissa and exponent are computed separately and renormalized when needed.
The determinant is computed when requested by the user. If the user is only interested in the
determinant, he/she may tell MUMPS that the factor matrices can be discarded (see Subsection 5.11),
significantly reducing the storage requirements.
y = L−1 ej (9)
a−1
ij = (U
−1
y)i (10)
MUMPS provides a functionality, controlled by ICNTL(30), to compute a set of entries of A−1 ,
while avoiding most of the computations on explicit zeros in Equations (9) and (10). The list of entries of
A−1 to be computed and the memory for those entries should be provided as a sparse right-hand side (see
Subsection 5.14.2). In a parallel environment it is not so natural to combine parallelism and exploiting
sparsity. Recent work based on [44, 15] to exploit parallelism is provided in this release.
Thus the Schur complement, as returned by MUMPS, is such that S = A2,2 − A2,1 A−1
1,1 A1,2 .
18
The user must specify on entry to the analysis phase the list of indices of the Schur matrix,
corresponding to the variables of A2,2 . MUMPS returns to the user, on exit of the factorization phase, the
Schur complement matrix S, as a full matrix but with different type of distribution (see Subsection 5.15
for more details.)
This partial factorization can be used to solve Ax = b in different ways using the ICNTL(26)
parameter. It can be used to solve the linear system associated with the “interior” variables or to handle a
reduced/condensed right-hand-side as described in the following discussion.
• Compute a partial solution (ICNTL(26) = 0):
The solve is performed on the internal problem:
A1,1 x1 = b1 .
Entries in the right-hand side corresponding to indices from the Schur matrix need not be set on
entry and they are explicitly set to zero on output.
• Solve the complete system in three steps:
L1,1 0 U1,1 U1,2 x1 b1
= (12)
L2,1 I 0 S x2 b2
First solving:
L1,1 0 y1 b1
= (13)
L2,1 I y2 b2
And thereafter:
U1,1 U1,2 x1 y1
= (14)
0 S x2 y2
19
to perform many of the basic dense algebra operations. Several strategies have been proposed in the
literature to exploit this property.
We have designed a so-called Block Low-Rank (BLR) format (see [4, 5]) to reduce the memory
footprint and the computational complexity of our multifrontal solver. The compression of BLR blocks
is based on a truncated QR factorization with column pivoting.
At each node of the multifrontal computational graph, a partial factorization (part corresponding
to the fully summed variables, see Figure 1(a)) of a dense frontal matrix is performed. The so-called
contribution block (CB in the figure) will correspond to the local Schur matrix built at the end of the
partial factorization of the front. As illustrated in Figure 1, the BLR factorization is performed by
panels of size the size of a BLR block. On the left-hand side (a) of the figure, a standard full-rank
(FR) factorization of the panel is first performed leading to dense FR L and U factor blocks (colored
in (a)). The block structure follows the flat BLR structure of the front. A truncated QR factorization,
controlled by a numerical threshold (referred to as ε in Subsection 5.16), is then performed on each off-
diagonal block (as indicated in Figure 1(b)). Blocks in low-rank form are then used to update the trailing
submatrix of the front (grey in (b)) at a lower cost than a full-rank standard update.
BLR block size
U U U U U U U U U U
L L
L L U U U U L L U U U U
L L L L
L L L L
CB CB
L L L L
(a) Full-Rank (FR) panel factorization (b) Low-rank panel compression and update
Figure 1: Block Low-Rank factorization of one panel of a the partial factorization of a frontal matrix.
This functionality is controlled by a numerical parameter, called the low-rank threshold and noted
ε, defining the accuracy of the low-rank approximations (see Subsection 5.16). We have observed in
practice that the backward error of the final solution is closely related to this numerical parameter. With
this parameter, MUMPS can be used to provide either a direct solution at an accuracy controlled by the
low-rank threshold or an approximate factorization that can be used as a preconditioner (see [41]).
20
3.18.2 Reducing the memory consumption using BLR
The LU factors are compressed during the factorization. To reduce the memory footprint, factors are
stored in low-rank form (see Subsection 5.16) . The solution phase is then performed using the low-rank
factors and may thus also be accelerated. This is the automatic choice when BLR is activated.
It is also possible to keep the full-rank factors and discard the low-rank factors that have only thus be
used to accelerate the factorization. The standard, full-rank solution phase is performed and no memory
gains can be expected in this case.
The memory consumption can be further reduced by also compressing intermediate working space,
the so-called contribution blocks (CB) of the frontal matrices, as described in [41]. However, contrarily
to the LU factors compression, the CB compression does not contribute to reducing the global number
of operations and therefore represents an overhead cost. For this reason, in the current version
of MUMPS when BLR is activated the CB compression is not activated by default but is activable
(see Subsection 5.16). Note that in a parallel context, CB compression can also reduce the volume of
communications.
The memory footprint can be further reduced using a preliminary version of mixed precision Block
Low-Rank feature. Mixed precision is used to reduce storage of the matrices of factors and possibly the
contribution blocks when stored with BLR representation. This work (see [2]), supported by EDF R&D,
is performed in the context of the PhD thesis of Matthieu Gerest (LIP6-Sorbonne University and EDF
R&D).
PRECISION, COMPLEX and DOUBLE COMPLEX versions, respectively. Similarly [SDCZ]MUMPS STRUC refers to either
SMUMPS STRUC, DMUMPS STRUC, CMUMPS STRUC, or ZMUMPS STRUC, and [sdcz]mumps struc.h to smumps struc.h,
dmumps struc.h, cmumps struc.h or zmumps struc.h.
21
INCLUDE ’ m p i f . h ’
INCLUDE ’ d m u m p s s t r u c . h ’
...
INTEGER : : IERR
TYPE (DMUMPS STRUC) : : mumps par
...
CALL MPI INIT ( IERR )
...
mumps par%JOB = . . . ! S e t some a r g u m e n t s t o t h e p a c k a g e : t h o s e
mumps par%ICNTL ( 3 ) = 6 ! a r e c o m p o n e n t s o f t h e mumps par s t r u c t u r e
...
CALL DMUMPS( mumps par )
...
CALL MPI FINALIZE ( IERR )
22
INCLUDE ’ [ s d c z ] mumps root . h ’
TYPE [ SDCZ ]MUMPS STRUC
SEQUENCE
C INPUT PARAMETERS
C ----------------
C Problem definition
C ------------------
C Solver (SYM=0 Unsymmetric, SYM=1 Sym. Positive Definite, SYM=2 General Symmetric)
C Type of parallelism (PAR=1 host working, PAR=0 host not working)
INTEGER SYM, PAR , JOB
C Control parameters
C ------------------
INTEGER ICNTL ( 6 0 )
r e a l CNTL ( 1 5 )
INTEGER N ! Order o f i n p u t m a t r i x
C Assembled input matrix : User interface
C ----------------------------------------
INTEGER : : NZ ! S t a n d a r d i n t e g e r i n p u t + bwd . compat .
INTEGER ( 8 ) : : NNZ ! 64− b i t i n t e g e r i n p u t
r e a l / complex , DIMENSION ( : ) , POINTER : : A
INTEGER, DIMENSION ( : ) , POINTER : : IRN , JCN
C Case of distributed matrix entry
C --------------------------------
INTEGER : : NZ loc ! S t a n d a r d i n t e g e r i n p u t + bwd . compat .
INTEGER ( 8 ) : : NNZ loc ! 64− b i t i n t e g e r i n p u t
INTEGER, DIMENSION ( : ) , POINTER : : IR N l oc , J C N l o c
5.1 General
5.1.1 Initialization, Analysis, Factorization, Solve, Save, Restore, Termination (JOB)
JOB (integer) must be initialized by the user on all processors before a call to MUMPS. It controls the
main actions taken by MUMPS. It is not altered by MUMPS. Possible values of JOB are:
–1 initializes an instance of the package. A call with JOB = –1 must be performed before any
other call to the package on the same instance. It sets default values for other components of
[SDCZ]MUMPS STRUC (such as ICNTL), which may then be altered before subsequent calls
to MUMPS. Note that three components of the structure must always be set by the user (on all
processors) before a call with JOB = –1. These are
• COMM,
• SYM, and
• PAR.
Note that if the user wants to modify one of those three components then he/she must terminate the
instance (call with JOB = –2) then reinitialize the instance (call with JOB = –1).
Furthermore, after a call with JOB = –1, the internal component MYID contains the rank of the
calling processor in the communicator provided to MUMPS. Thus, the test “(MYID == 0)” may be
used to identify the host processor (see Subsection 3.10).
Finally, the version number is returned in VERSION NUMBER (see Subsection 5.1.3).
–2 terminates an instance of the package. All data structures associated in mumps par with the instance,
except those provided by the user, are deallocated. It should be called by the user only when no
further calls to MUMPS with this instance are required. In order to avoid memory leaks, it must also
be called before a further JOB = –1 call with the same argument mumps par. When out-of-core
was used (ICNTL(22)=1), the out-of-core factor files are deleted except if the save/restore feature
(Subsection 5.17) has been used and the files are associated to a saved instance. In this case, the
out-of-cores files are kept because they will be exploited to restore the saved instance. See also
Subsection 5.17.
–3 save / restore feature: removes data saved to disk. The files that were used to save an instance of
MUMPS are deleted. It should be called by the user when no further restore of MUMPS with the files
is required. See also Subsection 5.17.4.
1 performs the analysis. In this phase, MUMPS chooses pivots from the diagonal using a selection criterion
to preserve sparsity, using the pattern of the matrix A input by the user. Several formats and
distributions onto the processors are available to input matrix (see Subsection 5.2.2). It subsequently
constructs subsidiary information for the numerical factorization (a JOB=2 call).
An option exists for the user to input the pivotal sequence (ICNTL(7)=1, see Subsection 5.4) in
which case only the necessary information for a JOB=2 call will be generated.
If a preprocessing based on the numerical values is requested (see Subsection 5.3 and ICNTL(6)),
then the numerical values of the original matrix A must also be provided by the user during the
analysis phase, and scaling vectors are optionally computed.
24
Note that a call to MUMPS with JOB=1 must be preceded by a call with JOB = –1 on the same
instance.
2 performs the factorization. It uses the numerical values of the matrix A provided by the user (see
Subsection 5.2.2) and the information from the analysis phase (JOB=1) to factorize the matrix A.
The actual pivot sequence used during the factorization may slightly differ from the sequence
returned by the analysis (see Subsection 5.4.1) if the matrix A is not diagonally dominant.
An option exists for the user to input scaling vectors or let MUMPS compute automatically such
vectors (see Subsection 5.3.2 and ICNTL(8)) just before the numerical factorization.
A call to MUMPS with JOB=2 must be preceded by a call with JOB=1 on the same instance.
3 computes the solution. It uses the right-hand side(s) B provided by the user and the factors generated
by the factorization (JOB=2) to solve a system of equations AX = B or AT X = B. The pattern
and values of the matrix should be passed unchanged since the last call to the factorization phase
(see JOB=2). Several possibilities are given to input the right-hand side matrix B and to ouput the
solution matrix X. The structure component mumps par%RHS must be set by the user (on the
host only) before a call with JOB=3 (see Subsection 5.14). This solution phase can also be used to
compute the null space basis of singular matrices (see ICNTL(25)), provided that null pivot row
detection (ICNTL(24)) was on and that the deficiency obtained INFOG(28) was different from
0.
A call to MUMPS with JOB=3 must be preceded by a call with JOB=2 (or JOB=4) on the same
instance.
4 combines the actions of JOB=1 with those of JOB=2. It must be preceded by a call to MUMPS with
JOB = –1 on the same instance.
5 combines the actions of JOB=2 and JOB=3. It must be preceded by a call to MUMPS with JOB=1 on
the same instance.
6 combines the actions of calls with JOB=1, 2, and 3. It must be preceded by a call to MUMPS with JOB
= –1 on the same instance.
7 saves MUMPS internal data to disk. It must be preceded by a call to MUMPS with JOB=-1 on the same
instance. A call to MUMPS with JOB=7 should be followed, at some point, by a call to MUMPS
with JOB=-2 in order to free MUMPS internal data. The calling processes may then be stopped.
It is possible to delete the files of a saved instance using a call to MUMPS with JOB=-3. See
Subsection 5.17 for more details, in particular how to specify where to store the files of the saved
instance.
8 restores MUMPS internal data from disk. It must be preceded by a call to MUMPS with JOB=-1 on the
same instance. The values of PAR, SYM and COMM should be compatible with the values used at the
moment the data have been written down to disk (JOB=7). See Subsection 5.17 for more details, in
particular where to find the files containing the saved instance.
-200 (experimental, subject to change in the future) suppresses MUMPS out-of-core factor files associated
to the calling MPI process and returns. This feature was designed in case out-of-core was used
(ICNTL(22)=1) and a serious crash occurred, otherwise it should not be used. For example,
if MPI crashes with an error that cannot be recovered, or if some processes encounter some
unrecoverable error that will lead to the termination of the processes, an error handler may call
MUMPS with JOB=-200 in order to free the disk space associated with the out-of-core factor
files before terminating the application. Internal data in memory are not freed and no MPI
communications are performed during this call.
25
5.1.2.1 JOB=9 before solution with distributed right-hand sides: although the user is free to
provide his/her own distribution of the right-hand sides, to target performance he/she could be interested
in getting distributions with specific properties before calling the solve phase. (For example to exploit the
distribution of the factor matrices to limit data redistribution of the RHS). A call to MUMPS with JOB=9
can be done to obtain such information after a successful factorization (JOB=2 or 4 with INFOG(1) ≥ 0
on exit). On entry, IRHS loc should be allocated on each MPI process, of size at least INFO(23). The
result of this JOB=9 call may also depend on the values of ICNTL(20) and ICNTL(9) on entry. More
information is provided in Subsection 5.14.3, which describes the distributed RHS interface.
5.1.2.2 JOB=-4 after factorization or solve phases: a call to MUMPS with JOB=-4 frees all
MUMPS internal data structures except the ones from analysis. It can be used to free internal data allocated
at factorization (factor matrices, . . . ) when the user wants to free some memory associated to an instance
but still be able to perform a new factorization, possibly with new numerical values.
26
SYM (integer) must be initialized by the user on all processors before the initialization phase (JOB =
–1) and is accessed by MUMPS only during this phase. It is not altered by MUMPS. Its value is
communicated internally to the other phases as required. Possible values for SYM are:
0 : A is unsymmetric.
1 : A is assumed to be symmetric positive definite so that pivots are taken from the diagonal
without numerical pivoting during the factorization. With this option, non-positive definite
matrices that do not require pivoting can also be treated in certain cases (see remark below).
2 : A is general symmetric
Other values are treated as 0.
Note that the value of SYM should be identical on all processors; if this is not the case, the value on
processor 0 is used by the package. For the complex version, the value SYM=1 is currently treated
as SYM=2. We do not have a version for Hermitian matrices in this release of MUMPS.
Remark for symmetric matrices (SYM=1). When SYM=1 is indicated by the user, an LDLT
factorization (in opposition to Cholesky factorization which requires positive diagonal pivots) of matrix
A is performed internally by the package, and numerical pivoting is switched off. Therefore, this
setting works for classes of matrices more general than positive definite matrices, including matrices with
negative pivots. However, this feature depends on the use of the ScaLAPACK library (see ICNTL(13))
to factorize the last dense block in the factorization of A associated to the root node of the elimination
tree. More precisely,
• if ScaLAPACK is allowed for the last dense block (default in parallel, ICNTL(13)=0), then the
presence of negative pivots in the part of the factorization processed with ScaLAPACK (subroutine
P POTRF) will raise an error and the code -40 is then returned in INFOG(1);
• if ScaLAPACK is not used (ICNTL(13)>0, or sequential execution, or last dense block detected
to be too small), then negative pivots are allowed and the factorization will work for some classes of
non-positive definite matrices where numerical pivoting is not necessary, e.g., symmetric negative
matrices.
The successful factorization of a symmetric matrix with SYM=1 is thus not an indication that the
matrix provided was symmetric positive definite. In order to verify that a matrix is positive definite,
the user can check that the number of negative pivots or inertia (INFOG(12)) is 0 on exit from the
factorization phase. Another approach to suppress numerical pivoting on symmetric matrices which is
compatible with the use of ScaLAPACK (see ICNTL(13)) consists in setting SYM=2 (general symmetric
matrices) with the relative threshold for pivoting CNTL(1) set to 0 (recommended strategy).
27
Default value: 0 (assembled format)
Related parameters: ICNTL(18)
Incompatibility: If the matrix is in elemental format (ICNTL(5)=1), the BLR feature
(ICNTL(35)≥ 1) is currently not available, see error -800.
Remarks: NNZ and NNZ loc are 64-bit integers (NZ and NZ loc are 32-bit integers kept for
backward compatibility and will be obsolete in future releases).
Parallel analysis (ICNTL(28) =2) is only available for matrices in assembled format and, thus, an
error will be raised for elemental matrices (ICNTL(5)=1).
Elemental matrices can be input only centralized on the host (ICNTL(18)=0).
ICNTL(18) defines the strategy for the distribution of the input matrix (only for assembled matrix).
Phase: accessed by the host during the analysis phase.
Possible values :
0 : the input matrix is centralized on the host (see Subsection 5.2.2.1).
1 : the user provides the structure of the matrix on the host at analysis, MUMPS returns a mapping
and the user should then provide the matrix entries distributed according to the mapping on
entry to the numerical factorization phase (see Subsection 5.2.2.2).
2 : the user provides the structure of the matrix on the host at analysis, and the distributed
matrix entries on all slave processors at factorization. Any distribution is allowed (see
Subsection 5.2.2.2).
3 : user directly provides the distributed matrix, pattern and entries, input both for analysis and
factorization (see Subsection 5.2.2.2).
Other values are treated as 0.
Default value: 0 (input matrix centralized on the host)
Related parameters: ICNTL(5)
Remarks: In case of distributed matrix, we recommand options 2 or 3. Among them, we
recommand option 3 which is easier to use. Option 1 is kept for backward compatibility but is
deprecated and we plan to suppress it in a future release.
a11
!
A= a22 a23
a31 a33
The following components of [SDCZ]MUMPS STRUC hold the matrix in centralized assembled
format:
mumps par%N (integer) is the order of the matrix A, N > 0. It must be set by the user on the host
before analysis. It is not altered by MUMPS.
mumps par%NNZ (integer(8)) is the number of nonzero entries being input, NNZ > 0. It must be set
by the user on the host before analysis. (Note that mumps par%NZ (integer) is also available for
backward compatibility.) It is not altered by MUMPS.
28
mumps par%IRN and mumps par%JCN (integer pointer arrays, dimension NNZ) contain the row and
column indices, respectively, for the matrix entries. They must be set by the user on the host before
analysis. They are not altered by MUMPS.
mumps par%A (real/complex pointer array, dimension NNZ) must be set by the user in such a way that
A(k) is the value of the entry in row IRN(k) and column JCN(k) of the matrix. It must be set before
the factorization phase (JOB=2) or before analysis (JOB=1) if a numerical preprocessing option is
requested (1 < ICNTL(6) < 7). A is not altered by MUMPS. Duplicate entries are summed and
all entries with IRN(k) or JCN(k) out-of-range are ignored.
Note that, in the case of symmetric matrices (SYM=1 or 2), only half of the matrix should be provided.
For example, only the lower triangular part of the matrix (including the diagonal) or only the upper
triangular part of the matrix (including the diagonal) can be provided in IRN, JCN, and A. More precisely,
a diagonal nonzero aii must be provided as A(k)=aii , IRN(k)=JCN(k)=i, and a pair of off-diagonal
nonzeros aij = aji must be provided either as A(k)=aij and IRN(k)=i, JCN(k)=j or vice-versa. Again,
out-of-range entries are ignored and duplicate entries are summed. In particular, this means that if both
aij and aji are provided, they will be summed.
29
mumps par%A loc (real/complex pointer array, dimension NNZ loc) must be defined before the
factorization phase (JOB=2) on all processors if PAR = 1, and on all processors except the host
if PAR = 0. The user must set A loc(k) to the value in row IRN loc(k) and column JCN loc(k).
mumps par%MAPPING (integer array, dimension NNZ) is returned by MUMPS on the host after
the analysis phase as an indication of a preferred mapping if ICNTL(18) = 1. In that case,
MAPPING(i) = IPROC means that entry IRN(i), JCN(i) should be provided on processor with
rank IPROC in the MUMPS communicator. Remark that MAPPING is allocated by MUMPS, and not
by the user. It will be freed during a call to MUMPS with JOB = -2. This parameter and the option
ICNTL(18) = 1 are kept for backward compatibility with previous versions but are deprecated
and will be suppressed in a future release.
We recommend the use of options ICNTL(18)= 2 or 3 because they are the most flexible options.
Furthermore, these options (2 or 3) are in general as efficient as the more complicated (and deprecated)
option ICNTL(18)=1. Among those two options, ICNTL(18)=3 is the simplest and most natural one
to use. ICNTL(18)=2 should only be used if the application has a centralized version of the entire
matrix already available on the host processor.
Again, out-of-range entries are ignored and duplicate entries are summed. In particular, if an entry
aij is provided both as (IRN loc(k1), JCN loc(k1), A loc(k1)) on a process P1 and as (IRN loc(k2),
JCN loc(k2), A loc(k2)) on a process P2, the corresponding numerical value considered for aij is the
sum of A loc(k1) on P1 and A loc(k2) on P2. This also means that it is possible to only perform local
assemblies inside each MPI process and that entries that are common to several MPI processes (which
may typically correspond to interface variables) will be summed internally by the MUMPS package without
the user having to take care of communications to assemble those entries.
1 −1 2 3 3 2 −1 3
! !
A1 = 2 2 1 1 , A2 = 4 1 2 −1
3 1 1 1 5 3 2 1
−1 2 3 0 0
2 1 1 0 0
A= 1 1 3 −1 3 = A1 + A2
0 0 1 2 −1
0 0 3 2 1
PN ELT
• N=5 NELT=2 NVAR=6 A= i=1
Ai
ELTPTR [1:NELT+1] = 147
• ELTVAR [1:NVAR] = 123345
A ELT [1:NVAL] = -1 2 1 2 1 1 3 1 1 2 1 3 -1 2 2 3 -1 1
• Remarks:
– NVAR = ELTPTR(NELT+1)-1
– Order of element i: Si = ELT P T R(i + 1) − ELT P T R(i)
P 2 P
– NVAL = Si (unsymmetric) or Si (Si + 1)/2 (symmetric),
– storage of elements in ELTVAL: by columns
In the current release of the package, a matrix in elemental format must be input centrally on the host
(ICNTL(5)=1 and ICNTL(18)=0). The distributed elemental format is not currently available.
mumps par%N (integer), mumps par%NELT (integer), mumps par%ELTPTR (integer pointer
array, dimension NELT+1), mumps par%ELTVAR (integer pointer array, dimension ELTPTR(NELT+1)
– 1), and mumps par%A ELT (real/complex pointer array) hold the matrix in elemental format. The
following components of the MUMPS STRUC hold the matrix in elemental format:
30
mumps par%N (integer) is the order of the matrix A, N > 0. It is not altered by MUMPS.
mumps par%NELT (integer) is the number of elements being input, NELT > 0. It is not altered by
MUMPS.
mumps par%ELTPTR (integer pointer array, dimension NELT+1) is such that ELTPTR(j) points to
the position in ELTVAR of the first variable in element j, and ELTPTR(NELT+1) must be set to
the position after the last variable of the last element. Note that ELTPTR(1) should be equal to 1.
ELPTR is not altered by MUMPS.
mumps par%ELTVAR (integer pointer array, dimension ELTPTR(NELT+1) – 1) must be set to the lists
of variables of the elements. It is not altered by MUMPS. The variables for element j are stored in
positions ELTPTR(j), . . . , ELTPTR(j+1)–1. Out-of-range variables are ignored.
mumps par%A ELT (real/complex pointer array) If Np denotes ELTPTR(p+1)–ELTPTR(p), then the
values for element j are stored in positions Kj + 1, . . . , Kj + Lj , where
Pj−1
→ Kj = p=1
Np 2 , and Lj = Nj 2 in the unsymmetric case (SYM = 0)
Pj−1
→ Kj = p=1 (Np · (Np + 1))/2, and Lj = (Nj · (Nj + 1))/2 in the symmetric case (SYM=1
or 2). Only the lower triangular part is stored.
Values within each element are stored column-wise. Values corresponding to out-of-range variables
are ignored and values corresponding to duplicate variables within an element are summed. A ELT
is not accessed at the analysis phase (JOB = 1). Note that, although the elemental matrix may be
symmetric or unsymmetric in value, its structure is always symmetric.
The components N, NELT, ELTPTR, and ELTVAR describe the pattern of the matrix and must be set
by the user before the analysis phase (JOB=1) and should be passed unchanged when later calling the
factorization (JOB=2) and solve (JOB=3) phases. Component A ELT must be set before the factorization
phase (JOB=2).
31
For a distributed matrix, the binary format is the same except that NNZ loc, IRN loc,
JCN loc, A loc are printed on each MPI process (instead of NNZ, IRN, JCN, and A as
above).
A small text file which contains the corresponding matrix-market header and some comments
describing the binary file is also written. The name of this text file is constructed by replacing
“.bin” by “.header” in the string WRITE PROBLEM.
• Otherwise, that is, if the last four characters of the string WRITE PROBLEM different from
“.bin”, then the matrix (matrices in case of distributed format – ICNTL(18)=3) is written in
matrix-market format.
Furthermore, if a dense and centralized right-hand side (see Subsection 5.14.1) is provided on
the host before the analysis phase, it is also written in a file whose name is the matrix file name
(WRITE PROBLEM) appended by “.rhs”. In case a binary format is used, the file for the RHS
contains N×NRHS scalars, of size 4, 8, 8, 16 for s, d, c, z arithmetics, respectively, and the header
file mentioned above also provides the values of N and NRHS.
Finally, in case of analysis by blocks (see ICNTL(15)), BLKPTR and/or BLKVAR are also written
if they are provided. If BLKPTR is provided, NBLK and BLKPTR(1:NBLK+1) are written (one
integer per line, in text form) in a file whose name is the matrix file name appended by “.blkptr”.
If BLKVAR is provided, BLKVAR(1:N) is also written (one integer per line, in text form) in a file
whose name is the matrix file name appended by “.blkvar”.
32
Other values are treated as 0. On output from the analysis phase, INFOG(23) holds the value of
ICNTL(6) that was effectively used.
Default value: 7 (automatic choice done by the package)
Incompatibility: If the matrix is symmetric positive definite (SYM = 1), or in elemental format
(ICNTL(5)=1), or the parallel analysis is requested (ICNTL(28)=2) or the ordering is provided
by the user (ICNTL(7)=1), or the Schur option (ICNTL(19) = 1, 2, or 3) is required, or the
matrix is initially distributed (ICNTL(18)=1,2,3), then ICNTL(6) is treated as 0.
Related parameters: ICNTL(8), ICNTL(12)
Remarks: On assembled centralized unsymmetric matrices (ICNTL(5)=0, ICNTL(18)=0, SYM
= 0), if ICNTL(6)=1, 2, 3, 4, 5, 6 a column permutation (based on weighted bipartite matching
algorithms described in [24, 25]) is applied to the original matrix to get a zero-free diagonal.
The user is advised to set ICNTL(6) to a nonzero value when the matrix is very unsymmetric
in structure. On output to the analysis phase, when the column permutation is not the identity, the
pointer UNS PERM (internal data valid until a call to MUMPS with JOB=-2) provides access to the
permutation on the host processor (see Subsection 5.3.1). Otherwise, the pointer is not associated.
The column permutation is such that entry ai,perm(i) is on the diagonal of the permuted matrix.
On general assembled centralized symmetric matrices (ICNTL(5)=0, ICNTL(18)=0, SYM =
2), if ICNTL(6)=1, 2, 3, 4, 5, 6, the column permutation is internally used to determine a set of
recommended 1×1 and 2×2 pivots (see [26] and the description of ICNTL(12) in Subsection 6.1
for more details). We advise either to let MUMPS select the strategy (ICNTL(6) = 7) or to set
ICNTL(6) = 5 if the user knows that the matrix is for example an augmented system (which is a
system with a large zero diagonal block). On output from the analysis the pointer UNS PERM is not
associated.
33
Other values are treated as 77.
Default value: 77 (automatic choice done by the package)
Related parameters: ICNTL(6), ICNTL(12)
Remarks: If ICNTL(8) = 77, then an automatic choice of the scaling option may be performed,
either during the analysis or the factorization. The effective value used for ICNTL(8) is returned in
INFOG(33). If the scaling arrays are computed during the analysis, then they are ready to be used
by the factorization phase. Note that scalings can be efficiently computed during analysis when
requested (see ICNTL(6) and ICNTL(12)).
If the input matrix is real and symmetric with SYM= 1 then automatic choice is no scaling. However,
the user may want to scale the matrix when BLR feature is activated (see ICNTL(35)).
Incompatibility: If the input matrix is symmetric (SYM= 1 or 2), then only options –2, –1, 0, 1, 7,
8 and 77 are allowed and other options are treated as 0. If the input matrix is in elemental format
(ICNTL(5) = 1), then only options –1 and 0 are allowed and other options are treated as 0. If
the input matrix is assembled and distributed (ICNTL(18)=1,2,3 and ICNTL(5) = 0), then only
options 7, 8 and 77 are allowed, otherwise no scaling is applied.
If block format is exploited (ICNTL(15)̸= 0) then scaling is not applied.
34
1: sequential computation. In this case the ordering method is set by ICNTL(7) and the
ICNTL(29) parameter is meaningless (choice of the parallel ordering tool).
2: parallel computation. A parallel ordering and parallel symbolic factorization is requested by
the user. For that, one of the parallel ordering tools (or all) must be available, and the matrix
should not be too small. The ordering method is set by ICNTL(29) and the ICNTL(7)
parameter is meaningless.
Any other values will be treated as 0.
Default value: 0 (automatic choice)
Incompatibility: The parallel analysis is not available when the Schur complement feature is
requested (ICNTL(19)=1,2 or 3), when a maximum transversal is requested on the input matrix
(i.e., ICNTL(6)=1, 2, 3, 4, 5 or 6) or when the input matrix is an unassembled matrices
(ICNTL(5)=1). When the number of processes available for parallel analysis is equal to 1,
or when the initial matrix is extremely small, a sequential analysis is indeed performed, even if
ICNTL(28)=2 (no error is raised in that case).
Related parameters: ICNTL(7), ICNTL(29), INFOG(32)
Remarks: Performing the analysis in parallel (ICNTL(28)= 2) will enable saving both time and
memory. Note that then the quality of the ordering depends on the number of processors used.
The number of processors for parallel analysis may be smaller than the number of MPI processes
available for MUMPS, in order to satisfy internal constraints of parallel ordering tools. On output,
INFOG(32) is set to the type of analysis (sequential or parallel) that was effectively chosen
internally.
ICNTL(7) computes a symmetric permutation (ordering) to determine the pivot order to be used for the
factorization (see Subsection 3.2)
Phase: accessed by the host and only during the sequential analysis phase (ICNTL(28) = 1).
Possible variables/arrays involved: PERM IN, SYM PERM
Possible values :
0 : Approximate Minimum Degree (AMD) [7] is used,
1 : The pivot order should be set by the user in PERM IN, on the host processor. In that case,
PERM IN must be allocated on the host by the user and PERM IN(i), (i=1, ... N) must hold the
position of variable i in the pivot order. In other words, row/column i in the original matrix
corresponds to row/column PERM IN(i) in the reordered matrix.
2 : Approximate Minimum Fill (AMF) is used,
3 : SCOTCH5 [42] package is used if previously installed by the user otherwise treated as 7.
4 : PORD6 [46] is used if previously installed by the user otherwise treated as 7.
5 : the Metis7 [34] package is used if previously installed by the user otherwise treated as 7.
It is possible to modify some components of the internal options array of Metis (see
Metis manual) in order to fine-tune and modify various aspects of the internal algorithms
used by Metis. This can be done by setting some elements (see the file metis.h in the
Metis installation to check the position of each option in the array) of the MUMPS array
mumps par%METIS OPTIONS after the MUMPS initialization phase (JOB=-1) and before
the analysis phase. Note that the METIS OPTIONS array of the MUMPS structure is of size 40,
which is large enough for both Metis 4.x and Metis 5.x verions. It is passed by MUMPS as the
argument “options” to the METIS ordering routine METIS NodeND (METIS NodeWND is
sometimes also called in case MUMPS was installed with Metis 4.x) during the analysis phase.
6 : Approximate Minimum Degree with automatic quasi-dense row detection (QAMD) is used.
5 See http://gforge.inria.fr/projects/scotch/ to obtain a copy.
6 Distributed
within MUMPS by permission of J. Schulze (University of Paderborn).
7 See http://glaros.dtc.umn.edu/gkhome/metis/metis/overview to obtain a copy.
35
7 : Automatic choice by the software during analysis phase. This choice will depend on the
ordering packages made available, on the matrix (type and size), and on the number of
processors.
Other values are treated as 7.
Default value: 7 (automatic choice)
Incompatibility: ICNTL(7) is meaningless if the parallel analysis is chosen (ICNTL(28)=2).
Related parameters: ICNTL(28)
Remarks: Even when the ordering is provided by the user, the analysis must be performed before
numerical factorization.
For assembled matrices (centralized or distributed) (ICNTL(5)=0) all the options are available.
For elemental matrices (ICNTL(5)=1), only options 0, 1, 5 and 7 are available, with option 7
leading to an automatic choice between AMD and Metis (options 0 or 5); other values are treated
as 7.
If the user asks for a Schur complement matrix (ICNTL(19)= 1, 2, 3) and
– the matrix is assembled (ICNTL(5)=0) then only options 0, 1, 5 and 7 are currently available.
Other options are treated as 7.
– the matrix is elemental (ICNTL(5)=1) only options 0, 1 and 7 are currently available. Other
options are treated as 7 which will (currently) be treated as 0 (AMD).
– in both cases (assembled or elemental matrix) if the pivot order is given by the user
(ICNTL(7)=1) then the following property should hold: PERM IN(LISTVAR SCHUR(i)) =
N-SIZE SCHUR+i, for i=1,SIZE SCHUR.
For matrices with relatively dense rows, we highly recommend option 6 which may significantly
reduce the time for analysis.
On output, the pointer array SYM PERM provides access, on the host processor, to the symmetric
permutation that is effectively computed during the analysis phase by the MUMPS package, and
INFOG(7) to the ordering option that was effectively chosen. In fact, the option corresponding to
ICNTL(7) may be forced by MUMPS when for example the ordering option chosen by the user is
not compatible with the value of ICNTL(12) or the necessary package is not installed.
SYM PERM(i), i=1, ... N, holds the position of variable i in the pivot order. In other words,
row/column i in the original matrix corresponds to row/column SYM PERM(i) in the reordered
matrix. See also Subsection 5.4.1.
ICNTL(29) defines the parallel ordering tool to be used to compute the fill-in reducing permutation.
Phase: accessed by host process only during the parallel analysis phase (ICNTL(28)=2).
Possible variables/arrays involved: SYM PERM
Possible values :
0: automatic choice.
1: PT-SCOTCH is used to reorder the input matrix, if available.
2: ParMetis is used to reorder the input matrix, if available.
Other values are treated as 0.
Default value: 0 (automatic choice)
Related parameters: ICNTL(28)
Remarks: On output, the pointer array SYM PERM provides access, on the host processor, to the
symmetric permutation that is effectively considered during the analysis phase, and INFOG(7)
to the ordering option that was effectively used. SYM PERM(i), (i=1, ... N) holds the position of
variable i in the pivot order, see Subsection 5.4.1 for a full description.
36
5.4.1 Symmetric permutation vector (ICNTL(7) and ICNTL(29))
When the ordering is not provided by the user, the choice of the ordering strategy is controlled by
ICNTL(7) in case of sequential analysis (ICNTL(28)=1), and by ICNTL(29) in case of parallel
analysis (ICNTL(28)=2). In all cases (serial or parallel analysis, ordering computed internally or
provided by the user, Schur complement, assembled or elemental matrix), the symmetric permutation
of the variables that MUMPS relies on is returned to the user in the mumps par%SYM PERM array.
mumps par%SYM PERM (integer pointer array, dimension N) is allocated internally and returned
on the host processor on output to the analysis phase. It contains the permutation that was
effectively computed during the analysis phase and that will serve as a basis for the numerical
factorization. It is such that SYM PERM(i) holds the position of variable i in the pivot order.
For example, SYM PERM(12)=2 means that variable 12 in the original matrix is the second
variable to be eliminated in the pivot order. In case a Schur complement was requested (see
ICNTL(19)), the returned permutation also includes the variables from the Schur complement,
so that: SYM PERM(LISTVAR SCHUR(i))=N-SIZE SCHUR+i, for 1 ≤ i ≤SIZE SCHUR (see
also Subsection 5.15).
37
mumps par%BLKVAR (integer pointer array, dimension N). If BLKVAR is not provided,
it is internally treated as the identity BLKVAR(i)=i, (i=1, ..., N). Otherwise,
BLKVAR(BLKPTR(iblk):BLKPTR(iblk+1)-1), (iblk=1, NBLK) holds the variables associated
to block iblk.
ICNTL(15) exploits compression of the input matrix resulting from a block format
Phase: accessed by the host process during the analysis phase.
Possible variables/arrays involved: NBLK, BLKPTR, BLKVAR
Possible values :
0: no compression
-k: all blocks are of fixed size k> 0. N (the order of the matrix A) must be a multiple of k. NBLK
and BLKPTR should not be provided by the user and will be computed internally. Concerning
BLKVAR, please refer to the Remarks below.
1: block format provided by the user. NBLK need be provided on the host by the user and
holds the number of blocks. BLKPTR(1:NBLK+1) must be provided by the user on the host.
Concerning BLKVAR, please refer to the Remarks below.
Any other values will be treated as 0.
Default value: 0
Remarks: If BLKVAR is not provided by the user then BLKVAR is internally treated as the identity
(BLKVAR(i)=i, (i=1, ..., N)). It corresponds to contiguous variables in blocks.
– If ICNTL(15)=1 then BLKVAR(BLKPTR(iblk):BLKPTR(iblk+1)-1), (iblk=1, NBLK) holds
the variables associated to block iblk.
– If ICNTL(15) < 0 then BLKPTR need not be provided by the user and NBLK = N/k where
N must be a multiple of k.
In case the pivot order is provided on entry by the user at the analysis phase (ICNTL(7)= 1) then
PERM IN should be compatible with the compression. This means that PERM IN, of size N, should
result from an expansion of a pivot order on the compressed matrix, i.e., variables in a block should
be consecutive in the pivot order.
Incompatibility: With element entry format ICNTL(5)= 1, with Schur complement
ICNTL(19)̸= 0 and with permutation to a zero-free diagonal and related compressed/constrained
ordering for symmetric matrices (ICNTL(6)̸= 0, ICNTL(12)̸= 1).
38
Let x be the initial solution of Ax = b
Compute residual r = b − Ax
Compute the associated backward errors ω1 and ω2 (see Subsection 3.3.2)
i=0
while ω1 + ω2 ≥ α and convergence is not too slow and i ≤ IR steps do
Solve A∆x = r using the computed factorization
x = x + ∆x
r = b − Ax
Compute backward errors ω1 and ω2
i=i+1
end while
Algorithm 2: Iterative refinement. At each step, backward errors are computed and compared to
α, the stopping criterion (see CNTL(2)). The number of steps performed is limited to IR steps (=
ICNTL(10)).
Possible values :
< 0 : Fixed number of steps of iterative refinement. No stopping criterion is used.
0 : No iterative refinement.
> 0 : Maximum number of steps of iterative refinement. A stopping criterion is used, therefore a
test for convergence is done at each step of the iterative refinement algorithm.
Default value: 0 (no iterative refinement)
Related parameters: CNTL(2)
Incompatibility: If ICNTL(21)=1 (solution kept distributed) or if ICNTL(32)=1 (forward
elimination during factorization), or if NRHS>1 (multiple right hand sides), or if ICNTL(20)=10
or 11 (distributed right hand sides), then iterative refinement is disabled and ICNTL(10) is treated
as 0.
Remarks: Note that if ICNTL(10)< 0, |ICNTL(10)| steps of iterative refinement are performed,
without any test of convergence (see Algorithm 3). This means that the iterative refinement may
diverge, that is the solution instead of being improved may be worse from an accuracy point of view.
But it has been shown [20] that with only two to three steps of iterative refinement the solution can
often be significantly improved. So if the convergence test should not be done we recommend to
set ICNTL(10) to -2 or -3.
Note also that it is not necessary to activate the error analysis option (ICNTL(11)= 1,2) to be
able to run the iterative refinement with stopping criterium (ICNTL(10) > 0). However, since
the backward errors ω1 and ω2 have been computed, they are still returned in RINFOG(7) and
RINFOG(8), respectively.
It must be noticed that iterative refinement with stopping criterium (ICNTL(10) > 0) will stop
when
1. either the requested accuracy is reached (ω1 + ω2 < CNTL(2))
39
2. or when the convergence rate is too slow (ω1 + ω2 does not decrease by at least a factor of 5)
3. or when exactly ICNTL(10) steps have been performed.
In the first two cases the number of iterative refinement steps (INFOG(15)) may be lower than
ICNTL(10).
40
– If ICNTL(11)= 1, then in addition to the above statistics also an estimate for the error in
the solution in RINFOG(9), and condition numbers for the linear system in RINFOG(10) and
RINFOG(11) are also returned.
If performance is critical, ICNTL(11) should be set to 0. If both performance is critical and statistics
are requested, then ICNTL(11) should be set to 2. If ICNTL(11)=1, the error analysis is very costly
(typically significantly more costly than the solve phase itself).
41
mumps par%OOC PREFIX (string) can be provided by the user (on each processor) to prefix the out-
of-core files.
Note that it is also possible to provide the files prefix through environment variables.
If OOC PREFIX is not defined, then MUMPS checks for the environment variable
MUMPS OOC PREFIX. If neither OOC PREFIX nor MUMPS OOC PREFIX are defined, then
MUMPS chooses the file names automatically.
42
• Low-rank CB only (ICNTL(35)=0,3 and ICNTL(37)=1) (estimates depends on both
ICNTL(14) and ICNTL(39)):
– Size in MegaBytes of the total working space locally requested by each processor:
INFO(37) for in-core strategy;
INFO(38) for out-of-core strategy.
Maximum and sum over all processors:
INFOG(44) and INFOG(45), respectively for in-core strategy;
INFOG(46) and INFOG(47), respectively for out-of-core strategy;
– Size of the main real/complex workarray S:
INFO(36) for in-core strategy;
INFO(33) for out-of-core strategy
(negative value corresponds to millions of real/complex entries needed in this workarray).
• Low-rank factors and CB (ICNTL(35)=1,2 and ICNTL(37)=1) (estimates depends on
ICNTL(14), ICNTL(38) and ICNTL(39)):
– Size in MegaBytes of the total working space locally requested by each processor:
INFO(34) for in-core strategy;
INFO(35) for out-of-core strategy.
Maximum and sum over all processors:
INFOG(40) and INFOG(41), respectively for in-core strategy;
INFOG(42) and INFOG(43), respectively for out-of-core strategy;
– Size of the main real/complex workarray S:
INFO(32) for in-core strategy;
INFO(33) for out-of-core strategy
(negative value corresponds to millions of real/complex entries needed in this workarray).
As a first general approach, we advise the user to rely on the estimations provided during the analysis
phase. If the user wants to/must increase the allocated workspace (typically, because of numerical pivoting
that leads to extra storage, or previous call to MUMPS that failed because of a lack of allocated memory),
we describe in the following how the size of the workspace can be controlled.
• The user can modify the value of the memory relaxation parameter, ICNTL(14), that is designed
to control the increase with respect to the estimations performed during analysis, in the size of all
(integer and real/complex) workspace allocated during the numerical phase.
• The user can explicitly control the memory used by the package by providing in ICNTL(23) the
size of the total memory that is allowed to be used internally.
We provide the definitions of ICNTL(14) and ICNTL(23) below:
ICNTL(14) corresponds to the percentage increase in the estimated working space.
Phase: accessed by the host both during the analysis and the factorization phases.
Default value: between 20 and 35 (which corresponds to at most 35 % increase) and depends on
the number of MPI processes. It is set to 5 % with SYM=1 and one MPI process.
Related parameters: ICNTL(23)
Remarks: When significant extra fill-in is caused by numerical pivoting, increasing ICNTL(14)
may help.
ICNTL(23) corresponds to the maximum size of the working memory in MegaBytes that MUMPS can
allocate per working processor. It covers all internal integer and real (complex in the complex version)
workspace allocated by MUMPS.
Phase: accessed by all processes at the beginning of the factorization phase. If the value is greater
than 0 only on the host, then the value on the host is used for all processes, otherwise ICNTL(23)
is interpreted locally on each MPI process.
Possible values :
0 : each processor will allocate workspace based on the estimates computed during the analysis
>0 : maximum size of the working memory in MegaBytes per working processor to be allocated
Default value: 0
43
Related parameters: ICNTL(14), ICNTL(38), ICNTL(39)
Remarks: If ICNTL(23) is greater than 0 then MUMPS automatically computes the size of
the internal workarrays such that the storage for all MUMPS internal data does not exceed
ICNTL(23). The relaxation ICNTL(14) is first applied to the internal integer workarray IS and to
communication and I/O buffers; the remaining available space is then shared between the main (and
often most critical) real/complex internal workarray S holding the factors, the stack of contribution
blocks and dynamic workarrays that are used either to expand the S array or to store low-rank
dynamic structures.
Lower bounds for ICNTL(23), in case ICNTL(23) is provided only on the host:
– In case of full-rank factors only (ICNTL(35)=0 or 3), a lower bound for ICNTL(23) (if ICNTL(14),
has not been modified since the analysis) is given by INFOG(16) if the factorization is in-core
(ICNTL(22)=0), and by INFOG(26) if the factorization is out-of-core (ICNTL(22)=1).
– In case of low-rank factors (ICNTL(35)=1 or 2) only (ICNTL(37)=0), a lower bound for ICNTL(23)
(if ICNTL(14), has not been modified since the analysis and ICNTL(38) is a good approximation
of the average compression rate of the factors) is given by INFOG(36) if the factorization is in-core
(ICNTL(22)=0), and by INFOG(38) if the factorization is out-of-core (ICNTL(22)=1).
– In case of low-rank contribution blocks (CB) only (ICNTL(35)=0,3 and ICNTL(37)=1), a lower bound
for ICNTL(23) (if ICNTL(14), has not been modified since the analysis and ICNTL(39) is a good
approximation of the average compression rate of the CB) is given by INFOG(44) if the factorization is
in-core (ICNTL(22)=0), and by INFOG(46) if the factorization is out-of-core (ICNTL(22)=1).
– In case of low-rank factors and contribution blocks (ICNTL(35)=1,2 and ICNTL(37)=1), a lower
bound for ICNTL(23) (if ICNTL(14), has not been modified since the analysis, and ICNTL(38) and
ICNTL(39) are good approximations of the average compression rate of respectively the factors and the
CB) is given by INFOG(40) if the factorization is in-core (ICNTL(22)=0), and by INFOG(42) if the
factorization is out-of-core (ICNTL(22)=1).
Lower bounds for ICNTL(23), in case ICNTL(23) is provided locally to each MPI process:
– Full-rank factors only (ICNTL(35)=0 or 3) ⇒ INFO(15) if the factorization is in-core
(ICNTL(22)=0), INFO(17) if the factorization is out-of-core (ICNTL(22)=1).
– Low-rank factors (ICNTL(35)=1 or 2) only (ICNTL(37)=0) ⇒ INFO(30) if the factorization is
in-core (ICNTL(22)=0), INFO(31) if the factorization is out-of-core (ICNTL(22)=1).
– Low-rank factors and contribution blocks (ICNTL(35)=1,2 and ICNTL(37)=1) ⇒ is given by
INFO(34) if the factorization is in-core (ICNTL(22)=0), INFO(35) if the factorization is out-of-
core (ICNTL(22)=1).
The above lower bounds include memory for the real/complex internal workarray S holding the
factors and stack of contribution blocks. In case WK USER is provided, the above quantities should
be diminished by the estimated memory for S/WK USER. This estimated memory can be obtained
from INFO(8), INFO(9), or INFO(20) (depending on MUMPS settings) by taking their absolute
value, if negative, or by dividing them by 106 , if positive. See also the paragraph Recommended
values of LWK USER below.
If ICNTL(23) is left to its default value 0 then MUMPS will allocate for the factorization phase
a workspace based on the estimates computed during the analysis if ICNTL(14) has not been
modified since analysis, or larger if ICNTL(14) was increased. Note that even with full-rank
factorization, these estimates are only accurate in the sequential version of MUMPS but they can
be inaccurate in the parallel case, especially for the out-of-core version. Therefore, in parallel, we
recommend to use ICNTL(23) and provide a value larger than the provided estimations.
Another possibility is that the user provides the real/complex workarray instead of using the internal
main real/complex workarray S. Note that in case factors of large frontal matrices are stored in low-rank
form (ICNTL(35)=2), they will use a separate dynamic storage allocation, but providing a workarray to
store frontal matrices, contribution blocks and factors that stay full-rank fronts is still possible. A pointer
array that points to that workspace must be provided. In this case, the value of ICNTL(23) excludes the
workspace corresponding to the user workspace.
44
is 0. At the beginning of the numerical phases, if the user sets LWK USER to a nonzero value
then LWK USER will define the size of the pointer array WK USER. If negative, -LWK USER
is a lower bound for the number of entries in millions of the pointer array WK USER so that
abs(LWK USER) × 106 ≤ size(WK USER) must hold.
Recommended values of LWK USER (otherwise an error with code -9 may occur):
• In case of full-rank factors (ICNTL(35)=0,3), we recommend the user to set LWK USER to a value larger
than than INFO(8) (in-core factorization) or INFO(20) (out-of-core factorization).
• In case of low-rank factors (ICNTL(35)=1 or 2) only (ICNTL(37)=0), we recommend the user to
set LWK USER to a value larger than INFO(29) (in-core factorization) or INFO(20) (out-of-core
factorization).
• In case of low-rank contribution blocks only (ICNTL(35)=0,3 and ICNTL(37)=1), we recommend the
user to set LWK USER to a value larger than INFO(36) (in-core factorization) or INFO(33) (out-of-
core factorization).
• In case of low-rank factors and contribution blocks (ICNTL(35)=1,2 and ICNTL(37)=1), we
recommend the user to set LWK USER to a value larger than INFO(32) (in-core factorization) or
INFO(33) (out-of-core factorization).
Moreover, if the factorization is in-core, the value of LWK USER must not be modified between
factorization and subsequent solution phases.
If the numerical phases are out-of-core (ICNTL(22)=1), we recommend LWK USER to be
larger than INFO(20). In this case, the user can reduce the value for LWK USER between the
factorization phase and the solve phase.
mumps par%WK USER is a real/complex pointer array that can point to the workspace provided by
the user. It is only accessed by MUMPS when LWK USER has been set by the user to a non-zero
value. In that case, MUMPS will avoid the internal allocation of the main real/complex workarray S
and use WK USER instead.
Note that the type of WK USER should follow the arithmetic: single precision for SMUMPS, double
precision for DMUMPS, single complex for CMUMPS, and double complex for ZMUMPS.
If the factorization is in-core (ICNTL(22)=0), then WK USER should not be modified between
factorization and solution phases of MUMPS.
45
Remarks: CNTL(3) is used to compute the threshold to decide if a pivot row is “null”.
Note that when ScaLAPACK is applied on the root node (see ICNTL(13) = 0), then exact null
pivots on the root will stop the factorization (INFO(1)=-10) while if tiny pivots are present on
the root node the ScaLAPACK routine will factorize the root matrix. Computing the root node
factorization sequentially (this can be forced by setting ICNTL(13) to 1) will help with the correct
detection of null pivots but may degrade performance.
Related data:
mumps par%INFOG(28) (integer): INFOG(28) is set to the number of null pivot rows detected
during the factorization step. .
mumps par%PIVNUL LIST (integer array, dimension N):
If INFOG(28) ̸= 0 then PIVNUL LIST(1:INFOG(28)) will hold, on the host, the row indices
corresponding to the null pivot rows (ICNTL(24)= 1) .
46
5.12 Computation of the determinant (ICNTL(33))
The user interested in computing the determinant of the matrix A can use the control parameter
ICNTL(33). See Subsection 3.15 for details on how it will be computed.
ICNTL(33) computes the determinant of the input matrix.
Phase: accessed by the host during the factorization phase.
Possible values :
0 : the determinant of the input matrix is not computed.
̸= 0: computes the determinant of the input matrix. The determinant is obtained by computing
(a + ib) × 2c where a =RINFOG(12), b =RINFOG(13) and c = INFOG(34). In real
arithmetic b=RINFOG(13) is equal to 0.
Default value: 0 (determinant is not computed)
Related parameters: ICNTL(31)
Remarks: In case a Schur complement was requested (see ICNTL(19)), elements of the Schur
complement are excluded from the computation of the determinant so that the determinant is that
one of matrix A1,1 (using notations of Subsection 3.17).
Although we recommend to compute the determinant on non-singular matrices, null pivot rows
(ICNTL(24)) and static pivots (CNTL(4)) are excluded from the determinant so that a non-zero
determinant is still returned on singular or near-singular matrices. This determinant is then not
unique and will depend on which equations were excluded.
Furthermore, we recommend to switch off scaling (ICNTL(8)) in such cases. If not (ICNTL(8)
̸= 0), we describe in the following the current behaviour of the package:
– if static pivoting (CNTL(4)) is activated: all entries of the scaling arrays ROWSCA and
COLSCA are currently taken into account in the computation of the determinant.
– if the null pivot row detection (ICNTL(24)) is activated, then entries of ROWSCA and
COLSCA corresponding to pivots in PIVNUL LIST are excluded from the determinant so
that
* for symmetric matrices (SYM=1 or 2), the returned determinant correctly corresponds to
the matrix excluding rows and columns of PIVNUL LIST.
* for unsymmetric matrices (SYM=0), scaling may perturb the value of the determinant in
case off-diagonal pivoting has occurred (INFOG(12)̸=0).
Note that if the user is interested in computing only the determinant, we recommend to discard the
factors during factorization ICNTL(31).
47
Note that for unsymmetric matrices, if the forward elimination is performed during factorization, the
U factor may be discarded (see ICNTL(31)). In the symmetric LDLT case, the L factor must always
be kept in order to be able to perform the backward substitution, i.e., solve LT x = y.
ICNTL(32) performs the forward elimination of the right-hand sides (Equation (3)) during the
factorization (JOB=2).
Phase: accessed by the host during the analysis phase.
Possible variables/arrays involved: RHS, NRHS, LRHS, and possibly REDRHS, LREDRHS when
ICNTL(26)=1
Possible values :
0: standard factorization not involving right-hand sides.
1: forward elimination (Equation (3)) of the right-hand side vectors is performed during
factorization (JOB=2). The solve phase (JOB=3) will then only involve backward substitution
(Equation (4)).
Other values are treated as 0.
Default value: 0 (standard factorization)
Related parameters: ICNTL(31),ICNTL(26)
Incompatibility: This option is incompatible with sparse right-hand sides (ICNTL(20)=1,2,3),
with the solution of the transposed system (ICNTL(9) ̸= 1), with the computation of entries of
the inverse (ICNTL(30)=1), and with BLR factorizations (ICNTL(35)=1,2,3). In such cases,
error -43 is raised.
Furthermore, iterative refinement (ICNTL(10)) and error analysis (ICNTL(11)) are disabled.
Finally, the current implementation imposes that all right-hand sides are processed in one pass
during the backward step. Therefore, the blocking size (ICNTL(27)) is ignored.
Remarks: The right-hand sides must be dense to use this functionality: RHS, NRHS, and LRHS
should be provided as described in Subsection 5.14.1. They should be provided at the beginning of
the factorization phase (JOB=2) rather than at the beginning of the solve phase (JOB=3).
For unsymmetric matrices, if the forward elimination is performed during factorization
(ICNTL(32) = 1), the L factor (see ICNTL(31)) may be discarded to save space. In fact,
the L factor will then always be discarded (even when ICNTL(31)=0) in the case of a full-rank
factorization (ICNTL(35)=0) or BLR factorization with full-rank solve (ICNTL(35)=3). In the
case of a BLR factorization with ICNTL(35)=1 or 2, only the L factor corresponding to full-rank
frontal matrices are discarded in the current version.
We advise to use this option only for a reasonably small number of dense right-hand side vectors
because of the additional associated storage required when this option is activated and the number
of right-hand sides is large compared to ICNTL(27).
48
Phase: accessed by the host during the solve phase and before a JOB=9 call.
Possible variables/arrays involved: RHS, NRHS, LRHS, IRHS SPARSE, RHS SPARSE,
IRHS PTR, NZ RHS, Nloc RHS, LRHS loc, IRHS loc and RHS loc.
Possible values :
0 : the right-hand side is in dense format in the structure component RHS, NRHS, LRHS (see
Subsection 5.14.1)
1,2,3 : the right-hand side is in sparse format in the structure components IRHS SPARSE,
RHS SPARSE, IRHS PTR and NZ RHS.
1 : The decision of exploiting sparsity of the right-hand side to accelerate the solution phase
is done automatically.
2 : Sparsity of the right-hand side is NOT exploited to improve solution phase.
3 : Sparsity of the right-hand side is exploited during solution phase.
10, 11 : When provided before the solve phase, values 10 and 11 have the same meaning. The
right-hand side is provided distributed in the structure components Nloc RHS, LRHS loc,
IRHS loc, RHS loc (see Subsection 5.14.3).
When provided before a JOB=9 call, values 10 and 11 indicate which distribution MUMPS
should build and return to the user in IRHS loc. In this case, the user should provide a
workarray IRHS loc on each MPI process of size at least INFO(23), where INFO(23) is
returned after the factorization phase.
10 : fill IRHS loc to minimize internal communications of right-hand side data during the
solve phase.
11 : fill IRHS loc to match the distribution of the solution (imposed by MUMPS), in case of
distributed solution (ICNTL(21)=1).
Values different from 0, 1, 2, 3, 10, 11 are treated as 0. For a sparse right-hand side, the
recommended value is 1.
Default value: 0 (dense right-hand sides)
Incompatibility: When NRHS > 1 (multiple right-hand side), the functionalities related to iterative
refinement ( ICNTL(10)) and error analysis (ICNTL(11)) are currently disabled.
With sparse right-hand sides (ICNTL(20)=1,2,3), the forward elimination during the factorization
(ICNTL(32)=1) is not currently available.
Remarks: For details on how to set the input parameters see Subsection 5.14.1, Subsection 5.14.2
and Subsection 5.14.3. Please note that duplicate entries in the sparse or distributed right-hand sides
are summed. A JOB=9 call can only be done after a successful factorization phase and its results
depends on the transpose option ICNTL(9), which should not be modified between a JOB=9 and
JOB=3 call. The distributed right-hand side feature enables the user to provide a sparse structured
RHS (i.e., a RHS with some empty rows).
49
5.14.1 Dense right-hand side (ICNTL(20)=0)
In this case, the matrix B of size n x nrhs is input in a one dimensional array of size lrhs x nrhs where
the leading dimension lrhs must be ≥ n, the dimension of the matrix A.
The following components of the MUMPS structure should be allocated by the user on the host before a
call to MUMPS with JOB= 3, 5, or 6 (call including the solve) if forward and backward elimination are both
computed during the solve (ICNTL(32)=0), or before a call to MUMPS with JOB= 2, 4 (call including
the factorization) if the forward elimination is computed during the factorization (ICNTL(32)=1). In
case of forward elimination performed during factorization, NRHS should also be provided before analysis
phase with JOB=1 and should be kept unchanged for the subsequent numerical phases.
mumps par%RHS (real/complex pointer array, dimension LRHS×NRHS) is a real (complex in the
complex version) array.
On entry RHS(i+(k-1)× LRHS) must hold the i-th component of the kth column of the right-hand
side matrix (1 ≤ k ≤ NRHS) of the equations being solved.
On exit, if the solution matrix has to be centralized (ICNTL(21)=0), then RHS(i+(k-1)×LRHS)
will hold the i-th component of the kth column of the solution matrix, 1 ≤ k ≤ NRHS.
Otherwise, if the solution matrix has to be distributed (ICNTL(21)=1), on exit to the package,
RHS will not contain any significant data for the user, even if it may have been modified.
mumps par%NRHS (integer) is an optional parameter that should be set by the user, on the host
processor, to the number of right-hand side vectors. Otherwise, the value 1 is assumed.
mumps par%LRHS (integer) is an optional parameter that should be set by the user, in the case where
NRHS is set by the user. In this case, it must hold the leading dimension of the array RHS and
should be greater than or equal to N (the matrix dimension). Otherwise, a single-column right-hand
side is assumed and LRHS is not accessed.
The following input parameters should be defined on the host only before a call to MUMPS including
the solve phase (JOB=3, 5, or 6):
mumps par%NZ RHS (integer) should hold the total number of non-zeros in all the right-hand side
vectors.
mumps par%NRHS (integer) is an optional parameter that should be set by the user on the host
processor, to the number of right-hand side vectors. Otherwise, the value 1 is assumed.
50
mumps par%RHS SPARSE (real/complex pointer array, dimension NZ RHS) should hold the
numerical values of the non-zero entries of each right-hand side vector. This means that the B
matrix should be input by columns.
mumps par%IRHS SPARSE (integer pointer array, dimension NZ RHS) should hold the indices of the
variables of the non-zero inputs of each right-hand side vector.
mumps par%IRHS PTR (integer pointer array, dimension NRHS+1) is such that the i-th right-hand
side vector is defined by its non-zero row indices IRHS SPARSE(IRHS PTR(i)...IRHS PTR(i+1)-
1) and the corresponding numerical values RHS SPARSE(IRHS PTR(i)...IRHS PTR(i+1)-1). Note
that IRHS PTR(1)=1 and IRHS PTR(NRHS+1)=NZ RHS+1.
mumps par%RHS (real/complex pointer array, dimension LRHS×NRHS) must be allocated by the user
on the host if the output solution should be centralized (ICNTL(21)=0). On exit from a call to
MUMPS it will hold the centralized solution (ICNTL(21) =0).
51
We now describe how to obtain special IRHS loc distributions from MUMPS, that can guide the
distribution of the right-hand sides provided by the user. This can be useful in case the user does not
have specific constraints on the RHS distribution. For this, MUMPS should be called with JOB=9 after a
successful factorization, with the following parameters.
mumps par%IRHS loc (integer pointer array, of dimension at least INFO(23), where INFO(23)
was computed during the factorization phase) must be allocated by the user after the factorization
phase. On exit from a call to MUMPS with JOB=9, IRHS loc(1:INFO(23)) contains the list of
global RHS indices on the local MPI process corresponding to the internal MUMPS distributions.
mumps par%ICNTL(20)=10 if the user asks for a distribution that avoids RHS communication during
MUMPS solve phase, or 11 if the user asks for a distribution that is identical to the distribution of
the solution (the distribution of the solution is imposed by MUMPS, see ICNTL(21)). Other values
are treated as 10.
mumps par%ICNTL(9) indicates if the distribution provided should target the standard solve phase
(AX = B) or the solve phase for the transposed system (AT X = B). Note that if ICNTL(9)
changes between a JOB=9 call and a JOB=3 call, the distribution computed with ICNTL(20)=10
will be the one corresponding to the description of ICNTL(20)=11 and vice versa.
In the symmetric case (SYM=1 or 2), and in the unsymmetric case (SYM=0) when pivoting and
maximum weighted matching options are switched off (CNTL(1)=0 and ICNTL(6)=0), all possible
values of ICNTL(9) and ICNTL(20) lead to the same distribution. Therefore, these control parameters
need not be set on entry to the call to MUMPS with JOB=9. In case an unsymmetric factorization (SYM=0)
did not use an unsymmetric permutation of the matrix (see ICNTL(6)) and there were no off-diagonal
pivots (INFOG(12)=0), the same distribution will be returned for all values of ICNTL(9) and for
ICNTL(20)=10 or 11.
Special case of distributed right-hand side and distributed solution.
In case of both distributed RHS (ICNTL(20)=10,11) and distributed solution (ICNTL(21)=1),
IRHS loc and ISOL loc may point to the same workarrays. This should only be done when the
contents are known to be identical (symmetric case, unsymmetric case under some circumstances, see
above). Otherwise, the solve phase (JOB=3) with distributed solution will overwrite entries of IRHS loc
while building ISOL loc.
Furthermore, RHS loc and SOL loc may point to the same memory location, as long as LRHS loc
> LSOL loc.
52
total entries A−1 to be computed NZ RHS =4
number of columns A−1 NRHS =N =4
pointers to the columns IRHS PTR [1 : N RHS + 1] = 1 3 4 4 5
array of row indices IRHS SPARSE [1 : N Z RHS] = 1 3 3 4
Note that column 3 will be considered as empty, because no elements have to be computed.
The following parameter should be allocated, but not initialized:
ICNTL(30) computes a user-specified set of entries in the inverse A−1 of the original matrix.
Phase: accessed during the solution phase.
Possible variables/arrays involved: NZ RHS, NRHS, RHS SPARSE, IRHS SPARSE, IRHS PTR
Possible values :
0: no entries in A−1 are computed.
1: computes entries in A−1 .
Other values are treated as 0.
Default value: 0 (no entries in A−1 are computed)
Incompatibility: Error analysis and iterative refinement will not be performed, even if the
corresponding options are set (ICNTL(10) and ICNTL(11)). Because the entries of A−1 are
returned in RHS SPARSE on the host, this functionality is incompatible with the distributed solution
option (ICNTL(21)). Furthermore, computing entries of A−1 is not possible in the case of partial
factorizations with a Schur complement (ICNTL(19)). Option to compute solution using A or
AT (ICNTL(9)) is meaningless and thus ignored.
Related parameters: ICNTL(27)
Remarks: When a set of entries of A−1 is requested, the associated set of columns will be
computed in blocks of size ICNTL(27). Larger ICNTL(27) values will most likely decrease the
amount of factor accesses, enable more parallelism and thus reduce the solution time [48, 44, 14].
The user must specify on input to a call of the solve phase in the arrays IRHS PTR and
IRHS SPARSE the target entries. The array RHS SPARSE should be allocated but not initialized.
Note that since selected entries of the inverse of the matrix are requested, NRHS must be set to N. On
output the arrays IRHS PTR, IRHS SPARSE and RHS SPARSE will hold the requested entries. If
duplicate target entries are provided then duplicate solutions will be returned.
When entries of A−1 are requested (ICNTL(30) = 1), mumps par%RHS needs not be allocated.
53
5.14.6 Distributed solution (ICNTL(21)=1)
On some networks with low bandwidth, and especially when there are many right-hand side vectors,
centralizing the solution on the host processor might be a costly part of the solution phase. If this
is critical to the user, this functionality allows the solution to be left distributed over the processors
(ICNTL(21)=1). The solution should then be exploited in its distributed form by the user application.
Note that this option can be used only with JOB=3 and should not be used with JOB= 5 or 6, because
some parameters needed for this option must be set using information output by the factorization.
The following input parameters should be allocated by the user before the solve phase (JOB=3) on
all processors in the case of the working host model of parallelism (PAR=1), and on all processors except
the host in the case of the non-working host model of parallelism (PAR=0).
mumps par%SOL loc (real/complex pointer array, dimension LSOL loc× NRHS where NRHS is
either equal to 1 or corresponds to the value provided by the user in NRHS on the host) must be
allocated by the user between the factorization and solve steps. Its leading dimension LSOL loc
should be larger than or equal to INFO(23), that is returned by the factorization phase.
On exit from the solve phase, SOL loc(i+(k-1)×LSOL loc) will contain the value corresponding
to variable ISOL loc(i) in the kth solution vector.
mumps par%LSOL loc (integer). LSOL loc must be set to the leading dimension of SOL loc (see
above) and should be larger than or equal to INFO(23), that is returned by the factorization phase.
mumps par%ISOL loc (integer pointer array, of dimension at least INFO(23), that is returned by the
factorization phase) must be allocated by the user between the factorization and solve steps.
On exit from the solve phase, ISOL loc(i) contains the index of the variables for which the solution
(in SOL loc) is available on the local processor.
If successive calls to the solve phase (JOB=3) are performed for a given matrix, ISOL loc will have
the same contents for each of these calls.
Note that if the solution is kept distributed, then functionalities related to error analysis and iterative
refinement (ICNTL(10) and ICNTL(11)) are currently not available.
54
The integer variables NPROW, NPCOL, MBLOCK, NBLOCK may also be defined (default values
will otherwise be provided).
Values not equal to 1, 2 or 3 are treated as 0.
Default value: 0 (complete factorization)
Incompatibility: Since the Schur complement is a partial factorization of the global matrix (with
partial ordering of the variables provided by the user), the following options of MUMPS are
incompatible with the Schur option: maximum transversal, scaling, iterative refinement, error
analysis and parallel analysis.
Related parameters: ICNTL(7), ICNTL(26)
Remarks: If the ordering is given (ICNTL(7)= 1) then the following property should hold:
PERM IN(LISTVAR SCHUR(i)) = N-SIZE SCHUR+i, for i=1,SIZE SCHUR.
Note that, in order to have a centralized Schur complement matrix by columns (see
Subsection 5.15.3), it is possible (and recommended) to use a particular case of the distributed
Schur complement (ICNTL(19)=2 or 3), where the Schur complement is assigned to only one
processor (NPCOL × NPROW = 1).
If ICNTL(19) = 1,2,3 the user should give on input on the host before the analysis phase the
following parameters:
mumps par%SIZE SCHUR (integer) must be initialized to the number of variables defining the Schur
complement. It is only accessed during the analysis phase and is not altered by MUMPS. Its value is
communicated internally to the other phases as required. SIZE SCHUR should be greater or equal
to 0 and strictly smaller than N.
mumps par%LISTVAR SCHUR (integer pointer array, dimension SIZE SCHUR) must be allocated
and initialized by the user so that LISTVAR SCHUR(i), i=1, . . . , SIZE SCHUR must hold the
ith variable of the Schur complement matrix. It is accessed during analysis (JOB=1) and it is not
altered by MUMPS.
If a given ordering (Subsection 5.4.2, ICNTL(7) =1) is set by the user, the permutation should
also include the variables of the Schur complement, so that: PERM IN(LISTVAR SCHUR(i))=N-
SIZE SCHUR+i, for 1 ≤ i ≤SIZE SCHUR.
55
For symmetric matrices, if ICNTL(19) =2, only the lower part of the Schur matrix is generated,
otherwise, if ICNTL(19) =3, the complete Schur matrix is generated.
For unsymmetric matrices MUMPS always provides the complete Schur matrix, so that
ICNTL(19)=2 and ICNTL(19)=3 have the same effect.
On entry to the analysis phase (JOB = 1), the following parameters should be defined on the host:
mumps par%NPROW, mumps par%NPCOL, mumps par%MBLOCK, and mumps par%NBLOCK
are integers corresponding to the characteristics of a 2D block cyclic grid of processors. If any of
these quantities is smaller than or equal to zero or has not been defined by the user, or if NPROW×
NPCOL is larger than the number of slave processors available (total number of processors if
PAR=1, total number of processors minus 1 if PAR=0), then a grid shape will be computed by
the analysis phase of MUMPS and NPROW, NPCOL, MBLOCK, NBLOCK will be overwritten on
exit from the analysis phase. We briefly describe here the meaning of the four above parameters in
a 2D block cyclic distribution:
• NPROW is the number of rows of the process grid (or the number of processors in a column
of the process grid),
• NPCOL is the number of columns of the process grid (or the number of processors in a row
of the process grid),
• MBLOCK is the blocking factor used to distribute the rows of the Schur complement,
• NBLOCK is the blocking factor used to distribute the columns of the Schur complement.
As in ScaLAPACK, we use a row-major process grid of processors, that is, process ranks (as
provided to MUMPS in the MPI communicator) are consecutive in a row of the process grid.
NPROW, NPCOL, MBLOCK and NBLOCK should be passed unchanged from the analysis phase
to the factorization phase. If the matrix is symmetric (SYM=1 or 2) and ICNTL(19)=3 (see below),
then the values of MBLOCK and NBLOCK should be equal.
On exit from the analysis phase, the following two components are set by MUMPS on the first NPROW
× NPCOL slave processors (the host is excluded if PAR=0 and the processors with largest MPI ranks in
the communicator provided to MUMPS may not be part of the grid of processors).
mumps par%SCHUR MLOC is an integer giving the number of rows of the local Schur complement
matrix on the concerned processor. It is equal to MAX(1,NUMROC(SIZE SCHUR, MBLOCK,
myrow, 0, NPROW)), where
• NUMROC is an integer function defined in most ScaLAPACK implementations (also used
internally by the MUMPS package),
• SIZE SCHUR, MBLOCK, NPROW have been defined earlier, and
• myrow is defined as follows:
Let myid be the rank of the calling process in the communicator COMM provided to MUMPS.
(myid can be returned by the MPI routine MPI COMM RANK.)
– if PAR = 1 myrow is equal to myid / NPCOL,
– if PAR = 0 myrow is equal to (myid − 1) / NPCOL.
Note that an upperbound of the minimum value of leading dimension (SCHUR LLD defined below)
is equal to ((SIZE SCHUR+MBLOCK-1)/MBLOCK+NPROW-1)/NPROW*MBLOCK.
mumps par%SCHUR NLOC is an integer giving the number of columns of the local Schur
complement matrix on the concerned processor. It is equal to NUMROC(SIZE SCHUR, NBLOCK,
mycol, 0, NPCOL), where
• SIZE SCHUR, NBLOCK, NPCOL have been defined earlier, and
• mycol is defined as follows:
Let myid be the rank of the calling process in the communicator COMM provided to MUMPS.
(myid can be returned by the MPI routine MPI COMM RANK.)
– if PAR = 1 mycol is equal to MOD(myid, NPCOL),
– if PAR = 0 mycol is equal to MOD(myid − 1, NPCOL).
56
On entry to the factorization phase (JOB = 2), the user should give on input the following components
of the structure:
mumps par%SCHUR LLD (integer) should be set to the leading dimension of the local Schur
complement matrix. It should be larger or equal to the local number of rows of that matrix,
SCHUR MLOC (as returned by MUMPS on exit from the analysis phase on the processors that
participate in the computation of the Schur). SCHUR LLD is not modified by MUMPS.
mumps par%SCHUR (real/complex one-dimensional pointer array) should be allocated by the user
on the NPROW × NPCOL first slave processors (the host is excluded if PAR=0 and the processors
with largest MPI ranks in the communicator provided to MUMPS may not be part of the grid of
processors). Its size should be at least equal to SCHUR LLD × (SCHUR NLOC - 1) + SCHUR MLOC,
where SCHUR MLOC, SCHUR NLOC, and SCHUR LLD have been defined above.
On exit from the factorization phase, the pointer array SCHUR contains the Schur complement, stored
by columns, in the format corresponding to the 2D cyclic grid of NPROW × NPCOL processors, with
block sizes MBLOCK and NBLOCK, and local leading dimensions SCHUR LLD.
Note that if ICNTL(19)=3 and the Schur is symmetric (SYM=1 or 2), then the constraint
mumps par%MBLOCK = mumps par%NBLOCK should hold.
Note that setting NPCOL × NPROW = 1 will centralize the Schur complement matrix, stored by
columns (instead of by rows as in the ICNTL(19)=1 option). More details on this are presented in
Subsection 5.15.3.
description from paragraph “Distributed Schur Complement” above for more information.
57
is returned by rows and the lower triangular part is not accessed. If the matrix is symmetric (SYM=1 or
2) and ICNTL(19)=3, then both the lower and upper triangular parts are returned. Because the Schur
complement is symmetric, this can be seen both as a row-major and as a column-major storage.
58
NRHS NRHS
L 11 U 11 U 12 X1 B1
=
L 21 I S X2 B2 SIZE_SCHUR
SIZE_SCHUR
Figure 3: Solving the complete system using the Schur complement matrix
mumps par%REDRHS is a real (complex in the complex version) one-dimensional pointer array that
should be allocated by the user before entering the solution phase. Its size should be at least equal
to LREDRHS ×(NRHS-1)+ SIZE SCHUR.
If the reduction/condensation phase should be performed (ICNTL(26)=1), then on exit from the
solution phase, REDRHS(i+(k-1)*LREDRHS), i=1, . . ., SIZE SCHUR, k=1, . . ., NRHS will hold
the reduced right-hand side (the y2 vector of Equation (13)).
If the expansion phase should be performed (ICNTL(26)=2), then REDRHS(i+(k-1)*LREDRHS),
i=1, . . ., SIZE SCHUR, k=1, . . ., NRHS must be set (on entry to the solution phase) to the solution
on the Schur variables (the x2 vector of Equation (14)). In this case (i.e., ICNTL(26)=2) REDRHS
is not altered by MUMPS.
Note that on exit, the solution matrix [X1 X2 ]T in Figure 3 is stored in the RHS parameter, except in
case of distributed solution where it will be stored in ISOL loc and SOL loc (see ICNTL(21) and
Subsection 5.14).
59
As these two options will lead to suboptimal performance, we recommend to use a BLAS library
compatible with OpenMP when available.
Please note that, low-rank approximations are computed using a truncated QR factorization with
column pivoting, implemented as a variant of the LAPACK GEQP3 and LAQPS routines; linking with
a LAPACK library is necessary to satisfy the dependencies of this feature. On many systems, BLAS and
LAPACK routines are provided by a single library (e.g., the Intel MKL library) but, if it’s not the case,
the LAPACK library can simply be added in the LAPACK variable of the MUMPS Makefile.inc file.
CNTL(7) is the dropping parameter ε (double precision real value) controlling the accuracy of the Block
Low-Rank approximations.
Phase: accessed by the host during the factorization phase when ICNTL(35)=1, 2 or 3
Possible values :
0.0 : full precision approximation.
> 0.0 : the dropping parameter is CNTL(7).
Default value: 0.0 (full precision (i.e., no approximation)).
60
Related parameters: ICNTL(35)
Remarks: The value of CNTL(7) is used as a stopping criterion for the compression of BLR
blocks which is achieved through a truncated Rank Revealing QR factorization. More precisely, to
compute the low-rank form of a block, we perform a QR factorization with column pivoting which
is stopped as soon as a diagonal coefficient of the R factor falls below the threshold, i.e., when
∥rkk ∥ < ε. This is implemented as a variant of the LAPACK [19] GEQP3 routine. Larger values
of this parameter lead to more compression at the price of a lower accuracy. Note that ϵ is used as
an absolute tolerance, i.e., not relative to the input matrix, or the frontal matrix or the block norms;
for this reason we recommend to scale the matrix or or let the solver automatically preprocess (e.g.,
scale) the input matrix.
Note that, depending on the application, gains can be expected even with small values (close to
machine precision) of CNTL(7).
Estimating the memory footprint during analysis is difficult because the compression rate of the BLR
factors is only known after factorization. (When ICNTL(35)=2, factors are kept in compressed form
and the memory footprint can be reduced.) To enable the user to estimate during analysis the effect
of BLR compression on the memory footprint, ICNTL(38) has been introduced. ICNTL(38) only
influences the statistics printed during the analysis phase (only the INFO/INFOG arrays are affected).
Furthermore, Out-Of-Core can also be combined with BLR compression. In this case, the factors that are
kept full-rank (all of them if ICNTL(35)=3, or the ones of the frontal matrices not considered for BLR
compression if ICNTL(35)=2) will be written onto the disk during the factorization. BLR estimations
of the peak of memory for all possible situations are provided during analysis.
61
After factorization, statistics on the effective low-rank compression and on the memory effectively
allocated/used are then provided to the user (see Section 5.16.3) and can then be used to adjust the value
of ICNTL(38).
Similarly, with ICNTL(39) the user can provide an estimation of the compression rate of the
contributions blocks.
ICNTL(38) estimated compression rate of LU factors
Phase: accessed by the host during the analysis and the factorization phases when ICNTL(35)=1,
2 or 3
Possible values : between 0 and 1000 (1000 is no compression and 0 is full compression); other
values are treated as 0; ICNTL(38)/10 is a percentage representing the typical compression of the
compressed factors
factor matrices in BLR fronts: ICNTL(38)/10 = uncompressed factors × 100.
Default value: 600 (when factors of BLR fronts are compressed, their size is 60.0% of their full-
rank size).
Related parameters: ICNTL(35), CNTL(7)
Remarks: Influences statistics provided in INFO(29), INFO(30), INFO(31), INFOG(36),
INFOG(37), INFOG(38), INFOG(39) , but also INFO(32-35) and INFOG(40-43)
62
Statistics on operation counts (OPC):
RINFOG(3) Total theoretical operations = 1.985E+09 (100.0%)
RINFOG(14) Total effective OPC (% of RINFOG(3)) = 2.243E+08 ( 11.3%)
---------- End of BLR statistics -----------------------------------------
63
5.17 Save (JOB=7) / Restore (JOB=8) feature
To save to disk MUMPS internal data associated to a given instance, MUMPS should be called with JOB=7
(see Subsection 5.1.1). These MUMPS internal data are saved in binary files. It is possible to use the save
feature (JOB=7) before or after any of the main phases (analysis, factorization, solve, JOB=1,2,3,4,5,6).
After that, it is possible to continue working with the existing instance until, at some point, the instance
should be terminated (JOB=-2). In order to restart MUMPS with the saved data, the user should first create
a new instance (JOB=-1, see Subsection 5.1.1) and then restore into that instance the saved data with
a call to MUMPS with JOB=8. Note that arrays that are allocated and freed by the user are not saved
(JOB=7) nor restored (JOB=8), although some of them might be requested for further calls to MUMPS.
For example, the arrays associated to the input matrix are not saved. See important remarks on how to
use this feature in Subsection 5.17.3.
64
Control parameters: during the restoration of MUMPS data, the values of ICNTL and CNTL are set back
to the saved values. So any intended change of these values should occur after the call of MUMPS
with JOB=8 to be taken into account during the following calls of MUMPS.
Constraint with binary files: because binary files are used to save the data, the binary storage should
have identical format (for example, little or big endian) on the machine that saves the data and the
one that restores it.
Arrays allocated and freed by the user As mentioned above, the arrays that are allocated and freed by
the user are not saved and cannot be restored by this feature, although some of them might be
requested for further calls to MUMPS. For example, the arrays associated to the input matrix are not
stored. The user is thus responsible of providing them again, if they are going to be needed. Data
from the user that are not saved but that might be required internally by MUMPS after the restoration
include:
Input matrix: the arrays associated to the input matrix (see Subsection 5.2.2) are not saved during
a call of MUMPS with JOB=7. The possible arrays concerned are: IRN, JCN, A, IRN loc,
JCN loc, A loc, ELTPTR, ELTVAR, A ELT. Consequently, the same matrix data must be
provided again by the user before a (possibly new) factorization or before a solve if iterative
refinement (ICNTL(10)) or error analysis (ICNTL(11)) are requested.
Scaling arrays: the scaling arrays ROWSCA and COLSCA are not saved by the save phase in case
they had been provided by the user before the factorization and should then be provided again
before the solve phase.
Right-hand side vector: the Right-hand side (or Solution) vector RHS is not saved by the save
phase.
MPI context: the only constraint is that before a call to MUMPS with JOB=8 using the communicator
COMM provided in the JOB=-1 call, the process with rank<myid> should have access to the file
<SAVE DIR>/<SAVE PREFIX> myid.mumps. There are 6 possibilities :
<SAVE DIR>/<SAVE PREFIX> myid.mumps
<SAVE DIR>/<MUMPS SAVE PREFIX> myid.mumps
<SAVE DIR>/save myid.mumps
< MUMPS SAVE DIR>/<SAVE PREFIX> myid.mumps
< MUMPS SAVE DIR>/<MUMPS SAVE PREFIX> myid.mumps
< MUMPS SAVE DIR>/save myid.mumps
65
Phase: accessed by the host during the save/restore files deletion phase (JOB=-3) in case of out-of-
core (ICNTL(22)=1).
Possible values :
0: the out-of-core files are marked out for deletion
1: the out-of-core files should not be deleted because another saved instance references them.
Other values are treated as 0.
Default value: 0 (out-of-core files associated to a saved instance are marked out for deletion at the
end of the out-of-core file lifetime)
Remarks: MUMPS will delete only the out-of-core files that are referenced in the saved data
identified by the value of SAVE DIR and SAVE PREFIX. Extra out-of-core files with the same
OOC TMPDIR and OOC PREFIX are not deleted.
66
5.19 Compact workarray id%S at the end of factorization phase
When memory for the solve phase is critical (case of large number of right-hand sides) or when the
memory footprint at the end of the factorization phase needs to be reduced, setting ICNTL(49) to 1 or
2 enables to compact the internal workarray id%S at the end of the factorization so that id%S only holds
factor matrices needed for the solution phase.
ICNTL(49) compact workarray id%S at the end of factorization phase
Phase: accessed by the host during factorization phase
Possible values :
0 : nothing is done.
1 : compact workarray id%S(MAXS) at the end of the factorization phase while satisfying the
memory contraint that might have been provided with ICNTL(23) feature.
2 : compact workarray id%S(MAXS) at the end of the factorization phase. The memory
constraint that might have been provided with ICNTL(23) feature does not apply to this
process.
Other values are treated as 0.
Default value: 0
Incompatibility: With the use of LWK USER / WK USER feature.
Remarks: ICNTL(49)=1,2 might require intermediate memory allocation to reallocate id%S of
minimal size. If the memory allocation fails, then a warning is returned and nothing is done. If
ICNTL(49)=1 and the memory constraint provided with ICNTL(23)> 0 would not be satisfied
then a warning is raised and nothing is done.
6 Control parameters
On exit from the initialization call (JOB = –1), the control parameters are set to default values. If the
user wishes to use values other than the defaults, the corresponding entries in mumps par%ICNTL and
mumps par%CNTL should be reset after this initial call and before the call in which they are used.
67
• ICNTL(16) controls the setting of the number of OpenMP threads
• ICNTL(18) defines the strategy for the distributed input matrix
• ICNTL(19) computes the Schur complement matrix
• ICNTL(20) determines the format (dense, sparse, or distributed) of the right-hand sides
• ICNTL(21) determines the distribution (centralized or distributed) of the solution vectors
• ICNTL(22) controls the in-core/out-of-core (OOC) factorization and solve
• ICNTL(23) corresponds to the maximum size of the working memory in MegaBytes that MUMPS
can allocate per working processor
• ICNTL(24) controls the detection of “null pivot rows”
• ICNTL(25) allows the computation of a solution of a deficient matrix and also of a null space basis
• ICNTL(26) drives the solution phase if a Schur complement matrix
• ICNTL(27) controls the blocking size for multiple right-hand sides
• ICNTL(28) determines whether a sequential or parallel computation of the ordering is performed
• ICNTL(29) defines the parallel ordering tool to be used to compute the fill-in reducing permutation
• ICNTL(30) computes a user-specified set of entries in the inverse A−1 of the original matrix
• ICNTL(31) indicates which factors may be discarded during the factorization
• ICNTL(32) performs the forward elimination of the right-hand sides during the factorization
• ICNTL(33) computes the determinant of the input matrix
• ICNTL(34) controls the deletion of the files in case of save/restore
• ICNTL(35) controls the activation of the Block Low-Rank (BLR) feature
• ICNTL(36) controls the choice of BLR factorization variant
• ICNTL(37) controls the BLR compression of the contribution blocks
• ICNTL(38) estimates compression rate of LU factors
• ICNTL(39) estimates compression rate of contribution blocks
• ICNTL(40) reserved in current version
• ICNTL(41-48) reserved in current version
• ICNTL(49) compact workarray id%S at the end of factorization phase
• ICNTL(50) reserved in current version
• ICNTL(51) reserved in current version
• ICNTL(52-57) reserved in current version
• ICNTL(58) defines options for symbolic factorization
• ICNTL(59-60) not used in current version
ICNTL(2) is the output stream for diagnostic printing and statistics local to each MPI process.
Possible values :
≤ 0: these messages will be suppressed.
> 0 : is the output stream.
68
Default value: 0
Remarks: If ICNTL(2) > 0 and ICNTL(4) ≥ 2, then information on advancement (flops done)
is also printed.
ICNTL(3) is the output stream for global information, collected on the host.
Possible values :
≤ 0: these messages will be suppressed.
> 0 : is the output stream.
Default value: 6 (standard output stream)
ICNTL(4) is the level of printing for error, warning, and diagnostic messages.
Possible values :
≤ 0: No messages output.
1: Only error messages printed.
2: Errors, warnings, and main statistics printed.
3: Errors and warnings and terse diagnostics (only first ten entries of arrays) printed.
≥4: Errors, warnings and information on input, output parameters printed.
Default value: 2 (errors, warnings and main statistics printed)
ICNTL(6) permutes the matrix to a zero-free diagonal and/or scale the matrix (see Subsection 3.2 and
Subsection 5.3.2).
Phase: accessed by the host and only during sequential analysis (ICNTL(28)=1)
Possible variables/arrays involved: optionally UNS PERM, mumps par%A, COLSCA and ROWSCA
Possible values :
69
0 : No column permutation is computed.
1 : The permuted matrix has as many entries on its diagonal as possible. The values on the
diagonal are of arbitrary size.
2 : The permutation is such that the smallest value on the diagonal of the permuted matrix is
maximized. The numerical values of the original matrix, (mumps par%A), must be provided
by the user during the analysis phase.
3 : Variant of option 2 with different performance. The numerical values of the original matrix
(mumps par%A) must be provided by the user during the analysis phase.
4 : The sum of the diagonal entries of the permuted matrix is maximized. The numerical values of
the original matrix (mumps par%A) must be provided by the user during the analysis phase.
5 : The product of the diagonal entries of the permuted matrix is maximized. Scaling vectors
are also computed and stored in COLSCA and ROWSCA, if ICNTL(8) is set to -2 or 77.
With these scaling vectors, the nonzero diagonal entries in the permuted matrix are one in
absolute value and all the off-diagonal entries less than or equal to one in absolute value.
For unsymmetric matrices, COLSCA and ROWSCA are meaningful on the permuted matrix
A Qc (see Equation (5)). For symmetric matrices, COLSCA and ROWSCA are meaningful on
the original matrix A. The numerical values of the original matrix, mumps par%A, must be
provided by the user during the analysis phase.
6 : Similar to 5 but with a more costly (time and memory footprint) algorithm. The numerical
values of the original matrix, mumps par%A, must be provided by the user during the analysis
phase.
7 : Based on the structural symmetry of the input matrix and on the availability of the numerical
values, the value of ICNTL(6) is automatically chosen by the software.
Other values are treated as 0. On output from the analysis phase, INFOG(23) holds the value of
ICNTL(6) that was effectively used.
Default value: 7 (automatic choice done by the package)
Incompatibility: If the matrix is symmetric positive definite (SYM = 1), or in elemental format
(ICNTL(5)=1), or the parallel analysis is requested (ICNTL(28)=2) or the ordering is provided
by the user (ICNTL(7)=1), or the Schur option (ICNTL(19) = 1, 2, or 3) is required, or the
matrix is initially distributed (ICNTL(18)=1,2,3), then ICNTL(6) is treated as 0.
Related parameters: ICNTL(8), ICNTL(12)
Remarks: On assembled centralized unsymmetric matrices (ICNTL(5)=0, ICNTL(18)=0, SYM
= 0), if ICNTL(6)=1, 2, 3, 4, 5, 6 a column permutation (based on weighted bipartite matching
algorithms described in [24, 25]) is applied to the original matrix to get a zero-free diagonal.
The user is advised to set ICNTL(6) to a nonzero value when the matrix is very unsymmetric
in structure. On output to the analysis phase, when the column permutation is not the identity, the
pointer UNS PERM (internal data valid until a call to MUMPS with JOB=-2) provides access to the
permutation on the host processor (see Subsection 5.3.1). Otherwise, the pointer is not associated.
The column permutation is such that entry ai,perm(i) is on the diagonal of the permuted matrix.
On general assembled centralized symmetric matrices (ICNTL(5)=0, ICNTL(18)=0, SYM =
2), if ICNTL(6)=1, 2, 3, 4, 5, 6, the column permutation is internally used to determine a set of
recommended 1×1 and 2×2 pivots (see [26] and the description of ICNTL(12) in Subsection 6.1
for more details). We advise either to let MUMPS select the strategy (ICNTL(6) = 7) or to set
ICNTL(6) = 5 if the user knows that the matrix is for example an augmented system (which is a
system with a large zero diagonal block). On output from the analysis the pointer UNS PERM is not
associated.
ICNTL(7) computes a symmetric permutation (ordering) to determine the pivot order to be used for the
factorization in case of sequential analysis (ICNTL(28)=1). See Subsection 3.2 and Subsection 5.4.
Phase: accessed by the host and only during the sequential analysis phase (ICNTL(28) = 1).
Possible variables/arrays involved: PERM IN, SYM PERM
Possible values :
70
0 : Approximate Minimum Degree (AMD) [7] is used,
1 : The pivot order should be set by the user in PERM IN, on the host processor. In that case,
PERM IN must be allocated on the host by the user and PERM IN(i), (i=1, ... N) must hold the
position of variable i in the pivot order. In other words, row/column i in the original matrix
corresponds to row/column PERM IN(i) in the reordered matrix.
2 : Approximate Minimum Fill (AMF) is used,
3 : SCOTCH9 [42] package is used if previously installed by the user otherwise treated as 7.
4 : PORD10 [46] is used if previously installed by the user otherwise treated as 7.
5 : the Metis11 [34] package is used if previously installed by the user otherwise treated as 7.
It is possible to modify some components of the internal options array of Metis (see
Metis manual) in order to fine-tune and modify various aspects of the internal algorithms
used by Metis. This can be done by setting some elements (see the file metis.h in the
Metis installation to check the position of each option in the array) of the MUMPS array
mumps par%METIS OPTIONS after the MUMPS initialization phase (JOB=-1) and before
the analysis phase. Note that the METIS OPTIONS array of the MUMPS structure is of size 40,
which is large enough for both Metis 4.x and Metis 5.x verions. It is passed by MUMPS as the
argument “options” to the METIS ordering routine METIS NodeND (METIS NodeWND is
sometimes also called in case MUMPS was installed with Metis 4.x) during the analysis phase.
6 : Approximate Minimum Degree with automatic quasi-dense row detection (QAMD) is used.
7 : Automatic choice by the software during analysis phase. This choice will depend on the
ordering packages made available, on the matrix (type and size), and on the number of
processors.
Other values are treated as 7.
Default value: 7 (automatic choice)
Incompatibility: ICNTL(7) is meaningless if the parallel analysis is chosen (ICNTL(28)=2).
Related parameters: ICNTL(28)
Remarks: Even when the ordering is provided by the user, the analysis must be performed before
numerical factorization.
For assembled matrices (centralized or distributed) (ICNTL(5)=0) all the options are available.
For elemental matrices (ICNTL(5)=1), only options 0, 1, 5 and 7 are available, with option 7
leading to an automatic choice between AMD and Metis (options 0 or 5); other values are treated
as 7.
If the user asks for a Schur complement matrix (ICNTL(19)= 1, 2, 3) and
– the matrix is assembled (ICNTL(5)=0) then only options 0, 1, 5 and 7 are currently available.
Other options are treated as 7.
– the matrix is elemental (ICNTL(5)=1) only options 0, 1 and 7 are currently available. Other
options are treated as 7 which will (currently) be treated as 0 (AMD).
– in both cases (assembled or elemental matrix) if the pivot order is given by the user
(ICNTL(7)=1) then the following property should hold: PERM IN(LISTVAR SCHUR(i)) =
N-SIZE SCHUR+i, for i=1,SIZE SCHUR.
For matrices with relatively dense rows, we highly recommend option 6 which may significantly
reduce the time for analysis.
On output, the pointer array SYM PERM provides access, on the host processor, to the symmetric
permutation that is effectively computed during the analysis phase by the MUMPS package, and
INFOG(7) to the ordering option that was effectively chosen. In fact, the option corresponding to
ICNTL(7) may be forced by MUMPS when for example the ordering option chosen by the user is
not compatible with the value of ICNTL(12) or the necessary package is not installed.
9 See http://gforge.inria.fr/projects/scotch/ to obtain a copy.
10 Distributed
within MUMPS by permission of J. Schulze (University of Paderborn).
11 See http://glaros.dtc.umn.edu/gkhome/metis/metis/overview to obtain a copy.
71
SYM PERM(i), i=1, ... N, holds the position of variable i in the pivot order. In other words,
row/column i in the original matrix corresponds to row/column SYM PERM(i) in the reordered
matrix. See also Subsection 5.4.1.
72
Possible values :
1 : AX = B is solved.
̸= 1 : AT X = B is solved.
Default value: 1
Related parameters: ICNTL(10), ICNTL(11), ICNTL(21), ICNTL(32)
Remarks: when a forward elimination is performed during the factorization (see ICNTL(32))
only ICNTL(9)=1 is allowed.
ICNTL(10) applies the iterative refinement to the computed solution (see Subsection 5.6).
Phase: accessed by the host during the solve phase.
Possible variables/arrays involved: NRHS
Possible values :
< 0 : Fixed number of steps of iterative refinement. No stopping criterion is used.
0 : No iterative refinement.
> 0 : Maximum number of steps of iterative refinement. A stopping criterion is used, therefore a
test for convergence is done at each step of the iterative refinement algorithm.
Default value: 0 (no iterative refinement)
Related parameters: CNTL(2)
Incompatibility: If ICNTL(21)=1 (solution kept distributed) or if ICNTL(32)=1 (forward
elimination during factorization), or if NRHS>1 (multiple right hand sides), or if ICNTL(20)=10
or 11 (distributed right hand sides), then iterative refinement is disabled and ICNTL(10) is treated
as 0.
Remarks: Note that if ICNTL(10)< 0, |ICNTL(10)| steps of iterative refinement are performed,
without any test of convergence (see Algorithm 3). This means that the iterative refinement may
diverge, that is the solution instead of being improved may be worse from an accuracy point of view.
But it has been shown [20] that with only two to three steps of iterative refinement the solution can
often be significantly improved. So if the convergence test should not be done we recommend to
set ICNTL(10) to -2 or -3.
Note also that it is not necessary to activate the error analysis option (ICNTL(11)= 1,2) to be
able to run the iterative refinement with stopping criterium (ICNTL(10) > 0). However, since
the backward errors ω1 and ω2 have been computed, they are still returned in RINFOG(7) and
RINFOG(8), respectively.
It must be noticed that iterative refinement with stopping criterium (ICNTL(10) > 0) will stop
when
1. either the requested accuracy is reached (ω1 + ω2 < CNTL(2))
2. or when the convergence rate is too slow (ω1 + ω2 does not decrease by at least a factor of 5)
3. or when exactly ICNTL(10) steps have been performed.
In the first two cases the number of iterative refinement steps (INFOG(15)) may be lower than
ICNTL(10).
ICNTL(11) computes statistics related to an error analysis of the linear system solved (Ax = b or
AT x = b (see ICNTL(9))). See Subsection 5.7.
Phase: accessed by the host and only during the solve phase.
Possible variables/arrays involved: NRHS
Possible values :
0 : no error analysis is performed (no statistics).
1 : compute all the statistics (very expensive).
73
2 : compute main statistics (norms, residuals, componentwise backward errors), but not the most
expensive ones like (condition number and forward error estimates).
Values different from 0, 1, and 2 are treated as 0.
Default value: 0 (no statistics).
Incompatibility: If ICNTL(21)=1 (solution kept distributed) or if ICNTL(32)=1 (forward
elimination during factorization), or if NRHS>1 (multiple right hand sides), or if ICNTL(20)=10
or 11 (distributed right hand sides), or if ICNTL(25)=-1 (computation of the null space basis),
then error analysis is not performed and ICNTL(11) is treated as 0.
Related parameters: ICNTL(9)
Remarks: The computed statistics are returned in various informational parameters, see also
Subsection 3.3:
– If ICNTL(11)= 2, then the infinite norm of the input matrix (∥A∥∞ or ∥AT ∥∞ in
RINFOG(4)), the infinite norm of the computed solution (∥x̄∥∞ in RINFOG(5)), and the
∥Ax̄−b∥∞
scaled residual ∥A∥ ∞ ∥x̄∥∞
in RINFOG(6), a componentwise backward error estimate in
RINFOG(7) and RINFOG(8) are computed.
– If ICNTL(11)= 1, then in addition to the above statistics also an estimate for the error in
the solution in RINFOG(9), and condition numbers for the linear system in RINFOG(10) and
RINFOG(11) are also returned.
If performance is critical, ICNTL(11) should be set to 0. If both performance is critical and statistics
are requested, then ICNTL(11) should be set to 2. If ICNTL(11)=1, the error analysis is very costly
(typically significantly more costly than the solve phase itself).
ICNTL(12) defines an ordering strategy for symmetric matrices (SYM = 2) (see [26] for more details)
and is used, in conjunction with ICNTL(6), to add constraints to the ordering algorithm (ICNTL(7)
option).
Phase: accessed by the host and only during the analysis phase.
Possible values :
0: automatic choice
1: usual ordering (nothing done)
2: ordering on the compressed graph associated with the matrix.
3: constrained ordering, only available with AMF (ICNTL(7)=2).
Other values are treated as 1.
Default value: 0 (automatic choice).
Incompatibility: If the matrix is unsymmetric (SYM=0) or symmetric definite positive matrices
(SYM=1), or the matrix is in elemental format (ICNTL(5)=1), or the matrix is initially distributed
(ICNTL(18)=1,2,3) or the ordering is provided by the user (ICNTL(7)=1), or the Schur option
(ICNTL(19) ̸= 0) is required, or the analysis is performed by blocks (ICNTL(15) ̸= 0),
ICNTL(12) is treated as 1 (nothing done).
Related parameters: ICNTL(6), ICNTL(7)
Remarks: If MUMPS detects some incompatibility between control parameters then it uses the
following rules to automatically reset the control parameters. Firstly ICNTL(12) has a lower
priority than ICNTL(7) so that if ICNTL(12) = 3 and the ordering required is not AMF then
ICNTL(12) is internally treated as 2. Secondly ICNTL(12) has a higher priority than ICNTL(6)
and ICNTL(8). Thus if ICNTL(12) = 2 and ICNTL(6) was not active (ICNTL(6)=0) then
ICNTL(6) is treated as 5 if numerical values are provided, or as 1 otherwise. Furthermore, if
ICNTL(12) = 3 then ICNTL(6) is treated as 5 and ICNTL(8) is treated as -2 (scaling computed
during analysis).
On output from the analysis phase, INFOG(24) holds the value of ICNTL(12) that was
effectively used. Note that INFOG(7) and INFOG(23) hold the values of ICNTL(7) and
ICNTL(6) (respectively) that were effectively used.
74
ICNTL(13) controls the parallelism of the root node (enabling or not the use of ScaLAPACK) and also
its splitting.
Phase: accessed by the host during the analysis phase.
Possible values :
< -1 : treated as 0.
-1 : force splitting of the root node in all cases (even sequentially)
0 : parallel factorization of the root node based on ScaLAPACK. If the size of the root frontal node
(last Schur complement to be factored) is larger than an internal threshold, then ScaLAPACK
will be used for factorizing it. Otherwise, the root node will be processed by a single MPI
process.
> 0 : ScaLAPACK is not used (recommended value is 1 to partly recover parallelism of the root
node). It forces a sequential factorization of the root node (ScaLAPACK will not be used). To
recover parallelism lost by the fact of not using ScaLAPACK, splitting of the root node can
be activated: if the number of working processors is strictly larger than ICNTL(13) (always
the case with ICNTL(13)=1) then splitting of the root node is performed to enable node level
parallelism.
Default value: 0 (parallel factorization on the root node)
Remarks: Processing the root sequentially (ICNTL(13) > 0) can be useful when the user is
interested in the inertia of the matrix (see INFO(12) and INFOG(12)), or when the user wants
to detect null pivots (see Subsection 5.10) or to activate BLR compression (Subsection 5.16) on the
root node.
Although ICNTL(13) controls the efficiency of the factorization and solve phases, preprocessing
work is performed during analysis and this option must be set on entry to the analysis phase.
With SYM=1, if ScaLAPACK is allowed (ICNTL(13)≤ 0) then Cholesky factorization will be
performed on the root node and thus negative pivots will raise an error (code -40 is returned in
INFOG(1)).
ICNTL(14) controls the percentage increase in the estimated working space, see Subsection 5.9.
Phase: accessed by the host both during the analysis and the factorization phases.
Default value: between 20 and 35 (which corresponds to at most 35 % increase) and depends on
the number of MPI processes. It is set to 5 % with SYM=1 and one MPI process.
Related parameters: ICNTL(23)
Remarks: When significant extra fill-in is caused by numerical pivoting, increasing ICNTL(14)
may help.
ICNTL(15) exploits compression of the input matrix resulting from a block format, see Subsection 5.5.
Phase: accessed by the host process during the analysis phase.
Possible variables/arrays involved: NBLK, BLKPTR, BLKVAR
Possible values :
0: no compression
-k: all blocks are of fixed size k> 0. N (the order of the matrix A) must be a multiple of k. NBLK
and BLKPTR should not be provided by the user and will be computed internally. Concerning
BLKVAR, please refer to the Remarks below.
1: block format provided by the user. NBLK need be provided on the host by the user and
holds the number of blocks. BLKPTR(1:NBLK+1) must be provided by the user on the host.
Concerning BLKVAR, please refer to the Remarks below.
Any other values will be treated as 0.
Default value: 0
75
Remarks: If BLKVAR is not provided by the user then BLKVAR is internally treated as the identity
(BLKVAR(i)=i, (i=1, ..., N)). It corresponds to contiguous variables in blocks.
– If ICNTL(15)=1 then BLKVAR(BLKPTR(iblk):BLKPTR(iblk+1)-1), (iblk=1, NBLK) holds
the variables associated to block iblk.
– If ICNTL(15) < 0 then BLKPTR need not be provided by the user and NBLK = N/k where
N must be a multiple of k.
In case the pivot order is provided on entry by the user at the analysis phase (ICNTL(7)= 1) then
PERM IN should be compatible with the compression. This means that PERM IN, of size N, should
result from an expansion of a pivot order on the compressed matrix, i.e., variables in a block should
be consecutive in the pivot order.
Incompatibility: With element entry format ICNTL(5)= 1, with Schur complement
ICNTL(19)̸= 0 and with permutation to a zero-free diagonal and related compressed/constrained
ordering for symmetric matrices (ICNTL(6)̸= 0, ICNTL(12)̸= 1).
ICNTL(16) controls the setting of the number of OpenMP threads, see Subsection 5.18.
Phase: accessed by the host at the beginning of all phases
Possible values :
0 : nothing is done, MUMPS uses the number of OpenMP threads configured by the calling
application.
> 0 : MUMPS sets the number of OpenMP threads on entry and reset the previous value on exit.
Other values are treated as 0.
Default value: 0 (no setting of the number of OpenMP threads done by MUMPS internally)
ICNTL(18) defines the strategy for the distributed input matrix (only for assembled matrix, see
Subsection 5.2.2).
Phase: accessed by the host during the analysis phase.
Possible values :
0 : the input matrix is centralized on the host (see Subsection 5.2.2.1).
1 : the user provides the structure of the matrix on the host at analysis, MUMPS returns a mapping
and the user should then provide the matrix entries distributed according to the mapping on
entry to the numerical factorization phase (see Subsection 5.2.2.2).
2 : the user provides the structure of the matrix on the host at analysis, and the distributed
matrix entries on all slave processors at factorization. Any distribution is allowed (see
Subsection 5.2.2.2).
3 : user directly provides the distributed matrix, pattern and entries, input both for analysis and
factorization (see Subsection 5.2.2.2).
Other values are treated as 0.
Default value: 0 (input matrix centralized on the host)
Related parameters: ICNTL(5)
Remarks: In case of distributed matrix, we recommand options 2 or 3. Among them, we
recommand option 3 which is easier to use. Option 1 is kept for backward compatibility but is
deprecated and we plan to suppress it in a future release.
76
0 : complete factorization. No Schur complement is returned.
1 : the Schur complement matrix will be returned centralized by rows on the host after the
factorization phase. On the host before the analysis phase, the user must set the integer variable
SIZE SCHUR to the size of the Schur matrix, the integer pointer array LISTVAR SCHUR to
the list of indices of the Schur matrix.
2 or 3 : the Schur complement matrix will be returned distributed by columns: the Schur will
be returned on the slave processors in the form of a 2D block cyclic distributed matrix
(ScaLAPACK style) after factorization. Workspace should be allocated by the user before
the factorization phase in order for MUMPS to store the Schur complement (see SCHUR,
SCHUR MLOC, SCHUR NLOC, and SCHUR LLD in Subsection 5.15). On the host before the
analysis phase, the user must set the integer variable SIZE SCHUR to the size of the Schur
matrix, the integer pointer array LISTVAR SCHUR to the list of indices of the Schur matrix.
The integer variables NPROW, NPCOL, MBLOCK, NBLOCK may also be defined (default values
will otherwise be provided).
Values not equal to 1, 2 or 3 are treated as 0.
Default value: 0 (complete factorization)
Incompatibility: Since the Schur complement is a partial factorization of the global matrix (with
partial ordering of the variables provided by the user), the following options of MUMPS are
incompatible with the Schur option: maximum transversal, scaling, iterative refinement, error
analysis and parallel analysis.
Related parameters: ICNTL(7), ICNTL(26)
Remarks: If the ordering is given (ICNTL(7)= 1) then the following property should hold:
PERM IN(LISTVAR SCHUR(i)) = N-SIZE SCHUR+i, for i=1,SIZE SCHUR.
Note that, in order to have a centralized Schur complement matrix by columns (see
Subsection 5.15.3), it is possible (and recommended) to use a particular case of the distributed
Schur complement (ICNTL(19)=2 or 3), where the Schur complement is assigned to only one
processor (NPCOL × NPROW = 1).
ICNTL(20) determines the format (dense, sparse, or distributed) of the right-hand sides
Phase: accessed by the host during the solve phase and before a JOB=9 call.
Possible variables/arrays involved: RHS, NRHS, LRHS, IRHS SPARSE, RHS SPARSE,
IRHS PTR, NZ RHS, Nloc RHS, LRHS loc, IRHS loc and RHS loc.
Possible values :
0 : the right-hand side is in dense format in the structure component RHS, NRHS, LRHS (see
Subsection 5.14.1)
1,2,3 : the right-hand side is in sparse format in the structure components IRHS SPARSE,
RHS SPARSE, IRHS PTR and NZ RHS.
1 : The decision of exploiting sparsity of the right-hand side to accelerate the solution phase
is done automatically.
2 : Sparsity of the right-hand side is NOT exploited to improve solution phase.
3 : Sparsity of the right-hand side is exploited during solution phase.
10, 11 : When provided before the solve phase, values 10 and 11 have the same meaning. The
right-hand side is provided distributed in the structure components Nloc RHS, LRHS loc,
IRHS loc, RHS loc (see Subsection 5.14.3).
When provided before a JOB=9 call, values 10 and 11 indicate which distribution MUMPS
should build and return to the user in IRHS loc. In this case, the user should provide a
workarray IRHS loc on each MPI process of size at least INFO(23), where INFO(23) is
returned after the factorization phase.
10 : fill IRHS loc to minimize internal communications of right-hand side data during the
solve phase.
77
11 : fill IRHS loc to match the distribution of the solution (imposed by MUMPS), in case of
distributed solution (ICNTL(21)=1).
Values different from 0, 1, 2, 3, 10, 11 are treated as 0. For a sparse right-hand side, the
recommended value is 1.
Default value: 0 (dense right-hand sides)
Incompatibility: When NRHS > 1 (multiple right-hand side), the functionalities related to iterative
refinement ( ICNTL(10)) and error analysis (ICNTL(11)) are currently disabled.
With sparse right-hand sides (ICNTL(20)=1,2,3), the forward elimination during the factorization
(ICNTL(32)=1) is not currently available.
Remarks: For details on how to set the input parameters see Subsection 5.14.1, Subsection 5.14.2
and Subsection 5.14.3. Please note that duplicate entries in the sparse or distributed right-hand sides
are summed. A JOB=9 call can only be done after a successful factorization phase and its results
depends on the transpose option ICNTL(9), which should not be modified between a JOB=9 and
JOB=3 call. The distributed right-hand side feature enables the user to provide a sparse structured
RHS (i.e., a RHS with some empty rows).
78
The files containing the factors will be deleted if a new factorization starts or when a termination
phase (JOB=-2) is called, except if the save/restore feature has been used and the files containing
the factors are associated to a saved instance. See Section Subsection 5.17.4).
Note that, in case of abnormal termination of an application calling MUMPS (for example, a
termination of the calling process with a segmentation fault, or, more generally, a termination of
the calling process without a call to MUMPS with JOB=-2), the files containing the factors are not
deleted. It is then the user’s responsibility to delete them, as shown in bold in the example below,
where the application calling MUMPS is launched from a bash script and environment variables are
used to define the OOC environment:
#!/bin/bash
export MUMPS OOC TMPDIR="/local/mumps data/"
export MUMPS OOC PREFIX="job myapp "
mpirun -np 128 ./myapplication
# Suppress MUMPS OOC files in case of bad application termination
rm -f ${MUMPS OOC TMPDIR}/${MUMPS OOC PREFIX}*
ICNTL(23) corresponds to the maximum size of the working memory in MegaBytes that MUMPS can
allocate per working processor, see Subsection 5.9 for more details.
Phase: accessed by all processes at the beginning of the factorization phase. If the value is greater
than 0 only on the host, then the value on the host is used for all processes, otherwise ICNTL(23)
is interpreted locally on each MPI process.
Possible values :
0 : each processor will allocate workspace based on the estimates computed during the analysis
>0 : maximum size of the working memory in MegaBytes per working processor to be allocated
Default value: 0
Related parameters: ICNTL(14), ICNTL(38), ICNTL(39)
Remarks: If ICNTL(23) is greater than 0 then MUMPS automatically computes the size of
the internal workarrays such that the storage for all MUMPS internal data does not exceed
ICNTL(23). The relaxation ICNTL(14) is first applied to the internal integer workarray IS and to
communication and I/O buffers; the remaining available space is then shared between the main (and
often most critical) real/complex internal workarray S holding the factors, the stack of contribution
blocks and dynamic workarrays that are used either to expand the S array or to store low-rank
dynamic structures.
Lower bounds for ICNTL(23), in case ICNTL(23) is provided only on the host:
– In case of full-rank factors only (ICNTL(35)=0 or 3), a lower bound for ICNTL(23) (if ICNTL(14),
has not been modified since the analysis) is given by INFOG(16) if the factorization is in-core
(ICNTL(22)=0), and by INFOG(26) if the factorization is out-of-core (ICNTL(22)=1).
– In case of low-rank factors (ICNTL(35)=1 or 2) only (ICNTL(37)=0), a lower bound for ICNTL(23)
(if ICNTL(14), has not been modified since the analysis and ICNTL(38) is a good approximation
of the average compression rate of the factors) is given by INFOG(36) if the factorization is in-core
(ICNTL(22)=0), and by INFOG(38) if the factorization is out-of-core (ICNTL(22)=1).
– In case of low-rank contribution blocks (CB) only (ICNTL(35)=0,3 and ICNTL(37)=1), a lower bound
for ICNTL(23) (if ICNTL(14), has not been modified since the analysis and ICNTL(39) is a good
approximation of the average compression rate of the CB) is given by INFOG(44) if the factorization is
in-core (ICNTL(22)=0), and by INFOG(46) if the factorization is out-of-core (ICNTL(22)=1).
– In case of low-rank factors and contribution blocks (ICNTL(35)=1,2 and ICNTL(37)=1), a lower
bound for ICNTL(23) (if ICNTL(14), has not been modified since the analysis, and ICNTL(38) and
ICNTL(39) are good approximations of the average compression rate of respectively the factors and the
CB) is given by INFOG(40) if the factorization is in-core (ICNTL(22)=0), and by INFOG(42) if the
factorization is out-of-core (ICNTL(22)=1).
Lower bounds for ICNTL(23), in case ICNTL(23) is provided locally to each MPI process:
– Full-rank factors only (ICNTL(35)=0 or 3) ⇒ INFO(15) if the factorization is in-core
(ICNTL(22)=0), INFO(17) if the factorization is out-of-core (ICNTL(22)=1).
79
– Low-rank factors (ICNTL(35)=1 or 2) only (ICNTL(37)=0) ⇒ INFO(30) if the factorization is
in-core (ICNTL(22)=0), INFO(31) if the factorization is out-of-core (ICNTL(22)=1).
– Low-rank factors and contribution blocks (ICNTL(35)=1,2 and ICNTL(37)=1) ⇒ is given by
INFO(34) if the factorization is in-core (ICNTL(22)=0), INFO(35) if the factorization is out-of-
core (ICNTL(22)=1).
The above lower bounds include memory for the real/complex internal workarray S holding the
factors and stack of contribution blocks. In case WK USER is provided, the above quantities should
be diminished by the estimated memory for S/WK USER. This estimated memory can be obtained
from INFO(8), INFO(9), or INFO(20) (depending on MUMPS settings) by taking their absolute
value, if negative, or by dividing them by 106 , if positive. See also the paragraph Recommended
values of LWK USER below.
If ICNTL(23) is left to its default value 0 then MUMPS will allocate for the factorization phase
a workspace based on the estimates computed during the analysis if ICNTL(14) has not been
modified since analysis, or larger if ICNTL(14) was increased. Note that even with full-rank
factorization, these estimates are only accurate in the sequential version of MUMPS but they can
be inaccurate in the parallel case, especially for the out-of-core version. Therefore, in parallel, we
recommend to use ICNTL(23) and provide a value larger than the provided estimations.
ICNTL(25) allows the computation of a solution of a deficient matrix and also of a null space basis.
Phase: accessed by the host during the solution phase
Possible variables/arrays involved: RHS, PIVNUL LIST, ISOL loc and SOL loc
Possible values :
0: A normal solution step is performed. If the matrix was found singular during factorization
then one of the possible solutions is returned.
i: with 1 ≤ i ≤ INFOG(28). The i-th vector of the null space basis is computed.
-1: The complete null space basis is computed.
Default value: 0 (normal solution step)
Incompatibility: Iterative refinement, error analysis, and the option to solve the transpose system
(ICNTL(9) ̸= 1) are ignored when the solution step is used to return vectors from the null space
(ICNTL(25) ̸= 0).
Related parameters: ICNTL(21), ICNTL(24)
80
Remarks: Null space basis computation can be activated when a zero-pivot detection option was
requested (ICNTL(24) ̸= 0) during the factorization and the matrix was found to be deficient
(INFOG(28) > 0).
Note that when vectors from the null space are requested (ICNTL(25) ̸= 0), both centralized
(ICNTL(21)=0) and distributed (ICNTL(21)=1) solutions options can be used. If the solution
is centralized (ICNTL(21)=0), then the null space vectors are returned to the user in the array
RHS, allocated by the user on the host. If the solution is distributed (ICNTL(21)=1), then the
null space vectors are returned in the array SOL loc, which must be allocated by the user on
all working processors (see Subsection 5.14.6). In both cases, the number of columns of RHS or
SOL loc must be equal to the number of vectors requested, so that NRHS must be equal to:
– 1 if 1 ≤ ICNTL(25) ≤ INFOG(28)
– INFOG(28) if ICNTL(25)=-1.
ICNTL(26) drives the solution phase if a Schur complement matrix has been computed (ICNTL(19) ̸=
0), see Subsection 3.17 for details
Phase: accessed by the host during the solution phase. It will be accessed also during factorization
if the forward elimination is performed during factorization (ICNTL(32)=1)
Possible variables/arrays involved: REDRHS, LREDRHS
Possible values :
0 : standard solution phase on the internal problem; referring to the notations from
Subsection 3.17, only the system A1,1 x1 = b1 is solved and the entries of the right-hand
side corresponding to the Schur are explicitly set to 0 on output.
1 : condense/reduce the right-hand side on the Schur. Only a forward elimination is performed.
The solution corresponding to the ‘internal” (non-Schur) variables is returned together with
the reduced/condensed right-hand-side. The reduced right-hand side is made available on the
host in the pointer array REDRHS, that must be allocated by the user. Its leading dimension
LREDRHS must be provide, too.
2 : expand the Schur local solution on the complete solution variables. REDRHS is considered
to be the solution corresponding to the Schur variables. It must be allocated by the user as
well as its leading dimension LREDRHS must be provided. The backward substitution is then
performed with the given right-hand side to compute the solution associated with the ”internal”
variables. Note that the solution corresponding to the Schur variables is also made available
in the main solution vector/matrix.
Values different from 1 and 2 are treated as 0.
Default value: 0 (normal solution phase)
Incompatibility: If ICNTL(26) = 1 or 2, then error analysis and iterative refinement are disabled
(ICNTL(11) and ICNTL(10))
Related parameters: ICNTL(19), ICNTL(32)
Remarks: If ICNTL(26) ̸= 0, then the user should provide workspace in the pointer array REDRHS,
as well as a leading dimension LREDRHS (see Subsection 5.15). Note that if no Schur complement
was computed, ICNTL(26) = 1 or 2 results in an error.
81
0 : no blocking, it is treated as 1.
> 0 : blocksize = min(id%NRHS,ICNTL(27))
Default value: -32
Remarks: It influences both the memory usage (see INFOG(30) and INFOG(31)) and the
solution time. Larger values of ICNTL(27) lead to larger memory requirements and a better
performance (except if the larger memory requirements induce swapping effects). Tuning
ICNTL(27) is critical, especially when factors are on disk (ICNTL(22)=1 at the factorization
stage) because factors must be accessed once for each block of right-hand sides.
ICNTL(29) defines the parallel ordering tool (when ICNTL(28)=1) to be used to compute the fill-in
reducing permutation. See Subsection 3.2 and Subsection 5.4.
Phase: accessed by host process only during the parallel analysis phase (ICNTL(28)=2).
Possible variables/arrays involved: SYM PERM
Possible values :
0: automatic choice.
1: PT-SCOTCH is used to reorder the input matrix, if available.
2: ParMetis is used to reorder the input matrix, if available.
Other values are treated as 0.
Default value: 0 (automatic choice)
Related parameters: ICNTL(28)
82
Remarks: On output, the pointer array SYM PERM provides access, on the host processor, to the
symmetric permutation that is effectively considered during the analysis phase, and INFOG(7)
to the ordering option that was effectively used. SYM PERM(i), (i=1, ... N) holds the position of
variable i in the pivot order, see Subsection 5.4.1 for a full description.
ICNTL(30) computes a user-specified set of entries in the inverse A−1 of the original matrix (see
Subsection 5.14.4).
Phase: accessed during the solution phase.
Possible variables/arrays involved: NZ RHS, NRHS, RHS SPARSE, IRHS SPARSE, IRHS PTR
Possible values :
0: no entries in A−1 are computed.
1: computes entries in A−1 .
Other values are treated as 0.
Default value: 0 (no entries in A−1 are computed)
Incompatibility: Error analysis and iterative refinement will not be performed, even if the
corresponding options are set (ICNTL(10) and ICNTL(11)). Because the entries of A−1 are
returned in RHS SPARSE on the host, this functionality is incompatible with the distributed solution
option (ICNTL(21)). Furthermore, computing entries of A−1 is not possible in the case of partial
factorizations with a Schur complement (ICNTL(19)). Option to compute solution using A or
AT (ICNTL(9)) is meaningless and thus ignored.
Related parameters: ICNTL(27)
Remarks: When a set of entries of A−1 is requested, the associated set of columns will be
computed in blocks of size ICNTL(27). Larger ICNTL(27) values will most likely decrease the
amount of factor accesses, enable more parallelism and thus reduce the solution time [48, 44, 14].
The user must specify on input to a call of the solve phase in the arrays IRHS PTR and
IRHS SPARSE the target entries. The array RHS SPARSE should be allocated but not initialized.
Note that since selected entries of the inverse of the matrix are requested, NRHS must be set to N. On
output the arrays IRHS PTR, IRHS SPARSE and RHS SPARSE will hold the requested entries. If
duplicate target entries are provided then duplicate solutions will be returned.
When entries of A−1 are requested (ICNTL(30) = 1), mumps par%RHS needs not be allocated.
83
for unsymmetric matrices, if the forward elimination is performed during the factorization
(ICNTL(32) = 1) then the L factor is always discarded during factorization. In this case
(ICNTL(32) = 1), both ICNTL(31) = 0 and ICNTL(31) = 2 have the same behaviour.
ICNTL(32) performs the forward elimination of the right-hand sides (Equation (3)) during the
factorization (JOB=2). (see Subsection 5.13).
Phase: accessed by the host during the analysis phase.
Possible variables/arrays involved: RHS, NRHS, LRHS, and possibly REDRHS, LREDRHS when
ICNTL(26)=1
Possible values :
0: standard factorization not involving right-hand sides.
1: forward elimination (Equation (3)) of the right-hand side vectors is performed during
factorization (JOB=2). The solve phase (JOB=3) will then only involve backward substitution
(Equation (4)).
Other values are treated as 0.
Default value: 0 (standard factorization)
Related parameters: ICNTL(31),ICNTL(26)
Incompatibility: This option is incompatible with sparse right-hand sides (ICNTL(20)=1,2,3),
with the solution of the transposed system (ICNTL(9) ̸= 1), with the computation of entries of
the inverse (ICNTL(30)=1), and with BLR factorizations (ICNTL(35)=1,2,3). In such cases,
error -43 is raised.
Furthermore, iterative refinement (ICNTL(10)) and error analysis (ICNTL(11)) are disabled.
Finally, the current implementation imposes that all right-hand sides are processed in one pass
during the backward step. Therefore, the blocking size (ICNTL(27)) is ignored.
Remarks: The right-hand sides must be dense to use this functionality: RHS, NRHS, and LRHS
should be provided as described in Subsection 5.14.1. They should be provided at the beginning of
the factorization phase (JOB=2) rather than at the beginning of the solve phase (JOB=3).
For unsymmetric matrices, if the forward elimination is performed during factorization
(ICNTL(32) = 1), the L factor (see ICNTL(31)) may be discarded to save space. In fact,
the L factor will then always be discarded (even when ICNTL(31)=0) in the case of a full-rank
factorization (ICNTL(35)=0) or BLR factorization with full-rank solve (ICNTL(35)=3). In the
case of a BLR factorization with ICNTL(35)=1 or 2, only the L factor corresponding to full-rank
frontal matrices are discarded in the current version.
We advise to use this option only for a reasonably small number of dense right-hand side vectors
because of the additional associated storage required when this option is activated and the number
of right-hand sides is large compared to ICNTL(27).
84
Phase: accessed by the host during the factorization phase.
Possible values :
0 : the determinant of the input matrix is not computed.
̸ 0: computes the determinant of the input matrix. The determinant is obtained by computing
=
(a + ib) × 2c where a =RINFOG(12), b =RINFOG(13) and c = INFOG(34). In real
arithmetic b=RINFOG(13) is equal to 0.
Default value: 0 (determinant is not computed)
Related parameters: ICNTL(31)
Remarks: In case a Schur complement was requested (see ICNTL(19)), elements of the Schur
complement are excluded from the computation of the determinant so that the determinant is that
one of matrix A1,1 (using notations of Subsection 3.17).
Although we recommend to compute the determinant on non-singular matrices, null pivot rows
(ICNTL(24)) and static pivots (CNTL(4)) are excluded from the determinant so that a non-zero
determinant is still returned on singular or near-singular matrices. This determinant is then not
unique and will depend on which equations were excluded.
Furthermore, we recommend to switch off scaling (ICNTL(8)) in such cases. If not (ICNTL(8)
̸= 0), we describe in the following the current behaviour of the package:
– if static pivoting (CNTL(4)) is activated: all entries of the scaling arrays ROWSCA and
COLSCA are currently taken into account in the computation of the determinant.
– if the null pivot row detection (ICNTL(24)) is activated, then entries of ROWSCA and
COLSCA corresponding to pivots in PIVNUL LIST are excluded from the determinant so
that
* for symmetric matrices (SYM=1 or 2), the returned determinant correctly corresponds to
the matrix excluding rows and columns of PIVNUL LIST.
* for unsymmetric matrices (SYM=0), scaling may perturb the value of the determinant in
case off-diagonal pivoting has occurred (INFOG(12)̸=0).
Note that if the user is interested in computing only the determinant, we recommend to discard the
factors during factorization ICNTL(31).
ICNTL(34) controls the conservation of the OOC files during JOB=-3 (See Subsection 5.17).
Phase: accessed by the host during the save/restore files deletion phase (JOB=-3) in case of out-of-
core (ICNTL(22)=1).
Possible values :
0: the out-of-core files are marked out for deletion
1: the out-of-core files should not be deleted because another saved instance references them.
Other values are treated as 0.
Default value: 0 (out-of-core files associated to a saved instance are marked out for deletion at the
end of the out-of-core file lifetime)
Remarks: MUMPS will delete only the out-of-core files that are referenced in the saved data
identified by the value of SAVE DIR and SAVE PREFIX. Extra out-of-core files with the same
OOC TMPDIR and OOC PREFIX are not deleted.
ICNTL(35) controls the activation of the BLR feature (see Subsection 5.16).
Phase: accessed by the host during the analysis and during the factorization phases
Possible values :
0 : Standard analysis and factorization (BLR feature is not activated).
1 : BLR feature is activated and automatic choice of BLR option is performed by the software.
2 : BLR feature is activated during both the factorization and solution phases, which allows for
memory gains by storing the factors in low-rank.
85
3 : BLR feature is activated during the factorization phase but not the solution phase, which is still
performed in full-rank. As a consequence, the full-rank factors must be kept and no memory
gains can be obtained. In an OOC context, (ICNTL(22)=1) this option enables the user to
write all factors to disk which is not the case with ICNTL(35)=2 since factors in low-rank
form are not written to disk.
Other values are treated as 0.
Default value: 0 (standard multifrontal factorization).
Related parameters: CNTL(7) (BLR approximations accuracy), ICNTL(36) (BLR factorization
variant), ICNTL(37) (compression of the contribution blocks), ICNTL(38) (estimation of the
compression rate of the factors) and ICNTL(39) (estimation of the compression rate of the
contribution blocks).
Incompatibility: Note that the activation of the BLR feature is currently incompatible with
elemental matrices (ICNTL(5) = 1) (see error -800, subject to change in the future), and when
the forward elimination during the factorization is requested (ICNTL(32) = 1), see error -43.
Remarks: If ICNTL(35)=1, then the automatic choice of BLR option is to activate BLR
feature during both factorization and solution phases (ICNTL(35)=2). In order to activate the
BLR factorization, ICNTL(35) must be equal to 1, 2 or 3 before the analysis, where some
preprocessing on the graph of the matrix is needed to prepare the low-rank factorization. The value
of ICNTL(35) can then be set to any of the above values on entry to the factorization (e.g., taking
into account the values returned by the analysis). On the other hand, if ICNTL(35)=0 at analysis,
only ICNTL(35)=0 is allowed for the factorization (full-rank factorization). When activating
BLR, it is recommended to set ICNTL(35) to 1 or 2 rather than 3 to benefit from memory gains.
ICNTL(36) controls the choice of BLR factorization variant (see Subsection 5.16).
Phase: accessed by the host during the factorization phase when ICNTL(35)=1, 2 or 3
Possible values :
0 : Standard UFSC variant with low-rank updates accumulation (LUA)
1 : UCFS variant with low-rank updates accumulation (LUA). This variant consists in performing
the compression earlier in order to further reduce the number of operations. Although it
may have a numerical impact, the current implementation is still compatible with numerical
pivoting.
Other values are treated as 0.
Default value: 0 (UFSC variant).
Related parameters: ICNTL(35) and CNTL(1)
Remarks: If numerical pivoting is not required and thus CNTL(1) can be set to 0.0, further
performance gains can be expected with the UCFS version.
ICNTL(37) controls the BLR compression of the contribution blocks (see Subsection 5.16).
Phase: accessed by the host during the factorization phase when ICNTL(35)=1, 2 or 3
Possible values :
0 : contribution blocks are not compressed
1 : contribution blocks are compressed, reducing the memory consumption at the cost of some
additional operations
Other values are treated as 0.
Default value: 0 (contribution blocks not compressed).
Related parameters: ICNTL(35), CNTL(7)
86
Possible values : between 0 and 1000 (1000 is no compression and 0 is full compression); other
values are treated as 0; ICNTL(38)/10 is a percentage representing the typical compression of the
compressed factors
factor matrices in BLR fronts: ICNTL(38)/10 = uncompressed factors × 100.
Default value: 600 (when factors of BLR fronts are compressed, their size is 60.0% of their full-
rank size).
Related parameters: ICNTL(35), CNTL(7)
Remarks: Influences statistics provided in INFO(29), INFO(30), INFO(31), INFOG(36),
INFOG(37), INFOG(38), INFOG(39) , but also INFO(32-35) and INFOG(40-43)
87
Phase: accessed by the host during the analysis when centralized ordering is performed,
ICNTL(28)= 0, 1
Possible values :
1: Symbolic factorization based on quotient graph, mixing right looking and left looking updates
2: Column count based symbolic factorization based on [30]
Other values are treated as 2.
Default value: 2
Related parameters: ICNTL(7), ICNTL(28)
Remarks: When symbolic factorization is not performed within the ordering (case of ordering
given, ICNTL(7)=1 or centralized Metis ordering, ICNTL(7)=5) then symbolic factorization
will be automatically performed. When SCOTCH is used, ICNTL(7)=3, a fast block
symbolic factorization (exploiting graph separator information) provided within SCOTCH library,
libesmumps.a, is used.
88
CNTL(3) it is used to determine null pivot rows when the null pivot row detection option is enabled
(ICNTL(24) = 1) .
Phase: accessed by the host during the numerical factorization phase.
Possible values : we define the threshold thres as follows
> 0.0: thres = CN T L(3) × ∥Apre ∥
√
= 0.0: thres = ϵ × ∥Apre ∥ × Nh
< 0.0: thres = |CN T L(3)|
where Apre is the preprocessed matrix to be factorized (see Equation (5)), Nh number of variables
on the deepest branch of the elimination tree, ϵ is the machine precision and ∥.∥ is the infinite norm.
Default value: 0.0
Related parameters: ICNTL(24)
Remarks:
– When null pivot row detection is enabled (ICNTL(24)=1). A pivot is considered to be null
if the infinite norm of its row/column is smaller than thres.
CNTL(4) determines the threshold for static pivoting. See Subsection 3.9
Related parameters: CNTL(1), INFOG(25)
Phase: accessed by the host, and must be set either before the factorization phase, or before the
analysis phase.
Possible values :
< 0.0: static pivoting is not activated.
> 0.0: static pivoting is activated and the pivots whose magnitude is smaller than CNTL(4) will be
set to CNTL(4).
= 0.0: static pivoting is activated and the threshold value to define a small pivot is determined
√
automatically. In the current version, this threshold is equal to ϵ × ∥Apre ∥, where Apre is
the preprocessed matrix to be factored (see Equation (5)).
Default value: -1.0 (no static pivoting)
Related parameters: CNTL(1)
Incompatibility: This option is incompatible with null pivot row detection (ICNTL(24)= 1) ) and
will be ignored.
Remarks: By static pivoting (as in [39]) we mean replacing small pivots whose elimination should
be postponed because of partial threshold pivoting and would thus result in an increase of our
estimations (memory and operations), by a small perturbation of the original matrix controlled by
CNTL(4). The number of modified pivots is returned in INFOG(25).
CNTL(5) defines the fixation for null pivots and is effective only when null pivot row detection is active
(ICNTL(24) = 1).
Phase: accessed by the host during the numerical factorization phase.
Possible values :
≤ 0.0: In the symmetric case (SYM = 2), the pivot column of the L factors is set to zero and the pivot
entry in matrix D is set to one.
In the unsymmetric case (SYM = 0), the fixation is automatically set to a large positive value
and the pivot row of the U factors is set to zero.
> 0.0: when a pivot piv is detected as null, in order to limit the impact of this pivot on the rest of
the matrix, it is set to sign(piv) CNTL(5) ×∥Apre ∥, where Apre is the preprocessed matrix
to be factored (see Equation (5)). We recommend setting CNTL(5) to a large floating-point
value (e.g. 1020 ).
89
Default value: 0.0
Related parameters: ICNTL(24)
90
Functionality (Control) Matrix input format (ICNTL(18) and ICNTL(5))
Centralised Distributed assembled
Assembled Elemental (distr. elemental not avail.)
Unsymmetric (ICNTL(6)) All options Not available Not available
permutations (ICNTL(6)=0) (ICNTL(6)=0)
Scalings (ICNTL(8)) All options Only option 1 Only options 7, 8, or 1
(user-provided) (user-provided)
Constrained/com- (ICNTL(12)) All options Not available Not available
pressed orderings (ICNTL(12)=0) (ICNTL(12)=0)
Type of analysis (ICNTL(28)) Seq. or parallel Sequential only Sequential or parallel
Schur complement (ICNTL(19)) All options except All options except
parallel analysis parallel analysis
Block Low-Rank (ICNTL(35)=1,2,3) All options except Not available All options except
ICNTL(32)=1 ICNTL(32)=1
(fwd in facto) (fwd in facto)
SCALAPACK
OFF ON
Table 3: MUMPS relies on ScaLAPACK to factorize the last dense Schur complement. If exact inertia
(number of negative pivots) or null pivot list is critical, ScaLAPACK can be switched off, see ICNTL(13)
although this might imply a small performance degradation.
91
iterative refinement error analysis
Functionality Control ICNTL(10) ICNTL(11)
Table 4: List of incompatibilities with postprocessing options at the end of the solve phase.
7 Information parameters
The parameters described in this section are returned by MUMPS and hold information that may be of
interest to the user. Some of the information is local to each processor and some only on the host. If an
error is detected (see Section 8), the information may be incomplete.
mumps par%RINFO is a double precision array of dimension 20. It contains the following local
information on the execution of MUMPS:
RINFO(1) - after analysis: The estimated number of floating-point operations on the processor for the
elimination process.
RINFO(2) - after factorization: The number of floating-point operations on the processor for the
assembly process.
RINFO(3) - after factorization: The number of floating-point operations on the processor for the
elimination process. In case the BLR feature is activated (ICNTL(35)=1, 2 or 3), RINFO(3)
represents the theoretical number of operations for the standard full-rank factorization.
RINFO(4) - after factorization: The effective number of floating-point operations on the processor
for the elimination process. It is equal to RINFO(3) when the BLR feature is not activated
(ICNTL(35)=0) and will typically be smaller than RINFO(3) when the BLR feature is activated
and leads to compression.
RINFO(5) - after analysis: if the user decides to perform an out-of-core factorization
(ICNTL(22)=1), then a rough estimation of the size of the disk space in MegaBytes of the
files written by the concerned processor is provided in RINFO(5). If the analysis is full-
rank (ICNTL(35)=0 for the analysis step), then the factorization is necessarily full-rank so that
RINFO(5) is computed for a full-rank factorization (ICNTL(35)=0 also for the factorization).
If ICNTL(35)=1, 2 or 3 at analysis, then RINFO(5) is computed assuming a low-rank (in-
core) storage of the factors of the BLR fronts during the factorization (ICNTL(35)=1 or 2 during
factorization). In case ICNTL(35)=1, 2 or 3 at analysis and the factors are stored in full-rank
format (ICNTL(35)=0 or 3 for the factorization), we refer the user to INFO(3) in order to obtain
a rough estimate of the necessary disk space for the concerned processor.
92
The effective size in MegaBytes of the files written by the current processor will be returned in
RINFO(6), but only after the factorization. The total estimated disk space (sum of the values of
RINFO(5) over all processors) is returned in RINFOG(15).
RINFO(6) - after factorization: in the case of an out-of-core execution (ICNTL(22)=1), the size in
MegaBytes of the disk space used by the files written by the concerned processor is provided. The
total disk space (for all processors) is returned in RINFOG(16).
RINFO(7) - after each job: The size (in MegaBytes) of the file used to save the data on the processor
(See Subsection 5.17).
RINFO(8) - after each job: The size (in MegaBytes) of the MUMPS strucuture.
RINFO(9) - RINFO(40) are not used in the current version.
mumps par%INFO is an integer array of dimension 80. It contains the following local information on
the execution of MUMPS:
INFO(1) is 0 if the call to MUMPS was successful, negative if an error occurred (see Section 8), or
positive if a warning is returned. In particular, after successfully saving or restoring an instance
(call to MUMPS with JOB=7 or JOB=8), INFO(1) will be 0 even if INFO(1) was different from
0 at the moment of saving the MUMPS instance to disk.
INFO(2) holds additional information about the error or the warning. If INFO(1) = -1, INFO(2) is
the processor number (in communicator COMM) on which the error was detected.
INFO(3) - after analysis: Estimated size of the real/complex space needed on the processor to
store the factors, assuming the factors are stored in full-rank format (ICNTL(35)=0 or 3 during
factorization). If INFO(3) is negative, then its absolute value corresponds to millions of
real/complex entries used to store the factor matrices. Assuming that the factors will be stored
in full-rank format during the factorization (ICNTL(35)=0 or 3), a rough estimation of the size of
the disk space in bytes of the files written by the concerned processor can be obtained by multiplying
INFO(3) (or its absolute value multiplied by 1 million when negative) by 4, 8, 8, or 16 for single
precision, double precision, single complex, and double complex arithmetics, respectively. See also
RINFO(5).
Note that, when all factors are discarded (ICNTL(31)=1), INFO(3) corresponds to the factors
storage if factors were not discarded (rather than 0). However, if only the L factor is discarded
(case of forward substitution during factorization, ICNTL(32)=1, or case of ICNTL(31)=2),
then INFO(3) corresponds to the factor storage excluding L.
The effective size of the real/complex space needed to store the factors will be returned in INFO(9)
(see below), but only after the factorization. Furthermore, after an out-of-core factorization
(ICNTL(22)=1), the size of the disk space for the files written by the local processor is returned
in RINFO(6). Finally, the total estimated size of the full-rank factors for all processors (sum of
the INFO(3) values over all processors) is returned in INFOG(3).
INFO(4) - after analysis: Estimated integer space needed on the processor for factors (assuming a
full-rank storage for the factors)
INFO(5) - after analysis: Estimated maximum front size on the processor.
INFO(6) - after analysis: Number of nodes in the complete tree. The same value is returned on all
processors.
INFO(7) - after analysis: Minimum estimated size of the main internal integer workarray IS to run the
numerical factorization in-core .
INFO(8) - after analysis: Minimum estimated size of the main internal real/complex workarray S to
run the numerical factorization in-core when factors are stored full-rank (ICNTL(35)=0 or 3).
If negative, then the absolute value corresponds to millions of real/complex entries needed in this
workarray. It is also the estimated minimum size of LWK USER in that case, if the user provides
WK USER.
93
INFO(9) - after factorization: Size of the real/complex space used on the processor to store the factor
matrices, possibly including low-rank factor matrices (ICNTL(35)=1 or 2). If negative, then the
absolute value corresponds to millions of real/complex entries used to store the factor matrices.
Finally, the total size of the factor matrices for all processors (sum of the INFO(9) values over all
processors) is returned in INFOG(9).
INFO(10) - after factorization: Size of the integer space used on the processor to store the factor
matrices.
INFO(11) - after factorization: Order of the largest frontal matrix processed on the processor.
INFO(12) - after factorization: Number of off-diagonal pivots selected on the processor if SYM=0
or number of negative pivots on the processor if SYM=1 or 2. If ICNTL(13)=0 (the default),
this excludes pivots from the parallel root node treated by ScaLAPACK. (This means that the user
should set ICNTL(13)=1 or use a single processor in order to get the exact number of off-diagonal
or negative pivots rather than a lower bound.) Furthermore, when ICNTL(24) is set to 1 and
SYM=1 or 2, INFOG(12) excludes the null12 pivots, even if their sign is negative. In other words,
a pivot cannot be both null and negative. Note that for complex symmetric matrices (SYM=1 or 2),
INFO(12) will be 0. See also INFOG(12), which provides the total number of off-diagonal
or negative pivots over all processors. For real symmetric matrices, see also INFO(40) and
INFOG(50), which provide the local (resp. global) number of negative pivots among the null
pivots detected when ICNTL(24) is activated.
INFO(13) - after factorization: The number of postponed elimination because of numerical issues.
INFO(14) - after factorization: Number of memory compresses.
INFO(15) - after analysis: estimated size in MegaBytes (millions of bytes) of all working space
to perform full-rank numerical phases (factorization/solve) in-core (ICNTL(22)=0 for the
factorization). The maximum and sum over all processors are returned respectively in INFOG(16)
and INFOG(17). See also INFO(22) which provides the actual memory that was needed but
only after factorization.
INFO(16) - after factorization: total size (in millions of bytes) of all MUMPS internal data allocated
during the numerical factorization. This excludes the memory for WK USER, in the case where
WK USER is provided. The maximum and sum over all processors are returned respectively in
INFOG(18) and INFOG(19).
INFO(17) - after analysis: estimated size in MegaBytes (millions of bytes) of all working space to
run the numerical phases out-of-core (ICNTL(22)̸=0) with the default strategy. The maximum
and sum over all processors are returned respectively in INFOG(26) and INFOG(27). See also
INFO(22) which provides the actual memory that was needed but only after factorization.
INFO(18) - after factorization: local number of null pivot rows detected locally when ICNTL(24)=1
or ICNTL(56)= 1.
INFO(19) - after analysis: Estimated size of the main internal integer workarray IS to run the
numerical factorization out-of-core .
INFO(20) - after analysis: Estimated size of the main internal real/complex workarray S to run the
numerical factorization out-of-core . If negative, then the absolute value corresponds to millions of
real/complex entries needed in this workarray. It is also the estimated minimum size of LWK USER
in that case, if the user provides WK USER.
INFO(21) - after factorization: Effective space used in the main real/complex workarray S– or in the
workarray WK USER, in the case where WK USER is provided. If negative, then the absolute value
corresponds to millions of real/complex entries needed in this workarray.
INFO(22) - after factorization: Size in millions of bytes of memory effectively used during
factorization. This includes the part of the memory effectively used from the workarray WK USER,
in the case where WK USER is provided. The maximum and sum over all processors are
returned respectively in INFOG(21) and INFOG(22). The difference between estimated and
effective memory may results from numerical pivoting difficulties, parallelism and BLR effective
compression rates.
12 i.e., whose magnitude is smaller than the tolerance defined by CNTL(3).
94
INFO(23) - after factorization: total number of pivots eliminated on the processor. In the case of a
distributed solution (see ICNTL(21)), this should be used by the user to allocate solution vectors
ISOL loc and SOL loc of appropriate dimensions (ISOL loc of size INFO(23), SOL loc
of size LSOL loc × NRHS where LSOL loc ≥ INFO(23)) on that processor, between the
factorization and solve steps.
INFO(24) - after analysis: estimated number of entries in the factor matrices on the processor. If
negative, then the absolute value corresponds to millions of entries in the factors. Note that in the
unsymmetric case, INFO(24)=INFO(3). In the symmetric case, however, INFO(24) < INFO(3).
The total number of entries in the factor matrices for all processors (sum of the INFO(24) values
over all processors) is returned in INFOG(20)
INFO(25) - after factorization: number of tiny pivots (number of pivots modified by static pivoting)
detected on the processor (see INFOG(25) for the the total number of tiny pivots).
INFO(26) - after solution: effective size in MegaBytes (millions of bytes) of all working space to run
the solution phase. (The maximum and sum over all processors are returned in INFOG(30) and
INFOG(31), respectively).
INFO(27) - after factorization: effective number of entries in factor matrices assuming full-rank
factorization has been performed. If negative, then the absolute value corresponds to millions of
entries in the factors. Note that in case full-rank storage of factors (ICNTL(35)=0 or 3), we have
INFO(27)=INFO(9) in the unsymmetric case and INFO(27) ≤ INFO(9) in the symmetric case.
The sum of INFO(27) over all processors is available in INFOG(29).
INFO(28) - after factorization: effective number of entries in factors on the processor taking into
account BLR compression. If negative, then the absolute value corresponds to millions of entries in
the factors. It is equal to INFO(27) when BLR functionality (see ICNTL(35)) is not activated
or leads to no compression.
INFO(29) - after analysis: minimum estimated size of the main internal real/complex workarray S
to run the numerical factorization in-core when factors are stored low-rank (ICNTL(35)=1,2).
If negative, then the absolute value corresponds to millions of real/complex entries needed in this
workarray. It is also the estimated minimum size of LWK USER in that case, if the user provides
WK USER.
INFO(30) and INFO(31) - after analysis: estimated size in MegaBytes (millions of bytes) of all
working space to perform low-rank numerical phases (factorization/solve) with low-rank factors
(ICNTL(35)=1,2) and estimated compression rate given by ICNTL(38).
• —– (30) in-core factorization and solve The maximum and sum over all processors are
returned respectively in INFOG(36) and INFOG(37).
• —– (31) out-of-core factorization and solve The maximum and sum over all processors are
returned respectively in INFOG(38) and INFOG(39).
See also INFO(22) which provides the actual memory that was needed but only after
factorization. Numerical pivoting difficulties and the effective compression of the factors (in case
ICNTL(35)=1,2) typically impact the difference between estimated and effective memory.
INFO(32) - after analysis: minimum estimated size of the main internal real/complex workarray S
to run the numerical factorization in-core when factors and contribution blocks are stored low-
rank (ICNTL(35)=1,2 and ICNTL(37)=1). If negative, then the absolute value corresponds to
millions of real/complex entries needed in this workarray. It is also the estimated minimum size of
LWK USER in that case, if the user provides WK USER.
INFO(33) - after analysis: minimum estimated size of the main internal real/complex workarray S to
run the numerical factorization out-of-core when factors and contribution blocks are stored low-
rank (ICNTL(35)=1,2 and ICNTL(37)=1). If negative, then the absolute value corresponds to
millions of real/complex entries needed in this workarray. It is also the estimated minimum size of
LWK USER in that case, if the user provides WK USER.
INFO(34) and INFO(35) - after analysis: estimated size in MegaBytes (millions of bytes) of
all working space to perform low-rank numerical phases (factorization/solve) with low-rank
factors and low-rank contribution blocks (ICNTL(35)=1,2 and ICNTL(37)=1) and estimated
compression rates given by ICNTL(38) and ICNTL(39) relatively.
95
• —– (34) in-core factorization and solve. The maximum and sum over all processors are
returned respectively in INFOG(40) and INFOG(41).
• —– (35) out-of-core factorization and solve The maximum and sum over all processors are
returned respectively in INFOG(42) and INFOG(43).
See also INFO(22) which provides the actual memory that was needed but only after factorization.
INFO(36) - after analysis: minimum estimated size of the main internal real/complex workarray
S to run the numerical factorization out-of-core when contribution blocks are stored low-rank
(ICNTL(35)=0,3 and ICNTL(37)=1). If negative, then the absolute value corresponds to
millions of real/complex entries needed in this workarray. It is also the estimated minimum size
of LWK USER in that case, if the user provides WK USER.
INFO(37) and INFO(38) - after analysis: estimated size in MegaBytes (millions of bytes) of
all working space to perform low-rank numerical phases (factorization/solve) with low-rank
contribution blocks only (ICNTL(35)=0,3 and ICNTL(37)=1) and estimated compression rate
given by ICNTL(39).
• —– (37) in-core factorization and solve. The maximum and sum over all processors are
returned respectively in INFOG(44) and INFOG(45).
• —– (38) out-of-core factorization and solve The maximum and sum over all processors are
returned respectively in INFOG(46) and INFOG(47).
See also INFO(22) which provides the actual memory that was needed but only after factorization.
INFO(39) - after factorization: effective size of the main internal real/complex workarray S (allocated
internally or by the user when WK USER is provided) to run the numerical factorization. If negative,
then the absolute value corresponds to millions of real/complex entries needed in this workarray.
INFO(40) - after factorization: can only be nonzero for real symmetric matrices, in case the null pivot
row detection (see ICNTL(24)) feature is activated. INFO(40) is the number of negative pivots
among the null pivots/deficiency detected. Note that, for singular matrices, INFO(40) may vary
from one run to another due to floating-point rounding effects. A pivot counted in INFO(12),
the number of negative non-null pivots, will not be counted in INFO(40). See also INFOG(28)
which provides the number of null pivots/deficiency over all processors and INFOG(50), which
provides the number of negative null pivots over all processors.
INFO(41) - INFO(80) are not used in the current version.
mumps par%RINFOG is a double precision array of dimension 20. It contains the following global
information on the execution of MUMPS:
RINFOG(1) - after analysis: the estimated number of floating-point operations (on all processors) for
the elimination process.
RINFOG(2) - after factorization: the total number of floating-point operations (on all processors) for
the assembly process.
RINFOG(3) - after factorization: the total number of floating-point operations (on all processors) for
the elimination process. In case the BLR feature is activated (ICNTL(35)=1, 2 or 3), RINFOG(3)
represents the theoretical number of operations for the standard full-rank factorization.
RINFOG(4) to RINFOG(8) - after solve with error analysis: Only returned if ICNTL(11) = 1 or 2.
See description in Subsection 5.7 .
RINFOG(9) to RINFOG(11) - after solve with error analysis: Only returned if ICNTL(11) = 2.
See description in Subsection 5.7 .
RINFOG(12) - after factorization: if the computation of the determinant was requested (see
ICNTL(33)), RINFOG(12) contains the real part of the determinant. The determinant may
contain an imaginary part in case of complex arithmetic (see RINFOG(13)). It is obtained by
multiplying (RINFOG(12), RINFOG(13)) by 2 to the power INFOG(34).
96
RINFOG(13) - after factorization: if the computation of the determinant was requested (see
ICNTL(33)), RINFOG(13) contains the imaginary part of the determinant. The determinant
is then obtained by multiplying (RINFOG(12), RINFOG(13)) by 2 to the power INFOG(34).
RINFOG(14) - after factorization: the total effective number of floating-point operations (on all
processors) for the elimination process. It is equal to RINFOG(3) when the BLR feature is
not activated (ICNTL(35)=0) and will typically be smaller than RINFOG(3) when the BLR
functionality is activated and leads to compression.
RINFOG(15) - after analysis: if the user decides to perform an out-of-core factorization
(ICNTL(22)=1), then a rough estimation of the total size of the disk space in MegaBytes of
the files written by all processors is provided in RINFOG(15). If the analysis is full-rank
(ICNTL(35)=0 for the analysis step), then the factorization is necessarily full-rank so that
RINFOG(15) is computed for a full-rank factorization (ICNTL(35)=0 also for the factorization).
If ICNTL(35)=1, 2 or 3 at analysis, then RINFOG(15) is computed assuming a low-rank (in-
core) storage of the factors of the BLR fronts during the factorization (ICNTL(35)=2 during
factorization). In case ICNTL(35)=1, 2 or 3 for the analysis and the factors will be stored in full-
rank format (ICNTL(35)=0 or 3 for the factorization), we refer the user to INFOG(3) in order
to obtain a rough estimate of the necessary disk space for all processors.
The effective size in Megabytes of the files written by all processors will be returned in
RINFOG(16), but only after the factorization.
RINFOG(16) - after factorization: in the case of an out-of-core execution (ICNTL(22)=1), the total
size in MegaBytes of the disk space used by the files written by all processors is provided.
RINFOG(17) - after each job: sum over all processors of the sizes (in MegaBytes) of the files used to
save the instance (See Subsection 5.17).
RINFOG(18) - after each job: sum over all processors of the sizes (in MegaBytes) of the MUMPS
structures.
RINFOG(19) - RINFOG(40) are not used in the current version.
mumps par%INFOG is an integer array of dimension 80. It contains the following global information on
the execution of MUMPS:
INFOG(1) is 0 if the last call to MUMPS was successful, negative if an error occurred (see Section 8),
or positive if a warning is returned. In particular, after successfully saving or restoring an instance
(call to MUMPS with JOB=7 or JOB=8), INFOG(1) will be 0 even if INFOG(1) was different
from 0 at the moment of saving the MUMPS instance to disk.
INFOG(2) holds additional information about the error or the warning.
The difference between INFOG(1:2) and INFO(1:2) is that INFOG(1:2) is identical on all processors. It
has the value of INFO(1:2) of the processor which returned with the most negative INFO(1) value. For
example, if processor p returns with INFO(1)=-13, and INFO(2)=10000, then all other processors will
return with INFOG(1)=-13 and INFOG(2)=10000, and with INFO(1)=-1 and INFO(2)=p.
INFOG(3) - after analysis: total (sum over all processors) estimated real/complex workspace to
store the factors, assuming the factors are stored in full-rank format (ICNTL(35)=0 or 3). If
INFOG(3) is negative, then its absolute value corresponds to millions of real/complex entries
used to store the factor matrices. Assuming that the factors will be stored in full-rank format during
the factorization (ICNTL(35)=0 or 3), a rough estimation of the size of the disk space in bytes
of the files written all processors can be obtained by multiplying INFOG(3) (or its absolute value
multiplied by 1 million when negative) by 4, 8, 8, or 16 for single precision, double precision,
single complex, and double complex arithmetics, respectively. See also RINFOG(15).
Note that, when all factors are discarded (ICNTL(31)=1), INFOG(3) corresponds to the factors
storage if factors were not discarded (rather than 0). However, if only the L factor is discarded
(case of forward substitution during factorization, ICNTL(32)=1, or case of ICNTL(31)=2),
then INFO(3) corresponds to the factor storage excluding L.
The effective size of the real/complex space needed will be returned in INFOG(9) (see below),
but only after the factorization. Furthermore, after an out-of-core factorization, the size of the disk
space for the files written by all processors is returned in RINFOG(16).
97
INFOG(4) - after analysis: total (sum over all processors) estimated integer workspace to store the
factor matrices (assuming a full-rank storage of the factors). If INFOG(4) is negative, then its
absolute value corresponds to millions of integer entries used to store the factor matrices.
INFOG(5) - after analysis: estimated maximum front size in the complete tree.
INFOG(6) - after analysis: number of nodes in the complete tree.
INFOG(7) - after analysis: the ordering method actually used. The returned value will depend on
the type of analysis performed, e.g. sequential or parallel (see INFOG(32)). Please refer to
ICNTL(7) and ICNTL(29) for more details on the ordering methods available in sequential and
parallel analysis respectively.
INFOG(8) - after analysis: structural symmetry in percent (100 : symmetric, 0 : fully unsymmetric)
of the (permuted) matrix, -1 indicates that the structural symmetry was not computed (which
will be the case if the input matrix is in elemental form or if analysis by block was performed
(ICNTL(15))).
INFOG(9) - after factorization: total (sum over all processors) real/complex workspace to store the
factor matrices, possibly including low-rank factor matrices (ICNTL(35)=2). If negative, then
the absolute value corresponds to the size in millions of real/complex entries used to store the factor
matrices.
INFOG(10) - after factorization: total (sum over all processors) integer workspace to store the factor
matrices. If negative the absolute value corresponds to millions of integer entries in the integer
workspace to store the factor matrices.
INFOG(11) - after factorization: order of largest frontal matrix.
INFOG(12) - after factorization: total number of off-diagonal pivots if SYM=0 or total number of
negative pivots (real arithmetic) if SYM=1 or 2. If ICNTL(13)=0 (the default) this excludes
pivots from the parallel root node treated by ScaLAPACK. (This means that the user should set
ICNTL(13) to a positive value, say 1, or use a single processor in order to get the exact number
of off-diagonal or negative pivots rather than a lower bound.) Furthermore, when ICNTL(24) is
set to 1 and SYM=1 or 2, INFOG(12) excludes the null13 pivots, even if their sign is negative. In
other words, a pivot cannot be both null and negative.
Note that if SYM=1 or 2, INFOG(12) will be 0 for complex symmetric matrices. For real
symmetric matrices, see also INFOG(40), provides the number of negative pivots among the
null pivots detected when ICNTL(24) is activated.
INFOG(13) - after factorization: total number of delayed pivots. A variable of the original matrix may
be delayed several times between successive frontal matrices. In that case, it accounts for several
delayed pivots. A large number (more that 10% of the order of the matrix) indicates numerical
problems. Settings related to numerical preprocessing (ICNTL(6),ICNTL(8), ICNTL(12))
might then be modified by the user.
INFOG(14) - after factorization: total number of memory compresses.
INFOG(15) - after solution: number of steps of iterative refinement.
INFOG(16) and INFOG(17) - after analysis: estimated size (in million of bytes) of all MUMPS
internal data for running full-rank factorization in-core for a given value of ICNTL(14).
• —– (16) : max over all processors
• —– (17) : sum over all processors.
INFOG(18) and INFOG(19) - after factorization: size in millions of bytes of all MUMPS internal data
allocated during factorization.
• —– (18) : max over all processors
• —– (19) : sum over all processors.
Note that in the case where WK USER is provided, the memory allocated by the user for the local
arrays WK USER is not counted in INFOG(18) and INFOG(19).
13 i.e., whose magnitude is smaller than the tolerance defined by CNTL(3).
98
INFOG(20) - after analysis: estimated number of entries in the factors assuming full-rank factorization.
If negative the absolute value corresponds to millions of entries in the factors. Note that in the
unsymmetric case, INFOG(20)=INFOG(3). In the symmetric case, however, INFOG(20) <
INFOG(3).
INFOG(21) and INFOG(22) - after factorization: size in millions of bytes of memory effectively
used during factorization.
• —– (21) : max over all processors
• —– (22) : sum over all processors.
This includes the memory effectively used in the local workarrays WK USER, in the case where the
arrays WK USER are provided.
INFOG(23) - after analysis: value of ICNTL(6) effectively used.
INFOG(24) - after analysis: value of ICNTL(12) effectively used.
INFOG(25) - after factorization: number of tiny pivots (number of pivots modified by static pivoting)
INFOG(26) and INFOG(27) - after analysis: estimated size (in millions of bytes) of all MUMPS
internal data for running full-rank factorization out-of-core (ICNTL(22)̸= 0) for a given value
of ICNTL(14).
• —– (26) : max over all processors
• —– (27) : sum over all processors
INFOG(28) - after factorization: number of null pivot rows encountered. See ICNTL(24) and
CNTL(3) for the definition of a null pivot row.
INFOG(29) - after factorization: effective number of entries in the factor matrices (sum over all
processors) assuming that full-rank factorization has been performed. If negative, then the absolute
value corresponds to millions of entries in the factors. Note that in case the factor matrices are
stored full-rank (ICNTL(35)=0 or 3), we have INFOG(29)=INFOG(9) in the unsymmetric case
and INFOG(29) ≤ INFOG(9) in the symmetric case.
INFOG(30) and INFOG(31) - after solution: size in millions of bytes of memory effectively used
during solution phase:
• —– (30) : max over all processors
• —– (31) : sum over all processors
INFOG(32) - after analysis: the type of analysis actually done (see ICNTL(28)). INFOG(32) has
value 1 if sequential analysis was performed, in which case INFOG(7) returns the sequential
ordering option used, as defined by ICNTL(7). INFOG(32) has value 2 if parallel analysis
was performed, in which case INFOG(7) returns the parallel ordering used, as defined by
ICNTL(29).
INFOG(33): effective value used for ICNTL(8). It is set both after the analysis and the factorization
phases. If ICNTL(8)=77 on entry to the analysis and INFOG(33) has value 77 on exit from the
analysis, then no scaling was computed during the analysis and the automatic decision will only be
done during factorization (except if the user modifies ICNTL(8) to set a specific option on entry
to the factorization).
INFOG(34): if the computation of the determinant was requested (see ICNTL(33)), INFOG(34)
contains the exponent of the determinant. See also RINFOG(12) and RINFOG(13): the
determinant is obtained by multiplying (RINFOG(12), RINFOG(13)) by 2 to the power
INFOG(34).
INFOG(35) - after factorization: effective number of entries in the factors (sum over all processors)
taking into account BLR factor compression. If negative, then the absolute value corresponds
to millions of entries in the factors. It is equal to INFOG(29) when BLR functionality (see
ICNTL(35)) is not activated or leads to no compression.
INFOG(36), INFOG(37), INFOG(38), and INFOG(39) - after analysis: estimated size (in million
of bytes) of all MUMPS internal data for running low-rank factorization with low-rank factors for a
given value of ICNTL(14) and ICNTL(38).
99
• in-core
. —– (36) : max over all processors
. —– (37) : sum over all processors.
• out-of-core
. —– (38) : max over all processors
. —– (39) : sum over all processors.
INFOG(40), INFOG(41), INFOG(42), and INFOG(43) - after analysis: estimated size (in million
of bytes) of all MUMPS internal data for running low-rank factorization with low-rank factors and
low-rank contribution blocks for a given value of ICNTL(14), ICNTL(38) and ICNTL(39).
• in-core
. —– (40) : value on the most memory consuming processor.
. —– (41) : sum over all processors.
• out-of-core
. —– (42) : value on the most memory consuming processor.
. —– (43) : sum over all processors.
INFOG(44), INFOG(45), INFOG(46), and INFOG(47) - after analysis: estimated size (in million
of bytes) of all MUMPS internal data for running low-rank factorization with low-rank low-rank
contribution blocks only for a given value of ICNTL(14) and ICNTL(39).
• in-core
. —– (44) : value on the most memory consuming processor.
. —– (45) : sum over all processors.
• out-of-core
. —– (46) : value on the most memory consuming processor.
. —– (47) : sum over all processors.
INFOG(48) - INFOG(49) are reserved.
INFOG(50) - after factorization: can only be nonzero for symmetric matrices, in case the null pivot row
detection (see ICNTL(24)) feature is activated. INFOG(50) is the total, over all MPI processes,
number of negative pivots among the null pivots/deficiency detected. Note that, for singular
matrices, INFOG(50) may vary from one run to another due to floating-point rounding effects.
A pivot counted in INFOG(12), the number of negative non-null pivots, will not be counted in
INFOG(50). See also INFOG(28), the number of null pivots/deficiency, and INFO(40), the
local static for INFOG(50) on a given MPI process.
INFOG(51) - INFOG(80) are not used in the current version.
100
that did not produce an error will set INFO(1) to -1 and INFO(2) to the rank of the processor having
the most negative error code.
The behaviour is slightly different for the global information parameters INFOG(1) and INFOG(2):
in the previous example, all processors would return with INFOG(1) = -8 and INFOG(2)=1000.
The possible error codes returned in INFO(1) (and INFOG(1)) have the following meaning:
-1 An error occurred on processor INFO(2).
P P
loc or
-2 NNZ/NZ, NNZ loc/NZ P NNZ loc/ NZ loc are out of range.
P INFO(2)=NNZ/NZ,
NNZ loc/NZ loc or NNZ loc/ NZ loc.
-3 MUMPS was called with an invalid value for JOB. This may happen if the analysis (JOB=1) was not
performed (or failed) before the factorization (JOB=2), or the factorization was not performed (or
failed) before the solve (JOB=3), or the initialization phase (JOB=-1) was performed a second time
on an instance not freed (JOB=-2). See description of JOB in Section 4. This error also occurs if
JOB does not contain the same value on all processes on entry to MUMPS. INFO(2) is then set to
the local value of JOB.
-4 Error in user-provided permutation array PERM IN at position INFO(2). This error may only occur
on the host.
-5 Problem of real workspace allocation of size INFO(2) during analysis. The unit for INFO(2)
is the number of real values (single precision for SMUMPS/CMUMPS, double precision for
DMUMPS/ZMUMPS), in the Fortran “ALLOCATE” statement that did not succeed. If INFO(2)
is negative, then its absolute value should be multiplied by 1 million.
-6 Matrix is singular in structure. INFO(2) holds the structural rank.
-7 Problem of integer workspace allocation of size INFO(2) during analysis. The unit for INFO(2) is
the number of integer values that MUMPS tried to allocate in the Fortran ALLOCATE statement that
did not succeed. If INFO(2) is negative, then its absolute value should be multiplied by 1 million.
-8 Main internal integer workarray IS too small for factorization. This may happen, for example, if
numerical pivoting leads to significantly more fill-in than was predicted by the analysis. The user
should increase the value of ICNTL(14) before calling the factorization again (JOB=2).
-9 The main internal real/complex workarray S is too small. If INFO(2) is positive, then the number
of entries that are missing in S at the moment when the error is raised is available in INFO(2).
If INFO(2) is negative, then its absolute value should be multiplied by 1 million. If an error –9
occurs, the user should increase the value of ICNTL(14) before calling the factorization (JOB=2)
again, except if LWK USER is provided LWK USER should be increased.
-10 Numerically singular matrix. INFO(2) holds the number of eliminated pivots.
-11 Internal real/complex workarray S or LWK USER too small for solution. If INFO(2) is positive,
then the number of entries that are missing in S/LWK USER at the moment when the error is raised
is available in INFO(2). If the numerical phases are out-of-core and LWK USER is provided for
the solution phase and is smaller than the value provided for the factorization, it should be increased
by at least INFO(2). In other cases, please contact us.
-12 Internal real/complex workarray S too small for iterative refinement. Please contact us.
-13 Problem of workspace allocation of size INFO(2) during the factorization or solve steps. The size
that the package tried to allocate with a Fortran ALLOCATE statement is available in INFO(2).
If INFO(2) is negative, then the size that the package requested is obtained by multiplying the
absolute value of INFO(2) by 1 million. In general, the unit for INFO(2) is the number of scalar
entries of the type of the input matrix (real, complex, single or double precision).
-14 Internal integer workarray IS too small for solution. See error INFO(1) = -8.
-15 Integer workarray IS too small for iterative refinement and/or error analysis. See error INFO(1) =
-8.
-16 N is out of range. INFO(2)=N.
-17 The internal send buffer that was allocated dynamically by MUMPS on the processor is too small.
The user should increase the value of ICNTL(14) before calling MUMPS again.
101
-18 The blocking size for multiple RHS (ICNTL(27)) is too large and may lead to an integer overflow.
This error may only occurs for very large matrices with large values of ICNTL(27) (e.g., several
thousands). INFO(2) provides an estimate of the maximum value of ICNTL(27) that should be
used.
-19 The maximum allowed size of working memory ICNTL(23) is too small to run the factorization
phase and should be increased. If INFO(2) is positive, then the number of entries that are missing
at the moment when the error is raised is available in INFO(2). If INFO(2) is negative, then its
absolute value should be multiplied by 1 million.
-20 The internal reception buffer that was allocated dynamically by MUMPS is too small. Normally, this
error is raised on the sender side when detecting that the message to be sent is too large for the
reception buffer on the receiver. INFO(2) holds the minimum size of the reception buffer required
(in bytes). The user should increase the value of ICNTL(14) before calling MUMPS again.
-21 Value of PAR=0 is not allowed because only one processor is available; Running MUMPS in host-
node mode (the host is not a slave processor itself) requires at least two processors. The user should
either set PAR to 1 or increase the number of processors.
-22 A pointer array is provided by the user that is either
• not associated, or
• has insufficient size, or
• is associated and should not be associated (for example, RHS on non-host processors).
INFO(2) points to the incorrect pointer array in the table below:
INFO(2) array
1 IRN or ELTPTR
2 JCN or ELTVAR
3 PERM IN
4 A or A ELT
5 ROWSCA
6 COLSCA
7 RHS
8 LISTVAR SCHUR
9 SCHUR
10 RHS SPARSE
11 IRHS SPARSE
12 IRHS PTR
13 ISOL loc
14 SOL loc
15 REDRHS
16 IRN loc, JCN loc or A loc
17 IRHS loc
18 RHS loc
-23 MPI was not initialized by the user prior to a call to MUMPS with JOB = –1.
-24 NELT is out of range. INFO(2)=NELT.
-25 A problem has occurred in the initialization of the BLACS. This may be because you are using a
vendor’s BLACS. Try using a BLACS version from netlib instead.
-26 LRHS is out of range. INFO(2)=LRHS.
-27 NZ RHS and IRHS PTR(NRHS+1) do not match. INFO(2) = IRHS PTR(NRHS+1).
-28 IRHS PTR(1) is not equal to 1. INFO(2) = IRHS PTR(1).
-29 LSOL loc is smaller than INFO(23). INFO(2)=LSOL loc.
-30 SCHUR LLD is out of range. INFO(2) = SCHUR LLD.
-31 A 2D block cyclic symmetric (SYM=1 or 2) Schur complement is required with the option
ICNTL(19)=3, but the user has provided a process grid that does not satisfy the constraint
MBLOCK=NBLOCK. INFO(2)=MBLOCK-NBLOCK.
-32 Incompatible values of NRHS and ICNTL(25). Either ICNTL(25) was set to -1 and NRHS is
different from INFOG(28); or ICNTL(25) was set to i, 1 ≤ i ≤ INFOG(28) and NRHS is
different from 1. Value of NRHS is stored in INFO(2).
102
-33 ICNTL(26) was asked for during solve phase (or during the factorization – see ICNTL(32))
but the Schur complement was not asked for at the analysis phase (ICNTL(19)).
INFO(2)=ICNTL(26).
-34 LREDRHS is out of range. INFO(2)=LREDRHS.
-35 This error is raised when the expansion phase is called (ICNTL(26) = 2) but reduction phase
(ICNTL(26)=1) was not called before. This error also occurs in case the reduction phase
(ICNTL(26) = 1) is asked for at the solution phase (JOB=3) but the forward elimination
was already performed during the factorization phase (JOB=2 and ICNTL(32)=1). INFO(2)
contains the value of ICNTL(26).
-36 Incompatible values of ICNTL(25) and INFOG(28). The value of ICNTL(25) is stored in
INFO(2).
-37 Value of ICNTL(25) incompatible with some other parameter. If ICNTL(25) is incompatible
with ICNTL(xx), the index xx is stored in INFO(2).
-38 Parallel analysis was set (i.e., ICNTL(28)=2) but PT-SCOTCH or ParMetis were not provided.
-39 Incompatible values for ICNTL(28) and ICNTL(5) and/or ICNTL(19) and/or ICNTL(6).
Parallel analysis is not possible in the cases where the matrix is unassembled and/or a Schur
complement is requested and/or a maximum transversal is requested on the matrix.
-40 The matrix was indicated to be positive definite (SYM=1) by the user but a negative or null pivot
was encountered during the processing of the root by ScaLAPACK. SYM=2 should be used.
-41 Incompatible value of LWK USER between factorization and solution phases. This error may
only occur when the factorization is in-core (ICNTL(22)=1), in which case both the contents
of WK USER and LWK USER should be passed unchanged between the factorization (JOB=2) and
solution (JOB=3) phases.
-42 ICNTL(32) was set to 1 (forward during factorization), but the value of NRHS on the host
processor is incorrect: either the value of NRHS provided at analysis is negative or zero, or the
value provided at factorization or solve is different from the value provided at analysis. INFO(2)
holds the value of id%NRHS that was provided at analysis.
-43 Incompatible values of ICNTL(32) and ICNTL(xx). The index xx is stored in INFO(2).
-44 The solve phase (JOB=3) cannot be performed because the factors or part of the factors are not
available. INFO(2) contains the value of ICNTL(31).
-45 NRHS ≤ 0. INFO(2) contains the value of NRHS.
-46 NZ RHS ≤ 0. This is currently not allowed in case of reduced right-hand-side (ICNTL(26)=1) and
in case entries of A−1 are requested (ICNTL(30)=1). INFO(2) contains the value of NZ RHS.
-47 Entries of A−1 were requested during the solve phase (JOB=3, ICNTL(30)=1) but the constraint
NRHS=N is not respected. The value of NRHS is provided in INFO(2).
-48 A−1 Incompatible values of ICNTL(30) and ICNTL(xx). xx is stored in INFO(2).
-49 SIZE SCHUR has an incorrect value: SIZE SCHUR < 0 or SIZE SCHUR ≥N, or SIZE SCHUR
was modified on the host since the analysis phase. The value of SIZE SCHUR is provided in
INFO(2).
-50 An error occurred while computing the fill-reducing ordering during the analysis phase. This
commonly happens when an (external) ordering tool returns an error code or a wrong result.
-51 An external ordering (Metis/ParMetis, SCOTCH/PT-SCOTCH, PORD), with 32-bit default
integers, is invoked to processing a graph of size larger than 231 − 1. INFO(2) holds the size
required to store the graph as a number of integer values; it is negative and its absolute value should
be multiplied by 1 million.
-52 When default Fortran integers are 64 bit (e.g. Fortran compiler flag -i8 -fdefault-integer-8 or
something equivalent depending on your compiler) then external ordering libraries (Metis/ParMetis,
SCOTCH/PT-SCOTCH, PORD) should also have 64-bit default integers. INFO(2) = 1, 2, 3
means that respectively Metis/ParMetis, SCOTCH/PT-SCOTCH or PORD were invoked and were
not generated with 64-bit default integers.
103
-53 Internal error that could be due to inconsistent input data between two consecutive calls.
-54 The analysis phase (JOB=1) was called with ICNTL(35)=0 but the factorization phase was
called with ICNTL(35)=1, 2 or 3. In order to perform the factorization with BLR compression,
please perform the analysis phase again using ICNTL(35)=1, 2 or 3 (see the documentation of
ICNTL(35)).
-55 During a call to MUMPS including the solve phase with distributed right-hand side, LRHS loc was
detected to be smaller than Nloc RHS. INFO(2)=LRHS loc.
-56 During a call to MUMPS including the solve phase with distributed right-hand side and distributed
solution, RHS loc and SOL loc point to the same workarray but LRHS loc < LSOL loc.
INFO(2)=LRHS loc.
-57 During a call to MUMPS analysis phase with a block format
(ICNTL(15) ̸= 0), an error in the interface provided by the user
was detected. INFO(2) holds additional information about the issue:
INFO(2) issue
1 NBLK is incorrect (or not compatible with BLKPTR size),
or -ICNTL(15) is not compatible with N
2 BLKPTR is not provided or its content is incorrect
3 BLKVAR if provided should be of size N
-70 During a call to MUMPS with JOB=7, the file specified to save the current instance, as derived from
SAVE DIR and/or SAVE PREFIX, already exists. Before saving an instance into this file, it should
be first suppressed (see JOB=-3). Otherwise, a different file should be specified by changing the
values of SAVE DIR and/or SAVE PREFIX.
-71 An error has occured during the creation of one of the files needed to save MUMPS data (JOB=7).
-72 Error while saving data (JOB=7); a write operation did not succeed (e.g., disk full, I/O error, . . . ).
INFO(2) is the size that should have been written during that operation.
If INFO(2) is negative, then its absolute value should be multiplied by 1 million.
-73 During a call to MUMPS with JOB=8, one parameter of the current instance is not compatible with
the corresponding one in the saved instance.
INFO(2) points to the incorrect parameter in the table below:
INFO(2) parameter
1 fortran version (after/before 2003)
2 integer size(32/64 bit)
3 saved instance not compatible over MPI processes
4 number of MPI processes
5 arithmetic
6 SYM
7 PAR
-74 The file resulting from the setting of SAVE DIR and SAVE PREFIX could not be opened for
restoring data (JOB=8). INFO(2) is the rank of the process (in the communicator COMM) on
which the error was detected.
-75 Error while restoring data (JOB=8); a read operation did not succeed (e.g., end of file reached, I/O
error, . . . ). INFO(2) is the size still to be read. If INFO(2) is negative, then the size that the
package requested is obtained by multiplying the absolute value of INFO(2) by 1 million.
-76 Error while deleting the files (JOB=-3); some files to be erased were not found or could not be
suppressed. INFO(2) is the rank of the process (in the communicator COMM) on which the error
was detected.
-77 Neither SAVE DIR nor the environment variable MUMPS SAVE DIR are defined.
-78 Problem of workspace allocation during the restore step. The size still to be allocated is available
in INFO(2). If INFO(2) is negative, then the size that the package requested is obtained by
multiplying the absolute value of INFO(2) by 1 million.
-79 MUMPS could not find a Fortran file unit to perform I/O’s. INFO(2) provides additional
information on the error:
• INFO(2)=1: the problem occurs in the analysis phase, when attempting to find a free Fortran
unit for the WRITE PROBLEM feature (see Subsection 5.2.3).
104
• INFO(2)=2: the problem occurs during a call to MUMPS with JOB=7 or 8 (save-restore
feature, see Subsection 3.19).
-89 Internal error during SCOTCH kway-partitioning in SCOTCHFGRAPHPART. METIS package
should be made available to MUMPS.
-90 Error in out-of-core management. See the error message returned on output unit ICNTL(1) for
more information.
-800 Temporary error associated to the current MUMPS release, subject to change or disappearance
in the future. If INFO(2)=5, then this error is due to the fact that the elemental matrix format
(ICNTL(5)=1) is currently incompatible with a BLR factorization (ICNTL(35)̸=0).
A positive value of INFO(1) is associated with a successful MUMPS execution but with a warning.
The corresponding warning message will be output on unit ICNTL(2) when ICNTL(4) ≥ 2.
+1 Index (in IRN or JCN) out of range. Action taken by subroutine is to ignore any such entries and
continue. INFO(2) is set to the number of faulty entries.
+2 During error analysis the max-norm of the computed solution is close to zero. In some cases, this
could cause difficulties in the computation of RINFOG(6).
+4 ICNTL(49)=1,2 and not enough memory to compact id%S at the end of the factorization.
+8 Warning return from the iterative refinement routine. More than ICNTL(10) iterations are required.
+ Combinations of the above warnings will correspond to summing the constituent warnings. For
example, if an MPI process exits the package with INFO(1)=6, this indicates that both warnings
+2 and +4 occurred on this MPI process. In case several warnings occur on an MPI process,
INFO(2) corresponds to the warning that occurred last. Finally, in case of multiple MPI processes,
INFOG(1) combines the warnings raised on the different MPI processes (for example, INFOG(1)
will be equal to 5 if INFO(1)=1, 1 and 4 on MPI processes with ranks 0, 1 and 2, respectively). In
that case, INFOG(2) is simply set to the number of MPI processes on which a warning occurred
and the values of INFO(1) and INFO(2) on each MPI process can be checked for more detailed
information.
105
typedef struct
{
int sym, par, job;
int comm fortran; /* Fortran communicator */
int icntl[60];
real cntl[15];
int n;
/* Assembled entry */
int nz; int64 t nnz; int *irn; int *jcn; real/complex *a;
/* Distributed entry */
int nz loc; int *irn loc; int *jcn loc; real/complex *a loc;
/* Element entry */
int nelt; int *eltptr; int *eltvar; real/complex *a elt;
/* Ordering, if given by user, Metis options */
int *perm in;
int metis options[40];
/* Scaling */
real/complex *colsca; real/complex *rowsca;
/* RHS, solution, output data and statistics */
real/complex *rhs, *redrhs, *rhs sparse, *sol loc, *rhs loc;
int *irhs sparse, *irhs ptr, *isol loc, *irhs loc;
int nrhs, lrhs, lredrhs, nz rhs, lsol loc, lrhs loc, nloc rhs;
int info[80],infog[80];
real rinfo[40], rinfog[40];
int *sym perm, *uns perm;
/* mapping, null pivots */
int *mapping, *pivnul list;
/* Schur */
int size schur; int *listvar schur; real/complex *schur;
int nprow, npcol, mblock, nblock, schur lld, schur mloc,schur nloc;
/* Version number */
char version number[80];
char ooc tmpdir[256], ooc prefix[64]; char write problem[256];
/* Internal parameters */
int instance number;
} [SDCZ]MUMPS STRUC C;
Figure 4: Definition of the C structure [SDCZ]MUMPS STRUC C. real/complex is used for data that can
be either real or complex, real for data that stays real (float or double) in the complex version.
106
and then use the uppercase notation with parenthesis (instead of lowercase/brackets). In that case, the
notation id.IRN(I), where I is in { 1, 2, ... NNZ} can be used instead of id.irn[I-1]; this
notation then matches exactly with the description in Sections 5 and 6, where arrays are supposed to start
at 1.
This can be slightly more confusing for elemental matrix input (see Subsection 5.2.2.3), where some
arrays are used to index other arrays. For instance, the first value in eltptr, eltptr[0], pointing
into the list of variables of the first element in eltvar, should be equal to 1. Effectively, using the
notation above, the list of variables for element j = 1 starts at location ELTVAR(ELTPTR(j)) =
ELTVAR(eltptr[j-1]) = eltvar[eltptr[j-1]-1].
107
9.5 Integer, real and complex datatypes in C and Fortran
We assume that the int, int64 t, float and double types are compatible with the Fortran
INTEGER, INTEGER(KIND=8), REAL and DOUBLE PRECISION datatypes. Those assumptions are
used in the files [dscz]mumps c types.h.
Remark that Fortran compilers often provide an option to make all default INTEGER datatypes 64-bit
integers. In that case, one should add the option -DINTSIZE64 during the installation of MUMPS to
indicate that the default Fortran INTEGER should match a 64-bit integer of type int64 t.
When including MUMPS headers files from a C application, one can then check at compilation time
the preprocessing constants MUMPS INTSIZE32 and MUMPS INTSIZE64 to know how MUMPS INT
was defined. At runtime, one can simply check the value of sizeof(MUMPS INT).
Since not all C compilers define the complex datatype (this appeared in the C99 standard), we define
the following, compatible with the Fortran COMPLEX and DOUBLE COMPLEX types:
typedef struct {float r,i;} mumps complex; for single precision (cmumps), and
typedef struct {double r,i;} mumps double complex; for double precision
(zmumps).
Types for complex data from the user program should be compatible with those above.
108
Please refer to the report [29] for a more detailed description of these interfaces. Please also refer to
the README file in directories MATLAB or Scilab of the main MUMPS distribution for more information
on installation. For example, one important thing to note is that at installation, the user must provide the
Fortran 95 runtime libraries corresponding to the compiled MUMPS package. This can be done in the
makefile for the MATLAB interface (file make.inc) and in the builder for the Scilab interface (file
builder.sce).
Finally, note that examples of usage of the MATLAB and the Scilab interfaces are provided in
directories MATLAB and SCILAB/examples, respectively. In the following, we describe the input
and output parameters of the function [dz]mumps, that are relevant in the context of this interface to the
sequential version of MUMPS.
Input Parameters
• mat : sparse matrix which has to be provided as the second argument of dmumps if id.JOB is
strictly larger than 0.
• id.SYM : controls the matrix type (symmetric positive definite, symmetric indefinite or
unsymmetric) and it has do be initialized by the user before the initialization phase of MUMPS
(see id.JOB). Its value is set to 0 after the call of initmumps.
• id.JOB : defines the action that will be realized by MUMPS: initialize, analyze and/or factorize
and/or solve and release MUMPS internal C/Fortran data. It has to be set by the user before any call
to MUMPS (except after a call to initmumps, which sets its value to -1).
• id.ICNTL and id.CNTL : define control parameters that can be set after the initialization call
(id.JOB = -1). See Section “Control parameters” for more details. If the user does not modify
an entry in id.ICNTL then MUMPS uses the default parameter. For example, if the user wants to
use the AMD ordering, he/she should set id.ICNTL(7) = 0. Note that the following parameters
are inhibited because they are automatically set within the interface: id.ICNTL(19) which controls
the Schur complement option and id.ICNTL(20) which controls the format of the right-hand side.
Note that parameters id.ICNTL(1:4) may not work properly depending on your compiler and your
environment. In case of problem, we recommand to swith printing off by setting id.ICNL(1:4)=-1.
• id.PERM IN : corresponds to the given ordering option (see Section “Input and output parameters”
for more details). Note that this permutation is only accessed if the parameter id.ICNTL(7) is set to
1.
• id.COLSCA and id.ROWSCA : are optional scaling arrays (see Section “Input and output
parameters” for more details)
• id.RHS : defines the right-hand side. The parameter id.ICNTL(20) related to its format (sparse or
dense) is automatically set within the interface. Note that id.RHS is not modified (as in MUMPS),
the solution is returned in id.SOL.
• id.VAR SCHUR : corresponds to the list of variables that appear in the Schur complement matrix
(see Section “Input and output parameters” for more details).
• id.REDRHS (input parameter only if id.VAR SCHUR was provided during the factorization and
if ICNTL(26)=2 on entry to the solve phase): partial solution on the variables corresponding
to the Schur complement. It is provided by the user and normally results from both the Schur
complement and the reduced right-hand side that were returned by MUMPS in a previous call. When
ICNTL(26)=2, MUMPS uses this information to build the solution id.SOL on the complete problem.
See Section “Schur complement” for more details.
Output Parameters
• id.SCHUR : if id.VAR SCHUR is provided of size SIZE SCHUR, then id.SCHUR corresponds to
a dense array of size (SIZE SCHUR,SIZE SCHUR) that holds the Schur complement matrix (see
Section “Input and output parameters” for more details). The user does not have to initialize it.
109
• id.REDRHS (output parameter only if ICNTL(26)=1 and id.VAR SCHUR was defined): Reduced
right-hand side (or condensed right-hand side on the variables associated to the Schur complement).
It is computed by MUMPS during the solve stage if ICNTL(26)=1. It can then be used outside
MUMPS, together with the Schur complement, to build a solution on the interface. See Section
“Schur complement” for more details.
• id.INFOG and id.RINFOG : information parameters (see Section “Information parameters” ).
• id.SYM PERM : corresponds to a symmetric permutation of the variables (see discussion
regarding ICNTL(7) in Section “Control parameters” ). This permutation is computed during the
analysis and is followed by the numerical factorization except when numerical pivoting occurs.
• id.UNS PERM : column permutation (if any) on exit from the analysis phase of MUMPS (see
discussion regarding ICNTL(6) in Section “Control parameters” ).
• id.SOL : dense vector or matrix containing the solution after MUMPS solution phase. Also contains
the nullspace in case of null space computation, or entries of the inverse, in case of computation of
inverse entries.
Internal Parameters
• id.INST: (MUMPS reserved component) MUMPS internal parameter.
• id.TYPE: (MUMPS reserved component) defines the arithmetic (complex or double precision).
110
PROGRAM MUMPS EXAMPLE
IMPLICIT NONE
INCLUDE ’ m p i f . h ’
INCLUDE ’ d m u m p s s t r u c . h ’
TYPE (DMUMPS STRUC) mumps par
INTEGER IERR , I , I 8
CALL MPI INIT ( IERR )
C Define a communicator for the package.
mumps par%COMM = MPI COMM WORLD
C Initialize an instance of the package
C for L U factorization (sym = 0, with working host)
mumps par%JOB = −1
mumps par%SYM = 0
mumps par%PAR = 1
CALL DMUMPS( mumps par )
C Define problem on the host (processor 0)
IF ( mumps par%MYID . eq . 0 ) THEN
READ( 5 , * ) mumps par%N
READ( 5 , * ) mumps par%NNZ
ALLOCATE( mumps par%IRN ( mumps par%NNZ ) )
ALLOCATE( mumps par%JCN ( mumps par%NNZ ) )
ALLOCATE( mumps par%A( mumps par%NNZ ) )
ALLOCATE( mumps par%RHS ( mumps par%N ) )
DO I 8 = 1 8 , mumps par%NNZ
READ( 5 , * ) mumps par%IRN ( I 8 ) , mumps par%JCN ( I 8 ) , mumps par%A( I 8 )
END DO
READ( 5 , * ) ( mumps par%RHS( I ) , I =1 , mumps par%N )
END IF
C Call package for solution
mumps par%JOB = 6
CALL DMUMPS( mumps par )
IF ( mumps par%INFOG ( 1 ) . LT . 0 ) THEN
WRITE( 6 , ’ (A, A, I6 , A, I 9 ) ’ ) ” ERROR RETURN : ” ,
& ” mumps par%INFOG ( 1 ) = ” , mumps par%INFOG ( 1 ) ,
& ” mumps par%INFOG ( 2 ) = ” , mumps par%INFOG ( 2 )
GOTO 500
END IF
C Solution has been assembled on the host
IF ( mumps par%MYID . eq . 0 ) THEN
WRITE( 6 , * ) ’ S o l u t i o n i s ’ , ( mumps par%RHS( I ) , I =1 , mumps par%N)
END IF
C Deallocate user data
IF ( mumps par%MYID . eq . 0 )THEN
DEALLOCATE( mumps par%IRN )
DEALLOCATE( mumps par%JCN )
DEALLOCATE( mumps par%A )
DEALLOCATE( mumps par%RHS )
END IF
C Destroy the instance (deallocate internal data structures)
mumps par%JOB = −2
CALL DMUMPS( mumps par )
500 CALL MPI FINALIZE ( IERR )
STOP
END
111
and we obtain the solution RHS(i) = i, i = 1, . . . , 5.
The calling sequence is similar to that for the assembled problem in Subsection 11.1 but now the host
reads the problem in components N, NELT, ELTPTR, ELTVAR, A ELT, and RHS. Note that for elemental
problems ICNTL(5) must be set to 1 and that elemental matrices always have a symmetric structure. For
the two-element matrix and right-hand side
12
−1 −1
1 2 3 3 2 3 7
2 2 1 1 , 4 1 2 −1 , 23
3 1 1 1 5 3 2 1 6
22
we could have as input
5
2
6
18
1 4 7
1 2 3 3 4 5
-1.0 2.0 1.0 2.0 1.0 1.0 3.0 1.0 1.0 2.0 1.0 3.0 -1.0 2.0 2.0 3.0 -1.0 1.0
12.0 7.0 23.0 6.0 22.0
112
PROGRAM MUMPS EXAMPLE
IMPLICIT NONE
INCLUDE ’ m p i f . h ’
INCLUDE ’ d m u m p s s t r u c . h ’
TYPE (DMUMPS STRUC) mumps par
INTEGER I , IERR , LELTVAR, NA ELT
CALL MPI INIT ( IERR )
C Define a communicator for the package
mumps par%COMM = MPI COMM WORLD
C Ask for unsymmetric code
mumps par%SYM = 0
C Host working
mumps par%PAR = 1
C Initialize an instance of the package
mumps par%JOB = −1
CALL DMUMPS( mumps par )
C Define the problem on the host (processor 0)
IF ( mumps par%MYID . eq . 0 ) THEN
READ( 5 , * ) mumps par%N
READ( 5 , * ) mumps par%NELT
READ( 5 , * ) LELTVAR
READ( 5 , * ) NA ELT
ALLOCATE( mumps par%ELTPTR ( mumps par%NELT+1 ) )
ALLOCATE( mumps par%ELTVAR ( LELTVAR ) )
ALLOCATE( mumps par%A ELT ( NA ELT ) )
ALLOCATE( mumps par%RHS ( mumps par%N ) )
READ( 5 , * ) ( mumps par%ELTPTR ( I ) , I =1 , mumps par%NELT+1 )
READ( 5 , * ) ( mumps par%ELTVAR( I ) , I =1 , LELTVAR )
READ( 5 , * ) ( mumps par%A ELT ( I ) , I =1 , NA ELT )
READ( 5 , * ) ( mumps par%RHS( I ) , I =1 , mumps par%N )
END IF
C Specify element entry
mumps par%ICNTL ( 5 ) = 1
C Call package for solution
mumps par%JOB = 6
CALL DMUMPS( mumps par )
IF ( mumps par%INFOG ( 1 ) . LT . 0 ) THEN
WRITE( 6 , ’ (A, A, I6 , A, I 9 ) ’ ) ” ERROR RETURN : ” ,
& ” mumps par%INFOG ( 1 ) = ” , mumps par%INFOG ( 1 ) ,
& ” mumps par%INFOG ( 2 ) = ” , mumps par%INFOG ( 2 )
GOTO 500
END IF
C Solution has been assembled on the host
IF ( mumps par%MYID . eq . 0 ) THEN
WRITE( 6 , * ) ’ S o l u t i o n i s ’ , ( mumps par%RHS( I ) , I =1 , mumps par%N)
C Deallocate user data
DEALLOCATE( mumps par%ELTPTR )
DEALLOCATE( mumps par%ELTVAR )
DEALLOCATE( mumps par%A ELT )
DEALLOCATE( mumps par%RHS )
END IF
C Destroy the instance (deallocate internal data structures)
mumps par%JOB = −2
CALL DMUMPS( mumps par )
500 CALL MPI FINALIZE ( IERR )
STOP
END
113
11.3 An example of calling MUMPS from C
An example of a driver to use MUMPS from C is given in Figure 7.
/ * Example program u s i n g t h e C i n t e r f a c e t o t h e
* d o u b l e p r e c i s i o n v e r s i o n o f MUMPS, dmumps c .
* We s o l v e t h e s y s t e m A x = RHS w i t h
* A = d i a g ( 1 2 ) and RHS = [ 1 4 ] ˆ T
* S o l u t i o n i s [1 2 ] ˆ T * /
# i n c l u d e <s t d i o . h>
# i n c l u d e ” mpi . h ”
# i n c l u d e ” dmumps c . h ”
# d e f i n e JOB INIT −1
# d e f i n e JOB END −2
# d e f i n e USE COMM WORLD −987654
i n t main ( i n t a r g c , char ** a r g v ) {
DMUMPS STRUC C i d ;
int n = 2;
i n t 6 4 t nnz = 2 ;
i n t i r n [ ] = {1 ,2};
i n t jcn [ ] = {1 ,2};
double a [ 2 ] ;
double r h s [ 2 ] ;
i n t myid , i e r r ;
i e r r = M P I I n i t (& a r g c , &a r g v ) ;
i e r r = MPI Comm rank (MPI COMM WORLD, &myid ) ;
/ * D e f i n e A and r h s * /
rhs [0]=1.0; rhs [1]=4.0;
a [0]=1.0; a [1]=2.0;
114
11.4 An example of calling MUMPS from fortran using the Save/Restore
feature and Out Of Core
An example program illustrating a possible use of the Save/restore feature combined with Out Of Core:
1 PROGRAM MUMPS_TEST_SAVE_RESTORE
2 IMPLICIT NONE
3 INCLUDE ’mpif.h’
4 INCLUDE ’mumps_struc.h’
5 TYPE (CMUMPS_STRUC) mumps_par_save, mumps_par_restore
6 INTEGER IERR, I
7 CALL MPI_INIT(IERR)
8 C Define a communicator for the package.
9 mumps_par_save%COMM = MPI_COMM_WORLD
10 C Initialize an instance of the package
11 C for L U factorization (sym = 0, with working host)
12 mumps_par_save%JOB = -1
13 mumps_par_save%SYM = 0
14 mumps_par_save%PAR = 1
15 CALL CMUMPS(mumps_par_save)
16 IF (mumps_par_save%INFOG(1).LT.0) THEN
17 WRITE(6,’(A,A,I6,A,I9)’) " ERROR RETURN: ",
18 & " mumps_par_save%INFOG(1)= ", mumps_par_save%INFOG(1),
19 & " mumps_par_save%INFOG(2)= ", mumps_par_save%INFOG(2)
20 GOTO 500
21 END IF
22 C Define problem on the host (processor 0)
23 IF ( mumps_par_save%MYID .eq. 0 ) THEN
24 READ(5,*) mumps_par_save%N
25 READ(5,*) mumps_par_save%NZ
26 ALLOCATE( mumps_par_save%IRN ( mumps_par_save%NZ ) )
27 ALLOCATE( mumps_par_save%JCN ( mumps_par_save%NZ ) )
28 ALLOCATE( mumps_par_save%A( mumps_par_save%NZ ) )
29 DO I = 1, mumps_par_save%NZ
30 READ(5,*) mumps_par_save%IRN(I),mumps_par_save%JCN(I)
31 & ,mumps_par_save%A(I)
32 END DO
33 END IF
34 C Activate OOC
35 mumps_par_save%ICNTL(22)=1
36 C Call package for factorization
37 mumps_par_save%JOB = 4
38 CALL CMUMPS(mumps_par_save)
39 IF (mumps_par_save%INFOG(1).LT.0) THEN
40 WRITE(6,’(A,A,I6,A,I9)’) " ERROR RETURN: ",
41 & " mumps_par_save%INFOG(1)= ", mumps_par_save%INFOG(1),
42 & " mumps_par_save%INFOG(2)= ", mumps_par_save%INFOG(2)
43 GOTO 500
44 END IF
45 C Call package for save
46 mumps_par_save%JOB = 7
47 mumps_par_save%SAVE_DIR="/tmp"
48 mumps_par_save%SAVE_PREFIX="mumps_simpletest_save"
49 CALL CMUMPS(mumps_par_save)
50 IF (mumps_par_save%INFOG(1).LT.0) THEN
51 WRITE(6,’(A,A,I6,A,I9)’) " ERROR RETURN: ",
115
52 & " mumps_par_save%INFOG(1)= ", mumps_par_save%INFOG(1),
53 & " mumps_par_save%INFOG(2)= ", mumps_par_save%INFOG(2)
54 GOTO 500
55 END IF
56 C Deallocate user data
57 IF ( mumps_par_save%MYID .eq. 0 )THEN
58 DEALLOCATE( mumps_par_save%IRN )
59 DEALLOCATE( mumps_par_save%JCN )
60 DEALLOCATE( mumps_par_save%A )
61 END IF
62 C Destroy the instance (deallocate internal data structures)
63 mumps_par_save%JOB = -2
64 CALL CMUMPS(mumps_par_save)
65 C Now mumps_par_save has be destroyed
66 C We use a new instance mumps_par_restore to finish the computation
67
68 C Define a communicator for the package on the new instace.
69 mumps_par_restore%COMM = MPI_COMM_WORLD
70 C Initialize a new instance of the package
71 C for L U factorization (sym = 0, with working host)
72 mumps_par_restore%JOB = -1
73 mumps_par_restore%SYM = 0
74 mumps_par_restore%PAR = 1
75 CALL CMUMPS(mumps_par_restore)
76 IF (mumps_par_restore%INFOG(1).LT.0) THEN
77 WRITE(6,’(A,A,I6,A,I9)’) " ERROR RETURN: ",
78 & " mumps_par_restore%INFOG(1)= ",
79 & mumps_par_restore%INFOG(1),
80 & " mumps_par_restore%INFOG(2)= ",
81 & mumps_par_restore%INFOG(2)
82 GOTO 500
83 END IF
84 C Call package for restore with OOC feature
85 mumps_par_restore%JOB = 8
86 mumps_par_restore%SAVE_DIR="/tmp"
87 mumps_par_restore%SAVE_PREFIX="mumps_simpletest_save"
88 CALL CMUMPS(mumps_par_restore)
89 IF (mumps_par_restore%INFOG(1).LT.0) THEN
90 WRITE(6,’(A,A,I6,A,I9)’) " ERROR RETURN: ",
91 & " mumps_par_restore%INFOG(1)= ",
92 & mumps_par_restore%INFOG(1),
93 & " mumps_par_restore%INFOG(2)= ",
94 & mumps_par_restore%INFOG(2)
95 GOTO 500
96 END IF
97 C Define rhs on the host (processor 0)
98 IF ( mumps_par_restore%MYID .eq. 0 ) THEN
99 ALLOCATE( mumps_par_restore%RHS ( mumps_par_restore%N ) )
100 DO I = 1, mumps_par_restore%N
101 READ(5,*) mumps_par_restore%RHS(I)
102 END DO
103 END IF
104 C Call package for solution
105 mumps_par_restore%JOB = 3
106 CALL CMUMPS(mumps_par_restore)
116
107 IF (mumps_par_restore%INFOG(1).LT.0) THEN
108 WRITE(6,’(A,A,I6,A,I9)’) " ERROR RETURN: ",
109 & " mumps_par_restore%INFOG(1)= ",
110 & mumps_par_restore%INFOG(1),
111 & " mumps_par_restore%INFOG(2)= ",
112 & mumps_par_restore%INFOG(2)
113 GOTO 500
114 END IF
115 C Solution has been assembled on the host
116 IF ( mumps_par_restore%MYID .eq. 0 ) THEN
117 WRITE( 6, * ) ’ Solution is ’,
118 & (mumps_par_restore%RHS(I),I=1,mumps_par_restore%N)
119 END IF
120 C Deallocate user data
121 IF ( mumps_par_restore%MYID .eq. 0 )THEN
122 DEALLOCATE( mumps_par_restore%RHS )
123 END IF
124 C Delete the saved files
125 C Note mumps_par_restore%ICNTL(34) is kept to default (0) to suppress
126 C also the OOC files.
127 mumps_par_restore%JOB = -3
128 CALL CMUMPS(mumps_par_restore)
129 C Destroy the instance (deallocate internal data structures)
130 mumps_par_restore%JOB = -2
131 CALL CMUMPS(mumps_par_restore)
132 500 CALL MPI_FINALIZE(IERR)
133 STOP
134 END
The MUMPS instance mumps par save is initialized by calling MUMPS with JOB = –1, the problem is
read in by the host (in the components N, NNZ, IRN, JCN, A), and the factorization is done using Out
Of Core (ICNTL(22) = 1) on all processors to MUMPS with JOB=4. The instance mumps par save is
saved by calling MUMPS with JOB=7, a call to MUMPS with JOB = –2 is performed to deallocate the data
structures used by the instance mumps par save.
The MUMPS instance mumps par restore is initialized by calling MUMPS with JOB = –1. The instance
mumps par restore is restore at the same state as mumps par save was by calling MUMPS with JOB=8.
The rest of the problem is read in by the host (in the component RHS), and the solution is computed in
RHS with a call on all processors to MUMPS with JOB=3. Finally, a call to MUMPS with JOB = –3 is
performed to deallocate the data structures used by the instance mumps par restore and all files used for
restarting (OOC and Save/Restore) are suppressed because ICNTL(34) = 0.
117
11.5 An example of calling MUMPS from C using the Save/Restore feature
An example of a driver to use MUMPS from C :
1 /* Example program using the C interface to the
2 * double real arithmetic version of MUMPS, dmumps_c.
3 * We solve the system A x = RHS with
4 * A = diag(1 2) and RHS = [1 4]ˆT
5 * Solution is [1 2]ˆT */
6 #include <stdio.h>
7 #include <string.h>
8 #include "mpi.h"
9 #include "dmumps_c.h"
10 #define JOB_INIT -1
11 #define JOB_END -2
12 #define USE_COMM_WORLD -987654
13
14 #if defined(MAIN_COMP)
15 /*
16 * Some Fortran compilers (COMPAQ fort) define "main" in
17 * their runtime library while a Fortran program translates
18 * to MAIN_ or MAIN__ which is then called from "main".
19 * We defined argc/argv arbitrarily in that case.
20 */
21 int MAIN__();
22 int MAIN_()
23 {
24 return MAIN__();
25 }
26
27 int MAIN__()
28 {
29 int argc=1;
30 char * name = "c_example_save_restore";
31 char ** argv ;
32 #else
33 int main(int argc, char ** argv)
34 {
35 #endif
36 DMUMPS_STRUC_C id_save,id_restore;
37 MUMPS_INT n = 2;
38 MUMPS_INT8 nnz = 2;
39 MUMPS_INT irn[] = {1,2};
40 MUMPS_INT jcn[] = {1,2};
41 double a[2];
42 double rhs[2];
43
44 int error = 0;
45 /* When compiling with -DINTSIZE64, MUMPS_INT is 64-bit but MPI
46 ilp64 versions may still require standard int for C interface. */
47 /* MUMPS_INT myid, ierr; */
48 int myid, ierr;
49 #if defined(MAIN_COMP)
50 argv = &name;
51 #endif
52 ierr = MPI_Init(&argc, &argv);
53 ierr = MPI_Comm_rank(MPI_COMM_WORLD, &myid);
118
54 /* Define A and rhs */
55 rhs[0]=1.0;rhs[1]=4.0;
56 a[0]=1.0;a[1]=2.0;
57
58 /* Initialize MUMPS save instance. Use MPI_COMM_WORLD */
59 id_save.comm_fortran=USE_COMM_WORLD;
60 id_save.par=1; id_save.sym=0;
61 id_save.job=JOB_INIT;
62 dmumps_c(&id_save);
63 /* Define the problem on the host */
64 if (myid == 0) {
65 id_save.n = n; id_save.nnz =nnz; id_save.irn=irn; id_save.jcn=jcn;
66 id_save.a = a;
67 }
68 #define ICNTL(I) icntl[(I)-1] /* macro s.t. indices match documentation */
69 /* No outputs */
70 id_save.ICNTL(1)=-1; id_save.ICNTL(2)=-1;
71 id_save.ICNTL(3)=-1; id_save.ICNTL(4)=0;
72 /* Call the MUMPS package on the save instance (analyse and factorization). */
73 id_save.job=4;
74 dmumps_c(&id_save);
75
76 /* MUMPS save feature on the save instance. */
77 strcpy(id_save.save_prefix,"csave_restore");
78 strcpy(id_save.save_dir,"/tmp");
79 if (myid == 0) {
80 printf("Saving MUMPS instance in %s with prefix %s.\n",
81 id_save.save_dir, id_save.save_prefix);
82 }
83 id_save.job=7;
84 dmumps_c(&id_save);
85 if (id_save.infog[0]<0) {
86 printf("\n (PROC %d) ERROR RETURN: \tINFOG(1)= %d\n\t\t\t\tINFOG(2)= %d\n",
87 myid, id_save.infog[0], id_save.infog[1]);
88 error = 1;
89 } else if (myid == 0) {
90 printf(" DONE\n\n");
91 }
92
93 /* Terminate the save instance. */
94 id_save.job=JOB_END;
95 dmumps_c(&id_save);
96
97
98
99 if (!error) {
100 /* Initialize MUMPS restore instance. Use MPI_COMM_WORLD */
101 id_restore.comm_fortran=USE_COMM_WORLD;
102 id_restore.par=1; id_restore.sym=0;
103 id_restore.job=JOB_INIT;
104 dmumps_c(&id_restore);
105 /* Define the rhs on the host */
106 if (myid == 0) {
107 id_restore.rhs = rhs;
108 }
119
109
110 /* No outputs */
111 id_save.ICNTL(1)=-1; id_save.ICNTL(2)=-1;
112 id_save.ICNTL(3)=-1; id_save.ICNTL(4)=0;
113
114 /* MUMPS restore feature on restore instance. */
115 if (myid == 0) {
116 printf("Restoring MUMPS instance in %s with prefix %s.\n",
117 id_save.save_dir, id_save.save_prefix);
118 }
119 strcpy(id_restore.save_prefix,"csave_restore");
120 strcpy(id_restore.save_dir,"/tmp");
121 id_restore.job=8;
122 dmumps_c(&id_restore);
123 if (id_save.infog[0]<0) {
124 printf("\n (PROC %d) ERROR RETURN: \tINFOG(1)= %d\n\t\t\t\tINFOG(2)= %d\n",
125 myid, id_save.infog[0], id_save.infog[1]);
126 error = 1;
127 } else if (myid == 0) {
128 printf(" DONE\n\n");
129 }
130 }
131
132 if (!error) {
133 /* Call the MUMPS package on restore instance (solve). */
134 if (myid == 0) {
135 printf("Calling MUMPS package (solve).\n");
136 }
137 id_restore.job=3;
138 dmumps_c(&id_restore);
139 if (id_save.infog[0]<0) {
140 printf("=> (PROC %d) ERROR RETURN: \tINFOG(1)= %d\n\t\t\t\tINFOG(2)= %d\n",
141 myid, id_save.infog[0], id_save.infog[1]);
142 error = 1;
143 } else if (myid == 0) {
144 printf(" DONE\n\n");
145 }
146
147 /* Deletes the saved and the OOC files. */
148 if (myid == 0) {
149 printf("Removing save files.\n");
150 }
151 id_restore.job=-3;
152 dmumps_c(&id_restore);
153 if (id_save.infog[0]<0) {
154 printf("=> (PROC %d) ERROR RETURN: \tINFOG(1)= %d\n\t\t\t\tINFOG(2)= %d\n",
155 myid, id_save.infog[0], id_save.infog[1]);
156 error = 1;
157 } else if (myid == 0) {
158 printf(" DONE\n\n");
159 }
160
161 /* Terminate the restore instance. */
162 id_restore.job=JOB_END;
163 dmumps_c(&id_restore);
120
164 }
165
166 if (myid == 0) {
167 if (!error) {
168 printf("Solution is : (%8.2f %8.2f)\n", rhs[0],rhs[1]);
169 } else {
170 printf("An error has occured, please check error code returned by MUMPS.\n");
171 }
172 }
173 ierr = MPI_Finalize();
174 return 0;
175 }
121
12 License
Copyright 1991-2023 CERFACS, CNRS, ENS Lyon, INP Toulouse, Inria,
Mumps Technologies, University of Bordeaux.
You can acknowledge (using references [1] and [2]) the contribution
of this package in any scientific publication dependent upon the use
of the package. Please use reasonable endeavours to notify the authors
of the package of this publication.
The fact that you are presently reading this means that you have had
knowledge of the CeCILL-C license and that you accept its terms.
13 Credits
This version of MUMPS has been developed by employees of CERFACS, ENS Lyon,
122
INPT(ENSEEIHT)-IRIT, Inria, Mumps Technologies and University of Bordeaux:
Emmanuel Agullo, Patrick Amestoy, Maurice Bremond, Alfredo Buttari, Philippe
Combes, Marie Durand, Aurelia Fevre, Abdou Guermouche, Guillaume Joslin,
Jacko Koster, Jean-Yves L’Excellent, Theo Mary, Stephane Pralet, Chiara
Puglisi, Francois-Henry Rouet, Wissam Sid-Lakhdar, Tzvetomila Slavova,
Bora Ucar and Clement Weisbecker.
We are also grateful to Juergen Schulze for letting us distribute PORD developed
at the University of Paderborn. We thank Eddy Caron for the administration of a
server used on a daily basis for MUMPS.
Finally we want to thank the institutions that have provided access to their
parallel machines: Centre Informatique National de l’Enseignement Superieur
(CINES), CERFACS, CALMIP ("Centre Interuniversitaire de Calcul" located in Toulouse),
Federation Lyonnaise de Calcul Haute-Performance, Institut du Developpement et
des Ressources en Informatique Scientifique (IDRIS), Lawrence Berkeley National
Laboratory, Laboratoire de l’Informatique du Parallelisme, Inria, and PARALLAB.
References
[1] E. Agullo. On the Out-of-core Factorization of Large Sparse Matrices. PhD thesis, École Normale
Supérieure de Lyon, Nov. 2008.
[2] P. Amestoy, O. Boiteau, A. Buttari, M. Gerest, F. Jézéquel, J.-Y. L’Excellent, and T. Mary. Mixed
Precision Low Rank Approximations and their Application to Block Low Rank LU Factorization.
working paper or preprint, June 2021.
[3] P. R. Amestoy. Recent progress in parallel multifrontal solvers for unsymmetric sparse matrices.
In Proceedings of the 15th World Congress on Scientific Computation, Modelling and Applied
Mathematics, IMACS 97, Berlin, 1997.
[4] P. R. Amestoy, C. Ashcraft, O. Boiteau, A. Buttari, J.-Y. L’Excellent, and C. Weisbecker. Improving
multifrontal methods by means of block low-rank representations. SIAM Journal on Scientific
Computing, 37(3):A1451–A1474, 2015.
[5] P. R. Amestoy, A. Buttari, J.-Y. L’Excellent, and T. Mary. On the complexity of the Block Low-Rank
multifrontal factorization. SIAM Journal on Scientific Computing, 39(4):A1710–A1740, 2017.
[6] P. R. Amestoy, A. Buttari, J.-Y. L’Excellent, and T. Mary. Performance and Scalability of the
Block Low-Rank Multifrontal Factorization on Multicore Architectures. ACM Transactions on
Mathematical Software, 45:2:1–2:26, 2019.
[7] P. R. Amestoy, T. A. Davis, and I. S. Duff. An approximate minimum degree ordering algorithm.
SIAM Journal on Matrix Analysis and Applications, 17(4):886–905, 1996.
[8] P. R. Amestoy and I. S. Duff. Vectorization of a multiprocessor multifrontal code. International
Journal of Supercomputer Applications, 3:41–59, 1989.
123
[9] P. R. Amestoy, I. S. Duff, J. Koster, and J.-Y. L’Excellent. A fully asynchronous multifrontal
solver using distributed dynamic scheduling. SIAM Journal on Matrix Analysis and Applications,
23(1):15–41, 2001.
[10] P. R. Amestoy, I. S. Duff, and J.-Y. L’Excellent. Multifrontal solvers within the PARASOL
environment. In B. Kågström, J. Dongarra, E. Elmroth, and J. Waśniewski, editors, Applied Parallel
Computing, PARA’98, Lecture Notes in Computer Science, No. 1541, pages 7–11, Berlin, 1998.
Springer-Verlag.
[11] P. R. Amestoy, I. S. Duff, and J.-Y. L’Excellent. Parallélisation de la factorisation LU de matrices
creuses non-symétriques pour des architectures à mémoire distribuée. Calculateurs Parallèles
Réseaux et Systèmes Répartis, 10(5):509–520, 1998.
[12] P. R. Amestoy, I. S. Duff, and J.-Y. L’Excellent. Multifrontal parallel distributed symmetric and
unsymmetric solvers. Comput. Methods Appl. Mech. Eng., 184:501–520, 2000.
[13] P. R. Amestoy, I. S. Duff, J.-Y. L’Excellent, and X. S. Li. Analysis and comparison of two general
sparse solvers for distributed memory computers. ACM Transactions on Mathematical Software,
27(4):388–421, 2001.
[14] P. R. Amestoy, I. S. Duff, J.-Y. L’Excellent, Y. Robert, F.-H. Rouet, and B. Uçar. On computing
inverse entries of a sparse matrix in an out-of-core environment. SIAM Journal on Scientific
Computing, 34(4):A1975–A1999, 2012.
[15] P. R. Amestoy, I. S. Duff, J.-Y. L’Excellent, and F.-H. Rouet. Parallel computation of entries of A-1 .
SIAM Journal on Scientific Computing, 37(2):C268–C284, 2015.
[16] P. R. Amestoy, I. S. Duff, D. Ruiz, and B. Uçar. A parallel matrix scaling algorithm. In J. M.
L. M. Palma, P. R. Amestoy, M. J. Daydé, M. Mattoso, and J. C. Lopes, editors, High Performance
Computing for Computational Science, VECPAR’08, number 5336 in Lecture Notes in Computer
Science, pages 309–321. Springer-Verlag, 2008.
[17] P. R. Amestoy, A. Guermouche, J.-Y. L’Excellent, and S. Pralet. Hybrid scheduling for the parallel
solution of linear systems. Parallel Computing, 32(2):136–156, 2006.
[18] P. R. Amestoy, J.-Y. L’Excellent, F.-H. Rouet, and W. M. Sid-Lakhdar. Modeling 1D distributed-
memory dense kernels for an asynchronous multifrontal sparse solver. In High Performance
Computing for Computational Science, VECPAR 2014 - 11th International Conference, Eugene,
Oregon, USA, June 30 - July 3, 2014, Revised Selected Papers, pages 156–169, 2014.
[19] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. DuCroz, A. Greenbaum,
S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users’ Guide. SIAM Press, Philadelphia,
PA, third edition, 1995.
[20] M. Arioli, J. Demmel, and I. S. Duff. Solving sparse linear systems with sparse backward error.
SIAM Journal on Matrix Analysis and Applications, 10(2):165–190, 1989.
[21] L. S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Dongarra,
S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. ScaLAPACK Users’
Guide. SIAM Press, 1997.
[22] J. J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling. Algorithm 679: A set of Level 3 Basic
Linear Algebra Subprograms. ACM Transactions on Mathematical Software, 16:1–17, 1990.
[23] J. J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling. Algorithm 679. A set of Level 3 Basic
Linear Algebra Subprograms: model implementation and test programs. ACM Transactions on
Mathematical Software, 16:18–28, 1990.
[24] I. S. Duff and J. Koster. The design and use of algorithms for permuting large entries to the diagonal
of sparse matrices. SIAM Journal on Matrix Analysis and Applications, 20(4):889–901, 1999.
[25] I. S. Duff and J. Koster. On algorithms for permuting large entries to the diagonal of a sparse matrix.
SIAM Journal on Matrix Analysis and Applications, 22(4):973–996, 2001.
[26] I. S. Duff and S. Pralet. Strategies for scaling and pivoting for sparse symmetric indefinite problems.
SIAM Journal on Matrix Analysis and Applications, 27(2):313–340, 2005.
124
[27] I. S. Duff and J. K. Reid. The multifrontal solution of indefinite sparse symmetric linear systems.
ACM Transactions on Mathematical Software, 9:302–325, 1983.
[28] I. S. Duff and J. K. Reid. The multifrontal solution of unsymmetric sets of linear systems. SIAM
Journal on Scientific and Statistical Computing, 5:633–641, 1984.
[29] A. Fèvre, J.-Y. L’Excellent, and S. Pralet. Scilab and MATLAB interfaces to MUMPS. Technical
Report RR-5816, INRIA, Jan. 2006. Also appeared as ENSEEIHT-IRIT report TR/TLSE/06/01 and
LIP report RR2006-06.
[30] J. R. Gilbert, E. G. Ng, and B. W. Peyton. An efficient algorithm to compute row and column counts
for sparse cholesky factorization. SIAM Journal on Matrix Analysis and Applications, 15:1075–
1091, 1994.
[31] A. Guermouche. Étude et optimisation du comportement mémoire dans les méthodes parallèles de
factorisation de matrices creuses. PhD thesis, École Normale Supérieure de Lyon, July 2004.
[32] A. Guermouche and J.-Y. L’Excellent. Constructing memory-minimizing schedules for multifrontal
methods. ACM Transactions on Mathematical Software, 32(1):17–32, 2006.
[33] A. Guermouche, J.-Y. L’Excellent, and G. Utard. Impact of reordering on the memory of a
multifrontal solver. Parallel Computing, 29(9):1191–1218, 2003.
[34] G. Karypis and V. Kumar. M ETI S – A Software Package for Partitioning Unstructured Graphs,
Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices – Version 4.0.
University of Minnesota, Sept. 1998.
[35] P. Knight and D. Ruiz. A fast algorithm for matrix balancing. IMA Journal of Numerical Analysis,
33(3):1029–1047, octobre 2012.
[36] P. A. Knight, D. Ruiz, and B. Uçar. A symmetry preserving algorithm for matrix scaling. SIAM
Journal on Matrix Analysis and Applications, 35(3):931–955, 2014.
[37] J.-Y. L’Excellent. Multifrontal Methods: Parallelism, Memory Usage and Numerical Aspects.
Habilitation à diriger des recherches, École normale supérieure de Lyon, Sept. 2012.
[38] J.-Y. L’Excellent and M. W. Sid-Lakhdar. A study of shared-memory parallelism in a multifrontal
solver. Parallel Computing, 40(3-4):34–46, 2014.
[39] X. S. Li and J. W. Demmel. SuperLU DIST: A scalable distributed-memory sparse direct solver for
unsymmetric linear systems. ACM Transactions on Mathematical Software, 29(2):110–140, 2003.
[40] J. W. H. Liu. The role of elimination trees in sparse factorization. SIAM Journal on Matrix Analysis
and Applications, 11:134–172, 1990.
[41] T. Mary. Block Low-Rank multifrontal solvers: complexity, performance, and scalability. PhD
thesis, Université de Toulouse, November 2017.
[42] F. Pellegrini. S COTCH and LIBSCOTCH 5.0 User’s guide. Technical Report, LaBRI, Université
Bordeaux I, 2007.
[43] S. Pralet. Constrained orderings and scheduling for parallel sparse linear algebra. PhD thesis,
Institut National Polytechnique de Toulouse, Sept 2004. Available as CERFACS technical report,
TH/PA/04/105.
[44] F.-H. Rouet. Memory and performance issues in parallel multifrontal factorizations and triangular
solutions with sparse right-hand sides. PhD thesis, Institut National Polytechnique de Toulouse,
Oct. 2012.
[45] D. Ruiz. A scaling algorithm to equilibrate both rows and columns norms in matrices. Technical
Report RT/APO/01/4, ENSEEIHT-IRIT, 2001. Also appeared as RAL report RAL-TR-2001-034.
[46] J. Schulze. Towards a tighter coupling of bottom-up and top-down sparse matrix ordering methods.
BIT, 41(4):800–841, 2001.
[47] W. M. Sid-Lakhdar. Scaling multifrontal methods for the solution of large sparse linear systems on
hybrid shared-distributed memory architectures. Ph.D. dissertation, ENS Lyon, Dec. 2014.
[48] Tz. Slavova. Parallel triangular solution in the out-of-core multifrontal approach for solving large
sparse linear systems. Ph.D. dissertation, Institut National Polytechnique de Toulouse, Apr. 2009.
Available as CERFACS Report TH/PA/09/59.
125
[49] M. Snir, S. W. Otto, S. Huss-Lederman, D. W. Walker, and J. Dongarra. MPI: The Complete
Reference. The MIT Press, Cambridge, Massachusetts, 1996.
[50] C. Weisbecker. Improving multifrontal solvers by means of algebraic block low-rank
representations. PhD thesis, Institut National Polytechnique de Toulouse, Oct. 2013.
126