Rcpparmadillo: Accelerating R With High-Performance C++ Linear Algebra
Rcpparmadillo: Accelerating R With High-Performance C++ Linear Algebra
Abstract
The R statistical environment and language has demonstrated particular strengths
for interactive development of statistical algorithms, as well as data modelling
and visualisation. Its current implementation has an interpreter at its core which
may result in a performance penalty in comparison to directly executing user
algorithms in the native machine code of the host CPU. In contrast, the C++
language has no built-in visualisation capabilities, handling of linear algebra or
even basic statistical algorithms; however, user programs are converted to high-
performance machine code, ahead of execution. A new method avoids possible
speed penalties in R by using the Rcpp extension package in conjunction with
the Armadillo C++ matrix library. In addition to the inherent performance
advantages of compiled code, Armadillo provides an easy-to-use template-based
meta-programming framework, allowing the automatic pooling of several linear
algebra operations into one, which in turn can lead to further speedups. With
the aid of Rcpp and Armadillo, conversion of linear algebra centered algorithms
from R to C++ becomes straightforward. The algorithms retains the overall
structure as well as readability, all while maintaining a bidirectional link with
the host R environment. Empirical timing comparisons of R and C++ imple-
mentations of a Kalman filtering algorithm indicate a speedup of several orders
of magnitude.
Keywords: Software, R, C++, linear algebra
1. Overview
Analysis. Currently still identical to the paper, this vignette version may over time receive
minor updates. For citations, please use Eddelbuettel and Sanderson (2014) as provided by
citation("RcppArmadillo"). This version corresponds to RcppArmadillo version 0.9.850.1.0
and was typeset on February 9, 2020.
2
template meta-programming techniques to attain efficiency. However, it has
been written to target modern C++ compilers as well as providing a much
larger set of linear algebra operations than Blitz++. R programs augmented
to use Armadillo retain the overall structure as well as readability, all while
retaining a bidirectional link with the host R environment.
Section 2 provides an overview of Armadillo, followed by its integration with
the Rcpp extension package. Section 4 shows an example of an R program and
its conversion to C++ via Rcpp and Armadillo. Section 5 discusses an empirical
timing comparison between the R and C++ versions before Section 6 concludes.
2. Armadillo
The Armadillo C++ library provides vector, matrix and cube types (sup-
porting integer, floating point and complex numbers) as well as a subset of
trigonometric and statistics functions (Sanderson, 2010). In addition to ele-
mentary operations such as addition and matrix multiplication, various matrix
factorisations and submatrix manipulation operations are provided. The corre-
sponding application programming interface (syntax) enables the programmer to
write code which is both concise and easy-to-read to those familiar with script-
ing languages such as Matlab and R. Table 1 lists a few common Armadillo
functions.
Matrix multiplication and factorisations are accomplished through integra-
tion with the underlying operations stemming from standard numerical libraries
such as BLAS and LAPACK (Demmel, 1997). Similar to how environments
such as R are implemented, these underlying libraries can be replaced in a
transparent manner with variants that are optimised to the specific hardware
platform and/or multi-threaded to automatically take advantage of the now-
common multi-core platforms (Kurzak et al., 2010).
Armadillo uses a delayed evaluation approach to combine several operations
into one and reduce (or eliminate) the need for temporary objects. In contrast
to brute-force evaluations, delayed evaluation can provide considerable perfor-
mance improvements as well as reduced memory usage. The delayed evaluation
machinery is accomplished through template meta-programming (Vandevoorde
and Josuttis, 2002; Abrahams and Gurtovoy, 2004), where the C++ compiler
is induced to reason about mathematical expressions at compile time. Where
possible, the C++ compiler can generate machine code that is tailored for each
expression.
As an example of the possible efficiency gains, let us consider the expression
X = A − B + C, where A, B and C are matrices. A brute-force implementation
would evaluate A − B first and store the result in a temporary matrix T . The
next operation would be T + C, with the result finally stored in X. The creation
of the temporary matrix, and using two separate loops for the subtraction and
addition of matrix elements is suboptimal from an efficiency point of view.
Through the overloading of mathematical operators, Armadillo avoids the
generation of the temporary matrix by first converting the expression into a
3
set of lightweight Glue objects, which only store references to the matrices
and Armadillo’s representations of mathematical expressions (eg. other Glue
objects). To indicate that an operation comprised of subtraction and addition
is required, the exact type of the Glue objects is automatically inferred from the
given expression through template meta-programming. More specifically, given
the expression X = A − B + C, Armadillo automatically induces the compiler to
generate an instance of the lightweight Glue storage object with the following
C++ type:
where Glue<...> indicates that Glue is a C++ template class, with the items
between ‘<’ and ‘>’ specifying template parameters; the outer Glue<..., Mat,
glue_plus> is the Glue object indicating an addition operation, storing a ref-
erence to a matrix as well as a reference to another Glue object; the inner
Glue<Mat, Mat, glue_minus> stores references to two matrices and indicates
a subtraction operation. In both the inner and outer Glue, the type Mat specifies
that a reference to a matrix object is to be held.
The expression evaluator in Armadillo is then automatically invoked through
the “=” operation, which interprets (at compile time) the template parameters
of the compound Glue object and generates C++ code equivalent to:
4
where N is the number of elements in A, B and C, with A[i] indicating the i-th
element in A. As such, apart from the lightweight Glue objects (for which mem-
ory is pre-allocated at compile time), no other temporary object is generated,
and only one loop is required instead of two. Given a sufficiently advanced C++
compiler, the lightweight Glue objects can be optimised away, as they are au-
tomatically generated by the compiler and only contain compile-time generated
references; the resultant machine code can appear as if the Glue objects never
existed in the first place.
Note that due to the ability of the Glue object to hold references to other
Glue objects, far longer and more complicated operations can be easily accom-
modated. Further discussion of template meta-programming is beyond the scope
of this paper; for more details, the interested reader is referred to Vandevoorde
and Josuttis (2002) as well as Abrahams and Gurtovoy (2004). Reddy et al.
(2013) provide a recent application of Armadillo in computer vision and pattern
recognition.
3. RcppArmadillo
$ inner
21 [1] 415
Listing 1: Integrating Armadillo-based C++ code via the RcppArmadillo package.
5
Consider the simple example in Listing 1. Given a vector, the g() function
returns both the outer and inner products. We load the inline package (Sklyar
et al., 2012), which provides cxxfunction() that we use to compile, link and
load the C++ code which is passed as the body argument. We declare the
function signature to contain a single argument named ‘vs’. On line five, this
argument is used to instantiate an Armadillo column vector object named ‘v’
(using the templated conversion function as() from Rcpp). In lines six and
seven, the outer and inner product of the column vector are calculated by ap-
propriately multiplying the vector with its transpose. This shows how the *
operator for multiplication has been overloaded to provide the appropriate op-
eration for the types implemented by Armadillo. The inner product creates a
scalar variable, and in contrast to R where each object is a vector type (even if
of length one), we have to explicitly convert using as_scalar() to assign the
value to a variable of type double.
Finally, the last line creates an R named list type containing both results.
As a result of calling cxxfunction(), a new function is created. It contains
a reference to the native code, compiled on the fly based on the C++ code
provided to cxxfunction() and makes it available directly from R under a
user-assigned function name, here g(). The listing also shows how the Rcpp and
arma namespaces are used to disambiguate symbols from the two libraries; the
:: operator is already familiar to R programmers who use the NAMESPACE
directive in R in a similar fashion.
The listing also demonstrates how the new function g() can be called with
a suitable argument. Here we create a vector of five elements, containing values
ranging from 7 to 11. The function’s output, here the list containing both outer
and inner product, is then displayed as it is not assigned to a variable.
This simple example illustrates how R objects can be transferred directly into
corresponding Armadillo objects using the interface code provided by Rcpp.
It also shows how deployment of RcppArmadillo is straightforward, even for
interactive work where functions can be compiled on the fly. Similarly, usage
in packages is also uncomplicated and follows the documentation provided with
Rcpp (Eddelbuettel and François, 2012; Eddelbuettel, 2013).
6
and VY relative to the two coordinates, as well as two acceleration variables AX
and AY .
We have the positions being updated as a function of the velocity
X = X0 + VX dt and Y = Y0 + VY dt,
With covariance matrices Q and R for (Gaussian) error terms, the standard
Kalman filter estimation involves a linear prediction step resulting in a new
predicted state vector, and a new covariance estimate. This leads to a residuals
vector and a covariance matrix for residuals which are used to determine the
(optimal) Kalman gain, which is then used to update the state estimate and
covariance matrix.
All of these steps involve only matrix multiplication and inversions, making
the algorithm very suitable for an implementation in any language which can
use matrix expressions. An example for Matlab is provided on the Mathworks
website2 and shown in Listing 2.
1 % C o p y r i g h t 2010 The MathWorks , Inc .
function y = kalmanfilter ( z )
3 dt =1;
% I n i t i a l i z e state t r a n s i t i o n matrix
5 A =[ 1 0 dt 0 0 0; 0 1 0 dt 0 0;... % [x ], [y ]
0 0 1 0 dt 0; 0 0 0 1 0 dt ;... % [ Vx ] , [ Vy ]
7 0 0 0 0 1 0 ; 0 0 0 0 0 1 ]; % [ Ax ] , [ Ay ]
H = [ 1 0 0 0 0 0; 0 1 0 0 0 0 ]; % Init . m e a s u r e m n t mat
9 Q = eye (6) ;
R = 1000 * eye (2) ;
11 persistent x _ est p _ est % Init . state cond .
if isempty ( x _ est )
13 x _ est = zeros (6 , 1) ; % x _ est =[ x ,y , Vx , Vy , Ax , Ay ] ’
p _ est = zeros (6 , 6) ;
15 end
2 See http://www.mathworks.com/products/matlab-coder/demos.html?file=/products/
demos/shipping/coder/coderdemo_kalman_filter.html.
7
Listing 2: Basic Kalman filter in Matlab.
8
in Listing 3. A slightly improved version (where several invariant statements
are moved out of the repeatedly-called function) is provided in Listing 4 on
page 9 showing the function KalmanR. The estimates of the state vector and
its covariance matrix are updated iteratively. The Matlab implementation uses
two variables declared ‘persistent’ for this. In R, which does not have such an
attribute for variables, we store them in the enclosing environment of the outer
function KalmanR, which contains an inner function kalmanfilter that is called
for each observation.
Armadillo provides efficient vector and matrix classes to implement the
Kalman filter. In Listing 5 on page 10, we show a simple C++ class containing
a basic constructor as well as one additional member function. The constructor
can be used to initialise all variables as we are guaranteed that the code in the
class constructor will be executed exactly once when this class is instantiated.
A class also makes it easy to add ‘persistent’ local variables, which is a feature
we need here. Given such a class, the estimation can be accessed from R via a
short and simple routine such as the one shown in Listing 6.
1 KalmanR <- function ( pos ) {
# # estimation
9 S <- H % * % t ( pprd ) % * % t ( H ) + R
B <- H % * % t ( pprd )
11
23 dt <- 1
A <- matrix ( c ( 1 , 0 , dt , 0 , 0 , 0 , # x
25 0 , 1 , 0 , dt , 0 , 0 , # y
0 , 0 , 1 , 0 , dt , 0 , # Vx
27 0 , 0 , 0 , 1 , 0 , dt , # Vy
0, 0, 0, 0, 1, 0, # Ax
29 0 , 0 , 0 , 0 , 0 , 1) , # Ay
6 , 6 , byrow = TRUE )
31 H <- matrix ( c (1 , 0 , 0 , 0 , 0 , 0 ,
0 , 1 , 0 , 0 , 0 , 0) ,
9
33 2 , 6 , byrow = TRUE )
Q <- diag (6)
35 R <- 1000 * diag (2)
N <- nrow ( pos )
37 Y <- matrix ( NA , N , 2)
for ( i in 1: N ) {
43 Y [i ,] <- kalmanfilter ( t ( pos [i , , drop = FALSE ]) )
}
45 invisible ( Y )
}
Listing 4: An improved Kalman filter implemented in R (referred to as KalmanR).
class Kalman {
4 private :
mat A , H , Q , R , xest , pest ;
6 double dt ;
8 public :
/ / constructor , sets up data structures
10 Kalman () : dt (1.0) {
A . eye (6 ,6) ;
12 A (0 ,2) = A (1 ,3) = A (2 ,4) = A (3 ,5) = dt ;
H . zeros (2 ,6) ;
14 H (0 ,0) = H (1 ,1) = 1.0;
Q . eye (6 ,6) ;
16 R = 1000 * eye (2 ,2) ;
xest . zeros (6 ,1) ;
18 pest . zeros (6 ,6) ;
}
20
10
34 S = H * pprd . t () * H . t () + R ;
B = H * pprd . t () ;
36 kalmangain = ( solve (S , B ) ) . t () ;
/ / estimated state and covariance
38 xest = xprd + kalmangain * ( z - H * xprd ) ;
pest = pprd - kalmangain * H * pprd ;
40 / / compute the estimated measurements
y = H * xest ;
42 Y . row ( i ) = y . t () ;
}
44 return Y ;
}
46 };
Listing 5: A Kalman filter class in C++, using Armadillo classes.
11
Trajectory
1.0
Estimate
0.5
0.0
y
−0.5
−1.0
Figure 1: An example of object trajectory and the corresponding Kalman filter estimate.
12
Implementation Time in seconds Relative to best solution
KalmanCpp 0.73 1.0
KalmanRimpC 21.10 29.0
KalmanRimp 22.01 30.2
KalmanRC 28.64 39.3
KalmanR 31.30 43.0
FirstKalmanRC 49.40 67.9
FirstKalmanR 64.00 88.0
3 The improved version replaces explicit transpose and multiplication with the crossprod
function.
13
6. Conclusion
This paper introduced the RcppArmadillo package for use within the R
statistical environment. By using the Rcpp interface package, RcppArmadillo
brings the speed of C++ along with the highly expressive Armadillo linear al-
gebra library to the R language. A small example implementing a Kalman filter
illustrated two key aspects. First, orders of magnitude of performance gains can
be obtained by deploying C++ code along with R. Second, the ease of use and
readability of the corresponding C++ code is similar to the R code from which
it was derived.
This combination makes RcppArmadillo a compelling tool in the arsenal of
applied researchers deploying computational methods in statistical computing
and data analysis. As of early-2013, about 30 R packages on CRAN deploy
RcppArmadillo4 , showing both the usefulness of Armadillo and its acceptance
by the R community.
Acknowledgements
References
Demmel, J. W., 1997. Applied Numerical Linear Algebra. SIAM, ISBN 978-
0898713893.
Eddelbuettel, D., 2013. Seamless R and C++ Integration with Rcpp. Springer,
New York.
Eddelbuettel, D., François, R., 2011. Rcpp: Seamless R and C++ integration.
Journal of Statistical Software 40 (8), 1–18.
URL http://www.jstatsoft.org/v40/i08/
Eddelbuettel, D., François, R., 2012. Rcpp: Seamless R and C++ Integration.
R package version 0.10.2.
URL http://CRAN.R-Project.org/package=Rcpp
14
Eddelbuettel, D., Francois, R., Bates, D., Ni, B., 2018. RcppArmadillo: ’Rcpp’
Integration for the ’Armadillo’ Templated Linear Algebra Library. R package
version 0.8.600.0.
URL http://CRAN.R-Project.org/package=RcppArmadillo
Eddelbuettel, D., Sanderson, C., March 2014. Rcpparmadillo: Accelerating R
with high-performance C++ linear algebra. Computational Statistics and
Data Analysis 71, 1054–1063.
URL http://dx.doi.org/10.1016/j.csda.2013.02.005
Kurzak, J., Bader, D. A., Dongarra, J. (Eds.), 2010. Scientific Computing with
Multicore and Accelerators. CRC Press, ISBN 978-1439825365.
Li, J., 2013. An unscented Kalman smoother for volatility extraction: Evidence
from stock prices and options. Computational Statistics and Data Analysis
58, 15–26.
Meyers, S., 2005. Effective C++: 55 Specific Ways to Improve Your Pro-
grams and Designs, 3rd Edition. Addison-Wesley Professional, ISBN 978-
0321334879.
Morandat, F., Hill, B., Osvald, L., Vitek, J., 2012. Evaluating the design of
the R language. In: ECOOP 2012: Proceedings of European Conference on
Object-Oriented Programming.
Sklyar, O., Murdoch, D., Smith, M., Eddelbuettel, D., François, R., 2012. inline:
Inline C, C++, Fortran function calls from R. R package version 0.3.10.
URL http://CRAN.R-Project.org/package=inline
Tierney, L., 2012. A byte-code compiler for R. Manuscript, Department of Statis-
tics and Actuarial Science, University of Iowa.
URL www.stat.uiowa.edu/~luke/R/compiler/compiler.pdf
Tusell, F., 2011. Kalman filtering in R. Journal of Statistical Software 39 (2),
1–27.
URL http://www.jstatsoft.org/v39/i02
15
Vandevoorde, D., Josuttis, N. M., 2002. C++ Templates: The Complete Guide.
Addison-Wesley Professional.
Veldhuizen, T. L., 1998. Arrays in Blitz++. In: ISCOPE ’98: Proceedings of the
Second International Symposium on Computing in Object-Oriented Parallel
Environments. Springer-Verlag, London, UK, pp. 223–230, ISBN 3-540-65387-
2.
16