Abstract
This paper presents a generic approach to highly efficient image registration in two and three dimensions. Both monomodal and multimodal registration problems are considered. We focus on the important class of affine-linear transformations in a derivative-based optimization framework. Our main contribution is an explicit formulation of the objective function gradient and Hessian approximation that allows for very efficient, parallel derivative calculation with virtually no memory requirements. The flexible parallelism of our concept allows for direct implementation on various hardware platforms. Derivative calculations are fully matrix free and operate directly on the input data, thereby reducing the auxiliary space requirements from \({\mathcal {O}}(n)\) to \({\mathcal {O}}(1)\). The proposed approach is implemented on multicore CPU and GPU. Our GPU code outperforms a conventional matrix-based CPU implementation by more than two orders of magnitude, thus enabling usage in real-time scenarios. The computational properties of our approach are extensively evaluated, thereby demonstrating the performance gain for a variety of real-life medical applications.








Similar content being viewed by others
References
Alavi, A., et al.: Is PET-CT the only option? Eur. J. Nucl. Med. Mol. Imaging 34, 819–821 (2007)
Berg, R., König, L., Rühaak, J., Lausen, R., Fischer, B.: Highly efficient image registration for embedded systems using a distributed multicore DSP architecture. J. Real Time Image Process. (2014). doi:10.1007/s11554-014-0457-3
Björck, A.: Numerical Methods for Least Squares Problems. SIAM, Philadelphia (1996)
Bronsert, P., Enderle-Ammour, K., Bader, M., Timme, S., Kuehs, M., Csanadi, A., Kayser, G., Kohler, I., Bausch, D., Hoeppner, J., et al.: Cancer cell invasion and EMT marker expression: a three-dimensional study of the human cancer-host interface. J. Pathol. 234(3), 410–422 (2014)
Brown, L.G.: A survey of image registration techniques. ACM Comput. Surv. (CSUR) 24(4), 325–376 (1992)
Buluc, A., Gilbert, J.R.: Parallel sparse matrix-matrix multiplication and indexing: implementation and experiments. SIAM J. Sci. Comput. 34(4), C170–C191 (2012)
Castro-Pareja, C.R., Jagadeesh, J.M., Shekhar, R.: FAIR: a hardware architecture for real-time 3-D image registration. IEEE Trans. Inf. Technol. Biomed. 7(4), 426–434 (2003)
Collignon, A., Maes, F., Delaere, D., Vandermeulen, D., Suetens, P., Marchal, G.: Automated multi-modality image registration based on information theory. Inf. Process. Med. Imaging 3, 264–274 (1995)
Davis, T.A.: Direct Methods for Sparse Linear Systems, vol. 2. SIAM, Philadelphia (2006)
De Luca, V., Benz, T., Kondo, S., König, L., Lübke, D., Rothlübbers, S., Somphone, O., Allaire, S., Bell, M.L., Chung, D., et al.: The 2014 liver ultrasound tracking benchmark. Phys. Med. Biol. 60(14), 5571 (2015)
Dennis Jr, J.E., Schnabel, R.B.: Numerical Methods for Unconstrained Optimization and Nonlinear Equations, vol. 16. SIAM, Philadelphia (1996)
Ferroli, P., Franzini, A., Marras, C., Maccagnano, E., D’Incerti, L., Broggi, G.: A simple method to assess accuracy of deep brain stimulation electrode placement: pre-operative stereotactic CT + postoperative MR image fusion. Stereotact. Func. Neurosurg. 82(1), 14–19 (2004)
Fischer, B., Modersitzki, J.: Fast diffusion registration. Contemp. Math. 313, 117–128 (2002)
Gigengack, F., Ruthotto, L., Burger, M., Wolters, C.H., Jiang, X., Schafers, K.P.: Motion correction in dual gated cardiac PET using mass-preserving image registration. IEEE Trans. Med. Imaging 31(3), 698–712 (2012)
Haber, E., Modersitzki, J.: A multilevel method for image registration. SIAM J. Sci. Comput. 27(5), 1594–1607 (2006)
Haber, E., Modersitzki, J.: Intensity gradient based registration and fusion of multi-modal images. Methods Inf. Med. 46, 292–9 (2007)
Haber, E., Heldmann, S., Modersitzki, J.: An octree method for parametric image registration. SIAM J. Sci. Comput. 29(5), 2008–2023 (2007)
Haber, E., Heldmann, S., Modersitzki, J.: Adaptive mesh refinement for nonparametric image registration. SIAM J. Sci. Comput. 30(6), 3012–3027 (2008)
Harris, M., et al.: Optimizing parallel reduction in CUDA. NVIDIA Developer Technology 2(4) (2007). http://docs.nvidia.com/cuda/samples/6_Advanced/reduction/doc/reduction.pdf
Kabus, S., Lorenz, C.: Fast elastic image registration. In: Proceedings of the medical image analysis for the clinic: a grand challenge, pp. 81–89. (2010)
Köhn, A., Drexl, J., Ritter, F., König, M., Peitgen, HO.: GPU accelerated image registration in two and three dimensions. In: Bildverarbeitung für die Medizin 2006, Springer, pp. 261–265 (2006)
König, L., Rühaak, J.: A fast and accurate parallel algorithm for non-linear image registration using normalized gradient fields. In: 2014 IEEE 11th international symposium on biomedical imaging (ISBI), pp. 580–583 (2014)
König, L., Kipshagen, T., Rühaak, J.: A non-linear image registration scheme for real-time liver ultrasound tracking using normalized gradient fields. In: Proceedings of MICCAI challenge on liver ultrasound tracking (CLUST 2014) (2014)
König, L., Derksen, A., Hallmann, M., Papenberg, N.: Parallel and memory efficient multimodal image registration for radiotherapy using normalized gradient fields. In: 2015 IEEE 12th international symposium on biomedical imaging (ISBI) (2015)
Lange, T., Papenberg, N., Heldmann, S., Modersitzki, J., Fischer, B., Lamecker, H., Schlag, P.M.: 3D ultrasound-CT registration of the liver using combined landmark-intensity information. Int. J. Comput. Assist. Radiol. Surg. 4(1), 79–88 (2009)
Lombardi, F., Spigler, R.: The evolution of the approach to scientific computing: a survey. J. Parallel Cloud Comput. 3(2), 32–42 (2014)
Maintz, J., Viergever, M.A.: A survey of medical image registration. Med. Image Anal. 2(1), 1–36 (1998)
Modersitzki, J.: Numerical Methods for Image Registration. Oxford University Press, Oxford (2004)
Modersitzki, J.: FAIR: Flexible Algorithms for Image Registration, vol. 6. SIAM, Philadelphia (2009)
Murphy, K., Van Ginneken, B., Reinhardt, J.M., Kabus, S., Ding, K., Deng, X., Cao, K., Du, K., Christensen, G.E., Garcia, V., et al.: Evaluation of registration methods on thoracic CT: the EMPIRE10 challenge. IEEE Trans. Med. Imaging 30(11), 1901–1920 (2011)
Nocedal, J., Wright, S.: Numerical Optimization. Springer, Berlin (1999)
NVIDIA Corporation: NVIDIA CUDA C Programming Guide. NVIDIA Corporation, Santa Clara (2014)
Powell, M.J.: An efficient method for finding the minimum of a function of several variables without calculating derivatives. The Comput. J. 7(2), 155–162 (1964)
Rühaak, J., Heldmann, S., Kipshagen, T., Fischer, B.: Highly accurate fast lung CT registration. In: SPIE Medical Imaging 2013, image processing, pp. 86,690Y–86,690Y–9 (2013a)
Rühaak, J., König, L., Hallmann, M., Papenberg, N., Heldmann, S., Schumacher, H., Fischer, B.: A fully parallel algorithm for multimodal image registration using normalized gradient fields. In: 2013 IEEE 10th international symposium on biomedical imaging (ISBI), pp. 572–575 (2013b)
Rühaak, J., Derksen, A., Heldmann, S., Hallmann, M., Meine, H.: Accurate CT-MR image registration for deep brain stimulation: a multi-observer evaluation study. In: SPIE Medical Imaging 2015: image processing (2015)
Salas Gonzalez, D., Górriz, J., Ramírez, J., Lassl, A., Puntonet, C.: Improved Gauss–Newton optimisation methods in affine registration of SPECT brain images. Electr. Lett. 44(22), 1291–1292 (2008)
Schmitt, O., Modersitzki, J., Heldmann, S., Wirtz, S., Fischer, B.: Image registration of sectioned brains. Int. J. Comput. Vis. 73(1), 5–39 (2007)
Shams, R., Sadeghi, P., Kennedy, R., Hartley, R.: A survey of medical image registration on multicore and the GPU. IEEE Sig. Process. Mag. 27(2), 50–60 (2010a)
Shams, R., Sadeghi, P., Kennedy, R., Hartley, R.: Parallel computation of mutual information on the GPU with application to real-time registration of 3D medical images. Comput. Methods Prog. Biomed. 99(2), 133–146 (2010b)
Shi, L., Liu, W., Zhang, H., Xie, Y., Wang, D.: A survey of GPU-based medical image computing techniques. Quant. Imaging Med. Surg. 2(3), 188 (2012)
Sotiras, A., Davatzikos, C., Paragios, N.: Deformable medical image registration: a survey. IEEE Trans. Med. Imaging 32(7), 1153–1190 (2013)
Soza, G., Bauer, M., Hastreiter, P., Nimsky, C., Greiner, G.: Non-rigid registration with use of hardware-based 3D Bézier functions. In: Medical image computing and computer-assisted intervention—MICCAI 2002, Springer, pp. 549–556 (2002)
Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66 (2010)
Stürmer, M., Köstler, H., Rüde, U.: A fast full multigrid solver for applications in image processing. Numer. Linear Algebra Appl. 15(2–3), 187–200 (2008)
Tramnitzke, F., Rühaak, J., König, L., Modersitzki, J., Köstler, H.: GPU based affine linear image registration using normalized gradient fields. In: Proceedings of 7th international workshop on high performance computing for biomedical image analysis (HPC-MICCAI) (2014)
Vercauteren, T., Pennec, X., Perchant, A., Ayache, N.: Diffeomorphic demons: efficient non-parametric image registration. NeuroImage 45(1), S61–S72 (2009)
Verma, P.S., Wu, H., Langer, M.P., Das, I.J., Sandison, G.: Survey: real-time tumor motion prediction for image-guided radiation treatment. Comput. Sci. Eng. 13(5), 24–35 (2011)
Viola, P., Wells III, W.M.: Alignment by maximization of mutual information. Int. J. Comput. Vis. 24(2), 137–154 (1997)
Wilt, N.: The CUDA handbook: a comprehensive guide to GPU programming. Pearson Education, Upper Saddle River (2013)
Zitova, B., Flusser, J.: Image registration methods: a survey. Image Vis. Comput. 21(11), 977–1000 (2003)
Acknowledgments
J. Rühaak, L. König, F. Tramnitzke and J. Modersitzki received funding from the European Union, European Regional Development Fund, Grant No. 122-10-002. All authors declare that they have no conflicts of interest.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix
Extension to the three-dimensional case
In this appendix, explicit matrix-free calculation rules will be derived for affine-linear registration of three-dimensional images with the SSD and NGF distance measures. Most definitions of the occuring functions are briefly repeated here to improve readability.
2.1 Sum of squared differences (SSD)
For any \(y:\Omega _{{\mathcal {R}}}\rightarrow {\mathbb {R}}^{3}\), the sum of squared differences (SSD) distance measure [28] is given by
Let \(y_w:{\mathbb {R}^{3}}\rightarrow {\mathbb {R}^{3}}, \quad x\,\mapsto \,Ax+b\) denote a three-dimensional affine-linear transformation with \(w=(w_1,\ldots ,w_{12})\) and
Setting \({\mathcal {D}}_{\text {SSD}}(w):={\mathcal {D}}_{\text {SSD}}({\mathcal {R}},{\mathcal {T}};y_w)\) yields the formulation of affine-linear image registration with SSD as minimization problem
with \({\mathcal {D}}_{\text {SSD}}:{\mathbb {R}^{12}}\rightarrow {\mathbb {R}}\). For discretization, the domain \(\Omega _{{\mathcal {R}}}\) is assumed to be cuboid and decomposed into n cells of equal size with center points \({\mathbf {x}}_{i},\, i=1,\ldots ,n\), arranged in lexicographical ordering. Using the midpoint quadrature rule for numerical integration, a discretized version of (25) reads
where \(\bar{h}\) denotes the volume of each cell. Multilinear interpolation with Dirichlet zero boundary conditions is used to evaluate the discrete template image at arbitrary coordinates.
Let \(({\mathbf {x}}_{i})_j\) denote the j-th component of \({\mathbf {x}}_{i}\in {\mathbb {R}}^{3}\). For transformation parameters \(w\in {\mathbb {R}^{12}}\), we define the vector
to construct the function
Using \({{\mathbf {y}}_{i}} = (y_i,y_{i+n},y_{i+2n})^\top \), we define
With \(R_i := {{\mathcal {R}}}({{\mathbf {x}}}_{i})\), we set
as residual function and finally
as the sum of all squared residual elements. Now, \(D_{\mathrm {SSD}}\) can be written as a concatenation of four functions:
2.1.1 Matrix-based differentiation
The differentiation of (28) is performed with the chain rule as
just as in the two-dimensional case. Again, we define the gradient as a row vector. The first two individual derivatives in (29) are given by
with \(I_{n} \in {\mathbb {R}^{n\;\times\;n}} \) as the identity matrix. Denoting the partial derivative with respect to the i-th component by \(\partial _i\) and defining \(\partial _i {\mathcal {T[y]}}\) as
it holds that
Finally, the derivative of the function y is given by
with the Kronecker product \(\otimes \) and the grid matrix X as
thus completing the analysis of the gradient components from (29). With
the Gauss–Newton approximation \(H_{\text {SSD}}\) of the Hessian matrix is given by
with \({\mathrm {d}}_2\psi =\bar{h}\). Again, note \(\frac{\partial r}{\partial T}=I_n\).
2.1.2 Matrix-free derivative calculation
With (31) and (32), it follows that
Using (30), it holds that
with \({\mathcal {T}_{w}}({{\mathbf {x}}}_i)\,\,{:=}\,\,{{\mathcal {T}}}(y_w({{\mathbf {x}}}_i))\). The explicit calculation rule for the objective function gradient in the three-dimensional case is therefore given by
The Gauss–Newton approximation to the Hessian for the SSD distance measure is defined as
By utilizing (34) and setting
it directly follows that
2.2 Normalized gradient fields (NGF)
We consider the NGF distance measure [16]
\(\langle a,b \rangle _{\alpha ,\beta }:=\sum _{i=1}^{3}a_ib_i+\alpha \beta ,\ a,b\in {\mathbb {R}}^{3}\), \(\Vert a\Vert _\varepsilon :=\sqrt{\sum _{i=1}^3 a_i^2+\varepsilon ^2}\), with separate edge parameters for reference and template image, cf. [35]. Setting \({\mathcal {D}}_{\text {NGF}}(w) := {\mathcal {D}}_{\text {NGF}}({\mathcal {R}},{\mathcal {T}};y_w)\), affine-linear image registration with NGF translates to
For numerical optimization, the continuous formulation in (37) is discretized. For a reference image of size \(n_1\;\times\;n_2\;\times\;n_3\) and an index \(i,\ i=1,\ldots ,n\), let \(i', j',k'\in {\mathbb {N}},1\le i'\le n_1,\ 1\le j'\le n_2,\ 1\le k'\le n_3\) such that \(i = i' + j'n_1 + k'n_1n_2\). The indices of neighboring points with Neumann zero boundary conditions are given by
We define functions
and
for gradient and scalar product type operations at the position i, respectively. Further setting
the discretized version of (37) is given by
with \((T_w)_i = {\mathcal {T}}(y_w({{\mathbf {x}}}_i))\).
2.2.1 Matrix-based differentiation
Let y and T as in (26) and (27). We define the residual function \(r:{\mathbb {R}^{n}}\rightarrow {\mathbb {R}^{n}}\) by setting the i-th component function \(r_i:{\mathbb {R}^{n}}\rightarrow {\mathbb {R}}\) to
The reduction function \(\psi :{\mathbb {R}^{n}}\rightarrow {\mathbb {R}}\) is given by
yielding the function chain
The derivatives of T and y have already been computed in (31) and (32). For the reduction function \(\psi \), it holds that
The calculation of \(\frac{\partial r}{\partial T}\) is performed by differentiating the component functions \(r_i,\; i=1,\ldots ,n\). The functions \(r_i\) are composed of \(s_i\), \(g_i\) and \(n_\varepsilon \) whose derivatives are given by

and
with \(\frac{\partial g_i}{\partial T}\in {\mathbb {R}^{3\;\times\;n}}\). Applying the chain rule in both numerator and denominator of \(r_i\) yields
with the vector entries at positions \(i_{-z},i_{-y},i_{-x},i_{+x},i_{+y},\) and \(i_{+z}\) (in that order) as defined in (38). Note that these positions may coincide, in which case the values are added.
The Gauss–Newton approximation \(H_{\text{NGF}}\) to the Hessian is given by
with \({\mathrm {d}}r\) defined as in (33) and \({\mathrm {d}}_2\psi =-{\bar{h}}\).
2.2.2 Matrix-free derivative calculation
Setting \(r_i:=\frac{s_i(g_i(T))}{ n_\varrho ( g_i(R))\ n_\tau ( g_i(T))}\) and \({\mathrm {d}}r_i:=\frac{\partial r_i}{\partial T}\frac{\partial T}{\partial y}\frac{\partial y}{\partial w}\), it holds with (39) that
As \(r_i\in {\mathbb {R}}\) are scalars, it suffices to derive a matrix-free description of the vectors \({\mathrm {d}}r_i\in {\mathbb {R}}^{12}\) to achieve a fully matrix-free formulation of the objective function gradient. Let \(1\le i \le n\) and define indices \(i_{-z},i_{-y},i_{-x},i_{+x},i_{+y},i_{+z}\) as in (38). With the definition
for \(i=1,\ldots ,n\), \(j=1,2,3\), \(k=1,\dots ,4\) and
it follows that
which according to (40) yields
completing the gradient calculation for the three-dimensional case. Since
the calculation of the Hessian approximation can directly be performed with the help of the matrix-free formulation of \({\mathrm {d}}r_i\) from (41). By defining the matrices \(l_k\in {\mathbb {R}^{12}\;\times\;12}\) as
analog to the case of SSD, the matrix-free formulation for the Gauss–Newton approximation to the Hessian is given by
This finalizes the derivation of matrix-free calculation rules for objective function gradient and Gauss–Newton approximation to the Hessian also for the Normalized Gradient Fields distance measure with three-dimensional images.
Rights and permissions
About this article
Cite this article
Rühaak, J., König, L., Tramnitzke, F. et al. A matrix-free approach to efficient affine-linear image registration on CPU and GPU. J Real-Time Image Proc 13, 205–225 (2017). https://doi.org/10.1007/s11554-016-0564-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-016-0564-4