A matrix-free approach to efficient affine-linear image registration on CPU and GPU

Rühaak, Jan; König, Lars; Tramnitzke, Florian; Köstler, Harald; Modersitzki, Jan

doi:10.1007/s11554-016-0564-4

A matrix-free approach to efficient affine-linear image registration on CPU and GPU

Special Issue Paper
Published: 05 April 2016

Volume 13, pages 205–225, (2017)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

529 Accesses
Explore all metrics

Abstract

This paper presents a generic approach to highly efficient image registration in two and three dimensions. Both monomodal and multimodal registration problems are considered. We focus on the important class of affine-linear transformations in a derivative-based optimization framework. Our main contribution is an explicit formulation of the objective function gradient and Hessian approximation that allows for very efficient, parallel derivative calculation with virtually no memory requirements. The flexible parallelism of our concept allows for direct implementation on various hardware platforms. Derivative calculations are fully matrix free and operate directly on the input data, thereby reducing the auxiliary space requirements from ${\mathcal {O}}(n)$ to ${\mathcal {O}}(1)$. The proposed approach is implemented on multicore CPU and GPU. Our GPU code outperforms a conventional matrix-based CPU implementation by more than two orders of magnitude, thus enabling usage in real-time scenarios. The computational properties of our approach are extensively evaluated, thereby demonstrating the performance gain for a variety of real-life medical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fully-Deformable 3D Image Registration in Two Seconds

CLAIRE: Scalable GPU-Accelerated Algorithms for Diffeomorphic Image Registration in 3D

GPU Accelerated High Accuracy Digital Volume Correlation

References

Alavi, A., et al.: Is PET-CT the only option? Eur. J. Nucl. Med. Mol. Imaging 34, 819–821 (2007)
Article Google Scholar
Berg, R., König, L., Rühaak, J., Lausen, R., Fischer, B.: Highly efficient image registration for embedded systems using a distributed multicore DSP architecture. J. Real Time Image Process. (2014). doi:10.1007/s11554-014-0457-3
Björck, A.: Numerical Methods for Least Squares Problems. SIAM, Philadelphia (1996)
Book MATH Google Scholar
Bronsert, P., Enderle-Ammour, K., Bader, M., Timme, S., Kuehs, M., Csanadi, A., Kayser, G., Kohler, I., Bausch, D., Hoeppner, J., et al.: Cancer cell invasion and EMT marker expression: a three-dimensional study of the human cancer-host interface. J. Pathol. 234(3), 410–422 (2014)
Article Google Scholar
Brown, L.G.: A survey of image registration techniques. ACM Comput. Surv. (CSUR) 24(4), 325–376 (1992)
Article Google Scholar
Buluc, A., Gilbert, J.R.: Parallel sparse matrix-matrix multiplication and indexing: implementation and experiments. SIAM J. Sci. Comput. 34(4), C170–C191 (2012)
Article MathSciNet MATH Google Scholar
Castro-Pareja, C.R., Jagadeesh, J.M., Shekhar, R.: FAIR: a hardware architecture for real-time 3-D image registration. IEEE Trans. Inf. Technol. Biomed. 7(4), 426–434 (2003)
Article Google Scholar
Collignon, A., Maes, F., Delaere, D., Vandermeulen, D., Suetens, P., Marchal, G.: Automated multi-modality image registration based on information theory. Inf. Process. Med. Imaging 3, 264–274 (1995)
Google Scholar
Davis, T.A.: Direct Methods for Sparse Linear Systems, vol. 2. SIAM, Philadelphia (2006)
Book MATH Google Scholar
De Luca, V., Benz, T., Kondo, S., König, L., Lübke, D., Rothlübbers, S., Somphone, O., Allaire, S., Bell, M.L., Chung, D., et al.: The 2014 liver ultrasound tracking benchmark. Phys. Med. Biol. 60(14), 5571 (2015)
Article Google Scholar
Dennis Jr, J.E., Schnabel, R.B.: Numerical Methods for Unconstrained Optimization and Nonlinear Equations, vol. 16. SIAM, Philadelphia (1996)
Book MATH Google Scholar
Ferroli, P., Franzini, A., Marras, C., Maccagnano, E., D’Incerti, L., Broggi, G.: A simple method to assess accuracy of deep brain stimulation electrode placement: pre-operative stereotactic CT + postoperative MR image fusion. Stereotact. Func. Neurosurg. 82(1), 14–19 (2004)
Article Google Scholar
Fischer, B., Modersitzki, J.: Fast diffusion registration. Contemp. Math. 313, 117–128 (2002)
Article MathSciNet MATH Google Scholar
Gigengack, F., Ruthotto, L., Burger, M., Wolters, C.H., Jiang, X., Schafers, K.P.: Motion correction in dual gated cardiac PET using mass-preserving image registration. IEEE Trans. Med. Imaging 31(3), 698–712 (2012)
Article Google Scholar
Haber, E., Modersitzki, J.: A multilevel method for image registration. SIAM J. Sci. Comput. 27(5), 1594–1607 (2006)
Article MathSciNet MATH Google Scholar
Haber, E., Modersitzki, J.: Intensity gradient based registration and fusion of multi-modal images. Methods Inf. Med. 46, 292–9 (2007)
Google Scholar
Haber, E., Heldmann, S., Modersitzki, J.: An octree method for parametric image registration. SIAM J. Sci. Comput. 29(5), 2008–2023 (2007)
Article MathSciNet MATH Google Scholar
Haber, E., Heldmann, S., Modersitzki, J.: Adaptive mesh refinement for nonparametric image registration. SIAM J. Sci. Comput. 30(6), 3012–3027 (2008)
Article MathSciNet MATH Google Scholar
Harris, M., et al.: Optimizing parallel reduction in CUDA. NVIDIA Developer Technology 2(4) (2007). http://docs.nvidia.com/cuda/samples/6_Advanced/reduction/doc/reduction.pdf
Kabus, S., Lorenz, C.: Fast elastic image registration. In: Proceedings of the medical image analysis for the clinic: a grand challenge, pp. 81–89. (2010)
Köhn, A., Drexl, J., Ritter, F., König, M., Peitgen, HO.: GPU accelerated image registration in two and three dimensions. In: Bildverarbeitung für die Medizin 2006, Springer, pp. 261–265 (2006)
König, L., Rühaak, J.: A fast and accurate parallel algorithm for non-linear image registration using normalized gradient fields. In: 2014 IEEE 11th international symposium on biomedical imaging (ISBI), pp. 580–583 (2014)
König, L., Kipshagen, T., Rühaak, J.: A non-linear image registration scheme for real-time liver ultrasound tracking using normalized gradient fields. In: Proceedings of MICCAI challenge on liver ultrasound tracking (CLUST 2014) (2014)
König, L., Derksen, A., Hallmann, M., Papenberg, N.: Parallel and memory efficient multimodal image registration for radiotherapy using normalized gradient fields. In: 2015 IEEE 12th international symposium on biomedical imaging (ISBI) (2015)
Lange, T., Papenberg, N., Heldmann, S., Modersitzki, J., Fischer, B., Lamecker, H., Schlag, P.M.: 3D ultrasound-CT registration of the liver using combined landmark-intensity information. Int. J. Comput. Assist. Radiol. Surg. 4(1), 79–88 (2009)
Article Google Scholar
Lombardi, F., Spigler, R.: The evolution of the approach to scientific computing: a survey. J. Parallel Cloud Comput. 3(2), 32–42 (2014)
Google Scholar
Maintz, J., Viergever, M.A.: A survey of medical image registration. Med. Image Anal. 2(1), 1–36 (1998)
Article Google Scholar
Modersitzki, J.: Numerical Methods for Image Registration. Oxford University Press, Oxford (2004)
Modersitzki, J.: FAIR: Flexible Algorithms for Image Registration, vol. 6. SIAM, Philadelphia (2009)
Book MATH Google Scholar
Murphy, K., Van Ginneken, B., Reinhardt, J.M., Kabus, S., Ding, K., Deng, X., Cao, K., Du, K., Christensen, G.E., Garcia, V., et al.: Evaluation of registration methods on thoracic CT: the EMPIRE10 challenge. IEEE Trans. Med. Imaging 30(11), 1901–1920 (2011)
Article Google Scholar
Nocedal, J., Wright, S.: Numerical Optimization. Springer, Berlin (1999)
NVIDIA Corporation: NVIDIA CUDA C Programming Guide. NVIDIA Corporation, Santa Clara (2014)
Powell, M.J.: An efficient method for finding the minimum of a function of several variables without calculating derivatives. The Comput. J. 7(2), 155–162 (1964)
Article MathSciNet MATH Google Scholar
Rühaak, J., Heldmann, S., Kipshagen, T., Fischer, B.: Highly accurate fast lung CT registration. In: SPIE Medical Imaging 2013, image processing, pp. 86,690Y–86,690Y–9 (2013a)
Rühaak, J., König, L., Hallmann, M., Papenberg, N., Heldmann, S., Schumacher, H., Fischer, B.: A fully parallel algorithm for multimodal image registration using normalized gradient fields. In: 2013 IEEE 10th international symposium on biomedical imaging (ISBI), pp. 572–575 (2013b)
Rühaak, J., Derksen, A., Heldmann, S., Hallmann, M., Meine, H.: Accurate CT-MR image registration for deep brain stimulation: a multi-observer evaluation study. In: SPIE Medical Imaging 2015: image processing (2015)
Salas Gonzalez, D., Górriz, J., Ramírez, J., Lassl, A., Puntonet, C.: Improved Gauss–Newton optimisation methods in affine registration of SPECT brain images. Electr. Lett. 44(22), 1291–1292 (2008)
Article Google Scholar
Schmitt, O., Modersitzki, J., Heldmann, S., Wirtz, S., Fischer, B.: Image registration of sectioned brains. Int. J. Comput. Vis. 73(1), 5–39 (2007)
Article Google Scholar
Shams, R., Sadeghi, P., Kennedy, R., Hartley, R.: A survey of medical image registration on multicore and the GPU. IEEE Sig. Process. Mag. 27(2), 50–60 (2010a)
Article Google Scholar
Shams, R., Sadeghi, P., Kennedy, R., Hartley, R.: Parallel computation of mutual information on the GPU with application to real-time registration of 3D medical images. Comput. Methods Prog. Biomed. 99(2), 133–146 (2010b)
Article Google Scholar
Shi, L., Liu, W., Zhang, H., Xie, Y., Wang, D.: A survey of GPU-based medical image computing techniques. Quant. Imaging Med. Surg. 2(3), 188 (2012)
Google Scholar
Sotiras, A., Davatzikos, C., Paragios, N.: Deformable medical image registration: a survey. IEEE Trans. Med. Imaging 32(7), 1153–1190 (2013)
Article Google Scholar
Soza, G., Bauer, M., Hastreiter, P., Nimsky, C., Greiner, G.: Non-rigid registration with use of hardware-based 3D Bézier functions. In: Medical image computing and computer-assisted intervention—MICCAI 2002, Springer, pp. 549–556 (2002)
Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66 (2010)
Article Google Scholar
Stürmer, M., Köstler, H., Rüde, U.: A fast full multigrid solver for applications in image processing. Numer. Linear Algebra Appl. 15(2–3), 187–200 (2008)
Article MathSciNet MATH Google Scholar
Tramnitzke, F., Rühaak, J., König, L., Modersitzki, J., Köstler, H.: GPU based affine linear image registration using normalized gradient fields. In: Proceedings of 7th international workshop on high performance computing for biomedical image analysis (HPC-MICCAI) (2014)
Vercauteren, T., Pennec, X., Perchant, A., Ayache, N.: Diffeomorphic demons: efficient non-parametric image registration. NeuroImage 45(1), S61–S72 (2009)
Article Google Scholar
Verma, P.S., Wu, H., Langer, M.P., Das, I.J., Sandison, G.: Survey: real-time tumor motion prediction for image-guided radiation treatment. Comput. Sci. Eng. 13(5), 24–35 (2011)
Article Google Scholar
Viola, P., Wells III, W.M.: Alignment by maximization of mutual information. Int. J. Comput. Vis. 24(2), 137–154 (1997)
Article Google Scholar
Wilt, N.: The CUDA handbook: a comprehensive guide to GPU programming. Pearson Education, Upper Saddle River (2013)
Zitova, B., Flusser, J.: Image registration methods: a survey. Image Vis. Comput. 21(11), 977–1000 (2003)
Article Google Scholar

Download references

Acknowledgments

J. Rühaak, L. König, F. Tramnitzke and J. Modersitzki received funding from the European Union, European Regional Development Fund, Grant No. 122-10-002. All authors declare that they have no conflicts of interest.

Author information

Authors and Affiliations

Fraunhofer MEVIS, Maria-Goeppert-Str. 3, 23562, Lübeck, Germany
Jan Rühaak, Lars König & Florian Tramnitzke
Universität Erlangen-Nürnberg, Lehrstuhl für Systemsimulation, Cauerstr. 11, 91058, Erlangen, Germany
Harald Köstler
Universität zu Lübeck, Institute of Mathematics and Image Computing, Maria-Goeppert-Str. 3, 23562, Lübeck, Germany
Jan Modersitzki

Authors

Jan Rühaak
View author publications
You can also search for this author inPubMed Google Scholar
Lars König
View author publications
You can also search for this author inPubMed Google Scholar
Florian Tramnitzke
View author publications
You can also search for this author inPubMed Google Scholar
Harald Köstler
View author publications
You can also search for this author inPubMed Google Scholar
Jan Modersitzki
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jan Rühaak.

Appendices

Appendix

Extension to the three-dimensional case

In this appendix, explicit matrix-free calculation rules will be derived for affine-linear registration of three-dimensional images with the SSD and NGF distance measures. Most definitions of the occuring functions are briefly repeated here to improve readability.

2.1 Sum of squared differences (SSD)

For any $y:\Omega _{{\mathcal {R}}}\rightarrow {\mathbb {R}}^{3}$, the sum of squared differences (SSD) distance measure [28] is given by

$$\begin{aligned} {\mathcal {D}}_{\text {SSD}}({\mathcal {R}},{\mathcal {T}};y) := \frac{1}{2} \int _{\Omega _{\mathcal {R}}} \left( {\mathcal {T}}(y({\mathbf {x}})) - {\mathcal {R}}({\mathbf {x}}) \right) ^2 {\mathrm {d}}{\mathbf {x}}. \end{aligned}$$

Let $y_w:{\mathbb {R}^{3}}\rightarrow {\mathbb {R}^{3}}, \quad x\,\mapsto \,Ax+b$ denote a three-dimensional affine-linear transformation with $w=(w_1,\ldots ,w_{12})$ and

$$ A: = \left( {\begin{array}{*{20}c} {w_{1} } & {w_{2} } & {w_{3} } \\ {w_{5} } & {w_{6} } & {w_{7} } \\ {w_{9} } & {w_{{10}} } & {w_{{11}} } \\ \end{array} } \right),\;b: = \left( {\begin{array}{*{20}c} {w_{4} } \\ {w_{8} } \\ {w_{{12}} } \\ \end{array} } \right). $$

Setting ${\mathcal {D}}_{\text {SSD}}(w):={\mathcal {D}}_{\text {SSD}}({\mathcal {R}},{\mathcal {T}};y_w)$ yields the formulation of affine-linear image registration with SSD as minimization problem

$$\begin{aligned} \min _w \ {\mathcal {D}}_{\text {SSD}}(w) \end{aligned}$$

(25)

with ${\mathcal {D}}_{\text {SSD}}:{\mathbb {R}^{12}}\rightarrow {\mathbb {R}}$. For discretization, the domain $\Omega _{{\mathcal {R}}}$ is assumed to be cuboid and decomposed into n cells of equal size with center points ${\mathbf {x}}_{i},\, i=1,\ldots ,n$, arranged in lexicographical ordering. Using the midpoint quadrature rule for numerical integration, a discretized version of (25) reads

$$\begin{aligned} \min _w \ D_{\mathrm {SSD}}(w) :=\frac{\bar{h}}{2} \displaystyle \sum _{i=1}^{n} \left( {\mathcal {T}}(y_w({\mathbf {x}}_i)) - {\mathcal {R}}({\mathbf {x}}_i) \right) ^2, \end{aligned}$$

where $\bar{h}$ denotes the volume of each cell. Multilinear interpolation with Dirichlet zero boundary conditions is used to evaluate the discrete template image at arbitrary coordinates.

Let $({\mathbf {x}}_{i})_j$ denote the j-th component of ${\mathbf {x}}_{i}\in {\mathbb {R}}^{3}$. For transformation parameters $w\in {\mathbb {R}^{12}}$, we define the vector

$$ v_{i} : = \left( {\begin{array}{*{20}c} {\left( {A{\mathbf{x}}_{1} + b} \right)_{i} } \\ {\left( {A{\mathbf{x}}_{2} + b} \right)_{i} } \\ \vdots \\ {\left( {A{\mathbf{x}}_{n} + b} \right)_{i} } \\ \end{array} } \right) \in {\mathbb{R}^{n}} , \quad i = 1,2,3, $$

to construct the function

$$ y:{\mathbb{R}^{{12}}} \to {\mathbb{R}^{{3n}}} ,\quad w \; \mapsto \left( {\begin{array}{*{20}c} {v_{1} } \\ {v_{2} } \\ {v_{3} } \\ \end{array} } \right). $$

(26)

Using ${{\mathbf {y}}_{i}} = (y_i,y_{i+n},y_{i+2n})^\top $, we define

$$ T:{\mathbb{R}^{{3n}}} \to {\mathbb{R}^{n}} ,\;\left( {\begin{array}{*{20}c} {y_{1} } \\ \vdots \\ {y_{{3n}} } \\ \end{array} } \right) \mapsto \left( {\begin{array}{*{20}c} {{\mathcal{T}}({\mathbf{y}}_{1} )} \\ \vdots \\ {{\mathcal{T}}({\mathbf{y}}_{n} )} \\ \end{array} } \right). $$

(27)

With $R_i := {{\mathcal {R}}}({{\mathbf {x}}}_{i})$, we set

$$ r:{\mathbb{R}^{n}} \to {\mathbb{R}^{n}} ,\;\left( {\begin{array}{*{20}c} {T_{1} } \\ \vdots \\ {T_{n} } \\ \end{array} } \right) \mapsto \left( {\begin{array}{*{20}c} {T_{1} - R_{1} } \\ \vdots \\ {T_{n} - R_{n} } \\ \end{array} } \right) $$

as residual function and finally

$$ \psi :{\mathbb{R}^{n}} \to {\mathbb{R}},\;\left( {\begin{array}{*{20}c} {r_{1} } \\ \vdots \\ {r_{n} } \\ \end{array} } \right) \mapsto \frac{{\bar{h}}}{2}\sum\limits_{{i = 1}}^{n} {r_{i}^{2} } $$

as the sum of all squared residual elements. Now, $D_{\mathrm {SSD}}$ can be written as a concatenation of four functions:

$$\begin{aligned} D_{\text {SSD}}: {{\mathbb {R}}^{12}\xrightarrow {y}}{{\mathbb {R}}^{3n}\xrightarrow {T}}{{\mathbb {R}}^{n}\xrightarrow {r}}{{\mathbb {R}}^{n}\xrightarrow {\psi }}{\mathbb {R}}. \end{aligned}$$

(28)

2.1.1 Matrix-based differentiation

The differentiation of (28) is performed with the chain rule as

$$\begin{aligned} \nabla D_{\text {SSD}}(w) = \frac{\partial \psi }{\partial r}\frac{\partial r}{\partial T}\frac{\partial T}{\partial y} \frac{\partial y}{\partial w} \end{aligned}$$

(29)

just as in the two-dimensional case. Again, we define the gradient as a row vector. The first two individual derivatives in (29) are given by

$$\begin{aligned} \frac{\partial \psi }{\partial r}[r]&= {\bar{h}}(r_1,\ldots ,r_{n}) \ \text {and} \nonumber \\ \frac{\partial r}{\partial T}[T]&= I_{n}, \end{aligned}$$

(30)

with $I_{n} \in {\mathbb {R}^{n\;\times\;n}} $ as the identity matrix. Denoting the partial derivative with respect to the i-th component by $\partial _i$ and defining $\partial _i {\mathcal {T[y]}}$ as

$$ \partial _{i} {\mathcal{T}}[y]: = \left( {\begin{array}{*{20}c} {\partial _{i} {\mathcal{T}}({\mathbf{y}}_{1} )} & {} & {} \\ {} & \ddots & {} \\ {} & {} & {\partial _{i} {\mathcal{T}}({\mathbf{y}}_{n} )} \\ \end{array} } \right),\quad i = 1,2,3, $$

it holds that

$$ \frac{{\partial T}}{{\partial y}}[y] = \left( {\begin{array}{*{20}c} {\partial _{1} {\mathcal{T}}} & {\partial _{2} {\mathcal{T}}} & {\partial _{3} {\mathcal{T}}} \\ \end{array} } \right) \in {\mathbb{R}}^{{n\; \times \;3n}} . $$

(31)

Finally, the derivative of the function y is given by

$$\begin{aligned} \frac{\partial y}{\partial w}[w] = I_3 \otimes {\mathbf {X}} \in {\mathbb {R}^{3n\;\times\;12}} \end{aligned}$$

(32)

with the Kronecker product $\otimes $ and the grid matrix X as

$$ {\mathbf{X}}: = \left( {\begin{array}{*{20}c} {({\mathbf{x}}_{1} )_{1} } & {({\mathbf{x}}_{1} )_{2} } & {({\mathbf{x}}_{1} )_{3} } & 1 \\ {({\mathbf{x}}_{2} )_{1} } & {({\mathbf{x}}_{2} )_{2} } & {({\mathbf{x}}_{2} )_{3} } & 1 \\ \vdots & \vdots & \vdots & \vdots \\ {({\mathbf{x}}_{n} )_{1} } & {({\mathbf{x}}_{n} )_{2} } & {({\mathbf{x}}_{n} )_{3} } & 1 \\ \end{array} } \right) \in {\mathbb{R}^{{n\; \times \;4}}} , $$

thus completing the analysis of the gradient components from (29). With

$$\begin{aligned} {\mathrm {d}} r := \frac{\partial r}{\partial T}\frac{\partial T}{\partial y}\frac{\partial y}{\partial w}\in {\mathbb {R}}^{n\;\times\;12}, \end{aligned}$$

(33)

the Gauss–Newton approximation $H_{\text {SSD}}$ of the Hessian matrix is given by

$$\begin{aligned} H_{\text {SSD}}(w) := {\mathrm {d}} r^\top {\mathrm {d}}_2\psi {\mathrm {d}} r \end{aligned}$$

with ${\mathrm {d}}_2\psi =\bar{h}$. Again, note $\frac{\partial r}{\partial T}=I_n$.

2.1.2 Matrix-free derivative calculation

With (31) and (32), it follows that

$$\left( \frac{\partial T}{\partial y}\frac{\partial y}{\partial w}\right) _{i,j} \,= \, \left\{ \begin{array}{ll} \partial _1 {\mathcal{T}}({\mathbf{y}}_i) {\mathbf{X}}_{i,j}&{}1\le j \le 4\\ \partial _2 {\mathcal{T}}({\mathbf{y}}_{i}){\mathbf{X}}_{i,j-4}&{}5\le j \le 8\\ \partial _3 {\mathcal{T}}({\mathbf{y}}_{i}){\mathbf{X}}_{i,j-8}&{}9\le j \le 12 \end{array}\right. . $$

(34)

Using (30), it holds that

$$\begin{aligned} \left( \frac{\partial \psi }{\partial r}\right) _i = {\mathcal {T}_{w}}({{\mathbf {x}}}_i) - {\mathcal R}({{\mathbf {x}}}_i) \end{aligned}$$

with ${\mathcal {T}_{w}}({{\mathbf {x}}}_i)\,\,{:=}\,\,{{\mathcal {T}}}(y_w({{\mathbf {x}}}_i))$. The explicit calculation rule for the objective function gradient in the three-dimensional case is therefore given by

$$ \nabla D_{{{\text{SSD}}}} (w) = \bar{h}\sum\limits_{{i = 1}}^{n} {\left( {{\mathcal{T}}_{w} ({\mathbf{x}}_{i} ) - {\mathcal{R}}({\mathbf{x}}_{i} )} \right)} \left( {\begin{array}{*{20}c} {\partial _{1} {\mathcal{T}}_{w} ({\mathbf{x}}_{i} )({\mathbf{x}}_{i} )_{1} } \\ {\partial _{1} {\mathcal{T}}_{w} ({\mathbf{x}}_{i} )({\mathbf{x}}_{i} )_{2} } \\ {\partial _{1} {\mathcal{T}}_{w} ({\mathbf{x}}_{i} )({\mathbf{x}}_{i} )_{3} } \\ {\partial _{1} {\mathcal{T}}_{w} ({\mathbf{x}}_{i} )} \\ {\partial _{2} {\mathcal{T}}_{w} ({\mathbf{x}}_{i} )({\mathbf{x}}_{i} )_{1} } \\ {\partial _{2} {\mathcal{T}}_{w} ({\mathbf{x}}_{i} )({\mathbf{x}}_{i} )_{2} } \\ {\partial _{2} {\mathcal{T}}_{w} ({\mathbf{x}}_{i} )({\mathbf{x}}_{i} )_{3} } \\ {\partial _{2} {\mathcal{T}}_{w} ({\mathbf{x}}_{i} )} \\ {\partial _{3} {\mathcal{T}}_{w} ({\mathbf{x}}_{i} )({\mathbf{x}}_{i} )_{1} } \\ {\partial _{3} {\mathcal{T}}_{w} ({\mathbf{x}}_{i} )({\mathbf{x}}_{i} )_{2} } \\ {\partial _{3} {\mathcal{T}}_{w} ({\mathbf{x}}_{i} )({\mathbf{x}}_{i} )_{3} } \\ {\partial _{3} {\mathcal{T}}_{w} ({\mathbf{x}}_{i} )} \\ \end{array} } \right)^{{ \top }}. $$

(35)

The Gauss–Newton approximation to the Hessian for the SSD distance measure is defined as

$$\begin{aligned} H_{\mathrm {SSD}}&= {\mathrm {d}}r^\top {\mathrm {d}}_2\psi {\mathrm {d}}r\\&= {\bar{h}}\left( \frac{\partial T}{\partial y}\frac{\partial y}{\partial w}\right) ^\top \left( \frac{\partial T}{\partial y}\frac{\partial y}{\partial w}\right) \in {\mathbb {R}}^{12\;\times\;12}. \end{aligned}$$

By utilizing (34) and setting

$$ l_{k} : = \left( {\begin{array}{*{20}c} {} \\ {\left( {\frac{{\partial T}}{{\partial y}}\frac{{\partial y}}{{\partial w}}} \right)_{{k,i}} \cdot\left( {\frac{{\partial T}}{{\partial y}}\frac{{\partial y}}{{\partial w}}} \right)_{{k,j}} } \\ {} \\ \end{array} } \right)_{{1 \le i,j \le 12}} , $$

(36)

it directly follows that

$$\begin{aligned} H_{\mathrm {SSD}}(w) = \bar{h} \displaystyle \sum _{k=1}^{n} l_k. \end{aligned}$$

2.2 Normalized gradient fields (NGF)

We consider the NGF distance measure [16]

$$ {\mathcal{D}}_{{{\text{NGF}}}} : = \frac{1}{2}\int\limits_{{\Omega _{{\mathcal{R}}} }} 1 - \left( {\frac{{\langle \nabla{\mathcal{R}}({\mathbf{x}}),\nabla {\mathcal{T}}(y({\mathbf{x}}))\rangle _{{{\varrho },\tau }} }}{{|| \nabla {\mathcal{R}}({\mathbf{x}})|| _{{\varrho }} \, || \nabla{\mathcal{T}}(y({\mathbf{x}}))|| _{\tau } }}} \right)^{2} \;{\text{d}}{\mathbf{x}}, $$

$\langle a,b \rangle _{\alpha ,\beta }:=\sum _{i=1}^{3}a_ib_i+\alpha \beta ,\ a,b\in {\mathbb {R}}^{3}$, $\Vert a\Vert _\varepsilon :=\sqrt{\sum _{i=1}^3 a_i^2+\varepsilon ^2}$, with separate edge parameters for reference and template image, cf. [35]. Setting ${\mathcal {D}}_{\text {NGF}}(w) := {\mathcal {D}}_{\text {NGF}}({\mathcal {R}},{\mathcal {T}};y_w)$, affine-linear image registration with NGF translates to

$$\begin{aligned} \min _w \ {\mathcal {D}}_{\text {NGF}}(w). \end{aligned}$$

(37)

For numerical optimization, the continuous formulation in (37) is discretized. For a reference image of size $n_1\;\times\;n_2\;\times\;n_3$ and an index $i,\ i=1,\ldots ,n$, let $i', j',k'\in {\mathbb {N}},1\le i'\le n_1,\ 1\le j'\le n_2,\ 1\le k'\le n_3$ such that $i = i' + j'n_1 + k'n_1n_2$. The indices of neighboring points with Neumann zero boundary conditions are given by

$$\begin{aligned} i_{-x}&= \max (i'-1,1)+ j'n_1 + k'n_1n_2, \nonumber \\ i_{+x}&= \min (i'+1,n_1)+j'n_1+ k'n_1n_2, \nonumber \\ i_{-y}&= i'+\max (j'-1,1)n_1+ k'n_1n_2, \nonumber \\ i_{+y}&= i'+\min (j'+1,n_2)n_1+ k'n_1n_2, \nonumber \\ i_{-z}&= i'+j'n_1+ \max (k'-1,1)n_1n_2, \nonumber \\ i_{+z}&= i'+j'n_1+ \min (k'+1,n_3)n_1n_2. \end{aligned}$$

(38)

We define functions

$$ g_{i} :{\mathbb{R}^{n}} \to {\mathbb{R}^{3}} , \quad T \mapsto \left( {\begin{array}{*{20}c} {\frac{1}{{2h_{1} }}( - T_{{i_{{ - x}} }} + T_{{i_{{ + x}} }} )} \\ {\frac{1}{{2h_{2} }}( - T_{{i_{{ - y}} }} + T_{{i_{{ + y}} }} )} \\ {\frac{1}{{2h_{3} }}( - T_{{i_{{ - z}} }} + T_{{i_{{ + z}} }} )} \\ \end{array} } \right) $$

and

$$\begin{aligned} s_i:&{\mathbb {R}^{3}} \rightarrow {\mathbb {R}}, \quad a \; \mapsto \; \langle g_i(R),a\rangle + \varrho \tau \end{aligned}$$

for gradient and scalar product type operations at the position i, respectively. Further setting

$$\begin{aligned} n_\varepsilon :&{\mathbb {R}^{3}} \rightarrow {\mathbb {R}}, \quad a \; \mapsto\; \sqrt{a_1^2 + a_2^2 + a_3^2 + \varepsilon ^{2}}, \end{aligned}$$

the discretized version of (37) is given by

$$\begin{aligned} \min _w \ D_{\mathrm {NGF}}(w) :=\frac{\bar{h}}{2} \displaystyle \sum _{i=1}^{n} 1 - \left( \frac{s_i(g_i(T_w))}{n_\varrho (g_i(R)) \ n_\tau (g_i(T_w))} \right) ^2 \end{aligned}$$

with $(T_w)_i = {\mathcal {T}}(y_w({{\mathbf {x}}}_i))$.

2.2.1 Matrix-based differentiation

Let y and T as in (26) and (27). We define the residual function $r:{\mathbb {R}^{n}}\rightarrow {\mathbb {R}^{n}}$ by setting the i-th component function $r_i:{\mathbb {R}^{n}}\rightarrow {\mathbb {R}}$ to

$$\begin{aligned} r_i:T \; \mapsto \; \frac{s_i(g_i(T))}{ n_\varrho ( g_i(R))\ n_\tau ( g_i(T))}. \end{aligned}$$

The reduction function $\psi :{\mathbb {R}^{n}}\rightarrow {\mathbb {R}}$ is given by

$$\begin{aligned} \psi (r)&= \frac{\bar{h}}{2} \sum _{i=1}^{n} 1 - r_i^2, \end{aligned}$$

yielding the function chain

$$\begin{aligned} D_{\text {NGF}}: {{\mathbb {R}}^{12}\xrightarrow {y}}{{\mathbb {R}}^{3n}\xrightarrow {T}}{{\mathbb {R}}^{n}\xrightarrow {r}}{{\mathbb {R}}^{n}\xrightarrow {\psi }}{\mathbb {R}}. \end{aligned}$$

The derivatives of T and y have already been computed in (31) and (32). For the reduction function $\psi $, it holds that

$$\begin{aligned} \frac{\partial \psi }{\partial r}&= -\bar{h} r^\top \in {\mathbb {R}^{1\;\times\;n}}. \end{aligned}$$

(39)

The calculation of $\frac{\partial r}{\partial T}$ is performed by differentiating the component functions $r_i,\; i=1,\ldots ,n$. The functions $r_i$ are composed of $s_i$, $g_i$ and $n_\varepsilon $ whose derivatives are given by

$$\begin{aligned} \frac{\partial s_i}{\partial a} = g_i(R)^\top \in {\mathbb {R}^{1\;\times\;3}}, \end{aligned}$$

and

$$\begin{aligned} \frac{\partial n_\varepsilon }{\partial a} = \frac{1}{n_\varepsilon (a)}a^\top \in {\mathbb {R}^{1\;\times\;3}} \end{aligned}$$

with $\frac{\partial g_i}{\partial T}\in {\mathbb {R}^{3\;\times\;n}}$. Applying the chain rule in both numerator and denominator of $r_i$ yields

$$ \frac{{\partial r_{i} }}{{\partial T}} = \left( {\begin{array}{*{20}c} \vdots \\ {\frac{1}{{2h_{3} }}\left[ {\frac{{ - g_{i} (R)_{3} }}{{n_{{\varrho }} (g_{i} (R))n_{\tau } (g_{i} (T))}} + \frac{{s_{i} (g_{i} (T))g_{i} (T)_{3} }}{{n_{{\varrho }} (g_{i} (R))n_{\tau } (g_{i} (T))^{3} }}} \right]} \\ \vdots \\ {\frac{1}{{2h_{2} }}\left[ {\frac{{ - g_{i} (R)_{2} }}{{n_{{\varrho }} (g_{i} (R))n_{\tau } (g_{i} (T))}} + \frac{{s_{i} (g_{i} (T))g_{i} (T)_{2} }}{{n_{{\varrho }} (g_{i} (R))n_{\tau } (g_{i} (T))^{3} }}} \right]} \\ \vdots \\ {\frac{1}{{2h_{1} }}\left[ {\frac{{ - g_{i} (R)_{1} }}{{n_{{\varrho }} (g_{i} (R))n_{\tau } (g_{i} (T))}} + \frac{{s_{i} (g_{i} (T))g_{i} (T)_{1} }}{{n_{{\varrho }} (g_{i} (R))n_{\tau } (g_{i} (T))^{3} }}} \right]} \\ 0 \\ {\frac{1}{{2h_{1} }}\left[ {\frac{{g_{i} (R)_{1} }}{{n_{{\varrho }} (g_{i} (R))n_{\tau } (g_{i} (T))}} - \frac{{s_{i} (g_{i} (T))g_{i} (T)_{1} }}{{n_{{\varrho }} (g_{i} (R))n_{\tau } (g_{i} (T))^{3} }}} \right]} \\ \vdots \\ {\frac{1}{{2h_{2} }}\left[ {\frac{{g_{i} (R)_{2} }}{{n_{{\varrho }} (g_{i} (R))n_{\tau } (g_{i} (T))}} - \frac{{s_{i} (g_{i} (T))g_{i} (T)_{2} }}{{n_{{\varrho }} (g_{i} (R))n_{\tau } (g_{i} (T))^{3} }}} \right]} \\ \vdots \\ {\frac{1}{{2h_{3} }}\left[ {\frac{{g_{i} (R)_{3} }}{{n_{{\varrho }} (g_{i} (R))n_{\tau } (g_{i} (T))}} - \frac{{s_{i} (g_{i} (T))g_{i} (T)_{3} }}{{n_{{\varrho }} (g_{i} (R))n_{\tau } (g_{i} (T))^{3} }}} \right]} \\ \vdots \\ \end{array} } \right)^{\top} $$

with the vector entries at positions $i_{-z},i_{-y},i_{-x},i_{+x},i_{+y},$ and $i_{+z}$ (in that order) as defined in (38). Note that these positions may coincide, in which case the values are added.

The Gauss–Newton approximation $H_{\text{NGF}}$ to the Hessian is given by

$$\begin{aligned} H_{\text {NGF}}(w) := {\mathrm {d}} r^\top {\mathrm {d}}_2\psi {\mathrm {d}}r \approx \nabla ^2 D_{\text {NGF}}(w) \end{aligned}$$

with ${\mathrm {d}}r$ defined as in (33) and ${\mathrm {d}}_2\psi =-{\bar{h}}$.

2.2.2 Matrix-free derivative calculation

Setting $r_i:=\frac{s_i(g_i(T))}{ n_\varrho ( g_i(R))\ n_\tau ( g_i(T))}$ and ${\mathrm {d}}r_i:=\frac{\partial r_i}{\partial T}\frac{\partial T}{\partial y}\frac{\partial y}{\partial w}$, it holds with (39) that

$$\begin{aligned} \nabla D_{\text {NGF}}(w) = -\bar{h} \sum _{i=1}^{n}r_i{\mathrm {d}}r_i. \end{aligned}$$

(40)

As $r_i\in {\mathbb {R}}$ are scalars, it suffices to derive a matrix-free description of the vectors ${\mathrm {d}}r_i\in {\mathbb {R}}^{12}$ to achieve a fully matrix-free formulation of the objective function gradient. Let $1\le i \le n$ and define indices $i_{-z},i_{-y},i_{-x},i_{+x},i_{+y},i_{+z}$ as in (38). With the definition

$$ \begin{aligned} d_{i}^{{j,k}}& \,\,{:=}\,\,\,\, \partial r_{i} [i_{{ - z}} ]\partial _{j} {\mathcal{T}}({\mathbf{y}}_{{i_{{ - z}} }} ){\mathbf{X}}_{{i_{{ - z}} ,k}} \\ &\,\, + \,\partial r_{i} [i_{{ - y}} ]\partial _{j} {\mathcal{T}}({\mathbf{y}}_{{i_{{ - y}} }} ){\mathbf{X}}_{{i_{{ - y}} ,k}} \\ & \,\,+ \,\partial r_{i} [i_{{ - x}} ]\partial _{j} {\mathcal{T}}({\mathbf{y}}_{{i_{{ - x}} }} ){\mathbf{X}}_{{i_{{ - x}} ,k}} \\ & \,\, + \, \partial r_{i} [i_{{ + x}} ]\partial _{j} {\mathcal{T}}({\mathbf{y}}_{{i_{{ + x}} }} ){\mathbf{X}}_{{i_{{ + x}} ,k}} \; \\ & \,\, + \, \partial r_{i} [i_{{ + y}} ]\partial _{j} {\mathcal{T}}({\mathbf{y}}_{{i_{{ + y}} }} ){\mathbf{X}}_{{i_{{ + y}} ,k}} \; \\ & \,\, + \, \partial r_{i} [i_{{ + z}} ]\partial _{j} {\mathcal{T}}({\mathbf{y}}_{{i_{{ + z}} }} ){\mathbf{X}}_{{i_{{ + z}} ,k}} \\ \end{aligned} $$

for $i=1,\ldots ,n$, $j=1,2,3$, $k=1,\dots ,4$ and

$$ d_{i}^{j} : = \left( {\begin{array}{*{20}c} {d_{i}^{{j,1}} ,d_{i}^{{j,2}} ,d_{i}^{{j,3}} ,d_{i}^{{j,4}} } \\ \end{array} } \right), $$

it follows that

$$ {\text{d}}r_{i} = \left( {\begin{array}{*{20}c} {d_{i}^{1} } & {d_{i}^{2} } & {d_{i}^{3} } \\ \end{array} } \right)^{{ \top }} \in {\mathbb{R}^{{12}}} , $$

(41)

which according to (40) yields

$$ \nabla D_{{{\text{NGF}}}} (w) = - \bar{h}\sum\limits_{{i = 1}}^{n} {\frac{{s_{i} (g_{i} (T))}}{{n_{{\varrho }} (g_{i} (R))\;n_{\tau } (g_{i} (T))}}} \left( {\begin{array}{*{20}c} {{\text{d}}r_{i} [1]} \\ {{\text{d}}r_{i} [2]} \\ \vdots \\ {{\text{d}}r_{i} [12]} \\ \end{array} } \right)^{{ \top }} , $$

(42)

completing the gradient calculation for the three-dimensional case. Since

$$ H_{{{\text{NGF}}}} (w) = \left( {\frac{{\partial r}}{{\partial T}}\frac{{\partial T}}{{\partial y}}\frac{{\partial y}}{{\partial w}}} \right)^{{ \top }} {\text{d}}_{2} \psi \left( {\frac{{\partial r}}{{\partial T}}\frac{{\partial T}}{{\partial y}}\frac{{\partial y}}{{\partial w}}} \right) = \left( {\begin{array}{*{20}c} {{\text{d}}r_{1}^{{ \top }} } & \ldots & {{\text{d}}r_{n}^{{ \top }} } \\ \end{array} } \right){\text{d}}_{2} \psi \left( {\begin{array}{*{20}c} {{\text{d}}r_{1} } \\ \vdots \\ {{\text{d}}r_{n} } \\ \end{array} } \right), $$

the calculation of the Hessian approximation can directly be performed with the help of the matrix-free formulation of ${\mathrm {d}}r_i$ from (41). By defining the matrices $l_k\in {\mathbb {R}^{12}\;\times\;12}$ as

$$ l_{k} : = \left( {\begin{array}{*{20}c} {} \\ {{\text{d}}r_{k} [i]\cdot{\text{d}}r_{k} [j]} \\ {} \\ \end{array} } \right)_{{1 \le i,j \le 12}} $$

(43)

analog to the case of SSD, the matrix-free formulation for the Gauss–Newton approximation to the Hessian is given by

$$\begin{aligned} H_{\mathrm {NGF}}(w) = \bar{h} \displaystyle \sum _{k=1}^{n} l_k. \end{aligned}$$

This finalizes the derivation of matrix-free calculation rules for objective function gradient and Gauss–Newton approximation to the Hessian also for the Normalized Gradient Fields distance measure with three-dimensional images.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rühaak, J., König, L., Tramnitzke, F. et al. A matrix-free approach to efficient affine-linear image registration on CPU and GPU. J Real-Time Image Proc 13, 205–225 (2017). https://doi.org/10.1007/s11554-016-0564-4

Download citation

Received: 11 March 2015
Accepted: 14 January 2016
Published: 05 April 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s11554-016-0564-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A matrix-free approach to efficient affine-linear image registration on CPU and GPU

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fully-Deformable 3D Image Registration in Two Seconds

CLAIRE: Scalable GPU-Accelerated Algorithms for Diffeomorphic Image Registration in 3D

GPU Accelerated High Accuracy Digital Volume Correlation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix

Extension to the three-dimensional case

2.1 Sum of squared differences (SSD)

2.1.1 Matrix-based differentiation

2.1.2 Matrix-free derivative calculation

2.2 Normalized gradient fields (NGF)

2.2.1 Matrix-based differentiation

2.2.2 Matrix-free derivative calculation

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

A matrix-free approach to efficient affine-linear image registration on CPU and GPU

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fully-Deformable 3D Image Registration in Two Seconds

CLAIRE: Scalable GPU-Accelerated Algorithms for Diffeomorphic Image Registration in 3D

GPU Accelerated High Accuracy Digital Volume Correlation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix

Extension to the three-dimensional case

2.1 Sum of squared differences (SSD)

2.1.1 Matrix-based differentiation

2.1.2 Matrix-free derivative calculation

2.2 Normalized gradient fields (NGF)

2.2.1 Matrix-based differentiation

2.2.2 Matrix-free derivative calculation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.