Abstract
We present a complete approach to highly efficient image registration for embedded systems, covering all steps from theory to practice. An optimization-based image registration algorithm using a least-squares data term is implemented on an embedded distributed multicore digital signal processor (DSP) architecture. All relevant parts are optimized, ranging from mathematics, algorithmics, and data transfer to hardware architecture and electronic components. The optimization for the rigid alignment of two-dimensional images is performed in a multilevel Gauss–Newton minimization framework. We propose a reformulation of the necessary derivative computations, which eliminates all sparse matrix operations and allows for parallel, memory-efficient computation. The pixelwise parallellism forms an ideal starting point for our implementation on a multicore, multichip DSP architecture. The reduction of data transfer to the particular DSP chips is key for an efficient calculation. By determining worst cases for the subimages needed on each DSP, we can substantially reduce data transfer and memory requirements. This is accompanied by a sophisticated padding mechanism that eliminates pipeline hazards and speeds up the generation of the multilevel pyramid. Finally, we present a reference hardware architecture consisting of four TI C6678 DSPs with eight cores each. We show that it is possible to register high-resolution images within milliseconds on an embedded device. In our example, we register two images with 4096 × 4096 pixels within 93 ms, while off-loading the CPU by a factor of 20 and requiring 3.12 times less electrical energy.

















Similar content being viewed by others
Notes
The Jacobian is a derivative to the transformation parameters \(w\) and should not be confused with the image gradient obtained, e.g., by the Sobel operator. The same applies to the (approximated) Hessian.
e.g., 4096 × 4096 px on 1 DSP: Ethernet measurement 1058 ms, PCIe prediction 211 ms, PCIe measurement 202 ms.
References
Advantech (2013) DSPC-8681—half-length PCI express card with 4 TMS320C6678 DSPs. http://downloadt.advantech.com/ProductFile/PIS/DSPC-8681/Product%20-%20Datasheet/DSPC-8681_DS(03.31.14)20140519134025.pdf
Alavi, A., et al.: Is PET-CT the only option? Eur. J. Nucl. Med. Mol. Imag. 34, 819–821 (2007)
Brown, L.G.: A survey of image registration techniques. ACM Comput. Surv. 24(4), 325–376 (1992)
Capek, K.: Optimisation strategies applied to global similarity based image registration methods. In: International Conferences in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG), vol 2, 369–374 (1999)
Castro-Pareja, C.R., Jagadeesh, J.M., Shekhar, R.: FAIR: a hardware architecture for real-time 3-D image registration. IEEE Trans. Inf. Technol. Biomed 7(4), 426–434 (2003)
Dennis, J.J.E., Schnabel, R.B.: Numerical methods for unconstrained optimization and nonlinear equations. SIAM (1983)
Evans, J.R., Arslan, T.: The implementation of an evolvable hardware system for real time image registration on a system-on-chip platform. In: Evolvable Hardware, 2002. Proceedings. NASA/DoD Conference on, IEEE, 142–146 (2002)
Eyre, J., Bier, J.: The evolution of DSP processors. IEEE Signal Process. Mag 17(2), 43–51 (2000)
Fischer, B., Modersitzki, J.: Ill-posed medicine—an introduction to image registration. Inverse Problems 24(3):034,008 (2008)
Geronimo, D., Lopez, A.M., Sappa, A.D., Graf, T.: Survey of pedestrian detection for advanced driver assistance systems. IEEE Trans. Pattern Anal. Mach. Intell 32(7), 1239–1258 (2010)
Gigengack, F., Ruthotto, L., Burger, M., Wolters, C.H., Jiang, X., Schafers, K.P.: Motion correction in dual gated cardiac PET using mass-preserving image registration. IEEE Trans. Med. Imag31(3), 698–712 (2012)
Gonzalez, R.C., Woods, R.E.: Digital Imag. Process., vol 2. Addison-Wesley (1992)
Haber, E., Modersitzki, J.: A multilevel method for image registration. SIAM J. Sci. Compu 27(5), 1594–1607 (2006)
Haber, E., Modersitzki, J.: Intensity gradient based registration and fusion of multi-modal images. Methods Inf. Med 46, 292–9 (2007)
Hossny, M., Nahavandi, S., Creighton, D., Bhatti, A.: Towards autonomous image fusion. In: Control Automation Robotics and Vision (ICARCV), 2010 11th International Conference on, IEEE, 1748–1754 (2010)
Intel Corporation Desktop 3rd generation Intel Core processor family, desktop Intel Pentium processor family, and desktop Intel Celeron processor family. http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/3rd-gen-core-desktop-vol-1-datasheet.pdf (2013)
Irani, M., Peleg, S.: Improving resolution by image registration. CVGIP: Graph. Models Imag. process53(3), 231–239 (1991)
Kabus, S., Lorenz, C.: Fast elastic image registration. Grand Challenges in Medical Image Analysis, 81–89 (2010)
Karam, L.J., AlKamal, I., Gatherer, A., Frantz, G.A., Anderson, D.V., Evans, B.L.: Trends in multicore DSP platforms. IEEE Signal Process. Mag 26(6), 38–49 (2009)
Kessler, C.W.: Compiling for VLIW DSPs. In: Handbook of Signal Processing Systems, Springer, 1177–1214 (2013)
König, L., Rühaak, J.: A fast and accurate parallel algorithm for non-linear image registration using normalized gradient fields. In: Biomedical Imaging (ISBI), 2014 IEEE 11th International Symposium on, IEEE, 580–583 (2014)
Kontron, A.G.: Infotainment POS/POI. http://www.kontron.com/resources/collateral/industry_brochures/pos_poi_2010_global_single.pdf(2009)
Kontron, A.G.: Embedded computer solutions for advanced automation control. http://www.kontron.com/resources/collateral/industry_brochures/folder_automation_2013.pdf (2013)
Leon, F.P., Kammel, S.: Image fusion techniques for robust inspection of specular surfaces. In: AeroSense 2003, International Society for Optics and Photonics, 77–86 (2003)
Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., Suetens, P.: Multimodality image registration by maximization of mutual information. IEEE Trans. Med. Imag 16(2), 187–198 (1997)
Mahapatra, N.R., Venkatrao, B.: The processor-memory bottleneck: problems and solutions. Crossroads 5(3es):2 (1999)
Mattes, D., Haynor, D.R., Vesselle, H., Lewellen, T.K., Eubank, W.: PET-CT image registration in the chest using free-form deformations. IEEE Trans. Med. Imag 22(1), 120–128 (2003)
Modersitzki, J.: Numerical methods for image registration. Oxford University Press (2004)
Modersitzki, J.: FAIR—Flexible algorithms for image registration. SIAM, Philadelphia (2009)
Mueller, B., Olesch, J., Lotz, J., Barendt, S., Sedlaczek, O., Lahrmann, B., Grabe, N., Bestvater, F., Kauczor, U., Schnabel, P., Hoffmann, H., Fischer, B., Schirmacher, P., Warth, A., Breuhahn, K.: 3D reconstruction of lung adenocarcinomas—one module for the development of mathematical multiscale models of lung cancer. Der Pathologe 34(1), 140 (2013)
Nocedal, J., Wright, S.: Numerical optimization, 2nd edn. Springer, Berlin, Heidelberg (2006)
Reed, J.M., Hutchinson, S.: Image fusion and subpixel parameter estimation for automated optical inspection of electronic components. IEEE Trans. Indus. Electr 43(3), 346–354 (1996)
Remagnino, P., Jones, G.: Automated registration of surveillance data for multi-camera fusion. In: Information Fusion, 2002. Proceedings of the Fifth International Conference on, IEEE, vol 2, 1190–1197 (2002)
Rühaak, J., Heldmann, S., Kipshagen, T., Fischer, B.: Highly accurate fast lung CT registration. In: SPIE Medical Imaging, International Society for Optics and Photonics (2013)
Rühaak, J., König, L., Hallmann, M., Papenberg, N., Heldmann, S., Schumacher, H., Fischer, B.: A fully parallel algorithm for multimodal image registration using normalized gradient fields. In: Biomedical Imaging (ISBI), 2013 IEEE 10th International Symposium on, 572–575 (2013)
Saban, N.: Multicore DSP vs GPUs. http://www.sagivtech.com/contentManagment/uploadedFiles/fileGallery/Multi_core_DSPs_vs_GPUs_TI_for_distribution.pdf (2011)
Schmitt, O., Modersitzki, J., Heldmann, S., Wirtz, S., Fischer, B.: Image registration of sectioned brains. Intern. J. Comp. Vision 73(1), 5–39 (2007)
Sen, M., Hemaraj, Y., Plishker, W., Shekhar, R., Bhattacharyya, S.S.: Model-based mapping of reconfigurable image registration on FPGA platforms. J. Real-time Imag. Process 3(3), 149–162 (2008)
Stotzer, E., Jayaraj, A., Ali, M., Friedmann, A., Mitra, G., Rendell, A., Lintault, I.: OpenMP on the low-power TI keystone II ARM/DSP system-on-chip. In: Rendell, A., Chapman, B., Müller, M. (eds.) OpenMP in the Era of Low Power Devices and Accelerators. Lecture Notes in Computer Science, vol 8122, 114–127. Springer, Berlin Heidelberg (2013)
Texas Instruments: AM335x sitara processors. http://www.ti.com/lit/ds/symlink/am3359.pdf (2013)
Texas Instruments: AM335x starter kit. http://www.ti.com/tool/tmdssk3358 (2014a)
Texas Instruments: C6678 power consumption model (rev. d). http://www.ti.com/litv/zip/sprm545d (2014b)
Texas Instruments: SYS/BIOS (TI-RTOS kernel) v6.40. http://www.ti.com/lit/ug/spruex3n/spruex3n.pdf (2014c)
Texas Instruments: TMS320C6678 - multicore fixed and floating-point digital signal processor. http://www.ti.com/lit/ug/spruex3n/spruex3n.pdf (2014d)
Texas Instruments : TMS320C6678 evaluation modules. www.ti.com/tool/tmdsevm6678 (2014e)
Tramnitzke, F., Rühaak, J., König, L., Modersitzki, J., Köstler, H.: GPU Based Affine Linear Image Registration using Normalized Gradient Fields. In: Proc. Seventh International Workshop on High Performance Computing for Biomedical Image Analysis (HPC-MICCAI), Boston, MA, USA (2014)
Vercauteren, T., Pennec, X., Perchant, A., Ayache, N.: Diffeomorphic demons: efficient non-parametric image registration. NeuroImage 45(1), S61–S72 (2009)
Viola, P., Wells III, W.M.: Alignment by maximization of mutual information. Intern. J. Comp. Vision 24(2), 137–154 (1997)
Wu, H., Kim, Y.: Fast wavelet-based multiresolution image registration on a multiprocessing digital signal processor. Intern. J. Imag. Syst. Technol. 9(1), 29–37 (1998)
Zitová, B., Flusser, J.: Image registration methods: a survey. Imag. Vision Compu. 21(11), 977–1000 (2003)
Acknowledgments
The software created during this work is open source and can be accessed at http://www.github.com/RoelofBerg/fimreg.
In deep sorrow, we commemorate Prof. Dr. rer. nat. Bernd Fischer who passed away during the creation of this paper. Our thoughts are with his family.
Author information
Authors and Affiliations
Corresponding author
Additional information
B. Fischer deceased during the creation of this paper.
Rights and permissions
About this article
Cite this article
Berg, R., König, L., Rühaak, J. et al. Highly efficient image registration for embedded systems using a distributed multicore DSP architecture. J Real-Time Image Proc 14, 341–361 (2018). https://doi.org/10.1007/s11554-014-0457-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-014-0457-3