Abstract
Learning-based visual localization has become prospective over the past decades. Since ground truth pose labels are difficult to obtain, recent methods try to learn pose estimation networks using pixel-perfect synthetic data. However, this also introduces the problem of domain bias. In this paper, we first build a Tuebingen Buildings dataset of RGB images collected by a drone in urban scenes and create a 3D model for each scene. A large number of synthetic images are generated based on these 3D models. We take advantage of image style transfer and cycle-consistent adversarial training to predict the relative camera poses of image pairs based on training over synthetic environment data. We propose a relative camera pose estimation approach to solve the continuous localization problem for autonomous navigation of unmanned systems. Unlike those existing learning-based camera pose estimation methods that train and test in a single scene, our approach successfully estimates the relative camera poses of multiple city locations with a single trained model. We use the Tuebingen Buildings and the Cambridge Landmarks datasets to evaluate the performance of our approach in a single scene and across-scenes. For each dataset, we compare the performance between real images and synthetic images trained models. We also test our model in the indoor dataset 7Scenes to demonstrate its generalization ability.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: Dense tracking and mapping in real-time. In: 2011 International Conference on Computer Vision, pp 2320–2327, Barcelona (2011)
Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: Large-scale direct monocular SLAM. In: Fleet D., Pajdla T., Schiele B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014, pp 834–849 (2014)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Mur-Artal, R., Montiel, J.M.M., Tardós, J.D.: ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)
Tian, Y., Chen, C., Shah, M.: Cross-view image matching for Geo-localization in urban environments. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1998–2006, Honolulu, HI (2017)
Lin, T., Yin, Cui, Belongie, S., Hays, J.: Learning deep representations for ground-to-aerial geolocalization. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5007–5015, Boston, MA (2015)
En, S., Lechervy, A., Jurie, F.: Rpnet: An end-to-end network for relative camera pose estimation. In: Leal-Taixe, L., Roth, S (eds.) Computer Vision – ECCV 2019 Workshops, pp 738–745 (2019)
Deng, J., Dong, W., Socher, R., Li, L., Li, Kai, Li, F.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 248–255 (2009)
Arandjelovi, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: Netvlad: Cnn architecture for weakly supervised place recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1437–1451 (2018)
Agarwal, S., Snavely, N., Simon, I., Sietz, S.M., Szeliski, R.: Building rome in a day. In: Twelfth IEEE International Conference on Computer Vision (ICCV), pp 72–79 (2009)
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: A convolutional network for real-time 6-DOF camera Re. In: 2015 IEEE International Conference on Computer Vision, ICCV, vol. 284, pp 1980–1983 (2015)
Naseer, T., Burgard, W.: Deep regression for monocular camera-based 6-DoF global localization in outdoor environments. IEEE Int. Conf. Intell. Robot. Syst. 2017, 1525–1530 (2017)
Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using LSTMs for structured feature correlation. Proc. IEEE Int. Conf. Comput. Vision 2017, 627–637 (2017)
Le, T.A., Baydin, A.G., Zinkov, R., Wood, F.: Using synthetic data to train neural networks is model-based reasoning. arXiv:1703.00868 (2017)
Rajpura, P.S., Hegde, R.S., Bojinov, H.: Object detection using deep cnns trained on synthetic images. arXiv:1706.06782 (2017)
Shah, S., Dey, D., Lovett, C., Kapoor, A.: Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In: Field Serv. Robotics, pp 621–635 (2017)
Li, Y., Wang, N., Liu, J., Hou, X.: Demystifying neural style transfer. arXiv:1701.01036 (2017)
Long, M., Ding, G., Wang, J., Sun, J., Guo, Y., Yu, P.S.: Transfer sparse coding for robust image representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 407–414 (2013)
Atapour-Abarghouei, A., Breckon, T.P.: Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 18, p 1 (2018)
Chen, Y., Li, W., Sakaridis, C., Dai, D., Gool, L.V.: Domain adaptive faster r-cnn for object detection in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3339–3348 (2018)
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proc. Int. Conf. Computer Vision, pp 2242–2251 (2017)
Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: Computer Vision and Pattern Recognition (CVPR), pp 2930–2937 (2013)
Majdik, A., Verda, D., Albers-Schoenberg, Y., Scaramuzza, D.: Air-ground matching: Appearance-based gps-denied urban localization of micro aerial vehicles. J. Field Robot. 32, 1015–1039 (2015)
Anguelov, D., Dulong, C., Filip, D., Frueh, C., Lafon, S., Lyon, R., Ogale, A., Vincent, L., Weaver, J.: Google street view: Capturing the world at street level. Computer 43(6), 32–38 (2010)
Workman, S., Souvenir, R., Jacobs, N.: Wide-area image geolocalization with aerial reference imagery. Proc. IEEE Int. Conf. Comput. Vision, Inter 2015(2), 3961–3969 (2015)
Viswanathan, A., Pires, B.R., Huber, D.: Vision based robot localization by ground to satellite matching in gps-denied situations. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 192–198 (2014)
Suenderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., Milford, M.: Place recognition with ConvNet landmarks: Viewpoint-robust, Condition-robust, Training-free, Robotics: Science and Systems XI (2015)
Vo, N.N., Hays, J.: Localizing and orienting street views using overhead imagery. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNCS, vol. 9905, pp 494–509 (2016)
Hu, S., Feng, M., Nguyen, R.M.H., Lee, G.H.: CVM-Net: Cross-view matching network for image-based ground-to-aerial Geo-localization. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp 7258–7267 (2018)
Klein, G., Murray, D.: Parallel tracking and mapping for small ar workspaces. In: 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, pp 225–234 (2007)
Kaess, M., Johannsson, H., Roberts, R., Ila, V., Leonard, J., Dellaert, F.: Isam2: Incremental smoothing and mapping with fluid relinearization and incremental variable reordering. In: 2011 IEEE International Conference on Robotics and Automation, pp 3281–3288 (2011)
Pascoe, G., Maddern, W., Newman, P.: Direct visual localisation and calibration for road vehicles in changing city environments. In: Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop, ser. ICCVW ’15, pp 98–105 (2015)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. arXiv:1409.4842 (2014)
Balntas, V., Li, S., Prisacariu, V.: Relocnet: Continuous metric learning relocalisation using neural nets. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNCS, vol. 11218, pp 782–799 (2018)
Melekhov, I., Ylioinas, J., Kannala, J., Rahtu, E.: Relative camera pose estimation using convolutional neural networks. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNCS, vol. 10617, pp 675–687 (2017)
Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of CNN-based absolute camera pose regression. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition CVPR (2019)
Sattler, T., Torii, A., Sivic, J., Pollefeys, M., Taira, H., Okutomi, M., Pajdla, T.: Are large-scale 3D models really necessary for accurate visual localization?. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp 1637–1646 (2017)
Mohanty, V., Agrawal, S., Datta, S., Ghosh, A., Sharma, V.D., Chakravarty, D.: Deepvo: A deep learning approach for monocular visual odometry. arXiv:1611.06069 (2016)
Clark, R., Wang, S., Wen, H., Markham, A., Trigoni, N.: VINet: Visual-inertial odometry as a sequence-to-sequence learning problem. In: AAAI Conference on Artificial Intelligence (2017)
Qin, T., Cao, S., Pan, J., Shen, S.: A general optimization-based framework for global pose estimation with multiple sensors. arXiv:1901.03642 (2019)
Candela, J.Q., Sugiyama, M., Schwaighofer, A., Lawrence, N.: Covariate shift by kernel mean matching. Dataset Shift in Machine Learning pp. 131–160 (2009)
Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: Computer Vision and Pattern Recognition (CVPR), pp 1521–1528 (2011)
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2096–2030 (2016)
Gong, M., Zhang, K., Liu, T., Tao, D., Glymour, C., Scholkopf, B.: Domain adaptation with conditional transferable components. In: International Conference on Machine Learning, pp 2839–2848 (2016)
Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting visual category models to new domains. In: European Conference on Computer Vision, pp 213–226 (2010)
Sun, B., Saenko, K.: Deep coral: Correlation alignment for deep domain adaptation. In: European Conference on Computer Vision, pp 443–450 (2016)
Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain confusion: Maximizing for domain invariance. arXiv:1412.3474 (2014)
Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. arXiv:1702.05464 (2017)
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS (2014)
Sangkloy, P., Lu, J., Fang, C., Yu, F., Hays, J.: Scribbler: Controlling deep image synthesis with sketch and color. In: CVPR (2017)
Karacan, L., Akata, Z., Erdem, A., Erdem, E.: Learning to generate images of outdoor scenes from attributes and semantic layouts. arXiv:1612.00215 (2016)
Godard, C., Aodha, O.M., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)
Johnson, J., Alahi, A., Li, F. -F.: Perceptual losses for real-time style transfer and super-resolution. In: Proc. Euro. Conf. Computer Vision, pp 694–711 (2016)
Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp 6555–6564 (2017)
Chopra, S., Hadsell, R., Lecun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Proc. of Computer Vision and Pattern Recognition Conference, CVPR, pp 539–546. IEEE Press (2005)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. of Computer Vision and Pattern Recognition Conference, CVPR. arXiv:1512.03385, pp 770–778. IEEE Press (2016)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: 2010 International Conference on Machine Learning ICML (2010)
Strecha, C.: Pix4dmapper: The leading photogrammetry software for professional drone mapping. Website, Accessed. https://www.pix4d.com/product/pix4dmapper-photogrammetry-software (2019)
Bentley Systems, Inc.: ContextCapture: Create 3D models from simple photographs and/or point clouds. https://www.bentley.com/en/products/brands/contextcapture
Wu, C.: VisualSFM: A visual structure from motion system. http://ccwu.me/vsfm (2011)
Blender Foundation: Blender: Open source 3D creation. Free to use for any purpose forever. https://www.blender.org/
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980(2015)
Yang, C., Liu, Y., Zell, A.: RCPNet: Deep-learning based relative camera pose estimation for UAVs. In: 2020 International Conference on Unmanned Aircraft Systems (ICUAS), pp 1085–1092, Athens, Greece (2020)
Funding
Open Access funding enabled and organized by Projekt DEAL. This research was supported by the German Federal Ministry of Education and Research (BMBF) project ‘Training Center Machine Learning, Tuebingen’ with grant number 01|S17054.
Author information
Authors and Affiliations
Contributions
All authors have contributed to the concept and design of the research. Chenhao Yang provided the research ideas and the theoretical analysis, collected the dataset, and wrote the code and the paper. Yuyi Liu and Andreas Zell strictly revised and edited the previous manuscript. The final manuscript was read and approved by all authors.
Corresponding author
Ethics declarations
Competing interests
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Availability of data and materials
The original dataset is available under email request and could only be used for the non-commercial application.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yang, C., Liu, Y. & Zell, A. Relative Camera Pose Estimation using Synthetic Data with Domain Adaptation via Cycle-Consistent Adversarial Networks. J Intell Robot Syst 102, 79 (2021). https://doi.org/10.1007/s10846-021-01439-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10846-021-01439-6