Relative Camera Pose Estimation using Synthetic Data with Domain Adaptation via Cycle-Consistent Adversarial Networks

Yang, Chenhao; Liu, Yuyi; Zell, Andreas

doi:10.1007/s10846-021-01439-6

Relative Camera Pose Estimation using Synthetic Data with Domain Adaptation via Cycle-Consistent Adversarial Networks

Regular Paper
Open access
Published: 08 July 2021

Volume 102, article number 79, (2021)
Cite this article

Download PDF

You have full access to this open access article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Relative Camera Pose Estimation using Synthetic Data with Domain Adaptation via Cycle-Consistent Adversarial Networks

Download PDF

1358 Accesses
3 Citations
3 Altmetric
Explore all metrics

Abstract

Learning-based visual localization has become prospective over the past decades. Since ground truth pose labels are difficult to obtain, recent methods try to learn pose estimation networks using pixel-perfect synthetic data. However, this also introduces the problem of domain bias. In this paper, we first build a Tuebingen Buildings dataset of RGB images collected by a drone in urban scenes and create a 3D model for each scene. A large number of synthetic images are generated based on these 3D models. We take advantage of image style transfer and cycle-consistent adversarial training to predict the relative camera poses of image pairs based on training over synthetic environment data. We propose a relative camera pose estimation approach to solve the continuous localization problem for autonomous navigation of unmanned systems. Unlike those existing learning-based camera pose estimation methods that train and test in a single scene, our approach successfully estimates the relative camera poses of multiple city locations with a single trained model. We use the Tuebingen Buildings and the Cambridge Landmarks datasets to evaluate the performance of our approach in a single scene and across-scenes. For each dataset, we compare the performance between real images and synthetic images trained models. We also test our model in the indoor dataset 7Scenes to demonstrate its generalization ability.

Article PDF

Adversarial Transfer of Pose Estimation Regression

Learning 3D-Aware GANs from Unposed Images with Template Feature Field

RelMobNet: End-to-End Relative Camera Pose Estimation Using a Robust Two-Stage Training

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: Dense tracking and mapping in real-time. In: 2011 International Conference on Computer Vision, pp 2320–2327, Barcelona (2011)
Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: Large-scale direct monocular SLAM. In: Fleet D., Pajdla T., Schiele B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014, pp 834–849 (2014)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Article Google Scholar
Mur-Artal, R., Montiel, J.M.M., Tardós, J.D.: ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)
Article Google Scholar
Tian, Y., Chen, C., Shah, M.: Cross-view image matching for Geo-localization in urban environments. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1998–2006, Honolulu, HI (2017)
Lin, T., Yin, Cui, Belongie, S., Hays, J.: Learning deep representations for ground-to-aerial geolocalization. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5007–5015, Boston, MA (2015)
En, S., Lechervy, A., Jurie, F.: Rpnet: An end-to-end network for relative camera pose estimation. In: Leal-Taixe, L., Roth, S (eds.) Computer Vision – ECCV 2019 Workshops, pp 738–745 (2019)
Deng, J., Dong, W., Socher, R., Li, L., Li, Kai, Li, F.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 248–255 (2009)
Arandjelovi, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: Netvlad: Cnn architecture for weakly supervised place recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1437–1451 (2018)
Article Google Scholar
Agarwal, S., Snavely, N., Simon, I., Sietz, S.M., Szeliski, R.: Building rome in a day. In: Twelfth IEEE International Conference on Computer Vision (ICCV), pp 72–79 (2009)
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: A convolutional network for real-time 6-DOF camera Re. In: 2015 IEEE International Conference on Computer Vision, ICCV, vol. 284, pp 1980–1983 (2015)
Naseer, T., Burgard, W.: Deep regression for monocular camera-based 6-DoF global localization in outdoor environments. IEEE Int. Conf. Intell. Robot. Syst. 2017, 1525–1530 (2017)
Google Scholar
Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using LSTMs for structured feature correlation. Proc. IEEE Int. Conf. Comput. Vision 2017, 627–637 (2017)
Google Scholar
Le, T.A., Baydin, A.G., Zinkov, R., Wood, F.: Using synthetic data to train neural networks is model-based reasoning. arXiv:1703.00868 (2017)
Rajpura, P.S., Hegde, R.S., Bojinov, H.: Object detection using deep cnns trained on synthetic images. arXiv:1706.06782 (2017)
Shah, S., Dey, D., Lovett, C., Kapoor, A.: Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In: Field Serv. Robotics, pp 621–635 (2017)
Li, Y., Wang, N., Liu, J., Hou, X.: Demystifying neural style transfer. arXiv:1701.01036 (2017)
Long, M., Ding, G., Wang, J., Sun, J., Guo, Y., Yu, P.S.: Transfer sparse coding for robust image representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 407–414 (2013)
Atapour-Abarghouei, A., Breckon, T.P.: Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 18, p 1 (2018)
Chen, Y., Li, W., Sakaridis, C., Dai, D., Gool, L.V.: Domain adaptive faster r-cnn for object detection in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3339–3348 (2018)
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proc. Int. Conf. Computer Vision, pp 2242–2251 (2017)
Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: Computer Vision and Pattern Recognition (CVPR), pp 2930–2937 (2013)
Majdik, A., Verda, D., Albers-Schoenberg, Y., Scaramuzza, D.: Air-ground matching: Appearance-based gps-denied urban localization of micro aerial vehicles. J. Field Robot. 32, 1015–1039 (2015)
Article Google Scholar
Anguelov, D., Dulong, C., Filip, D., Frueh, C., Lafon, S., Lyon, R., Ogale, A., Vincent, L., Weaver, J.: Google street view: Capturing the world at street level. Computer 43(6), 32–38 (2010)
Article Google Scholar
Workman, S., Souvenir, R., Jacobs, N.: Wide-area image geolocalization with aerial reference imagery. Proc. IEEE Int. Conf. Comput. Vision, Inter 2015(2), 3961–3969 (2015)
Google Scholar
Viswanathan, A., Pires, B.R., Huber, D.: Vision based robot localization by ground to satellite matching in gps-denied situations. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 192–198 (2014)
Suenderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., Milford, M.: Place recognition with ConvNet landmarks: Viewpoint-robust, Condition-robust, Training-free, Robotics: Science and Systems XI (2015)
Vo, N.N., Hays, J.: Localizing and orienting street views using overhead imagery. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNCS, vol. 9905, pp 494–509 (2016)
Hu, S., Feng, M., Nguyen, R.M.H., Lee, G.H.: CVM-Net: Cross-view matching network for image-based ground-to-aerial Geo-localization. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp 7258–7267 (2018)
Klein, G., Murray, D.: Parallel tracking and mapping for small ar workspaces. In: 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, pp 225–234 (2007)
Kaess, M., Johannsson, H., Roberts, R., Ila, V., Leonard, J., Dellaert, F.: Isam2: Incremental smoothing and mapping with fluid relinearization and incremental variable reordering. In: 2011 IEEE International Conference on Robotics and Automation, pp 3281–3288 (2011)
Pascoe, G., Maddern, W., Newman, P.: Direct visual localisation and calibration for road vehicles in changing city environments. In: Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop, ser. ICCVW ’15, pp 98–105 (2015)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. arXiv:1409.4842 (2014)
Balntas, V., Li, S., Prisacariu, V.: Relocnet: Continuous metric learning relocalisation using neural nets. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNCS, vol. 11218, pp 782–799 (2018)
Melekhov, I., Ylioinas, J., Kannala, J., Rahtu, E.: Relative camera pose estimation using convolutional neural networks. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNCS, vol. 10617, pp 675–687 (2017)
Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of CNN-based absolute camera pose regression. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition CVPR (2019)
Sattler, T., Torii, A., Sivic, J., Pollefeys, M., Taira, H., Okutomi, M., Pajdla, T.: Are large-scale 3D models really necessary for accurate visual localization?. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp 1637–1646 (2017)
Mohanty, V., Agrawal, S., Datta, S., Ghosh, A., Sharma, V.D., Chakravarty, D.: Deepvo: A deep learning approach for monocular visual odometry. arXiv:1611.06069 (2016)
Clark, R., Wang, S., Wen, H., Markham, A., Trigoni, N.: VINet: Visual-inertial odometry as a sequence-to-sequence learning problem. In: AAAI Conference on Artificial Intelligence (2017)
Qin, T., Cao, S., Pan, J., Shen, S.: A general optimization-based framework for global pose estimation with multiple sensors. arXiv:1901.03642 (2019)
Candela, J.Q., Sugiyama, M., Schwaighofer, A., Lawrence, N.: Covariate shift by kernel mean matching. Dataset Shift in Machine Learning pp. 131–160 (2009)
Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: Computer Vision and Pattern Recognition (CVPR), pp 1521–1528 (2011)
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2096–2030 (2016)
MathSciNet MATH Google Scholar
Gong, M., Zhang, K., Liu, T., Tao, D., Glymour, C., Scholkopf, B.: Domain adaptation with conditional transferable components. In: International Conference on Machine Learning, pp 2839–2848 (2016)
Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting visual category models to new domains. In: European Conference on Computer Vision, pp 213–226 (2010)
Sun, B., Saenko, K.: Deep coral: Correlation alignment for deep domain adaptation. In: European Conference on Computer Vision, pp 443–450 (2016)
Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain confusion: Maximizing for domain invariance. arXiv:1412.3474 (2014)
Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. arXiv:1702.05464 (2017)
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS (2014)
Sangkloy, P., Lu, J., Fang, C., Yu, F., Hays, J.: Scribbler: Controlling deep image synthesis with sketch and color. In: CVPR (2017)
Karacan, L., Akata, Z., Erdem, A., Erdem, E.: Learning to generate images of outdoor scenes from attributes and semantic layouts. arXiv:1612.00215 (2016)
Godard, C., Aodha, O.M., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)
Johnson, J., Alahi, A., Li, F. -F.: Perceptual losses for real-time style transfer and super-resolution. In: Proc. Euro. Conf. Computer Vision, pp 694–711 (2016)
Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp 6555–6564 (2017)
Chopra, S., Hadsell, R., Lecun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Proc. of Computer Vision and Pattern Recognition Conference, CVPR, pp 539–546. IEEE Press (2005)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. of Computer Vision and Pattern Recognition Conference, CVPR. arXiv:1512.03385, pp 770–778. IEEE Press (2016)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: 2010 International Conference on Machine Learning ICML (2010)
Strecha, C.: Pix4dmapper: The leading photogrammetry software for professional drone mapping. Website, Accessed. https://www.pix4d.com/product/pix4dmapper-photogrammetry-software (2019)
Bentley Systems, Inc.: ContextCapture: Create 3D models from simple photographs and/or point clouds. https://www.bentley.com/en/products/brands/contextcapture
Wu, C.: VisualSFM: A visual structure from motion system. http://ccwu.me/vsfm (2011)
Blender Foundation: Blender: Open source 3D creation. Free to use for any purpose forever. https://www.blender.org/
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980(2015)
Yang, C., Liu, Y., Zell, A.: RCPNet: Deep-learning based relative camera pose estimation for UAVs. In: 2020 International Conference on Unmanned Aircraft Systems (ICUAS), pp 1085–1092, Athens, Greece (2020)

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL. This research was supported by the German Federal Ministry of Education and Research (BMBF) project ‘Training Center Machine Learning, Tuebingen’ with grant number 01|S17054.

Author information

Authors and Affiliations

Department of Computer Science, Chair of Cognitive Systems, University of Tübingen, Tübingen, Germany
Chenhao Yang & Andreas Zell
Department of Social Informatics, Graduate School of Informatics, HRI Laboratory, Kyoto University, Kyoto, Japan
Yuyi Liu

Authors

Chenhao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yuyi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Zell
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors have contributed to the concept and design of the research. Chenhao Yang provided the research ideas and the theoretical analysis, collected the dataset, and wrote the code and the paper. Yuyi Liu and Andreas Zell strictly revised and edited the previous manuscript. The final manuscript was read and approved by all authors.

Corresponding author

Correspondence to Chenhao Yang.

Ethics declarations

Competing interests

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Availability of data and materials

The original dataset is available under email request and could only be used for the non-commercial application.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, C., Liu, Y. & Zell, A. Relative Camera Pose Estimation using Synthetic Data with Domain Adaptation via Cycle-Consistent Adversarial Networks. J Intell Robot Syst 102, 79 (2021). https://doi.org/10.1007/s10846-021-01439-6

Download citation

Received: 04 October 2020
Accepted: 14 June 2021
Published: 08 July 2021
DOI: https://doi.org/10.1007/s10846-021-01439-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Relative Camera Pose Estimation using Synthetic Data with Domain Adaptation via Cycle-Consistent Adversarial Networks

Abstract

Article PDF

Similar content being viewed by others

Adversarial Transfer of Pose Estimation Regression

Learning 3D-Aware GANs from Unposed Images with Template Feature Field

RelMobNet: End-to-End Relative Camera Pose Estimation Using a Robust Two-Stage Training

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Availability of data and materials

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Relative Camera Pose Estimation using Synthetic Data with Domain Adaptation via Cycle-Consistent Adversarial Networks

Abstract

Article PDF

Similar content being viewed by others

Adversarial Transfer of Pose Estimation Regression

Learning 3D-Aware GANs from Unposed Images with Template Feature Field

RelMobNet: End-to-End Relative Camera Pose Estimation Using a Robust Two-Stage Training

Explore related subjects

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Availability of data and materials

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.