Abstract
To reconstruct the 3D model of specific targets in real time, a multi-sensor data fusion-based 3D reconstruction algorithm is proposed in this paper. This network-based algorithm takes the camera image and lidar point-cloud data as inputs, employing RGB channel and lidar channel to process each type of data separately, and finally obtains the targets’ dense depth map by fusion. In RGB channel, the transformer network rather than CNN (convolutional neural network) is used to obtain multi-scale image features with global receptive field and high resolution, and generate monocular depth, guidance map and semantic segmentation. In the lidar channel, the sparse lidar data is fused with the guidance map to generate the final prediction of dense depth. In the test, our algorithm achieved a high ranking on the leaderboard. In application, under the condition of equal reconstruction quality, a five times faster speed is obtained in 3D reconstruction comparing to the traditional image-based method.
Y. Ma(1981)—-Associated professor of BeiHang University, researching in fields of M&S theory and practice, and intelligent behavior modeling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Long, J., Evan, S., Trevor, D.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Ronneberger, O., Philipp, F., Thomas, B.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol. 9351. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Cheng, X., Wang, P., Yang, R.: Depth estimation via affinity learned with convolutional spatial propagation network. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 108–125. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_7
Tang, J., Folkesson, J., Jensfelt, P.: Sparse2dense: From direct sparse odometry to dense 3-d reconstruction. IEEE Robot. Autom. Letters 4(2), 530–537 (2019)
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Ranftl, R., Alexey, B., Vladlen, K.: Vision transformers for dense prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021)
Ma, F., Guilherme, V.C., Sertac, K.: Self-supervised sparse-to-dense: Self-supervised depth completion from Lidar and monocular camera. In: 2019 International Conference on Robotics and Automation (ICRA). IEEE (2019)
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Zhang, Y., Thomas, F.: Deep depth completion of a single RGB-D image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Valada, A., Mohan, R., Burgard, W.: Self-supervised model adaptation for multimodal semantic segmentation. Int. J. Comput. Vision 128(5), 1239–1285 (2020)
Lin, G., et al.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
de Brébisson, A., Pascal, V.: The z-loss: a shift and scale invariant classification loss belonging to the spherical family. arXiv preprint arXiv:1604.08859 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhou, Y., Lv, J., Ma, Y., Ma, X. (2022). A 3D Reconstruction Network Based on Multi-sensor. In: Fan, W., Zhang, L., Li, N., Song, X. (eds) Methods and Applications for Modeling and Simulation of Complex Systems. AsiaSim 2022. Communications in Computer and Information Science, vol 1712. Springer, Singapore. https://doi.org/10.1007/978-981-19-9198-1_44
Download citation
DOI: https://doi.org/10.1007/978-981-19-9198-1_44
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-9197-4
Online ISBN: 978-981-19-9198-1
eBook Packages: Computer ScienceComputer Science (R0)