PLIN: A Network for Pseudo-LiDAR Point Cloud Interpolation
Abstract
:1. Introduction
- To mitigate the low frequency limitation of LiDAR sensors, we present a Pseudo-LiDAR interpolation network to generate both temporally and spatially high-quality point cloud sequences.
- We use the bidirectional optical flow as an explicit motion guidance for interpolation. In addition, a warping layer is applied to improve the accuracy of depth prediction by approximating an intermediate frame. Finally, the in-between color image is leveraged to provide rich texture information of the realistic scene for more accurate and dense spatial reconstruction.
- We evaluate the proposed model on the KITTI benchmark [21], which reasonably recovers the original intermediate 3D scene and outperforms other interpolation methods.
2. Related Research
2.1. Video Interpolation
2.2. Depth Completion
3. Approach
3.1. Intermediate Depth Map Interpolation
3.1.1. Baseline Network
3.1.2. Motion Guidance Module
3.1.3. Scene Guidance Module
3.2. Transformation Module
3.3. Loss Function
4. Experiments
4.1. Dataset and Strategy
4.1.1. Dataset
4.1.2. Strategy
4.2. Ablation Study
- The baseline network only takes two consecutive sparse depth maps as the input (baseline).
- The forward and backward sparse depth maps and estimated optical flow maps are fed into the baseline network (baseline + flow).
- The baseline network receives the forward and backward depth maps, the bidirectional optical flow, and the depth maps derived by the warping layer (baseline + warp_flow).
- The refined network takes the intermediate color image and two depth maps as its inputs (baseline + rgb).
- The complete configurations including the coarse interpolation network with motion guidance using the warping operation, and the refined interpolation network with scene guidance (ours).
- Root mean squared error (RMSE):
- Mean absolute error (MAE):
- Root mean squared error of the inverse depth [1/km](iRMSE):
- Mean absolute error of the inverse depth [1/km](iMAE):
4.3. Comparison Results
4.3.1. Quantitative Comparison
4.3.2. Visual Comparison
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Shi, S.; Wang, X.; Li, H. PointRcnn: 3D object proposal generation and detection from point cloud. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 770–779. [Google Scholar]
- Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum pointnets for 3D object detection from RGB-D data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 918–927. [Google Scholar]
- Li, M.; Hu, Y.; Zhao, N.; Qian, Q. One-Stage Multi-Sensor Data Fusion Convolutional Neural Network for 3D Object Detection. Sensors 2019, 19, 1434. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wu, B.; Wan, A.; Yue, X.; Keutzer, K. Squeezeseg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3D lidar point cloud. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 1887–1893. [Google Scholar]
- Wu, B.; Zhou, X.; Zhao, S.; Yue, X.; Keutzer, K. SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 4376–4382. [Google Scholar]
- Cai, G.; Jiang, Z.; Wang, Z.; Huang, S.; Chen, K.; Ge, X.; Wu, Y. Spatial Aggregation Net: Point Cloud Semantic Segmentation Based on Multi-Directional Convolution. Sensors 2019, 19, 4329. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ma, F.; Cavalheiro, G.V.; Karaman, S. Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 3288–3295. [Google Scholar]
- Zou, N.; Xiang, Z.; Chen, Y.; Chen, S.; Qiao, C. Simultaneous Semantic Segmentation and Depth Completion with Constraint of Boundary. Sensors 2020, 20, 635. [Google Scholar] [CrossRef] [Green Version]
- Van Gansbeke, W.; Neven, D.; De Brabandere, B.; Van Gool, L. Sparse and Noisy LiDAR Completion with RGB Guidance and Uncertainty. In Proceedings of the International Conference on Machine Vision Applications (MVA), Tokyo, Japan, 27–31 May 2019; pp. 1–6. [Google Scholar]
- Qiu, J.; Cui, Z.; Zhang, Y.; Zhang, X.; Liu, S.; Zeng, B.; Pollefeys, M. Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 3313–3322. [Google Scholar]
- Su, H.; Maji, S.; Kalogerakis, E.; Learned-Miller, E.G. Multi-view Convolutional Neural Networks for 3D Shape Recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 13–16 December 2015. [Google Scholar]
- Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S.L. Joint 3D Proposal Generation and Object Detection from View Aggregation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1–8. [Google Scholar]
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-View 3D Object Detection Network for Autonomous Driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3773–3777. [Google Scholar]
- Li, Q.; Lin, C.; Zhao, Y. Geometric features-based parking slot detection. Sensors 2018, 18, 2821. [Google Scholar] [CrossRef] [Green Version]
- Wang, Y.; Chao, W.L.; Garg, D.; Hariharan, B.; Campbell, M.; Weinberger, K. Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA USA, 14–19 June 2019; pp. 8445–8453. [Google Scholar]
- Peleg, T.; Szekely, P.; Sabo, D.; Sendik, O. IM-Net for High Resolution Video Frame Interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA USA, 14–19 June 2019; pp. 2398–2407. [Google Scholar]
- Bao, W.; Lai, W.S.; Ma, C.; Zhang, X.; Gao, Z.; Yang, M.H. Depth-aware video frame interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 3703–3712. [Google Scholar]
- Jiang, H.; Sun, D.; Jampani, V.; Yang, M.H.; Learned-Miller, E.; Kautz, J. Super slomo: High quality estimation of multiple intermediate frames for video interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 9000–9008. [Google Scholar]
- Zhang, Y.; Funkhouser, T.A. Deep Depth Completion of a Single RGB-D Image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 175–185. [Google Scholar]
- Lee, B.U.; Jeon, H.G.; Im, S.; Kweon, I.S. Depth Completion with Deep Geometry and Context Guidance. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 3281–3287. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Rob. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
- Camplani, M.; Salgado, L. Efficient spatio-temporal hole filling strategy for kinect depth maps. In Proceedings of the SPIE—The International Society for Optical Engineering, Burlingame, CA, USA, 6–7 February 2012. [Google Scholar]
- Ma, F.; Carlone, L.; Ayaz, U.; Karaman, S. Sparse depth sensing for resource-constrained robots. Int. J. Rob. Res. 2017, 18. [Google Scholar] [CrossRef]
- Barron, J.T.; Poole, B. The Fast Bilateral Solver. Available online: https://arxiv.org/abs/1511.03296 (accessed on 22 July 2016).
- Uhrig, J.; Schneider, N.; Schneider, L.; Franke, U.; Brox, T.; Geiger, A. Sparsity invariant cnns. In Proceedings of the 2017 International Conference on 3D Vision (3DV), Qingdao, China, 10–12 October 2017; pp. 11–20. [Google Scholar]
- Liu, L.K.; Chan, S.H.; Nguyen, T.Q. Depth reconstruction from sparse samples: Representation, algorithm, and sampling. IEEE Trans. Image Process. 2015, 24, 1983–1996. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Hui, T.W.; Tang, X.; Change Loy, C. Liteflownet: A lightweight convolutional neural network for optical flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 18–22 June 2018; pp. 8981–8989. [Google Scholar]
- Yin, Z.; Darrell, T.; Yu, F. Hierarchical Discrete Distribution Decomposition for Match Density Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 6044–6053. [Google Scholar]
- Ilg, E.; Mayer, N.; Saikia, T.; Keuper, M.; Dosovitskiy, A.; Brox, T. Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2462–2470. [Google Scholar]
- Butler, D.J.; Wulff, J.; Stanley, G.B.; Black, M.J. A naturalistic open source movie for optical flow evaluation. In Proceedings of the 12th European conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 611–625. [Google Scholar]
- Liu, Z.; Yeh, R.A.; Tang, X.; Liu, Y.; Agarwala, A. Video frame synthesis using deep voxel flow. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4463–4471. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional Networks for Biomedical Image Segmentation. Available online: https://arxiv.org/abs/1505.04597 (accessed on 18 May 2015).
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic Differentiation in Pytorch. Available online: https://openreview.net/forum?id=BJJsrmfCZ (accessed on 28 October 2017).
Configuration | RMSE | MAE | iRMSE | iMAE |
---|---|---|---|---|
Baseline | 1408.80 | 513.06 | 7.63 | 3.01 |
Baseline+flow | 1335.06 | 514.4 | 8.04 | 3.47 |
Baseline+warp_flow | 1216.63 | 532.84 | 9.03 | 4.04 |
Baseline+rgb | 1238.25 | 495.24 | 6.38 | 2.95 |
Ours | 1168.27 | 546.37 | 6.84 | 3.68 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, H.; Liao, K.; Lin, C.; Zhao, Y.; Liu, M. PLIN: A Network for Pseudo-LiDAR Point Cloud Interpolation. Sensors 2020, 20, 1573. https://doi.org/10.3390/s20061573
Liu H, Liao K, Lin C, Zhao Y, Liu M. PLIN: A Network for Pseudo-LiDAR Point Cloud Interpolation. Sensors. 2020; 20(6):1573. https://doi.org/10.3390/s20061573
Chicago/Turabian StyleLiu, Haojie, Kang Liao, Chunyu Lin, Yao Zhao, and Meiqin Liu. 2020. "PLIN: A Network for Pseudo-LiDAR Point Cloud Interpolation" Sensors 20, no. 6: 1573. https://doi.org/10.3390/s20061573
APA StyleLiu, H., Liao, K., Lin, C., Zhao, Y., & Liu, M. (2020). PLIN: A Network for Pseudo-LiDAR Point Cloud Interpolation. Sensors, 20(6), 1573. https://doi.org/10.3390/s20061573