Fully Sparse 3D Occupancy Prediction

Liu, Haisong; Chen, Yang; Wang, Haiguang; Yang, Zetong; Li, Tianyu; Zeng, Jia; Chen, Li; Li, Hongyang; Wang, Limin

doi:10.1007/978-3-031-72698-9_4

Haisong Liu ORCID: orcid.org/0000-0002-0687-3713^13,14,
Yang Chen¹³,
Haiguang Wang¹³,
Zetong Yang¹⁴,
Tianyu Li¹⁴,
Jia Zeng¹⁴,
Li Chen¹⁴,
Hongyang Li¹⁴ &
…
Limin Wang ORCID: orcid.org/0000-0002-3674-7718^13,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15083))

Included in the following conference series:

European Conference on Computer Vision

273 Accesses

Abstract

Occupancy prediction plays a pivotal role in autonomous driving. Previous methods typically construct dense 3D volumes, neglecting the inherent sparsity of the scene and suffering high computational costs. To bridge the gap, we introduce a novel fully sparse occupancy network, termed SparseOcc. SparseOcc initially reconstructs a sparse 3D representation from visual inputs and subsequently predicts semantic/instance occupancy from the 3D sparse representation by sparse queries. A mask-guided sparse sampling is designed to enable sparse queries to interact with 2D features in a fully sparse manner, thereby circumventing costly dense features or global attention. Additionally, we design a thoughtful ray-based evaluation metric, namely RayIoU, to solve the inconsistency penalty along depths raised in traditional voxel-level mIoU criteria. SparseOcc demonstrates its effectiveness by achieving a RayIoU of 34.0, while maintaining a real-time inference speed of 17.3 FPS, with 7 history frames inputs. By incorporating more preceding frames to 15, SparseOcc continuously improves its performance to 35.1 RayIoU without bells and whistles. Code is available at https://github.com/MCG-NJU/SparseOcc.

H. Liu and Y. Chen—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding

OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving

References

Tesla AI Day (2021). https://www.youtube.com/watch?v=j0z4FweCy4M
Bozic, A., Palafox, P., Thies, J., Dai, A., Nießner, M.: Transformerfusion: monocular rgb scene reconstruction using transformers. In: NeurIPS (2021)
Google Scholar
Caesar, H., et al.: nuscenes: a multimodal dataset for autonomous driving. In: CVPR (2020)
Google Scholar
Cao, A.Q., de Charette, R.: Monoscene: monocular 3d semantic scene completion. In: CVPR (2022)
Google Scholar
Chen, L., et al.: Persformer: 3d lane detection via perspective transformer and the openlane benchmark. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022, pp. 550–567. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-19839-7_32
Chapter Google Scholar
Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: CVPR (2022)
Google Scholar
Cheng, B., Schwing, A., Kirillov, A.: Per-pixel classification is not all you need for semantic segmentation. In: NeurIPS (2021)
Google Scholar
Choy, C., Gwak, J., Savarese, S.: 4d spatio-temporal convnets: minkowski convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3075–3084 (2019)
Google Scholar
Curless, B., Levoy, M.: A volumetric method for building complex models from range images. In: SIGGRAPH (1996)
Google Scholar
Ding, Y., Huang, L., Zhong, J.: Multi-scale occ: 4th place solution for Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 3d Occupancy Prediction Challenge. arXiv preprint arXiv:2306.11414 (2023)
Feng, Z., Yang, L., Guo, P., Li, B.: Cvrecon: rethinking 3d geometric feature learning for neural reconstruction. In: ICCV (2023)
Google Scholar
Gan, W., Mo, N., Xu, H., Yokoya, N.: A comprehensive framework for 3d occupancy estimation in autonomous driving. IEEE Trans. Intell. Veh. 1–19 (2024)
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: CVPR (2022)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Huang, J., Huang, G.: Bevdet4d: exploit temporal cues in multi-camera 3d object detection. arXiv preprint arXiv:2203.17054 (2022)
Huang, J., Huang, G., Zhu, Z., Du, D.: Bevdet: high-performance multi-camera 3d object detection in bird-eye-view. arXiv preprint arXiv:2112.11790 (2021)
Huang, L., et al.: Leveraging vision-centric multi-modal expertise for 3d object detection. In: NeurIPS (2024)
Google Scholar
Huang, L., et al.: Geometric-aware pretraining for vision-centric 3d object detection. arXiv preprint arXiv:2304.03105 (2023)
Huang, Y., Zheng, W., Zhang, Y., Zhou, J., Lu, J.: Tri-perspective view for vision-based 3d semantic occupancy prediction. In: CVPR (2023)
Google Scholar
Jia, X., De Brabandere, B., Tuytelaars, T., Gool, L.V.: Dynamic filter networks. In: NeurIPS (2016)
Google Scholar
Khurana, T., Hu, P., Held, D., Ramanan, D.: Point cloud forecasting as a proxy for 4d occupancy forecasting. In: CVPR (2023)
Google Scholar
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: fast encoders for object detection from point clouds. In: CVPR (2019)
Google Scholar
Li, H., et al.: Open-sourced data ecosystem in autonomous driving: the present and future. arXiv preprint arXiv:2312.03408 (2023)
Li, H., et al.: Delving into the devils of bird’s-eye-view perception: a review, evaluation and recipe. IEEE TPAMI (2023)
Google Scholar
Li, T., et al.: Lanesegnet: map learning with lane segment perception for autonomous driving. In: ICLR (2024)
Google Scholar
Li, Y., et al.: Voxformer: sparse voxel transformer for camera-based 3d semantic scene completion. In: CVPR (2023)
Google Scholar
Li, Z., et al.: Bevformer: learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022, pp. 1–18. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-20077-9_1
Chapter Google Scholar
Li, Z., et al.: Fb-occ: 3d occupancy prediction based on forward-backward view transformation. arXiv preprint arXiv:2307.01492 (2023)
Liao, B., et al.: Maptr: structured modeling and learning for online vectorized hd map construction. In: ICLR (2023)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
Google Scholar
Liu, H., Lu, T., Xu, Y., Liu, J., Li, W., Chen, L.: Camliflow: bidirectional camera-lidar fusion for joint optical flow and scene flow estimation. In: CVPR (2022)
Google Scholar
Liu, H., Lu, T., Xu, Y., Liu, J., Wang, L.: Learning optical flow and scene flow with bidirectional camera-lidar fusion. arXiv preprint arXiv:2303.12017 (2023)
Liu, H., Teng, Y., Lu, T., Wang, H., Wang, L.: Sparsebev: high-performance sparse 3d object detection from multi-camera videos. In: ICCV (2023)
Google Scholar
Liu, Y., Wang, T., Zhang, X., Sun, J.: PETR: position embedding transformation for multi-view 3d object detection. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022, pp. 531–548. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-19812-0_31
Chapter Google Scholar
Liu, Y., et al.: Petrv2: a unified framework for 3d perception from multi-camera images. arXiv preprint arXiv:2206.01256 (2022)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)
Google Scholar
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3d reconstruction in function space. In: CVPR (2019)
Google Scholar
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 3DV (2016)
Google Scholar
Murez, Z., van As, T., Bartolozzi, J., Sinha, A., Badrinarayanan, V., Rabinovich, A.: Atlas: end-to-end 3D scene reconstruction from posed images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 414–431. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_25
Chapter Google Scholar
Pan, M., et al.: Renderocc: vision-centric 3d occupancy prediction with 2d rendering supervision. arXiv preprint arXiv:2309.09502 (2023)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: NeurIPS (2019)
Google Scholar
Peng, S., Niemeyer, M., Mescheder, L., Pollefeys, M., Geiger, A.: Convolutional occupancy networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 523–540. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_31
Chapter Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Schult, J., Engelmann, F., Hermans, A., Litany, O., Tang, S., Leibe, B.: Mask3d: mask transformer for 3d semantic instance segmentation. In: ICRA (2023)
Google Scholar
Sima, C., et al.: Scene as occupancy. In: ICCV (2023)
Google Scholar
Stier, N., Rich, A., Sen, P., Höllerer, T.: Vortx: volumetric 3d reconstruction with transformers for voxelwise view selection and fusion. In: 3DV (2021)
Google Scholar
Sun, J., Xie, Y., Chen, L., Zhou, X., Bao, H.: Neuralrecon: real-time coherent 3d reconstruction from monocular video. In: CVPR (2021)
Google Scholar
Takikawa, T., et al.: Neural geometric level of detail: real-time rendering with implicit 3d shapes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11358–11367 (2021)
Google Scholar
Takmaz, A., Fedele, E., Sumner, R.W., Pollefeys, M., Tombari, F., Engelmann, F.: Openmask3d: open-vocabulary 3d instance segmentation. arXiv preprint arXiv:2306.13631 (2023)
Tian, X., Jiang, T., Yun, L., Wang, Y., Wang, Y., Zhao, H.: Occ3d: a large-scale 3d occupancy prediction benchmark for autonomous driving. In: NeurIPS Datasets and Benchmarks (2023)
Google Scholar
Tolstikhin, I.O., et al.: Mlp-mixer: an all-mlp architecture for vision. In: NeurIPS (2021)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Google Scholar
Wang, H., et al.: Openlane-v2: a topology reasoning benchmark for unified 3d hd mapping. In: NeurIPS (2023)
Google Scholar
Wang, S., Liu, Y., Wang, T., Li, Y., Zhang, X.: Exploring object-centric temporal modeling for efficient multi-view 3d object detection. arXiv preprint arXiv:2303.11926 (2023)
Wang, X., et al.: Openoccupancy: a large scale benchmark for surrounding semantic occupancy perception. arXiv preprint arXiv:2303.03991 (2023)
Wang, Y., Guizilini, V.C., Zhang, T., Wang, Y., Zhao, H., Solomon, J.: Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In: CoRL (2022)
Google Scholar
Wang, Y., Chen, Y., Liao, X., Fan, L., Zhang, Z.: Panoocc: unified occupancy representation for camera-based 3d panoptic segmentation. arXiv preprint arXiv:2306.10013 (2023)
Wei, Y., Zhao, L., Zheng, W., Zhu, Z., Zhou, J., Lu, J.: Surroundocc: multi-camera 3d occupancy prediction for autonomous driving. In: ICCV (2023)
Google Scholar
Yang, Z., Chen, L., Sun, Y., Li, H.: Visual point cloud forecasting enables scalable autonomous driving. In: CVPR (2024)
Google Scholar
Yang, Z., Jiang, L., Sun, Y., Schiele, B., Jia, J.: A unified query-based paradigm for point cloud understanding. In: ICCV (2022)
Google Scholar
Yang, Z., Sun, Y., Liu, S., Jia, J.: 3dssd: point-based 3d single stage object detector. In: CVPR (2020)
Google Scholar
Yang, Z., Sun, Y., Liu, S., Shen, X., Jia, J.: Std: sparse-to-dense 3d object detector for point cloud. In: ICCV (2019)
Google Scholar
Yang, Z., Zhou, Y., Chen, Z., Ngiam, J.: 3d-man: 3d multi-frame attention network for object detection. In: ICCV (2021)
Google Scholar
Yin, T., Zhou, X., Krahenbuhl, P.: Center-based 3d object detection and tracking. In: CVPR (2021)
Google Scholar
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)

Download references

Acknowledgements

We thank the anonymous reviewers for their suggestions that make this work better. This work is supported by the National Key R&D Program of China (No. 2022ZD0160900), the National Natural Science Foundation of China (No. 62076119, No. 61921006), the Fundamental Research Funds for the Central Universities (No. 020214380119), and the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Haisong Liu, Yang Chen, Haiguang Wang & Limin Wang
Shanghai AI Lab, Shanghai, China
Haisong Liu, Zetong Yang, Tianyu Li, Jia Zeng, Li Chen, Hongyang Li & Limin Wang

Authors

Haisong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Haiguang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zetong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Tianyu Li
View author publications
You can also search for this author in PubMed Google Scholar
Jia Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Li Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hongyang Li
View author publications
You can also search for this author in PubMed Google Scholar
Limin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Limin Wang .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2531 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, H. et al. (2025). Fully Sparse 3D Occupancy Prediction. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15083. Springer, Cham. https://doi.org/10.1007/978-3-031-72698-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-72698-9_4
Published: 26 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72697-2
Online ISBN: 978-3-031-72698-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fully Sparse 3D Occupancy Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding

OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2531 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Fully Sparse 3D Occupancy Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding

OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2531 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.