MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

Huang, Kuan-Chih; Wu, Tsung-Han; Su, Hung-Ting; Hsu, Winston H.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2203.10981 (cs)

[Submitted on 21 Mar 2022 (v1), last revised 28 Mar 2022 (this version, v2)]

Title:MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

Authors:Kuan-Chih Huang, Tsung-Han Wu, Hung-Ting Su, Winston H. Hsu

View PDF

Abstract:Monocular 3D object detection is an important yet challenging task in autonomous driving. Some existing methods leverage depth information from an off-the-shelf depth estimator to assist 3D detection, but suffer from the additional computational burden and achieve limited performance caused by inaccurate depth priors. To alleviate this, we propose MonoDTR, a novel end-to-end depth-aware transformer network for monocular 3D object detection. It mainly consists of two components: (1) the Depth-Aware Feature Enhancement (DFE) module that implicitly learns depth-aware features with auxiliary supervision without requiring extra computation, and (2) the Depth-Aware Transformer (DTR) module that globally integrates context- and depth-aware features. Moreover, different from conventional pixel-wise positional encodings, we introduce a novel depth positional encoding (DPE) to inject depth positional hints into transformers. Our proposed depth-aware modules can be easily plugged into existing image-only monocular 3D object detectors to improve the performance. Extensive experiments on the KITTI dataset demonstrate that our approach outperforms previous state-of-the-art monocular-based methods and achieves real-time detection. Code is available at this https URL

Comments:	Accepted to CVPR 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2203.10981 [cs.CV]
	(or arXiv:2203.10981v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2203.10981

Submission history

From: Kuan-Chih Huang [view email]
[v1] Mon, 21 Mar 2022 13:40:10 UTC (8,992 KB)
[v2] Mon, 28 Mar 2022 17:56:53 UTC (4,032 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Computer Science > Computer Vision and Pattern Recognition

Title:MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.