Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

Cao, Yang; Zeng, Yihan; Xu, Hang; Xu, Dan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.00830 (cs)

[Submitted on 2 Jun 2024]

Title:Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

Authors:Yang Cao, Yihan Zeng, Hang Xu, Dan Xu

View PDF HTML (experimental)

Abstract:Open-vocabulary 3D Object Detection (OV-3DDet) addresses the detection of objects from an arbitrary list of novel categories in 3D scenes, which remains a very challenging problem. In this work, we propose CoDAv2, a unified framework designed to innovatively tackle both the localization and classification of novel 3D objects, under the condition of limited base categories. For localization, the proposed 3D Novel Object Discovery (3D-NOD) strategy utilizes 3D geometries and 2D open-vocabulary semantic priors to discover pseudo labels for novel objects during training. 3D-NOD is further extended with an Enrichment strategy that significantly enriches the novel object distribution in the training scenes, and then enhances the model's ability to localize more novel objects. The 3D-NOD with Enrichment is termed 3D-NODE. For classification, the Discovery-driven Cross-modal Alignment (DCMA) module aligns features from 3D point clouds and 2D/textual modalities, employing both class-agnostic and class-specific alignments that are iteratively refined to handle the expanding vocabulary of objects. Besides, 2D box guidance boosts the classification accuracy against complex background noises, which is coined as Box-DCMA. Extensive evaluation demonstrates the superiority of CoDAv2. CoDAv2 outperforms the best-performing method by a large margin (AP_Novel of 9.17 vs. 3.61 on SUN-RGBD and 9.12 vs. 3.74 on ScanNetv2). Source code and pre-trained models are available at the GitHub project page.

Comments:	Code Page: this https URL This paper has been submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) for possible publication
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2406.00830 [cs.CV]
	(or arXiv:2406.00830v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.00830

Submission history

From: Yang Cao [view email]
[v1] Sun, 2 Jun 2024 18:32:37 UTC (2,908 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Computer Science > Computer Vision and Pattern Recognition

Title:Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.