VisDiaHalBench: A Visual Dialogue Benchmark For Diagnosing Hallucination in Large Vision-Language Models

Qingxing Cao, Junhao Cheng, Xiaodan Liang, Liang Lin


Abstract
Despite the significant success of large vision-language models (LVLMs), some studies have revealed that LVLMs suffer from the hallucination problem, where the LVLMs’ response contains descriptions of non-existent objects. Although various benchmarks have been proposed to investigate this problem, they mostly focus on single-turn evaluation and overlook the hallucination raised by textual inputs. To investigate the hallucination problem of LVLMs when given long-term misleading textual history, we propose a novel visual dialogue hallucination evaluation benchmark VisDiaHalBench. The benchmark consists of samples with five-turn questions about an edited image and its original version. VisDiaHalBench differs from previous hallucination benchmarks in the following three points: 1) The questions and answers are unambiguously grounded by annotated scene graphs. 2) The images are uncommonly edited to inspect the visual model and common-object hallucination in LLMs. 3) The carefully designed dialogue refers a same object in different turns to assess the image consistency and influence of history for LVLMs. The detailed analysis of several state-of-the-art LVLMs across image consistency, visual understanding, history influence, and other dimensions reveals their substantial performance gap with single-turn VQA tasks. The benchmark is released in: https://github.com/qingxingcao/VisDiaHalBench
Anthology ID:
2024.acl-long.658
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12161–12176
Language:
URL:
https://aclanthology.org/2024.acl-long.658/
DOI:
10.18653/v1/2024.acl-long.658
Bibkey:
Cite (ACL):
Qingxing Cao, Junhao Cheng, Xiaodan Liang, and Liang Lin. 2024. VisDiaHalBench: A Visual Dialogue Benchmark For Diagnosing Hallucination in Large Vision-Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12161–12176, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
VisDiaHalBench: A Visual Dialogue Benchmark For Diagnosing Hallucination in Large Vision-Language Models (Cao et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.658.pdf

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy