TVQA: Localized, Compositional Video Question Answering

Lei, Jie; Yu, Licheng; Bansal, Mohit; Berg, Tamara L.

Computer Science > Computation and Language

arXiv:1809.01696 (cs)

[Submitted on 5 Sep 2018 (v1), last revised 7 May 2019 (this version, v2)]

Title:TVQA: Localized, Compositional Video Question Answering

Authors:Jie Lei, Licheng Yu, Mohit Bansal, Tamara L. Berg

View PDF

Abstract:Recent years have witnessed an increasing interest in image-based question-answering (QA) tasks. However, due to data limitations, there has been much less work on video-based QA. In this paper, we present TVQA, a large-scale video QA dataset based on 6 popular TV shows. TVQA consists of 152,545 QA pairs from 21,793 clips, spanning over 460 hours of video. Questions are designed to be compositional in nature, requiring systems to jointly localize relevant moments within a clip, comprehend subtitle-based dialogue, and recognize relevant visual concepts. We provide analyses of this new dataset as well as several baselines and a multi-stream end-to-end trainable neural network framework for the TVQA task. The dataset is publicly available at this http URL.

Comments:	EMNLP 2018 (13 pages; Data and Leaderboard at: this http URL). Updated with test-public results
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1809.01696 [cs.CL]
	(or arXiv:1809.01696v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1809.01696

Submission history

From: Jie Lei [view email]
[v1] Wed, 5 Sep 2018 19:14:11 UTC (8,613 KB)
[v2] Tue, 7 May 2019 21:34:05 UTC (8,450 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2018-09

Change to browse by:

cs
cs.AI
cs.CV

References & Citations

DBLP - CS Bibliography

listing | bibtex

Jie Lei
Licheng Yu
Mohit Bansal
Tamara L. Berg

export BibTeX citation

Computer Science > Computation and Language

Title:TVQA: Localized, Compositional Video Question Answering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Computer Science > Computation and Language

Title:TVQA: Localized, Compositional Video Question Answering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.