Mega-COV: A Billion-Scale Dataset of 100+ Languages for COVID-19

Muhammad Abdul-Mageed, AbdelRahim Elmadany, El Moatez Billah Nagoudi, Dinesh Pabbi, Kunal Verma, Rannie Lin


Abstract
We describe Mega-COV, a billion-scale dataset from Twitter for studying COVID-19. The dataset is diverse (covers 268 countries), longitudinal (goes as back as 2007), multilingual (comes in 100+ languages), and has a significant number of location-tagged tweets (~169M tweets). We release tweet IDs from the dataset. We also develop two powerful models, one for identifying whether or not a tweet is related to the pandemic (best F1=97%) and another for detecting misinformation about COVID-19 (best F1=92%). A human annotation study reveals the utility of our models on a subset of Mega-COV. Our data and models can be useful for studying a wide host of phenomena related to the pandemic. Mega-COV and our models are publicly available.
Anthology ID:
2021.eacl-main.298
Volume:
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Month:
April
Year:
2021
Address:
Online
Editors:
Paola Merlo, Jorg Tiedemann, Reut Tsarfaty
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3402–3420
Language:
URL:
https://aclanthology.org/2021.eacl-main.298/
DOI:
10.18653/v1/2021.eacl-main.298
Bibkey:
Cite (ACL):
Muhammad Abdul-Mageed, AbdelRahim Elmadany, El Moatez Billah Nagoudi, Dinesh Pabbi, Kunal Verma, and Rannie Lin. 2021. Mega-COV: A Billion-Scale Dataset of 100+ Languages for COVID-19. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3402–3420, Online. Association for Computational Linguistics.
Cite (Informal):
Mega-COV: A Billion-Scale Dataset of 100+ Languages for COVID-19 (Abdul-Mageed et al., EACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.eacl-main.298.pdf
Code
 UBC-NLP/megacov
Data
Mega-COV

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy