MMGA: Multimodal Learning with Graph Alignment

Yang, Xuan; Tao, Quanjin; Feng, Xiao; Cai, Donghong; Ren, Xiang; Yang, Yang

Computer Science > Multimedia

arXiv:2210.09946 (cs)

[Submitted on 18 Oct 2022 (v1), last revised 31 Oct 2022 (this version, v2)]

Title:MMGA: Multimodal Learning with Graph Alignment

Authors:Xuan Yang, Quanjin Tao, Xiao Feng, Donghong Cai, Xiang Ren, Yang Yang

View PDF

Abstract:Multimodal pre-training breaks down the modality barriers and allows the individual modalities to be mutually augmented with information, resulting in significant advances in representation learning. However, graph modality, as a very general and important form of data, cannot be easily interacted with other modalities because of its non-regular nature. In this paper, we propose MMGA (Multimodal learning with Graph Alignment), a novel multimodal pre-training framework to incorporate information from graph (social network), image and text modalities on social media to enhance user representation learning. In MMGA, a multi-step graph alignment mechanism is proposed to add the self-supervision from graph modality to optimize the image and text encoders, while using the information from the image and text modalities to guide the graph encoder learning. We conduct experiments on the dataset crawled from Instagram. The experimental results show that MMGA works well on the dataset and improves the fans prediction task's performance. We release our dataset, the first social media multimodal dataset with graph, of 60,000 users labeled with specific topics based on 2 million posts to facilitate future research.

Comments:	Please contact xuany@zju.this http URL for the dataset
Subjects:	Multimedia (cs.MM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2210.09946 [cs.MM]
	(or arXiv:2210.09946v2 [cs.MM] for this version)
	https://doi.org/10.48550/arXiv.2210.09946

Submission history

From: Xuan Yang [view email]
[v1] Tue, 18 Oct 2022 15:50:31 UTC (951 KB)
[v2] Mon, 31 Oct 2022 08:06:13 UTC (951 KB)

Computer Science > Multimedia

Title:MMGA: Multimodal Learning with Graph Alignment

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Computer Science > Multimedia

Title:MMGA: Multimodal Learning with Graph Alignment

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.