mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models

Lin, Peiqin; Hu, Chengzhi; Zhang, Zheyu; Martins, André F. T.; Schütze, Hinrich

Computer Science > Computation and Language

arXiv:2305.13684 (cs)

[Submitted on 23 May 2023 (v1), last revised 5 Jul 2024 (this version, v3)]

Title:mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models

Authors:Peiqin Lin, Chengzhi Hu, Zheyu Zhang, André F. T. Martins, Hinrich Schütze

View PDF HTML (experimental)

Abstract:Recent multilingual pretrained language models (mPLMs) have been shown to encode strong language-specific signals, which are not explicitly provided during pretraining. It remains an open question whether it is feasible to employ mPLMs to measure language similarity, and subsequently use the similarity results to select source languages for boosting cross-lingual transfer. To investigate this, we propose mPLMSim, a language similarity measure that induces the similarities across languages from mPLMs using multi-parallel corpora. Our study shows that mPLM-Sim exhibits moderately high correlations with linguistic similarity measures, such as lexicostatistics, genealogical language family, and geographical sprachbund. We also conduct a case study on languages with low correlation and observe that mPLM-Sim yields more accurate similarity results. Additionally, we find that similarity results vary across different mPLMs and different layers within an mPLM. We further investigate whether mPLMSim is effective for zero-shot cross-lingual transfer by conducting experiments on both low-level syntactic tasks and high-level semantic tasks. The experimental results demonstrate that mPLM-Sim is capable of selecting better source languages than linguistic measures, resulting in a 1%-2% improvement in zero-shot cross-lingual transfer performance.

Comments:	EACL 2024 Findings
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2305.13684 [cs.CL]
	(or arXiv:2305.13684v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.13684

Submission history

From: Peiqin Lin [view email]
[v1] Tue, 23 May 2023 04:44:26 UTC (6,958 KB)
[v2] Mon, 29 Jan 2024 09:03:43 UTC (254 KB)
[v3] Fri, 5 Jul 2024 17:19:52 UTC (254 KB)

Computer Science > Computation and Language

Title:mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Computer Science > Computation and Language

Title:mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.