MuLER: Detailed and Scalable Reference-based Evaluation

Karidi, Taelin; Choshen, Leshem; Patel, Gal; Abend, Omri

Computer Science > Computation and Language

arXiv:2305.14991 (cs)

[Submitted on 24 May 2023 (v1), last revised 29 Nov 2023 (this version, v2)]

Title:MuLER: Detailed and Scalable Reference-based Evaluation

Authors:Taelin Karidi, Leshem Choshen, Gal Patel, Omri Abend

View PDF

Abstract:We propose a novel methodology (namely, MuLER) that transforms any reference-based evaluation metric for text generation, such as machine translation (MT) into a fine-grained analysis tool. Given a system and a metric, MuLER quantifies how much the chosen metric penalizes specific error types (e.g., errors in translating names of locations). MuLER thus enables a detailed error analysis which can lead to targeted improvement efforts for specific phenomena. We perform experiments in both synthetic and naturalistic settings to support MuLER's validity and showcase its usability in MT evaluation, and other tasks, such as summarization. Analyzing all submissions to WMT in 2014-2020, we find consistent trends. For example, nouns and verbs are among the most frequent POS tags. However, they are among the hardest to translate. Performance on most POS tags improves with overall system performance, but a few are not thus correlated (their identity changes from language to language). Preliminary experiments with summarization reveal similar trends.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2305.14991 [cs.CL]
	(or arXiv:2305.14991v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.14991

Submission history

From: Taelin Karidi [view email]
[v1] Wed, 24 May 2023 10:26:13 UTC (11,764 KB)
[v2] Wed, 29 Nov 2023 10:47:58 UTC (9,028 KB)

Computer Science > Computation and Language

Title:MuLER: Detailed and Scalable Reference-based Evaluation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Computer Science > Computation and Language

Title:MuLER: Detailed and Scalable Reference-based Evaluation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.