PaReNT (Parent Retrieval Neural Tool): A Deep Dive into Word Formation across Languages

Emil Svoboda, Magda Sevcikova


Abstract
We present PaReNT (Parent Retrieval Neural Tool), a deep-learning-based multilingual tool performing retrieval and word formation classification in English, German, Dutch, Spanish, French, Russian, and Czech. Parent retrieval refers to determining the lexeme or lexemes the input lexeme was based on (e.g. “darkness” is traced back to “dark”; “waterfall” decomposes into “water” and “fall”). Additionally, PaReNT performs word formation classification, which determines the input lexeme as a compound e.g. “proofread”, a derivative (e.g. “deescalate”) or as an unmotivated word (e.g. “dog”). These seven languages are selected from three major branches of the Indo-European language family (Germanic, Romance, Slavic). Data is aggregated from a range of word-formation resources, as well as Wiktionary, to train and test the tool. The tool is based on a custom-architecture hybrid transformer block-enriched sequence-to-sequence neural network utilizing both a character-based and semantic representation of the input lexemes, with two output modules - one decoder-based dedicated to parent retrieval, and one classifier-based for word formation classification. PaReNT achieves a mean accuracy of 0.62 in parent retrieval and a mean balanced accuracy of 0.74 in word formation classification.
Anthology ID:
2024.lrec-main.1104
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
12611–12621
Language:
URL:
https://aclanthology.org/2024.lrec-main.1104/
DOI:
Bibkey:
Cite (ACL):
Emil Svoboda and Magda Sevcikova. 2024. PaReNT (Parent Retrieval Neural Tool): A Deep Dive into Word Formation across Languages. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 12611–12621, Torino, Italia. ELRA and ICCL.
Cite (Informal):
PaReNT (Parent Retrieval Neural Tool): A Deep Dive into Word Formation across Languages (Svoboda & Sevcikova, LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1104.pdf

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy