A New Dataset for Tonal and Segmental Dialectometry from the Yue- and Pinghua-Speaking Area

Ho Wang Matthew Sung; Jelena Prokić; Yiya Chen

A New Dataset for Tonal and Segmental Dialectometry from the Yue- and Pinghua-Speaking Area

Ho Wang Matthew Sung, Jelena Prokic, Yiya Chen

Abstract

Traditional dialectology or dialect geography is the study of geographical variation of language. Originated in Europe and pioneered in Germany and France, this field has predominantly been focusing on sounds, more specifically, on segments. Similarly, quantitative approaches to language variation concerned with the phonetic level are in most cases focusing on segments as well. However, more than half of the world’s languages include lexical tones (Yip, 2002). Despite this, tones are still underexplored in quantitative language comparison, partly due to the low accessibility of the suitable data. This paper aims to introduce a newly digitised dataset which comes from the Yue- and Pinghua-speaking areas in Southern China, with over 100 dialects. This dataset consists of two parts: tones and segments. In this paper, we illustrate how we can computationaly model tones in order to explore linguistic variation. We have applied a tone distance metric on our data, and we have found that 1) dialects also form a continuum on the tonal level and 2) other than tonemic (inventory) and tonetic differences, dialects can also differ in the lexical distribution of tones. The availability of this dataset will hopefully enable further exploration of the role of tones in quantitative typology and NLP research.

Anthology ID:: 2024.sigtyp-1.3
Volume:: Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:: March
Year:: 2024
Address:: St. Julian's, Malta
Editors:: Michael Hahn, Alexey Sorokin, Ritesh Kumar, Andreas Shcherbakov, Yulia Otmakhova, Jinrui Yang, Oleg Serikov, Priya Rani, Edoardo M. Ponti, Saliha Muradoğlu, Rena Gao, Ryan Cotterell, Ekaterina Vylomova
Venues:: SIGTYP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25–36
Language:
URL:: https://aclanthology.org/2024.sigtyp-1.3/
DOI:
Bibkey:
Cite (ACL):: Ho Wang Matthew Sung, Jelena Prokic, and Yiya Chen. 2024. A New Dataset for Tonal and Segmental Dialectometry from the Yue- and Pinghua-Speaking Area. In Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 25–36, St. Julian's, Malta. Association for Computational Linguistics.
Cite (Informal):: A New Dataset for Tonal and Segmental Dialectometry from the Yue- and Pinghua-Speaking Area (Sung et al., SIGTYP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.sigtyp-1.3.pdf

PDF Cite Search Fix data

A New Dataset for Tonal and Segmental Dialectometry from the Yue- and Pinghua-Speaking Area

Abstract

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.