CoANZSE Audio: Creation of an Online Corpus for Linguistic and Phonetic Analysis of Australian and New Zealand Englishes

Steven Coats

CoANZSE Audio: Creation of an Online Corpus for Linguistic and Phonetic Analysis of Australian and New Zealand Englishes

Abstract

CoANZSE Audio is a searchable online version of the Corpus of Australian and New Zealand Spoken English, a 195-million-word collection of geo-located YouTube transcripts of local government channels. In addition to the part-of-speech-tagged and lemmatized transcript data, CoANZSE Audio provides access to almost all of the underlying audio, as well as to forced alignments of the audio with transcript content, in Praat’s TextGrid format. This paper describes the methods used to create the corpus from open-source tools and the architecture of the CoANZSE Audio website. Two possible linguistic analyses based on CoANZSE Audio data are described: use of double modals, a rare syntactic feature, and raising of the mid front vowel /ɛ/ in New Zealand English. CoANZSE Audio can be considered to be among the first large, free, fully searchable online corpora containing data suitable for acoustic phonetic analyses in addition to lexical, grammatical, and discourse properties of Australian and New Zealand Englishes.

Anthology ID:: 2024.lrec-main.302
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 3407–3412
Language:
URL:: https://aclanthology.org/2024.lrec-main.302/
DOI:
Bibkey:
Cite (ACL):: Steven Coats. 2024. CoANZSE Audio: Creation of an Online Corpus for Linguistic and Phonetic Analysis of Australian and New Zealand Englishes. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3407–3412, Torino, Italia. ELRA and ICCL.
Cite (Informal):: CoANZSE Audio: Creation of an Online Corpus for Linguistic and Phonetic Analysis of Australian and New Zealand Englishes (Coats, LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.302.pdf

PDF Cite Search Fix data

CoANZSE Audio: Creation of an Online Corpus for Linguistic and Phonetic Analysis of Australian and New Zealand Englishes

Abstract

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.