Noise Robust Automatic Scoring Based on Deep Neural Network Acoustic Models with Lattice-Free MMI and Factorized Adaptation

Luo, Dean; Xia, Linzhong; Guan, Mingxiang

doi:10.1007/s11036-021-01878-3

Noise Robust Automatic Scoring Based on Deep Neural Network Acoustic Models with Lattice-Free MMI and Factorized Adaptation

Published: 29 December 2021

Volume 27, pages 1604–1611, (2022)
Cite this article

Mobile Networks and Applications Aims and scope Submit manuscript

206 Accesses
Explore all metrics

Abstract

Automatic scoring based on Automatic Speech Recognition (ASR) has been widely used in L2 (second language) speaking tests. In this paper, novel noise robust automatic scoring methods for L2 speaking tests based on Deep Neural Network (DNN) models with lattice-free Maximum Mutual Information (MMI) and factorized adaptation were proposed. Noise robust Goodness of Pronunciation (GOP) algorithms using lattice free MMI were implemented to improve the reliability of automatic scoring for L2 speaking tests through better utilizing sequential training power of lattice free MMI models. Factorized adaptation for DNN acoustic models was introduced to further improve performances of the proposed GOP scores in real speaking test environments by categorizing factors that cause mismatches between acoustic models and test data. Experimental results show that the proposed methods are noise robust and outperform conventional methods in assessment for speaking tests in real classroom environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Scoring of L2 English Speech Based on DNN Acoustic Models with Lattice-Free MMI

A Text-Independent Method for Estimating Pronunciation Quality of Chinese Students

Adaptation Approaches for Pronunciation Scoring with Sparse Training Data

References

Cheng J (2011) Automatic assessment of prosody in high-stakes english tests. In: Twelfth annual conference of the international speech communication association
Luo D, Gu W, Luo R, Wang L (2016) Investigation of the effects of automatic scoring technology on human raters’ performances in L2 speech proficiency assessment. In: 10th international symposium on chinese spoken language processing, ISCSLP 2016, October 17-20, 2016. IEEE, Tianjin, China, pp 1–5
Kanters S, Cucchiarini C, Strik H (2009) The goodness of pronunciation algorithm: a detailed performance study
Sudhakara S, Ramanathi MK, Yarra C, Ghosh PK (2019) An improved goodness of pronunciation (gop) measure for pronunciation evaluation with dnn-hmm system considering hmm transition probabilities. In: INTERSPEECH, pp 954–958
Zheng J, Huang C, Chu M, Soong FK, Ye W-P (2007) Generalized segment posterior probability for automatic mandarin pronunciation evaluation. In: 2007 IEEE international conference on acoustics, speech and signal processing-ICASSP’07, vol 4, IEEE, pp IV–201
van Doremalen JJHC, Cucchiarini C, Strik H (2010) Using non-native error patterns to improve pronunciation verification
Luo D, Shimomura N, Minematsu N, Yamauchi Y, Hirose K (2008) Automatic pronunciation evaluation of language learners’ utterances generated through shadowing. In: Ninth annual conference of the international speech communication association
Hu W, Qian Y, Soong FK (2013) A new dnn-based high quality pronunciation evaluation for computer-aided language learning (call). In: Interspeech, pp 1886–1890
Li K, Qian X, Meng H (2016) Mispronunciation detection and diagnosis in l2 english speech using multidistribution deep neural networks. IEEE/ACM Transactions on Audio Speech and Language Processing 25(1):193–207
Article Google Scholar
Luo D, Qiao Y, Minematsu N, Yamauchi Y, Hirose K (2009) Analysis and utilization of mllr speaker adaptation technique for learners’ pronunciation evaluation. In: Tenth annual conference of the international speech communication association
Luo D, Qiao Y, Minematsu N, Yamauchi Y, Hirose K (2010) Regularized-mllr speaker adaptation for computer-assisted language learning system. In: Eleventh annual conference of the international speech communication association
Luo D, Guan M, Xia L (2020) Automatic scoring of l2 english speech based on dnn acoustic models with lattice-free mmi. In: International conference on machine learning and intelligent communications. Springer, pp 113–122
Witt SM, Young SJ (2000) Phone-level pronunciation scoring and assessment for interactive language learning. Speech Communication 30(2–3):95–108
Article Google Scholar
Li L, Zhao Y, Jiang D, Zhang Y, Wang F, Gonzalez I, Valentin E, Sahli H (2013) Hybrid deep neural network–hidden markov model (dnn-hmm) based speech emotion recognition. In: 2013 humaine association conference on affective computing and intelligent interaction. IEEE, pp 312–317
Ravanelli M, Omologo M (2017) Contaminated speech training methods for robust dnn-hmm distant speech recognition. arXiv:1710.03538
Bahl L, Brown P, De Souza P, Mercer R (1986) Maximum mutual information estimation of hidden markov model parameters for speech recognition. In: ICASSP’86. IEEE international conference on acoustics, speech, and signal processing, vol 11. IEEE, pp 49–52
Povey D, Peddinti V, Galvez D, Ghahremani P, Manohar V, Na X, Wang Y, Khudanpur S (2016) Purely sequence-trained neural networks for asr based on lattice-free mmi. In: Interspeech, pp 2751–2755
Yu S-Z, Kobayashi H (2003) An efficient forward-backward algorithm for an explicit-duration hidden markov model. IEEE Signal Processing Letters 10(1):11–14
Article Google Scholar
Fainberg J, Renals S, Bell P (2017) Factorised representations for neural network adaptation to diverse acoustic environments. In: INTERSPEECH, pp 749–753
Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlicek P, Qian Y, Schwarz P, et al (2011) The kaldi speech recognition toolkit. In: IEEE 2011 workshop on automatic speech recognition and understanding, number CONF. IEEE Signal Processing Society
Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an asr corpus based on public domain audio books. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5206–5210
Liu X, Zhang X (2019) Noma-based resource allocation for cluster-based cognitive industrial internet of things. IEEE Transactions on Industrial Informatics 16(8):5379–5388
Article Google Scholar
Liu X, Zhai XB, Lu W, Wu C (2019) Qos-guarantee resource allocation for multibeam satellite industrial internet of things with noma. IEEE Transactions on Industrial Informatics 17(3):2052–2061
Article Google Scholar
Liu X, Zhang X (2018) Rate and energy efficiency improvements for 5g-based iot with simultaneous transfer. IEEE Internet of Things Journal 6(4):5971–5980
Article Google Scholar
Liu X, Zhang X, Jia M, Fan L, Lu W, Zhai X (2018) 5g-based green broadband communication system design with simultaneous wireless information and power transfer. Physical Communication 28:130–137
Article Google Scholar
Li F, Lam K-Y, Liu X, Wang J, Zhao K, Wang L (2017) Joint pricing and power allocation for multibeam satellite systems with dynamic game model. IEEE Transactions on Vehicular Technology 67(3):2398–2408
Article Google Scholar

Download references

Acknowledgements

This work is supported by Department of Education of Guangdong Province (Number: 2020KTSCX301).

Author information

Authors and Affiliations

School of Information and Communications Technology, Shenzhen Institute of Information Technology, Shenzhen, 518000, China
Dean Luo & Linzhong Xia
School of Electronic Communication Technology, Shenzhen Institute of Information Technology, Shenzhen, 518000, China
Mingxiang Guan

Authors

Dean Luo
View author publications
You can also search for this author in PubMed Google Scholar
Linzhong Xia
View author publications
You can also search for this author in PubMed Google Scholar
Mingxiang Guan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dean Luo.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luo, D., Xia, L. & Guan, M. Noise Robust Automatic Scoring Based on Deep Neural Network Acoustic Models with Lattice-Free MMI and Factorized Adaptation. Mobile Netw Appl 27, 1604–1611 (2022). https://doi.org/10.1007/s11036-021-01878-3

Download citation

Accepted: 03 September 2021
Published: 29 December 2021
Issue Date: August 2022
DOI: https://doi.org/10.1007/s11036-021-01878-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Noise Robust Automatic Scoring Based on Deep Neural Network Acoustic Models with Lattice-Free MMI and Factorized Adaptation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic Scoring of L2 English Speech Based on DNN Acoustic Models with Lattice-Free MMI

A Text-Independent Method for Estimating Pronunciation Quality of Chinese Students

Adaptation Approaches for Pronunciation Scoring with Sparse Training Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Noise Robust Automatic Scoring Based on Deep Neural Network Acoustic Models with Lattice-Free MMI and Factorized Adaptation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic Scoring of L2 English Speech Based on DNN Acoustic Models with Lattice-Free MMI

A Text-Independent Method for Estimating Pronunciation Quality of Chinese Students

Adaptation Approaches for Pronunciation Scoring with Sparse Training Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.