Abstract
Automatic scoring based on Automatic Speech Recognition (ASR) has been widely used in L2 (second language) speaking tests. In this paper, novel noise robust automatic scoring methods for L2 speaking tests based on Deep Neural Network (DNN) models with lattice-free Maximum Mutual Information (MMI) and factorized adaptation were proposed. Noise robust Goodness of Pronunciation (GOP) algorithms using lattice free MMI were implemented to improve the reliability of automatic scoring for L2 speaking tests through better utilizing sequential training power of lattice free MMI models. Factorized adaptation for DNN acoustic models was introduced to further improve performances of the proposed GOP scores in real speaking test environments by categorizing factors that cause mismatches between acoustic models and test data. Experimental results show that the proposed methods are noise robust and outperform conventional methods in assessment for speaking tests in real classroom environments.

Similar content being viewed by others
References
Cheng J (2011) Automatic assessment of prosody in high-stakes english tests. In: Twelfth annual conference of the international speech communication association
Luo D, Gu W, Luo R, Wang L (2016) Investigation of the effects of automatic scoring technology on human raters’ performances in L2 speech proficiency assessment. In: 10th international symposium on chinese spoken language processing, ISCSLP 2016, October 17-20, 2016. IEEE, Tianjin, China, pp 1–5
Kanters S, Cucchiarini C, Strik H (2009) The goodness of pronunciation algorithm: a detailed performance study
Sudhakara S, Ramanathi MK, Yarra C, Ghosh PK (2019) An improved goodness of pronunciation (gop) measure for pronunciation evaluation with dnn-hmm system considering hmm transition probabilities. In: INTERSPEECH, pp 954–958
Zheng J, Huang C, Chu M, Soong FK, Ye W-P (2007) Generalized segment posterior probability for automatic mandarin pronunciation evaluation. In: 2007 IEEE international conference on acoustics, speech and signal processing-ICASSP’07, vol 4, IEEE, pp IV–201
van Doremalen JJHC, Cucchiarini C, Strik H (2010) Using non-native error patterns to improve pronunciation verification
Luo D, Shimomura N, Minematsu N, Yamauchi Y, Hirose K (2008) Automatic pronunciation evaluation of language learners’ utterances generated through shadowing. In: Ninth annual conference of the international speech communication association
Hu W, Qian Y, Soong FK (2013) A new dnn-based high quality pronunciation evaluation for computer-aided language learning (call). In: Interspeech, pp 1886–1890
Li K, Qian X, Meng H (2016) Mispronunciation detection and diagnosis in l2 english speech using multidistribution deep neural networks. IEEE/ACM Transactions on Audio Speech and Language Processing 25(1):193–207
Luo D, Qiao Y, Minematsu N, Yamauchi Y, Hirose K (2009) Analysis and utilization of mllr speaker adaptation technique for learners’ pronunciation evaluation. In: Tenth annual conference of the international speech communication association
Luo D, Qiao Y, Minematsu N, Yamauchi Y, Hirose K (2010) Regularized-mllr speaker adaptation for computer-assisted language learning system. In: Eleventh annual conference of the international speech communication association
Luo D, Guan M, Xia L (2020) Automatic scoring of l2 english speech based on dnn acoustic models with lattice-free mmi. In: International conference on machine learning and intelligent communications. Springer, pp 113–122
Witt SM, Young SJ (2000) Phone-level pronunciation scoring and assessment for interactive language learning. Speech Communication 30(2–3):95–108
Li L, Zhao Y, Jiang D, Zhang Y, Wang F, Gonzalez I, Valentin E, Sahli H (2013) Hybrid deep neural network–hidden markov model (dnn-hmm) based speech emotion recognition. In: 2013 humaine association conference on affective computing and intelligent interaction. IEEE, pp 312–317
Ravanelli M, Omologo M (2017) Contaminated speech training methods for robust dnn-hmm distant speech recognition. arXiv:1710.03538
Bahl L, Brown P, De Souza P, Mercer R (1986) Maximum mutual information estimation of hidden markov model parameters for speech recognition. In: ICASSP’86. IEEE international conference on acoustics, speech, and signal processing, vol 11. IEEE, pp 49–52
Povey D, Peddinti V, Galvez D, Ghahremani P, Manohar V, Na X, Wang Y, Khudanpur S (2016) Purely sequence-trained neural networks for asr based on lattice-free mmi. In: Interspeech, pp 2751–2755
Yu S-Z, Kobayashi H (2003) An efficient forward-backward algorithm for an explicit-duration hidden markov model. IEEE Signal Processing Letters 10(1):11–14
Fainberg J, Renals S, Bell P (2017) Factorised representations for neural network adaptation to diverse acoustic environments. In: INTERSPEECH, pp 749–753
Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlicek P, Qian Y, Schwarz P, et al (2011) The kaldi speech recognition toolkit. In: IEEE 2011 workshop on automatic speech recognition and understanding, number CONF. IEEE Signal Processing Society
Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an asr corpus based on public domain audio books. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5206–5210
Liu X, Zhang X (2019) Noma-based resource allocation for cluster-based cognitive industrial internet of things. IEEE Transactions on Industrial Informatics 16(8):5379–5388
Liu X, Zhai XB, Lu W, Wu C (2019) Qos-guarantee resource allocation for multibeam satellite industrial internet of things with noma. IEEE Transactions on Industrial Informatics 17(3):2052–2061
Liu X, Zhang X (2018) Rate and energy efficiency improvements for 5g-based iot with simultaneous transfer. IEEE Internet of Things Journal 6(4):5971–5980
Liu X, Zhang X, Jia M, Fan L, Lu W, Zhai X (2018) 5g-based green broadband communication system design with simultaneous wireless information and power transfer. Physical Communication 28:130–137
Li F, Lam K-Y, Liu X, Wang J, Zhao K, Wang L (2017) Joint pricing and power allocation for multibeam satellite systems with dynamic game model. IEEE Transactions on Vehicular Technology 67(3):2398–2408
Acknowledgements
This work is supported by Department of Education of Guangdong Province (Number: 2020KTSCX301).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Luo, D., Xia, L. & Guan, M. Noise Robust Automatic Scoring Based on Deep Neural Network Acoustic Models with Lattice-Free MMI and Factorized Adaptation. Mobile Netw Appl 27, 1604–1611 (2022). https://doi.org/10.1007/s11036-021-01878-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11036-021-01878-3