Abstract
Mass COVID-19 infections detection has shown to be a very hard problem. In this work, we describe our systems developed to diagnose COVID-19 cases based on coughing sounds and speech. We propose a hybrid configuration that employs Convolution Neural Network (CNN), Time Delay Neural Network (TDNN) and Long Short-Term Memory (LSTM) for the extraction of coughing sound and speech embeddings. Moreover, the proposed framework utilizes SpecAugment-based on-the-fly data augmentation and multi-level statistics pooling for mapping frame level information into utterance level embedding. We employ classical support vector machines, random forests, AdaBoost, decision trees, and logistic regression classifiers for the final decision making, to determine whether the given feature is from a COVID-19 negative or positive patient. We also adopt an end-to-end approach employing ResNet model with a one-class softmax loss function for making positive versus negative decision over the high resolution hand-crafted features. Experiments are carried out on the two subsets, denoted as COVID-19 Speech Sounds (CSS) and COVID-19 Cough Sounds (CCS), from the Cambridge COVID-19 Sound database and experimental results are reported on the development and test sets of these subsets. Our approach outperforms the baselines provided by the challenge organizers on the development set, and shows that using speech to help remotely detect early COVID-19 infections and eventually other respiratory diseases is likely possible, which opens a new opportunity for a promising cheap and scalable pre-diagnosis way to better handle pandemics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amiriparian, S., Freitag, M., Cummins, N., Schuller, B.: Sequence to sequence autoencoders for unsupervised representation learning from audio. Universität Augsburg (2017)
Amiriparian, S., et al.: Snore sound classification using image-based deep spectrum features (2017)
Amiriparian, S., Gerczuk, M., Ottl, S., Cummins, N., Pugachevskiy, S., Schuller, B.: Bag-of-Deep-Features: noise-robust deep feature representations for audio analysis. In: Proceeding of IJCNN, pp. 1–7. IEEE (2018)
Brown, C., et al.: Exploring automatic diagnosis of COVID-19 from crowdsourced respiratory sound data. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3474–3484 (2020)
Deshmukh, S., Ismail, M.A., Singh, R.: Interpreting glottal flow dynamics for detecting COVID-19 from voice. arXiv preprint arXiv:2010.16318 (2020)
Deshpande, G., Schuller, B.: An overview on audio, signal, speech, & language processing for COVID-19. arXiv preprint arXiv:2005.08579 (2020)
Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of ACM ICM, pp. 835–838 (2013)
Freitag, M., Amiriparian, S., Pugachevskiy, S., Cummins, N., Schuller, B.: auDeep: unsupervised learning of representations from audio with deep recurrent neural networks. J. Mach. Learn. Res. 18(1), 6340–6344 (2017)
Han, J., et al.: Exploring automatic COVID-19 diagnosis via voice and symptoms from crowdsourced data, pp. 8328–8332 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). http://arxiv.org/abs/1512.03385
Imran, A., et al.: AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app. Inf. Med. Unlocked 20, 100378 (2020)
Ismail, M.A., Deshmukh, S., Singh, R.: Detection of COVID-19 through the analysis of vocal fold oscillations. arXiv preprint arXiv:2010.10707 (2020)
Khalid, H., Woo, S.S.: OC-FakeDect: classifying deepfakes using one-class variational autoencoder. In: Proceedings of IEEE/CVF CVPR Workshops, pp. 2794–2803 (2020)
Laguarta, J., Hueto, F., Subirana, B.: COVID-19 artificial intelligence diagnosis using only cough recordings. IEEE Open J. Eng. Med. Biol. 1, 275–281 (2020)
Latif, S., Rana, R., Khalifa, S., Jurdak, R., Qadir, J., Schuller, B.W.: Deep representation learning in speech processing: Challenges, recent advances, and future trends. CoRR abs/2001.00378 (2020). http://arxiv.org/abs/2001.00378
Muguli, A., et al.: Dicova challenge: dataset, task, and baseline system for COVID-19 diagnosis using acoustics (2021)
Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. CoRR abs/1706.08612 (2017). http://arxiv.org/abs/1706.08612
Park, D.S., et al.: SpecAugment: a simple data augmentation method for automatic speech recognition. Proc. Interspeech 2019, 2613–2617 (2019)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Povey, D., et al.: The Kaldi speech recognition toolkit (2011)
Qian, K., Schuller, B.W., Yamamoto, Y.: Recent advances in computer audition for diagnosing covid-19: An overview. arXiv preprint arXiv:2012.04650 (2020)
Quatieri, T.F., Talkar, T., Palmer, J.S.: A framework for biomarkers of COVID-19 based on coordination of speech-production subsystems. IEEE Open J. Eng. Med. Biol. 1, 203–206 (2020)
Schmitt, M., Schuller, B.: OpenXBOW: introducing the Passau open-source crossmodal bag-of-words toolkit (2017)
Schuller, B., et al.: The interspeech 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of INTERSPEECH (2013)
Schuller, B.W., et al.: The interspeech 2021 computational paralinguistics challenge: COVID-19 cough, COVID-19 speech, escalation & primates. arXiv preprint arXiv:2102.13468 (2021)
Stappen, L., Rizos, G., Hasan, M., Hain, T., Schuller, B.W.: Uncertainty-aware machine support for paper reviewing on the interspeech 2019 submission corpus. In: Proc. Interspeech, pp. 1808–1812 (2020)
Tang, Y., Ding, G., Huang, J., He, X., Zhou, B.: Deep speaker embedding learning with multi-level pooling for text-independent speaker verification. In: Proceedings of IEEE ICASSP, pp. 6116–6120 (2019)
Weninger, F., Eyben, F., Schuller, B.W., Mortillaro, M., Scherer, K.R.: On the acoustics of emotion in audio: what speech, music, and sound have in common. Front. Psychol. 4, 292 (2013)
Zhang, Y., Jiang, F., Duan, Z.: One-class learning towards synthetic voice spoofing detection. IEEE Sig. Process. Lett. 28, 937–941 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Fathan, A., Alam, J., Kang, W.H. (2021). An Ensemble Approach for the Diagnosis of COVID-19 from Speech and Cough Sounds. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-87802-3_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87801-6
Online ISBN: 978-3-030-87802-3
eBook Packages: Computer ScienceComputer Science (R0)