An Ensemble Approach for the Diagnosis of COVID-19 from Speech and Cough Sounds

Fathan, Abderrahim; Alam, Jahangir; Kang, Woo Hyun

doi:10.1007/978-3-030-87802-3_18

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:

International Conference on Speech and Computer

1803 Accesses
2 Citations

Abstract

Mass COVID-19 infections detection has shown to be a very hard problem. In this work, we describe our systems developed to diagnose COVID-19 cases based on coughing sounds and speech. We propose a hybrid configuration that employs Convolution Neural Network (CNN), Time Delay Neural Network (TDNN) and Long Short-Term Memory (LSTM) for the extraction of coughing sound and speech embeddings. Moreover, the proposed framework utilizes SpecAugment-based on-the-fly data augmentation and multi-level statistics pooling for mapping frame level information into utterance level embedding. We employ classical support vector machines, random forests, AdaBoost, decision trees, and logistic regression classifiers for the final decision making, to determine whether the given feature is from a COVID-19 negative or positive patient. We also adopt an end-to-end approach employing ResNet model with a one-class softmax loss function for making positive versus negative decision over the high resolution hand-crafted features. Experiments are carried out on the two subsets, denoted as COVID-19 Speech Sounds (CSS) and COVID-19 Cough Sounds (CCS), from the Cambridge COVID-19 Sound database and experimental results are reported on the development and test sets of these subsets. Our approach outperforms the baselines provided by the challenge organizers on the development set, and shows that using speech to help remotely detect early COVID-19 infections and eventually other respiratory diseases is likely possible, which opens a new opportunity for a promising cheap and scalable pre-diagnosis way to better handle pandemics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Modern Transfer Learning-Based Preliminary Diagnosis of COVID-19 Using Forced Cough Recordings with Mel-Frequency Cepstral Coefficients

Automatic diagnosis of COVID-19 related respiratory diseases from speech

Article 29 March 2023

Analysis of COVID-19 Coughs: From the Mildest to the Most Severe Form, a Realistic Classification Using Deep Learning

Notes

References

Amiriparian, S., Freitag, M., Cummins, N., Schuller, B.: Sequence to sequence autoencoders for unsupervised representation learning from audio. Universität Augsburg (2017)
Google Scholar
Amiriparian, S., et al.: Snore sound classification using image-based deep spectrum features (2017)
Google Scholar
Amiriparian, S., Gerczuk, M., Ottl, S., Cummins, N., Pugachevskiy, S., Schuller, B.: Bag-of-Deep-Features: noise-robust deep feature representations for audio analysis. In: Proceeding of IJCNN, pp. 1–7. IEEE (2018)
Google Scholar
Brown, C., et al.: Exploring automatic diagnosis of COVID-19 from crowdsourced respiratory sound data. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3474–3484 (2020)
Google Scholar
Deshmukh, S., Ismail, M.A., Singh, R.: Interpreting glottal flow dynamics for detecting COVID-19 from voice. arXiv preprint arXiv:2010.16318 (2020)
Deshpande, G., Schuller, B.: An overview on audio, signal, speech, & language processing for COVID-19. arXiv preprint arXiv:2005.08579 (2020)
Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of ACM ICM, pp. 835–838 (2013)
Google Scholar
Freitag, M., Amiriparian, S., Pugachevskiy, S., Cummins, N., Schuller, B.: auDeep: unsupervised learning of representations from audio with deep recurrent neural networks. J. Mach. Learn. Res. 18(1), 6340–6344 (2017)
MathSciNet Google Scholar
Han, J., et al.: Exploring automatic COVID-19 diagnosis via voice and symptoms from crowdsourced data, pp. 8328–8332 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). http://arxiv.org/abs/1512.03385
Imran, A., et al.: AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app. Inf. Med. Unlocked 20, 100378 (2020)
Google Scholar
Ismail, M.A., Deshmukh, S., Singh, R.: Detection of COVID-19 through the analysis of vocal fold oscillations. arXiv preprint arXiv:2010.10707 (2020)
Khalid, H., Woo, S.S.: OC-FakeDect: classifying deepfakes using one-class variational autoencoder. In: Proceedings of IEEE/CVF CVPR Workshops, pp. 2794–2803 (2020)
Google Scholar
Laguarta, J., Hueto, F., Subirana, B.: COVID-19 artificial intelligence diagnosis using only cough recordings. IEEE Open J. Eng. Med. Biol. 1, 275–281 (2020)
Article Google Scholar
Latif, S., Rana, R., Khalifa, S., Jurdak, R., Qadir, J., Schuller, B.W.: Deep representation learning in speech processing: Challenges, recent advances, and future trends. CoRR abs/2001.00378 (2020). http://arxiv.org/abs/2001.00378
Muguli, A., et al.: Dicova challenge: dataset, task, and baseline system for COVID-19 diagnosis using acoustics (2021)
Google Scholar
Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. CoRR abs/1706.08612 (2017). http://arxiv.org/abs/1706.08612
Park, D.S., et al.: SpecAugment: a simple data augmentation method for automatic speech recognition. Proc. Interspeech 2019, 2613–2617 (2019)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Povey, D., et al.: The Kaldi speech recognition toolkit (2011)
Google Scholar
Qian, K., Schuller, B.W., Yamamoto, Y.: Recent advances in computer audition for diagnosing covid-19: An overview. arXiv preprint arXiv:2012.04650 (2020)
Quatieri, T.F., Talkar, T., Palmer, J.S.: A framework for biomarkers of COVID-19 based on coordination of speech-production subsystems. IEEE Open J. Eng. Med. Biol. 1, 203–206 (2020)
Article Google Scholar
Schmitt, M., Schuller, B.: OpenXBOW: introducing the Passau open-source crossmodal bag-of-words toolkit (2017)
Google Scholar
Schuller, B., et al.: The interspeech 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of INTERSPEECH (2013)
Google Scholar
Schuller, B.W., et al.: The interspeech 2021 computational paralinguistics challenge: COVID-19 cough, COVID-19 speech, escalation & primates. arXiv preprint arXiv:2102.13468 (2021)
Stappen, L., Rizos, G., Hasan, M., Hain, T., Schuller, B.W.: Uncertainty-aware machine support for paper reviewing on the interspeech 2019 submission corpus. In: Proc. Interspeech, pp. 1808–1812 (2020)
Google Scholar
Tang, Y., Ding, G., Huang, J., He, X., Zhou, B.: Deep speaker embedding learning with multi-level pooling for text-independent speaker verification. In: Proceedings of IEEE ICASSP, pp. 6116–6120 (2019)
Google Scholar
Weninger, F., Eyben, F., Schuller, B.W., Mortillaro, M., Scherer, K.R.: On the acoustics of emotion in audio: what speech, music, and sound have in common. Front. Psychol. 4, 292 (2013)
Article Google Scholar
Zhang, Y., Jiang, F., Duan, Z.: One-class learning towards synthetic voice spoofing detection. IEEE Sig. Process. Lett. 28, 937–941 (2021)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Research Institute of Montreal, Montreal, QC, H3N 1M3, Canada
Abderrahim Fathan, Jahangir Alam & Woo Hyun Kang

Authors

Abderrahim Fathan
View author publications
You can also search for this author in PubMed Google Scholar
Jahangir Alam
View author publications
You can also search for this author in PubMed Google Scholar
Woo Hyun Kang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jahangir Alam .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fathan, A., Alam, J., Kang, W.H. (2021). An Ensemble Approach for the Diagnosis of COVID-19 from Speech and Cough Sounds. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-87802-3_18
Published: 22 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87801-6
Online ISBN: 978-3-030-87802-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Ensemble Approach for the Diagnosis of COVID-19 from Speech and Cough Sounds

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Modern Transfer Learning-Based Preliminary Diagnosis of COVID-19 Using Forced Cough Recordings with Mel-Frequency Cepstral Coefficients

Automatic diagnosis of COVID-19 related respiratory diseases from speech

Analysis of COVID-19 Coughs: From the Mildest to the Most Severe Form, a Realistic Classification Using Deep Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

An Ensemble Approach for the Diagnosis of COVID-19 from Speech and Cough Sounds

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Modern Transfer Learning-Based Preliminary Diagnosis of COVID-19 Using Forced Cough Recordings with Mel-Frequency Cepstral Coefficients

Automatic diagnosis of COVID-19 related respiratory diseases from speech

Analysis of COVID-19 Coughs: From the Mildest to the Most Severe Form, a Realistic Classification Using Deep Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.