Non-uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes

Motlicek, Petr; Hermansky, Hynek; Ganapathy, Sriram; Garudadri, Harinath

doi:10.1007/978-3-540-74628-7_46

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4629))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

1799 Accesses

Abstract

We describe novel speech/audio coding technique designed to operate at medium bit-rates. Unlike classical state-of-the-art coders that are based on short-term spectra, our approach uses relatively long temporal segments of audio signal in critical-band-sized sub-bands. We apply auto-regressive model to approximate Hilbert envelopes in frequency sub-bands. Residual signals (Hilbert carriers) are demodulated and thresholding functions are applied in spectral domain. The Hilbert envelopes and carriers are quantized and transmitted to the decoder. Our experiments focused on designing speech/audio coder to provide broadcast radio-like quality audio around 15 − 25kbps. Obtained objective quality measures, carried out on standard speech recordings, were compared to the state-of-the-art 3GPP-AMR speech coding system.

This work was partially supported by grants from ICSI Berkeley, USA; the Swiss National Center of Competence in Research (NCCR) on “Inter active Multi-modal Information Management (IM)2”; managed by the IDIAP Research Institute on behalf of the Swiss Federal Authorities, and by the European Commission 6th Framework DIRAC Integrated Project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Speech coding techniques and challenges: a comprehensive literature survey

Article 14 September 2023

Telephony speech system performance based on the codec effect

Article 31 May 2023

A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features

Article 04 January 2023

References

Spanias, A.S.: Speech Coding: A Tutorial Review. Proc. of IEEE 82(10) (October 1994)
Google Scholar
Makhoul, J.: Linear Prediction: A Tutorial Review. Proc. of IEEE 63(4) (April 1975)
Google Scholar
Motlicek, P., Hermansky, H., Garudadri, H., Srinivasamurthy, N.: Speech Coding Based on Spectral Dynamics. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, Springer, Heidelberg (2006)
Google Scholar
Quackenbush, S.R., Barnwell, T.P., Clements, M.A.: Objective Measures of Speech Quality. Advanced Reference Series. Prentice-Hall, Englewood Cliffs, NJ (1988)
Google Scholar
ITU-T Rec. P.862: Perceptual Evaluation of Speech Quality (PESQ), an Objective Method for End-to-end Speech Quality Assessment of Narrowband Telephone Networks and Speech Codecs, ITU, Geneva, Switzerland (2001)
Google Scholar
Herre, J., Johnston, J.H.: Enhancing the performance of perceptual audio coders by using temporal noise shaping (TNS), in 101st Conv. Aud. Eng. Soc. (1996)
Google Scholar
Athineos, M., Hermansky, H., Ellis, D.P.W.: LP-TRAP: Linear predictive temporal patterns. In: Proc. of ICSLP, Jeju, S. Korea, pp. 1154–1157 (October 2004)
Google Scholar
Schimmel, S., Atlas, L.: Coherent Envelope Detector for Modulation Filtering of Speech. In: Proc. of ICASSP, Philadelphia, USA, vol. 1, pp. 221–224 (May 2005)
Google Scholar
Fisher, W.M., et al.: The DARPA speech recognition research database: specifications and status. In: Proc. DARPA Workshop on Speech Recognition, pp. 93–99 (February 1986)
Google Scholar
Hansen, J.H.L., Pellom, B.: An Effective Quality Evaluation Protocol for Speech Enhancement Algorithms. In: Proc. of ICSLP, Sydney, Australia, vol. 7, pp. 2819–2822 (December 1998)
Google Scholar
3GPP TS 26.071: AMR speech CODEC, General description, http://www.3gpp.org/ftp/Specs/html-info/26071.htm

Download references

Author information

Authors and Affiliations

IDIAP Research Institute, Rue du Simplon 4, CH-1920, Martigny, Switzerland
Petr Motlicek, Hynek Hermansky & Sriram Ganapathy
Faculty of Information Technology, Brno University of Technology, Božetěchova 2, Brno, 612 66, Czech Republic
Petr Motlicek & Hynek Hermansky
École Polytechnique Fédérale de Lausanne (EPFL), Switzerland
Hynek Hermansky & Sriram Ganapathy
Qualcomm Inc., San Diego, California, USA
Harinath Garudadri

Authors

Petr Motlicek
View author publications
You can also search for this author in PubMed Google Scholar
Hynek Hermansky
View author publications
You can also search for this author in PubMed Google Scholar
Sriram Ganapathy
View author publications
You can also search for this author in PubMed Google Scholar
Harinath Garudadri
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Václav Matoušek Pavel Mautner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Motlicek, P., Hermansky, H., Ganapathy, S., Garudadri, H. (2007). Non-uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_46

Download citation

DOI: https://doi.org/10.1007/978-3-540-74628-7_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74627-0
Online ISBN: 978-3-540-74628-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Non-uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes

Abstract

Access this chapter

Preview

Similar content being viewed by others

Speech coding techniques and challenges: a comprehensive literature survey

Telephony speech system performance based on the codec effect

A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Non-uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes

Abstract

Access this chapter

Preview

Similar content being viewed by others

Speech coding techniques and challenges: a comprehensive literature survey

Telephony speech system performance based on the codec effect

A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.