Abstract
We describe novel speech/audio coding technique designed to operate at medium bit-rates. Unlike classical state-of-the-art coders that are based on short-term spectra, our approach uses relatively long temporal segments of audio signal in critical-band-sized sub-bands. We apply auto-regressive model to approximate Hilbert envelopes in frequency sub-bands. Residual signals (Hilbert carriers) are demodulated and thresholding functions are applied in spectral domain. The Hilbert envelopes and carriers are quantized and transmitted to the decoder. Our experiments focused on designing speech/audio coder to provide broadcast radio-like quality audio around 15 − 25kbps. Obtained objective quality measures, carried out on standard speech recordings, were compared to the state-of-the-art 3GPP-AMR speech coding system.
This work was partially supported by grants from ICSI Berkeley, USA; the Swiss National Center of Competence in Research (NCCR) on “Inter active Multi-modal Information Management (IM)2”; managed by the IDIAP Research Institute on behalf of the Swiss Federal Authorities, and by the European Commission 6th Framework DIRAC Integrated Project.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Spanias, A.S.: Speech Coding: A Tutorial Review. Proc. of IEEE 82(10) (October 1994)
Makhoul, J.: Linear Prediction: A Tutorial Review. Proc. of IEEE 63(4) (April 1975)
Motlicek, P., Hermansky, H., Garudadri, H., Srinivasamurthy, N.: Speech Coding Based on Spectral Dynamics. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, Springer, Heidelberg (2006)
Quackenbush, S.R., Barnwell, T.P., Clements, M.A.: Objective Measures of Speech Quality. Advanced Reference Series. Prentice-Hall, Englewood Cliffs, NJ (1988)
ITU-T Rec. P.862: Perceptual Evaluation of Speech Quality (PESQ), an Objective Method for End-to-end Speech Quality Assessment of Narrowband Telephone Networks and Speech Codecs, ITU, Geneva, Switzerland (2001)
Herre, J., Johnston, J.H.: Enhancing the performance of perceptual audio coders by using temporal noise shaping (TNS), in 101st Conv. Aud. Eng. Soc. (1996)
Athineos, M., Hermansky, H., Ellis, D.P.W.: LP-TRAP: Linear predictive temporal patterns. In: Proc. of ICSLP, Jeju, S. Korea, pp. 1154–1157 (October 2004)
Schimmel, S., Atlas, L.: Coherent Envelope Detector for Modulation Filtering of Speech. In: Proc. of ICASSP, Philadelphia, USA, vol. 1, pp. 221–224 (May 2005)
Fisher, W.M., et al.: The DARPA speech recognition research database: specifications and status. In: Proc. DARPA Workshop on Speech Recognition, pp. 93–99 (February 1986)
Hansen, J.H.L., Pellom, B.: An Effective Quality Evaluation Protocol for Speech Enhancement Algorithms. In: Proc. of ICSLP, Sydney, Australia, vol. 7, pp. 2819–2822 (December 1998)
3GPP TS 26.071: AMR speech CODEC, General description, http://www.3gpp.org/ftp/Specs/html-info/26071.htm
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Motlicek, P., Hermansky, H., Ganapathy, S., Garudadri, H. (2007). Non-uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_46
Download citation
DOI: https://doi.org/10.1007/978-3-540-74628-7_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74627-0
Online ISBN: 978-3-540-74628-7
eBook Packages: Computer ScienceComputer Science (R0)