STRAIGHT-Based Emotion Conversion Using Quadratic Multivariate Polynomial

Singh, Jang Bahadur; Lehana, Parveen

doi:10.1007/s00034-017-0660-0

STRAIGHT-Based Emotion Conversion Using Quadratic Multivariate Polynomial

Short Paper
Published: 18 September 2017

Volume 37, pages 2179–2193, (2018)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Speech is the natural mode of communication and the easiest way of expressing human emotions. Emotional speech is expressed in terms of features like f0 contour, intensity, speaking rate, and voice quality. The group of these features is called prosody. Generally, prosody is modified by pitch and time scaling. Emotional speech conversion is more sensitive to prosody unlike voice conversion, where spectral conversion is the main concern. Several techniques, linear as well as nonlinear, have been used for transforming the speech. Our hypothesis is that quality of emotional speech conversion can be improved by estimating nonlinear relationship between the neutral and emotional speech feature vectors. In this research work, quadratic multivariate polynomial (QMP) has been explored for transforming neutral speech to emotional target speech. Both subjective and objective analyses were carried out to evaluate the transformed emotional speech using comparison mean opinion scores (CMOS), mean opinion scores (MOS), identification rate, root-mean-square error, and Mahalanobis distance. For Toronto emotional database, except for neutral/sad conversion, the CMOS analysis indicates that the transformed speech can partly be perceived as target emotion. Moreover, the MOS and spectrogram indicate good quality of transformed speech. For German database except for neutral/boredom conversion, the CMOS value of proposed technique has better score than gross and initial–middle–final methods but less than syllable method. However, QMP technique is simple, is easy to implement, has better quality of transformed speech, and estimates transformation function using limited number of utterances of training set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transformation of Emotion by Modifying Prosody and Spectral Energy Using Discrete Wavelet Transform

Article 01 November 2023

Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech

Article 27 October 2016

English speech emotion recognition method based on speech recognition

Article 08 February 2022

References

M. Abe, S. Nakamura, K. Shikano, H. Kuwabara, Voice conversion through vector quantization. J. Acoust. Soc. Jpn. (E) 11(2), 71–76 (1990)
Article Google Scholar
Y. Adachi, S. Kawamoto, S. Morishima, S. Nakamura, Perceptual similarity measurement of speech by combination of acoustic features, in Proceedings IEEE International Conference Acoustics, Speech and Signal Processing, (2008), pp. 4861–4864
R. Aihara, R. Takashima, T. Takiguchi, Y. Ariki, GMM-based emotional voice conversion using spectrum and prosody features. Am. J. Signal Process. 2, 134–138 (2012)
Article Google Scholar
R. Aihara, R. Ueda, T. Takiguchi, Y. Ariki, Exemplar-based emotional voice conversion using non-negative matrix factorization, in Proceedings IEEE Asia-Pacific Signal and Information Processing Association, (2014), pp. 1-7
M. Bulut, et al., Investigating the role of phoneme-level modifications in emotional speech resynthesis, in Proceedings INTERSPEECH, (2005), pp. 801–804
F. Burkhardt, W.F. Sendlmeier, Verification of acoustical correlates of emotional speech using formant synthesis, in Tutorial and Research Workshop on Speech and Emotion, (2000), pp. 151–156
F. Burkhardt, N. Campbell, Emotional speech synthesis, in Oxford Handbook of Affective Computing, ed. By R.A. Calvo, S.K. D’Mello, J. Gratch, A. Kappas (Oxford University Press, 2014), p. 286
F. Burkhardt, A. Paeschke, M. Rolfes, W.F. Sendlmeier, B. Weiss, A database of German emotional speech, in Proceedings INTERSPEECH, (2005), pp. 1517–1520
L. Cen, P. Chan, M. Dong, H. Li, Generating emotional speech from neutral speech, in Proceedings 7th International Symposium on Chinese Spoken Language Processing, (2010), pp. 383–386
R.R. Chang, X.Q. Yu, Y.Y. Yuan, W.G. Wan, Emotional analysis and synthesis of human voice based on STRAIGHT. Appl. Mech. Mater. 536, 105–110 (2014)
Google Scholar
Y. Chen, M. Chu, E. Chang, J. Liu, R. Liu, Voice conversion with smoothed GMM and MAP adaptation, in Eurospeech, (2003), pp. 2413–2416
R. Cowie et al., Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18, 32–80 (2001)
Article Google Scholar
E.A. Cudney et al., An evaluation of Mahalanobis–Taguchi system and neural network for multivariate pattern recognition. J. Ind. Syst. Eng. 1, 139–150 (2007)
Google Scholar
S. Desai, A.W. Black, B. Yegnanarayana, K. Prahallad, Spectral mapping using artificial neural networks for voice conversion. IEEE Trans. Audio Speech Lang. Process. 18, 954–964 (2010)
Article Google Scholar
K. Dupuis, M.K. Pichora-Fuller, Toronto Emotional Speech Set (TESS) (Psychology Department, University of Toronto, Toronto, 2010)
Google Scholar
T. En-Najjary, O. Rosec, T. Chonavel, A voice conversion method based on joint pitch and spectral envelope transformation, in Proceedings INTERSPEECH (2004)
D. Erro, A. Moreno, A. Bonafonte, Voice conversion based on weighted frequency warping. IEEE Trans. Audio Speech Lang. Process. 18, 922–931 (2010)
Article Google Scholar
H. Fujisaki, Information, prosody, and modeling with emphasis on tonal features of speech, in Speech Prosody, (2004), pp. 1–10
K.I. Funahashi, On the approximate realization of continuous mappings by neural networks. Neural Netw. 2, 183–192 (1989)
Article Google Scholar
D. Govind, S.R.M. Prasanna, B. Yegnanarayana, Neutral to target emotion conversion using source and suprasegmental information, in Proceedings INTERSPEECH, (2011), pp. 2969–2972
R.C. Guido et al., A neural-wavelet architecture for voice conversion. Neurocomputing 71, 174–180 (2007)
Article Google Scholar
A. Haque, K. S. Rao, Analysis and modification of spectral energy for neutral to sad emotion conversion, in Proceedings IEEE 8th International Contemporary Computing, (2015), pp. 263–268
A. Haque, K.S. Rao, Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech, in International Journal of Speech Technology, (2016), pp. 1–11
E. Helander, T. Virtanen, J. Nurminen, M. Gabbouj, Voice conversion using partial least squares regression. IEEE Trans. Audio Speech Lang. Process. 18, 912–921 (2010)
Article Google Scholar
W.J. Holmes, J.N. Holmes, M.W. Judd, Extension of the bandwidth of the JSRU parallel-formant synthesizer for high quality synthesis of male and female speech, in Proceedings IEEE International Conference Acoustics, Speech, and Signal Processing, (1990), pp. 313–316
A. Iida, N. Campbell, S. Iga, F. Higuchi, M. Yasumura, A speech synthesis system with emotion for assisting communication, in Tutorial and Research Workshop on Speech and Emotion, (2000), pp. 167–172
T. Irino, Y. Minami, T. Nakatani, M. Tsuzaki, H. Tagawa, Evaluation of a speech recognition/generation method based on HMM and STRAIGHT, in Proceedings INTERSPEECH (2002)
H. Kawahara, M. Morise, Technical foundations of TANDEM-STRAIGHT, a speech analysis, modification and synthesis framework. Sadhana Acad. Proc. Eng. Sci. 36, 713–727 (2011)
Google Scholar
H. Kawahara, I. Masuda-Katsuse, A. De Cheveigne, Restructuring speech representations using a pitch adaptive time frequency smoothing and an instantaneous frequency based f0 extraction: possible role of repetitive structure in sounds. Speech Commun. 27, 187–207 (1999)
Article Google Scholar
R. Lawrence, Fundamentals of Speech Recognition (Pearson Education India, Delhi, 2008)
Google Scholar
P.K. Lehana, P.C. Pandey, Transformation of short-term spectral envelope of speech signal using multivariate polynomial modeling, in Proceedings National Conference on Communications, NCC, (2011)
P.K. Lehana, Spectral mapping using multivariate polynomial modeling for voice conversion, Ph.D. Thesis, Department of Electrical Engineering, IIT Bombay, India (2013)
Z.H. Ling, L. Deng, D. Yu, Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis. IEEE Trans. Audio Speech Lang. Process. 21, 2129–2139 (2013)
Article Google Scholar
K. Liu, J. Zhang, Y. Yan, High quality voice conversion through phoneme-based linear mapping functions with STRAIGHT for mandarin, in Proceedings 4th International Conference on Fuzzy Systems and Knowledge Discovery, (2007), pp. 410–414
Z. Luo, J. Chen, T. Nakashika, T. Takiguchi, Y. Ariki, Emotional voice conversion using neural networks with different temporal scales of f0 based on wavelet transform, in Proceedings 9th ISCA Speech Synthesis Workshop (2016), pp. 140–145
Z. Luo, T. Takiguchi, Y. Ariki, Emotional voice conversion using deep neural networks with MCC and F0 features, in Proceedings IEEE 15th International Conference Computer and Information Science, (2016), pp. 1–5
P.C. Mahalanobis, On the generalized distance in statistics, in Proceedings of the National Institute of Sciences of India, (1936), pp. 49–55
T. Masuko, K. Tokuda, T. Kobayashi, S. Imai, Voice characteristics conversion for HMM-based speech synthesis system. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 3, 1611–1614 (1997)
Google Scholar
A. Mouchtaris, S.S. Narayanan, C. Kyriakakis, Multichannel audio synthesis by subband-based spectral conversion and parameter adaptation. IEEE Trans. Speech Audio Process. 13, 263–274 (2005)
Article Google Scholar
T. Nakashika, R. Takashima, T. Takiguchi, Y. Ariki, Voice conversion in high-order Eigen space using deep belief nets, in Proceedings INTERSPEECH, (2013), pp. 369–372
J. Nirmal, M. Zaveri, S. Patnaik, P. Kachare, Voice conversion using general regression neural network. Appl. Soft Comput. 24, 1–12 (2014)
Article Google Scholar
H.K. Palo, M.N. Mohanty, M. Chandra, Efficient feature combination techniques for emotional speech classification. Int. J. Speech Technol. 19, 135–150 (2016)
Article Google Scholar
B.S. Pathak, M. Sayankar, A. Panat, Emotion transformation from neutral to 3 emotions of speech signal using DWT and adaptive filtering techniques, in Proceedings IEEE 11th India Conference: Emerging Trends and Innovation in Technology, (2014)
K.R. Scherer, Vocal communication of emotion: a review of research paradigms. Speech Commun. 40, 227–256 (2003)
Article MATH Google Scholar
M. Schröder, Emotional speech synthesis: a review, in Proceedings INTERSPEECH, (2001), pp. 561–564
J.B. Singh, R. Khanna, P. Lehana, Effect of MFCC based features for speech signal alignments, in Proceedings International Journal on Natural Language Computing, vol. 2 (2013)
Y. Stylianou, O. Cappé, E. Moulines, Continuous probabilistic transform for voice conversion. IEEE Trans. Speech Audio Process. 6, 131–142 (1998)
Article Google Scholar
D. Sundermann, A. Bonafonte, H. Ney, A study on residual prediction techniques for voice conversion. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 1, 1–13 (2005)
Google Scholar
T. Toda, H. Saruwatari, K. Shikano, Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 2, 841–844 (2001)
Google Scholar
K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, Speech parameter generation algorithms for HMM-based speech synthesis. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 3, 1315–1318 (2000)
Google Scholar
O.Türk, M. Schröder, A comparison of voice conversion methods for transforming voice quality in emotional speech synthesis, in Proceedings INTERSPEECH, (2008), pp. 2282–2285
O.Türk, L.M. Arslan, Voice conversion methods for vocal tract and pitch contour modification, in Proceedings INTERSPEECH (2003)
O.Türk, Cross-lingual voice conversion. Ph.D. dissertation, Bogaziçi University, (2007)
H. Valbret, E. Moulines, J.P. Tubach, Voice transformation using PSOLA technique. Speech Commun. 11, 175–187 (1992)
Article Google Scholar
C. Veaux, X. Rodet, Intonation conversion from neutral to expressive speech, in Proceedings INTERSPEECH (2011), pp. 2765–2768
F. Villavicencio, A. Röbel, X. Rodet, Extending efficient spectral envelope modeling to Mel-frequency based representation, in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, (2008), pp. 1625–1628
Z. Wu, Spectral mapping for voice conversion, Ph.D. dissertation, St. School of Computer Engineering, Nanyang Technological University, (2015)
J. Yadav, K.S. Rao, Prosodic mapping using neural networks for emotion conversion in Hindi Language. Circuits Systems Signal Process. 35, 139–162 (2016)
Article MathSciNet Google Scholar
H. Zen, K. Tokuda, A.W. Black, Statistical parametric speech synthesis. Speech Commun. 51, 1039–1064 (2009)
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Prof. Hideki Kawahara, Wakayama University, for his assistance for STRAIGHT.

Author information

Authors and Affiliations

D.S.P. Lab, Department of Electronics, University of Jammu, Jammu, J & K, 180006, India
Jang Bahadur Singh & Parveen Lehana

Authors

Jang Bahadur Singh
View author publications
You can also search for this author in PubMed Google Scholar
Parveen Lehana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jang Bahadur Singh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh, J.B., Lehana, P. STRAIGHT-Based Emotion Conversion Using Quadratic Multivariate Polynomial. Circuits Syst Signal Process 37, 2179–2193 (2018). https://doi.org/10.1007/s00034-017-0660-0

Download citation

Received: 06 September 2016
Revised: 31 August 2017
Accepted: 02 September 2017
Published: 18 September 2017
Issue Date: May 2018
DOI: https://doi.org/10.1007/s00034-017-0660-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

STRAIGHT-Based Emotion Conversion Using Quadratic Multivariate Polynomial

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Transformation of Emotion by Modifying Prosody and Spectral Energy Using Discrete Wavelet Transform

Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech

English speech emotion recognition method based on speech recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

STRAIGHT-Based Emotion Conversion Using Quadratic Multivariate Polynomial

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Transformation of Emotion by Modifying Prosody and Spectral Energy Using Discrete Wavelet Transform

Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech

English speech emotion recognition method based on speech recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.