0% found this document useful (0 votes)

8 views6 pages

Scale Transform in Speech Analysis

This paper investigates the scale transform of spectral envelopes in speech analysis, focusing on the relationship between formant frequencies of different speakers for the same vowel. The authors demonstrate that scale-transform based features offer better vowel separability compared to mel-transform based features, using data from 200 utterances of four vowels. The findings suggest that the scale-cepstral coefficients provide improved performance in distinguishing vowel sounds, particularly in noisy conditions.

Uploaded by

Susanta Sarangi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views6 pages

Scale Transform in Speech Analysis

Uploaded by

Susanta Sarangi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

40 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO.

1, JANUARY 1999

Scale Transform in Speech Analysis

S. Umesh, Member, IEEE, Leon Cohen, Fellow, IEEE, Nenad Marinovic, and Douglas J. Nelson, Member, IEEE

Abstract— In this paper, we study the scale transform of of each other, i.e.,
the spectral-envelope of speech utterances by different speakers.
This study is motivated by the hypothesis that the formant (2)
frequencies between different speakers are approximately related
by a scaling constant for a given vowel. The scale transform where is the scale factor. While the uniform
has the fundamental property that the magnitude of the scale-
transform of a function X (f ) and its scaled version
p
X( f )
tube model is not the best model for the vocal-tract, it
are same. The methods presented here are useful in reducing illustrates the general features investigated within this paper,
variations in acoustic features. We show that the F-ratio tests namely that there is scaling in the frequency domain as our
indicate better separability of vowels by using scale-transform previous work has shown [8].
based features than mel-transform based features. The data The paper is organized as follows. In the next section we dis-
used in the comparison of the different features consist of 200 cuss some of the properties of scale-transform that are relevant
utterances of four vowels that are extracted from the TIMIT
data base. to this paper. In Section III we detail a method to obtain the
smoothed estimate of the formant envelope. Subsequently, the
Index Terms— Formants, scale cepstrum, speaker normaliza- discrete implementation of the scale-cepstrum is explained. In
tion, speech analysis, speech front-end.
Section V, we describe the simulations that we have performed
to compare the separability of vowels when scale-cepstrum and
I. INTRODUCTION mel-cepstral coefficients are used as features.

T HERE have been a number of acoustic features that

have been studied in speech analysis [1], [2]. In this
paper, we study features based on the scale-transform. The
II. THE SCALE TRANSFORM
Before we describe the computation of the scale-transform
scale-transform based features are motivated by the vocal-tract based features we briefly review some important properties
normalization methods. The relationship of vocal tract scaling of the scale transform [9]. The scale-transform of a function,
to cepstral coefficients and mel warping and the effects of , is given by
scale on recognition performance was suggested by Nelson [3],
[4]. This principle was applied in recognition experiments by
(3)
Kamm et al. [5]. Such normalization techniques are necessary,
since different speakers have different formant frequencies for
the same vowel. A main source for this variability among and the inverse scale transform is
different speakers is due to the differences in vocal-tract
lengths [3]–[7]. As a simple model of this notion consider a (4)
uniform tube with length . The frequency spectrum is given
by A basic property of the scale-transform is that the magnitude
of the scale transform of a function, and its normalized
scaled version, , are equal. (Note that
(1) corresponds to dilation, while corresponds to
compression.) To show this consider the scale transform of
where is the velocity of sound. Hence, if we have two , i.e.,
speakers, and , their respective spectra are scaled versions
(5)
Manuscript received April 12, 1996; revised October 5, 1998. This work
was supported in part by the Department of Defense through NSA HBCU/MI
Program. The associate editor coordinating the review of this manuscript and
Using the following substitution of variables, , we
approving it for publication was Prof. Joseph Picone. have
S. Umesh was with the City University of New York, New York, NY
10021 USA. He is now with the Department of Electrical Engineering, Indian
Institute of Technology, Kanpur 208 016, India.
L. Cohen is with Hunter College, City University of New York, New York,
NY 10021 USA (e-mail: lcchc@cunyvm.cuny.edu). (6)
N. Marinovic is with City College, City University of New York, New
York, NY 10021 USA. Hence, the magnitude of the scale transform of and its
D. Nelson is with the U.S. Department of Defense, Ft. Meade, MD 20755
USA. scaled version are the same. The scaling constant contributes
Publisher Item Identifier S 1063-6676(99)00178-9. only to the phase and does not appear in the magnitude
1063–6676/99$10.00  1999 IEEE
UMESH et al.: SPEECH ANALYSIS 41

of the scale transform. We have previously used the scale- taken. The magnitude of the scale-cepstrum is then used as a
transform to study the effect of pitch-variation and the resulting feature vector.
broadening of pitch harmonics [10], [11].
Before we describe the scale-transform based procedure, we
IV. DISCRETE IMPLEMENTATION OF SCALE-CEPSTRUM
discuss the method used to obtain a smoothed estimate of the
formant envelope. Since the sampling frequency of the TIMIT data base is 16
KHz, for computations in this paper, we assume that the signal
is bandlimited between 100–7000 Hz. The scale-cepstrum may
III. ESTIMATION OF FORMANT ENVELOPE
therefore be represented as
According to the source-filter model for speech production,
vowels are produced by the vocal tract filter driven by the pitch
excitation. In the spectral domain this therefore corresponds (8)
to the product of the spectrum of the vocal-tract filter and the
spectrum of the pitch, i.e., Using substitution of variables, , we have

(7)
(9)
where , and are the observed spectrum, the
frequency response of the vocal-tract and the spectrum of the
which is the conventional Fourier transform of
pitch excitation function, respectively.
. For digital implementation, we sample in
Vowels are almost completely described by the first three
the domain and obtain an expression which can be easily
resonances (formants) of the vocal tract frequency represen-
implemented using the fast Fourier transform (FFT), i.e.,
tation, . These resonances are affected by the length of
the pharyngeal-oral tract, the constriction along the tract, and
the narrowness of the tract.
Since we are interested only in the vocal-tract response, we
would like to remove the effects of pitch excitation. In this
paper, the following procedure proposed by Nelson [12] is
(10)
used to suppress the effects of pitch. This method is similar
to averaged periodogram techniques [13]. In the proposed
where and .
method, each frame of speech is segmented into overlapping
The phase term can be ignored,
subframes, and each subframe is Hamming windowed. For the
since it does not contribute to the magnitude of .
purposes of this paper, we have chosen the subframes to be
can be easily computed from the time-lag
96 samples long (for speech sampled at 16 KHz), and the
samples of the smoothed formant-envelope, as
overlap between the subframes is 64 samples. We estimate the
sample autocorrelation function for each subframe and average
over the available subframes. This averaged autocorrelation
estimate is then Hamming windowed and is used to compute
the scale-cepstrum described in the next section. We denote (11)
the windowed average autocorrelation estimate as .
In the proposed analysis method, pitch is effectively sup- where is the sampling period in the time-lag domain.
pressed since the duration of each subframe is less than the The magnitude of scale-cepstral coefficients, i.e.,
expected pitch-interval. For every subframe that contains an , are used as features that describe the
individual pitch-pulse there is a broadband energy contribution various vowels.
to the spectrum of that subframe but not to any other subframe.
The result is that the averaged spectrum contains all of the
formant structure but almost none of the pitch structure. V. COMPARISON OF FEATURES
The scale-cepstrum is obtained by computing the scale In this section, we will compare the separability of vow-
transform of and is denoted by . is the els classes when scale-cepstral and mel-cepstral coefficients
Fourier transform of , the windowed averaged autocorre- are used as features. We point out that when we refer to
lation estimate. In the calculation of the scale cepstrum, the scale-cepstral coefficients as features, we assume that we are
analytic spectrum is used rather than the symmetric spectrum, using the magnitude of in the feature vector. In
since the scale properties are not valid for symmetric log/mel comparing the separability afforded by the different cepstral
warped spectrum. The reason for using the logarithm opera- features, a generalized F-ratio method is used [14], [15]. In de-
tion, is that it provides a more parsimonious representation in riving the F-ratio separability, let and denote the mean
the scale-cepstral domain. Note that the logarithm operation feature vector and sample covariance matrix, respectively,
affects only the magnitude of the spectral components. There- of the th phoneme class. We assume equal probability of
fore, formant envelopes that are frequency-scaled versions of phoneme classes. Let , where denotes
each other continue to remain so even after the logarithm is the number of phoneme classes being compared. We then
42 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 1, JANUARY 1999

+++
Fig. 1. Separability between phonemes using scale (indicated by “ - - ”) and mel-cepstral coefficients (indicated by “--”) based on 200 utterances
of clean speech. The data is taken from the region seven of the TIMIT training set data and is spoken by 22 male and 13 female talkers. It is seen
that the scale-cepstrum provides better separability for most vowels. The few cases for which it is less than mel-cepstrum may be explained by recalling
that the scale-cepstrum is based on the assumption of linear-scaling while mel-cesptrum uses the warping based on psychoacoustics. It is hoped that the
use of such warping functions will further improve the scale-cepstrum.

compute the within-class and between-class scatter matrices, Simulation Results: The data consist of a total of 200
and , respectively, as utterances of each vowel from 22 male and 13 female speakers
from dialect region seven of the TIMIT training set data. /Ae/,
/iy, /ih/, and /ow/ are the four vowels that are considered
(12)
for comparison of the different cepstra. Each utterance is so
chosen that the corresponding phoneme is relatively stationary
and over at least 768 samples, and the middle 512 samples are
used in the computation of the different cepstra. Therefore,
each utterance corresponds to 32 ms of speech, since the
(13)
TIMIT data is sampled at 16 kHz. The scale-cepstral and
mel-cepstral coefficients of clean and noisy utterances are
The separability criterion is then given by computed. The noisy utterance is simulated by adding arti-
ficially generated white Gaussian noise. The signal-to-noise
(14) ratio (SNR) is defined as the ratio of energy in the utterance
UMESH et al.: SPEECH ANALYSIS 43

+++
Fig. 2. Separability between phonemes using scale (indicated by “ - - ”) and mel-cepstral coefficients (indicated by “--”) based on 200 utterances
at 18 dB SNR. The data is taken from the region seven of the TIMIT training set data and is spoken by 22 male and 13 female talkers. Note that
the scale-cepstrum provides better separability than mel-cepstrum.

to the noise energy. In our experiments, we used SNR’s of 6 decreases to zero at the center frequencies of the two adjacent
and 18 dB. filters. The vector of energies is computed by weighting the
The scale-cepstrum is computed using the following values discrete Fourier transform (DFT) coefficients by the magnitude
of the parameters: , , , and response of the filterbank. The mel-cepstral coefficients are
. Note that is interpolated by a factor then obtained by computing the inverse cosine transform of
of two. the vector of log energies.
The mel-cepstrum is implemented using the program in the In all cepstra the zeroth coefficient is not used since this is
signal processing information base [16] written by Slaney. roughly a measure of the spectral energy. Coefficients 1–20
The 40 triangular filters were used. The center frequencies are used to measure the separability between the different
of the first 13 linearly spaced filters are 66.67 Hz apart phoneme classes. We would like to remind the reader that
starting at 133.34 Hz. The center frequencies of the other the magnitude of the scale-cepstral coefficients are used as
27 filters are chosen to have a ratio of 1.071 170 3 between elements in the feature vector. We normalize the vector of
successive filters. This covers the frequency band up to 8 KHz. coefficients to unit energy. The size of feature vectors are
Each filter’s magnitude frequency response has a triangular varied from 1 to 20 coefficients and the separability measure
shape that is maximum at the center frequency and linearly is computed. Figs. 1–3 show the separability measure, , as
44 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 1, JANUARY 1999

+++
Fig. 3. Separability between phonemes using scale (indicated by “ - - ”) and mel-cepstral coefficients (indicated by “--”) based on 200 utterances
at 6 dB SNR. The data is taken from region seven of the TIMIT training set data and is spoken by 22 male and 13 female talkers. It is observed that the
scale-cepstrum is robust to noise and the degradation in separability when compared to clean speech is small.

a function of the number of coefficients for clean and noisy is similar to mel-warping [17]. It is hoped that using such a
speech. warping function may further improve the features.
From the figures, it is clear that the scale-cepstrum provides
better separability than the mel-cepstrum for most vowels. VI. CONCLUSION
Further, it is also seen to be robust to noise. It is also intuitively
We have studied the scale-transform of the formant en-
satisfying to note that vowels closer together in the vowel
velopes of utterances of vowels by different speakers. This
triangle are much more difficult to separate (e.g., iy and was motivated by the hypothesis that the formant frequencies
ih ) when compared to those that are far apart (e.g., iy between different speakers are approximately related by a
and ow ). For clean speech, it is seen that the separability scaling constant for any given phoneme. We have described
for a few vowels is lesser than the mel-cepstrum and this may a procedure for the discrete-implementation of the scale-
be partly explained by recalling that for the scale-cepstrum, we transform on the formant envelope, which we estimate by
have made a linear frequency-scaling assumption, i.e., is using averaged-periodogram techniques. Our results on vowels
independent of frequency. Our recent studies show that is indicate that the scale-cepstral coefficients provides better
actually frequency dependent and the corresponding function separation between vowels than mel-cepstral coefficients. The
UMESH et al.: SPEECH ANALYSIS 45

data used in the comparison were 200 utterances of four Leon Cohen (SM’91–F’98) was born in Barcelona,
vowels from the TIMIT data base and the generalized F-ratio Spain, on March 23, 1940. He received the B.S.
degree from The City College of New York in 1958
was used as the criterion for the comparison of the features. and the Ph.D. degree from Yale University, New
We have shown that the scale-cepstral methods are useful Haven, CT, in 1966.
in devising features that take into account variations due to He has contributed to the fields of signal analysis,
astronomy, physics, and mathematics.
scaling in the frequency domain.

REFERENCES
[1] S. B. Davis and P. Mermelstein, “Comparison of parametric repre-
sentations for monsyllabic word recognition in continuously spoken
sentences,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-
28, pp. 357–366, Aug. 1980. Nenad Marinovic was born in Belgrade, Yu-
[2] C. Jankowski, H.-D. H. Vo, and R. P. Lippmann, “A comparison of
goslavia. He received his Dipl.-Ing. degree in
signal processing front ends for automatic word recognition,” vol. 3,
electrical engineering from the University of
pp. 286–293, July 1995.
Belgrade in 1975 and the Ph.D. degree in electrical
[3] D. J. Nelson, “Wavelet based analysis on speech signals,” in Proc.
engineering from the City University of New York,
IEEE Conf. Automatic Speech Recognition, Snowbird, UT, Dec. 1993,
in 1986.
pp 89–90.
[4] , ”Alternate spectral analysis methods in speech analysis,” in For almost 20 years, until 1996, he has been doing
Proc. 13th Ann. Speech Research Symp., Johns Hopkins Univ., Balti- signal processing research in both industry and
more, MD, June 1993, pp. 304–315. academia. His main interest was in signal theory,
[5] T. Kamm, G. Andreou, and J. Cohen, “Vocal tract normalization in statistical signal processing, and their applications in
speech recognition: Compensating for systematic speaker variability,” biomedical imaging, computer vision, radar, sonar,
in Proc. 15th Ann. Speech Research Symp., Johns Hopkins Univ., underwater acoustics, and speech. From 1986 to 1996, he was a Professor in
Baltimore, MD, June 1995, pp. 175–178. the Department of Electrical Engineering, City College of the City University
[6] P. Bamberg, “Vocal tract normalization,” Tech. Rep., Verbex, 1981. of New York, where his research contributing to this paper was carried out.
[7] E. Eide and H. Gish, “A parametric approach to vocal tract length Since then, he moved to financial industry where he is currently involved in
normalization,” in Proc. IEEE ICASSP’96, Atlanta, GA, pp. 346–349. stochastic modeling of securities markets for measurement and management
[8] L. Cohen, N. Marinovic, S. Umesh, and D. Nelson, “Scale-invariant of financial risks.
speech analysis,” in Proc. Int. Soc. Optical Engineering, San Diego,
CA, 1995, vol. 2569, pp. 522–537.
[9] L. Cohen, “The scale representation,” IEEE Trans. Signal Processing,
vol. 41, pp. 3275–3292, Dec. 1993. Douglas J. Nelson (M’98) was born in Minneapolis,
[10] N. Marinovic, L. Cohen, and S. Umesh, “Scale and harmonic type MN, on November 5, 1945. He received the B.A.
signals,” in Proc. Int. Soc.r Optical Engineering, San Diego, CA, 1994, degree in mathematics from the University of Min-
vol. 2303, pp. 411–418. nesota, Minneapolis, in 1967, and the Ph.D. degree
[11] , “Joint representation in time and frequency scale for harmonic
in mathematics from Stanford University, Stanford,
type signals,” in Proc. IEEE Int. Symp. Time-Frequency and Time-Scale
CA, in December 1972.
Analysis, Philadelphia, PA, 1994, pp. 84–87.
He was a Visiting Assistant Professor of Math-
[12] D. Nelson, “Correlation based speech formant recovery,” in Proc. IEEE
ematics at Carnegie Mellon University, Pittsburgh,
Int. Conf. Acoust., Speech, Signal Processing, Munich, Germany, Apr.
PA, from September 1972 to September 1975. Since
1997, pp. 1643–1646.
[13] A. H. Nuttall and G. C. Carter, “Spectral estimation using combined time November 1975, he has been with the National
and lag weighting,” Proc. IEEE, vol. 70, pp. 1115–1125, Sept. 1982. Security Agency, Fort Meade, MD, where he has
[14] K. Fukunaga, Introduction to Statistical Pattern Recognition. New been involved in the development of signal processing algorithms for radar,
York: Academic, 1990. communication signals, and speech.
[15] T. Parsons, Voice and Speech Processing. New York: McGraw-Hill,
1987.
[16] D. H. Johnson and P. N. Shami, “The signal processing information
base,” IEEE Signal Processing Mag., pp. 36–42, Oct. 1993.
[17] S. Umesh, L. Cohen, and D. Nelson, “Frequency warping and pyscho-
acoustic frequency scales,” in Proc. Int. Soc. Optical Engineering, San
Diego, CA, 1996, vol. 2825, pp. 530–539.

S. Umesh (S’92–M’94) was born in Bangalore, In-

dia. He received the B.E. (Hons.) degree in electrical
and electronics engineering from the Birla Institute
of Technology and Science, India, in 1987, the
M.E. (Hons.) degree in electronics engineering from
Madras Institute of Technology, India, in 1989, and
the Ph.D. degree in electrical engineering from the
University of Rhode Island, Kingston, in 1993.
He is currently an Assistant Professor in the De-
partment of Electrical Engineering, Indian Institute
of Technology, Kanpur, India. From June 1994 to
May 1996, he was with Hunter College, City University of New York as a
Research Associate. His main research interests are in the area of statistical
signal processing. His current research interests are in developing robust
acoustic features for applications in speech processing.
Dr. Umesh is a recipient of the Indian AICTE Career Award for Young
Teachers.

Pedigrees POGIL
No ratings yet
Pedigrees POGIL
7 pages
The 22 Immutable Laws of Branding
No ratings yet
The 22 Immutable Laws of Branding
15 pages
Cepstrum Pitch Determination: OICED-speech Sounds Result From The Resonant
100% (1)
Cepstrum Pitch Determination: OICED-speech Sounds Result From The Resonant
17 pages
Cepstrum vs. LPC: A Comparative Study For Speech Formant Frequencies Estimation
No ratings yet
Cepstrum vs. LPC: A Comparative Study For Speech Formant Frequencies Estimation
16 pages
Lecture Notes 10 - Monday 7/10: Summary of Last Lecture
No ratings yet
Lecture Notes 10 - Monday 7/10: Summary of Last Lecture
5 pages
Speech Feature Extraction
No ratings yet
Speech Feature Extraction
9 pages
List of Figures: Second Unit: Audio and Speech Descriptors
No ratings yet
List of Figures: Second Unit: Audio and Speech Descriptors
22 pages
Final Report On Speech Recognition Project
No ratings yet
Final Report On Speech Recognition Project
32 pages
Blacklock (2004) Tesis-Characteristics of Variation in Production of Normal and Disordered Fricative - Multitaper
No ratings yet
Blacklock (2004) Tesis-Characteristics of Variation in Production of Normal and Disordered Fricative - Multitaper
288 pages
Speech Analysis
No ratings yet
Speech Analysis
6 pages
Lab2 Cepstrales Sin Cepstrales
No ratings yet
Lab2 Cepstrales Sin Cepstrales
21 pages
The Cepstrum Method
No ratings yet
The Cepstrum Method
31 pages
Maretext Independent Speaker Identification Based On K-Mean Algorithm
No ratings yet
Maretext Independent Speaker Identification Based On K-Mean Algorithm
9 pages
Mel-Frequency Cepstral Coefficients Explained Easily
No ratings yet
Mel-Frequency Cepstral Coefficients Explained Easily
75 pages
Pitch Estimation Using A Full/Multi-Band Approaches: Mikhail Tadjikov, Arya Ahmadi
No ratings yet
Pitch Estimation Using A Full/Multi-Band Approaches: Mikhail Tadjikov, Arya Ahmadi
5 pages
An Automatic Speaker Recognition System
100% (1)
An Automatic Speaker Recognition System
11 pages
MFCC PDF
No ratings yet
MFCC PDF
14 pages
Speech Reconstruction From Mel-Frequency Cepstral Coefficients Using A Source-Filter Model
No ratings yet
Speech Reconstruction From Mel-Frequency Cepstral Coefficients Using A Source-Filter Model
4 pages
DSP lab mini project (1)
No ratings yet
DSP lab mini project (1)
7 pages
Speaker Recognition Using Vocal Tract Features
No ratings yet
Speaker Recognition Using Vocal Tract Features
5 pages
Mel-Scaled Filter Bank: Mel (F) 2595 Log10 (1+f/700)
No ratings yet
Mel-Scaled Filter Bank: Mel (F) 2595 Log10 (1+f/700)
3 pages
Cepstral Signal Analysis For Pitch Detection: 1.1 Definition of The Cepstral Coefficients
No ratings yet
Cepstral Signal Analysis For Pitch Detection: 1.1 Definition of The Cepstral Coefficients
3 pages
Speech Sound Production: Recognition Using Recurrent Neural Networks
No ratings yet
Speech Sound Production: Recognition Using Recurrent Neural Networks
20 pages
l8 Ceps
No ratings yet
l8 Ceps
1 page
3_3_1
No ratings yet
3_3_1
6 pages
Digital Signal Processing "Speech Recognition": Paper Presentation On
No ratings yet
Digital Signal Processing "Speech Recognition": Paper Presentation On
12 pages
The Use and Effective Analysis of Vocal Spectrum A
No ratings yet
The Use and Effective Analysis of Vocal Spectrum A
14 pages
FFT Research
No ratings yet
FFT Research
8 pages
Biometrics Lecture Speech
No ratings yet
Biometrics Lecture Speech
38 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
69 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
69 pages
3.2 Automatic Speech Recognition.pptx
No ratings yet
3.2 Automatic Speech Recognition.pptx
151 pages
Padovani
No ratings yet
Padovani
4 pages
Cepstrum
No ratings yet
Cepstrum
5 pages
Voice Recognition
No ratings yet
Voice Recognition
6 pages
Audproc 2
No ratings yet
Audproc 2
40 pages
The Mel-Frequency Cepstral Coefficients in The Context of Singer Identification
No ratings yet
The Mel-Frequency Cepstral Coefficients in The Context of Singer Identification
4 pages
8 Cepstral Analysis
No ratings yet
8 Cepstral Analysis
24 pages
Mel Frequency Cepstral Coefficient (MFCC) - Guidebook - Informatica e Ingegneria Online
No ratings yet
Mel Frequency Cepstral Coefficient (MFCC) - Guidebook - Informatica e Ingegneria Online
12 pages
Evaluation MFCC For Music Similarity
No ratings yet
Evaluation MFCC For Music Similarity
5 pages
Discrete Representation of Signal
No ratings yet
Discrete Representation of Signal
34 pages
Practical Cryptography PDF
No ratings yet
Practical Cryptography PDF
10 pages
Cepstrum: Origin and Definition
No ratings yet
Cepstrum: Origin and Definition
4 pages
Analysis and Classification of Speech Signals by Generalized Fractal Dimension Features
No ratings yet
Analysis and Classification of Speech Signals by Generalized Fractal Dimension Features
18 pages
Implementing Speaker Recognition: Chase Zhou Physics 406 - 11 May 2015
No ratings yet
Implementing Speaker Recognition: Chase Zhou Physics 406 - 11 May 2015
10 pages
MFCC_1
No ratings yet
MFCC_1
38 pages
Lecture 7 - Automatic Speech Recognition
No ratings yet
Lecture 7 - Automatic Speech Recognition
58 pages
Speech Acoustics Project
No ratings yet
Speech Acoustics Project
22 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
NLP UNIT V
No ratings yet
NLP UNIT V
8 pages
13MFCC Tutorial
No ratings yet
13MFCC Tutorial
6 pages
P and P Essay Spectrogram
No ratings yet
P and P Essay Spectrogram
3 pages
Acoustic Phonetics - The Handbook of Phonetic Sciences - Blackwell Reference Online
100% (1)
Acoustic Phonetics - The Handbook of Phonetic Sciences - Blackwell Reference Online
32 pages
sp'Module 4.pdf'
No ratings yet
sp'Module 4.pdf'
70 pages
7.0 Speech Signals and Front-End Processing: References: 1. 3.3, 3.4 of Becchetti
No ratings yet
7.0 Speech Signals and Front-End Processing: References: 1. 3.3, 3.4 of Becchetti
50 pages
Wishart CompositionVox5 1988
100% (1)
Wishart CompositionVox5 1988
8 pages
Voice Conversion by Prosody and Vocal Tract Modification: K. Sreenivasa Rao B. Yegnanarayana
No ratings yet
Voice Conversion by Prosody and Vocal Tract Modification: K. Sreenivasa Rao B. Yegnanarayana
6 pages
1 s2.0 S0892199714000654 Main
No ratings yet
1 s2.0 S0892199714000654 Main
4 pages
Brent William
No ratings yet
Brent William
173 pages
Anisotropic Diffusion: Enhancing Image Analysis Through Anisotropic Diffusion
From Everand
Anisotropic Diffusion: Enhancing Image Analysis Through Anisotropic Diffusion
Fouad Sabry
No ratings yet
Wave Propagation in a Random Medium
From Everand
Wave Propagation in a Random Medium
Lev A. Chernov
No ratings yet
Acoustics, Aeroacoustics and Vibrations
From Everand
Acoustics, Aeroacoustics and Vibrations
Fabien Anselmet
No ratings yet
2020_Hybrid Feature Selection Method Based on_LID
No ratings yet
2020_Hybrid Feature Selection Method Based on_LID
20 pages
2016_An Investigation of Deep Neural Network Architectures for Language_Interspeech_LID
No ratings yet
2016_An Investigation of Deep Neural Network Architectures for Language_Interspeech_LID
5 pages
Approaches to Language Identification Using Gaussian Mixture Models
No ratings yet
Approaches to Language Identification Using Gaussian Mixture Models
4 pages
Group Delay Functions and Its Applications in Speech
No ratings yet
Group Delay Functions and Its Applications in Speech
38 pages
06163922
No ratings yet
06163922
6 pages
Brandon Lake Tickets
No ratings yet
Brandon Lake Tickets
1 page
RRB Pharmacist Syllabus Non Pharma
No ratings yet
RRB Pharmacist Syllabus Non Pharma
12 pages
M00016896 取扱説明書（英文）
No ratings yet
M00016896 取扱説明書（英文）
81 pages
Alternative Approach To Service Blueprinting
No ratings yet
Alternative Approach To Service Blueprinting
8 pages
Diapro Hev Igm Elisa
No ratings yet
Diapro Hev Igm Elisa
6 pages
TESS Manual Programming Motorola Repeaters
No ratings yet
TESS Manual Programming Motorola Repeaters
16 pages
DLP Math First Cot
No ratings yet
DLP Math First Cot
4 pages
Respiratory System Drugs
No ratings yet
Respiratory System Drugs
64 pages
Sunbeat Solar Light
No ratings yet
Sunbeat Solar Light
25 pages
School of Applied Mathematics and Informatics: H H H H H H
No ratings yet
School of Applied Mathematics and Informatics: H H H H H H
1 page
Golden Era of Pia: Special Interest Articles
No ratings yet
Golden Era of Pia: Special Interest Articles
5 pages
03. Taxation Y11 DSE
No ratings yet
03. Taxation Y11 DSE
18 pages
Motion 1 Teacher’s edition
No ratings yet
Motion 1 Teacher’s edition
150 pages
Project Based Learning
No ratings yet
Project Based Learning
13 pages
Uterine Prolapse
No ratings yet
Uterine Prolapse
5 pages
Terms 1 10
No ratings yet
Terms 1 10
1,224 pages
Office Memorandum Sop Annexure A
No ratings yet
Office Memorandum Sop Annexure A
36 pages
Optical Fibre
No ratings yet
Optical Fibre
19 pages
A Review of Employees' Well-Being, Psychological Factors and Its Effect On Job Performance Literature
No ratings yet
A Review of Employees' Well-Being, Psychological Factors and Its Effect On Job Performance Literature
12 pages
+0.000 +4.500 (S.L.) Ground Water Level (Approx.) +3.000 (S.L.)
No ratings yet
+0.000 +4.500 (S.L.) Ground Water Level (Approx.) +3.000 (S.L.)
1 page
Yin 2008
No ratings yet
Yin 2008
10 pages
Animal Diversity - I (Non-Chordates) : Protozoa Dr. (MRS.) Hardeep Kaur
No ratings yet
Animal Diversity - I (Non-Chordates) : Protozoa Dr. (MRS.) Hardeep Kaur
48 pages
Introduction To Operational Transconductance Amplifiers
No ratings yet
Introduction To Operational Transconductance Amplifiers
10 pages
Class 9 Holidays Homework - Art Education
No ratings yet
Class 9 Holidays Homework - Art Education
2 pages
Ged106 - Lesson 4 5 2
No ratings yet
Ged106 - Lesson 4 5 2
3 pages
Reaction Paper On Media
100% (2)
Reaction Paper On Media
2 pages
2233 AnnualRpt PDF
No ratings yet
2233 AnnualRpt PDF
396 pages
PACU-1A,1B-PTX080@116.6F-21,396.0 Madina
No ratings yet
PACU-1A,1B-PTX080@116.6F-21,396.0 Madina
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Scale Transform in Speech Analysis

Uploaded by

Scale Transform in Speech Analysis

Uploaded by

40 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO.

Scale Transform in Speech Analysis

T HERE have been a number of acoustic features that

S. Umesh (S’92–M’94) was born in Bangalore, In-

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.