0% found this document useful (0 votes)
4 views6 pages

Speaker Identification Using Power Distribution in

The paper discusses a technique for speaker identification using power distribution in the frequency spectrum, highlighting its applications and trends. It presents a method based on Euclidean distance comparison, achieving high accuracy for text-dependent identification and around 80% for text-independent identification. The study emphasizes the importance of feature vectors in improving recognition accuracy and outlines the components of a speaker identification system.

Uploaded by

Lamyaa Aldawy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views6 pages

Speaker Identification Using Power Distribution in

The paper discusses a technique for speaker identification using power distribution in the frequency spectrum, highlighting its applications and trends. It presents a method based on Euclidean distance comparison, achieving high accuracy for text-dependent identification and around 80% for text-independent identification. The study emphasizes the importance of feature vectors in improving recognition accuracy and outlines the components of a speaker identification system.

Uploaded by

Lamyaa Aldawy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/228941277

Speaker Identification using Power Distribution in Frequency Spectrum

Article · February 2010

CITATIONS READS
3 547

2 authors:

Hemant B Kekre Vaishali Kulkarni


Narsee Monjee Institute of Management Studies Narsee Monjee Institute of Management Studies
410 PUBLICATIONS 4,120 CITATIONS 61 PUBLICATIONS 517 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Hemant B Kekre on 05 June 2014.

The user has requested enhancement of the downloaded file.


Journal of Sci., Engg. & Tech. Mgt. Vol 2 (1), January 2010

Speaker Identification using Power Distribution in Frequency Spectrum


Dr. H. B. Kekre1, Vaishali Kulkarni2
1
Senior Professor, Department of Computer Engineering
2
Assistant Professor, Department of Electronics
MPSTME, NMIMS, Mumbai - 400056
Email: hbkekre@yahoo.com, vaishalikulkarni6@yahoo.com

Abstract
This paper presents a brief overview of the Speaker recognition process, its trends and applications.
Further a simple technique based on the Euclidean distance comparison is proposed. The technique is
applied for both text-dependent as well as text independent identification. Text dependent identification
gives excellent results whereas text independent identification gives almost 80% matching accuracy.

Keywords: Speaker recognition, speaker Identification, Speaker Verification, Spectrogram, Euclidean


distance

Introduction
Human beings can identify a speaker based on
his voice with a fairly good precision. With a
large number of applications like voice dialing,
phone banking, teleshopping, database access
services, information services, voice mail,
security systems and remote access to computers
etc., the automated systems need to perform as
well or even better, than humans. A lot of work
in this regard has been done. But still there is
lack of understanding of the characteristics of
the speech signal that can uniquely identify a
speaker. The speech signal gives various levels
of information. Firstly it conveys the words or Fig.1. Classification of Speaker Recognition
message being spoken; also on a secondary level systems.
it gives us information about the identity of the
speaker. The goal of speaker recognition is to In the latter case, no assumption about the text
extract the identity of the person speaking. being spoken is made, but the system must
Speaker Recognition is the process of model the general underlying properties of the
automatically recognizing who is speaking on speaker’s vocal spectrum. In general, text-
the basis of individual information included in dependent systems are more accurate, since both
speech signals. It can be divided into Speaker the content and voice can be compared.
Identification and Speaker Verification. Speaker Work on automatic Speaker recognition started
identification determines which registered in the 1960’s. Pruzansky at Bell Labs [1] was
speaker provides a given utterance from among the first to initiate the research using
amongst a set of known speakers (also known as filter banks and correlating the two digital
closed set identification). Speaker verification spectrograms for a similarity measure.
accepts or rejects the identity claim of a speaker Doddington at Texas instruments [2] replaced
(also known as open set identification). filter banks by formant analysis. For text
Speaker identification task can be further independent methods, various parameters were
classified into text-dependent or text- extracted by averaging over a long enough
independent task. In the former case, the duration or by extracting statistical or predictive
utterance presented to the system is known parameters like averaged auto-correlation [3],
beforehand. instantaneous spectra covariance matrix [4],

43
Journal of Sci., Engg. & Tech. Mgt. Vol 2 (1), January 2010

spectrum and fundamental frequency histograms


[5], linear prediction coefficients [6] and long
term averaged spectra [7]. As the performance
of text-independent systems was limited various
text-dependent methods, [8, 9] were also
implanted in the 1970’s. Hidden Markov Model
(HMM) and Vector Quantization based methods
were developed in the 1980’s. Text dependent
speaker recognition systems based on HMM
architecture generally used multi word phrases
for the training phase and stored the models for
the entire phrase [10]. VQ/HMM based method
was developed for text-independent
identification. A set of short-time training
feature vectors of a speaker can be efficiently Fig 2 Speaker identification system
compressed to a small set of representative
points, a VQ codebook [11, 12 and 21]. Rose et the properties of speech that can separate
al. [13] proposed a single-state HMM, which is different speakers. Front-end processing is
now called Gaussian mixture model (GMM), as performed both in training- and recognition
a robust parametric model. In the 1990’s robust phases.
text-prompted methods were developed. Matsui Speaker modeling - this part performs a
et al. [14] proposed a text-prompted speaker reduction of feature data by modeling the
recognition method, in which key sentences are distributions of the feature vectors.
completely changed every time the system is Speaker database - the speaker models are stored
used. For reducing the intra-speaker variation, here.
likelihood ratio and posteriori probability-based Decision logic - makes the final decision about
techniques were investigated [15, 16, and 17]. the identity of the speaker by comparing
Methods based on score normalization have unknown feature vectors to all models in the
been recently introduced in the 2000’s database and selecting the best matching model.
[18].Various high level features like word
idiolect; pronunciations, phone usage, prosody, Basics of Speech signal
etc. have been successfully used in text- The speech samples used in this work are
independent speaker verification [19]. recorded using Sound Forge 4.5. The sampling
Recognition systems have been developed for a frequency is 8000 Hz (8 bit, mono PCM
wide range of applications. Although many new samples). Table 1 shows the database
techniques were invented and developed, there description. All the samples are scaled to the
are still a number of practical limitations same time scale. The samples are collected from
because of which widespread deployment of different speakers. Samples are taken from each
applications and services is not possible. Still it speaker in two sessions so that training model
is very true that humans can recognize speech and testing data can be created. Also 4 different
and speaker more efficiently than machines [20]. texts are recorded so that both text-independent
There is now an increasing interest in finding and text-dependent speaker identification can be
ways to reduce this performance gap. done. Fig 3 shows the speech signal for sample
1. Fig 4 shows the different features of the
The recognition Process sample 1 speech signal like spectrogram,
The Speaker identification system is composed intensity, pitch and formants.
of the following modules:
Front-end processing - the "signal processing"
part, which converts the sampled speech signal
into set of feature vectors, which characterize

44
Journal of Sci., Engg. & Tech. Mgt. Vol 2 (1), January 2010

Table 1 Database Description


processed and matched with the feature vectors
Parameter Sample stored in the database. The stored feature vector
characteristics which gives the minimum Euclidean distance
Language English with the input sample feature vector is declared
No. of Speakers 26 as the match (speaker identified). Fig 5 (a)
Speech type Read speech shows the FFT of the sample 1 shown in Fig 3.
Recording Normal. (A silent Fig 5 (b) shows the feature vector for sample
conditions room) 1obtained by grouping 250 samples and
Sampling frequency 8000 Hz selecting first 16 as feature vectors exploiting
Resolution 8 bps the symmetry of DFT.
Training speech 6 sec, 8 sec
(scaled)
Evaluation speech 6 sec, 8 sec
(scaled)

Fig 3 Speech signal for the sample1 Fig 5 (a) FFT of sample 1 (b) sum of the
magnitudes of FFT for 16 feature vectors
Experimental
For creating the database, all the time scaled Results and Discussions
samples of either 6 sec or 8 sec are considered. In this section the results obtained by applying
The FFT of the samples is found and the sum of the technique discussed in the previous section
the magnitude of FFT for different groupings is on a sample set of 26 speakers of varying age
found. This formed the feature vector. The groups (between 10 and 70 years of age) is
feature vectors were stored in the database. For presented for both text-dependent and text-
the identification of the speaker, the input independent speaker identification.
sample is similarly
Fig 6 shows the curves obtained for text-
dependent as well as text-independent
identification by varying the number of feature
vectors for a sample set of 26 speakers. As seen
from the figure, for text-dependent samples,
about 80 feature vectors are sufficient to get
100% accuracy. But with text-independent
speech, the maximum accuracy is about 84.61%.
Fig 7 shows the curve obtained for text-
independent identification by varying the
number of speakers in the database. It can be
seen that as the number of speakers increases the
Fig 4 Various parameters of sample 1. accuracy decreases as expected but it is still
Spectrogram, pitch, intensity, formants above 80%.

45
Journal of Sci., Engg. & Tech. Mgt. Vol 2 (1), January 2010

[5]B. Beek, et. al., “Automatic speaker


matching percentage recognition system”, Rome Air Development
Center Report, 1971.
100
[6] M.R.Sambur, Speaker recognition and
Accuracy %

80
60
text dependent verification using linear prediction analysis, Ph.
text independent
40 D. Dissert., M.I.T., 1972.
20 [7] S. Furui, et. al., “Talker recognition by long
0
time averaged speech spectrum”, Electronics
16 20 40 80 200 400
and Communications in Japan, 55-A. pp. 54-61,
no. of feature vectors
1972.
Fig 6 Variation in the number of feature vectors [8]S. Furui, “Cepstral analysis technique for
automatic speaker verification”, IEEE Trans.
Acoustic, Speech, Signal Processing, ASSP-29,
Variation in no. of speakers
pp. 254-272, 1981.
120 [9]A. E. Rosenberg and M. R. Sambur, “New
Accuracy (%)

100 Techniques for automatic speaker verification”,


80
60
Variation in no. of IEEE Trans. Acoustics, Speech, Signal Proc.,
speakers ASSP-23, 2, pp. 169-176, 1975.
40
20 [10] J. M. Naik, et. al., “Speaker verification
0
over long distance telephone lines”, Proc.
8 12 16 20 24 26
ICASSP, pp.524-527, 1989.
No. of speakers [11]F. K. Soong, et. al., “A vector quantization
approach to speaker recognition”, At & T
Fig 7 Variation in the number of speakers
Technical Journal, 66, pp. 14-26, 1987.
[12] A. E. Rosenberg and F. K. Soong,
Conclusion
“Evaluation of a vector quantization talker
A very simple technique based on power
recognition system in text independent and text
distribution in frequency spectrum has been
dependent models”, Computer Speech and
introduced. This technique gives very good
Language 22, pp. 143-157, 1987.
results for both text-dependent and text-
[13] R. Rose and R. A. Reynolds, “Text
independent systems. The results also show that
independent speaker identification using
accuracy increases as the number of feature
automatic acoustic segmentation”, Proc.
vectors in the database for each sample
ICASSP, pp. 293-296, 1990.
increases. The present study is still ongoing,
[14]T. Matsui and S. Furui, “Concatenated
which may involve different techniques to find
phoneme models for text variable speaker
the feature vectors and their comparison.
recognition”, Proc. ICASSP, pp. II-391-394,
1993.
References
[15] A. Higgins, et. al., “Speaker verification
using randomized phrase prompting”, Digital
[1] S. Pruzansky, “Pattern-matching procedure
Signal Processing, 1, pp. 89-106, 1991.
for automatic talker recognition”, J.A.S.A., 35,
[16]T. Matsui and S. Furui, “Similarity
pp. 354-358, 1963.
normalization method for speaker verification
[2]G.R.Doddington, “A method of speaker
based on a posteriori probability”, Proc. ESCA
verification”, J.A.S.A., 49,139 (A), 1971.
Workshop on Automatic Speaker Recognition,
[3] P.D. Bricker, et. al., ”Statistical techniques
Identification and Verification, pp. 59-62, 1994.
for talker identification”, B.S.T.J., 50, pp. 1427-
[17] D. Reynolds, “Speaker identification and
1454, 1971.
verification using Gaussian mixture speaker
[4]K.P.Li, et. al., “Experimental studies in
models”, Proc. ESCA Workshop on Automatic
speaker verification using a adaptive system”,
Speaker recognition, Identification and
J.A.S.A., 40, pp. 966-978, 1966.
verification, pp. 27-30, 1994.

46
Journal of Sci., Engg. & Tech. Mgt. Vol 2 (1), January 2010

[18] F. J. Bimbot, et. al., “A tutorial on text- working under his guidance have received best
independent speaker verification”, EURASIP paper awards. Currently he is guiding ten Ph.D.
Journ. on Applied Signal Processing, pp. 430- students.
451, 2004.
[19] G.R. Doddington, “Speaker recognition Vaishali Kulkarni has received
based on idiolectal differences between B.E in
speakers”, Proc. Eurospeech, pp. 2521-2524, Electronics Engg. from Mumbai
2001. University in 1997, M.Tech
[20] S Furui, “50 years of progress in speech and (Electronics and Telecom) from
speaker recognition research”, ECTI Mumbai University in 2006.
Transactions on Computer and Information Presently she is pursuing Ph. D from NMIMS
Technology, Vol. 1, No.2, November 2005. University. She has a tezching experience of
[21] Marco Grimaldi and Fred Cummins, more than 7 years. She is Assistant Professor in
“Speaker Identification using Instantaneous telecom Department in MPSTME, NMIMS
Frequencies”, IEEE Transactions on Audio, University. Her area of interest include Speech
Speech, and Language Processing, vol., 16, no. processing: Speech and Speaker Recognition
6, August 2008.
[21] H. B. Kekre, Tanuja K. Sarode, “Speech
Data Compression using Vector Quantization”,
WASET International Journal of Computer and
Information Science and Engineering (IJCISE),
Fall 2008, Volume 2, Number 4, pp.: 251-254,
2008. http://www.waset.org/ijcise.

Author Biographies

Dr. H. B. Kekre has received B.E. (Hons.) in


Telecomm. Engg. from
Jabalpur University in 1958,
M.Tech (Industrial
Electronics) from IIT Bombay
in 1960, M.S.Engg. (Electrical
Engg.) from University of
Ottawa in 1965 and Ph.D.
(System Identification) from IIT Bombay in
1970. He has worked Over 35 years as Faculty
of Electrical Engineering and then HOD
Computer Science and Engg. at IIT Bombay.
For last 13 years worked as a Professor in
Department of Computer Engg. at Thadomal
Shahani Engineering College, Mumbai. He is
currently Senior Professor working with Mukesh
Patel School of Technology Management and
Engineering, SVKM’s NMIMS University, Vile
Parle(w), Mumbai, INDIA. He has guided 17
Ph.D.s, 150 M.E./M.Tech Projects and several
B.E./B.Tech Projects. His areas of interest are
Digital Signal processing, Image Processing and
Computer Networks. He has more than 250
papers in National / International Conferences /
Journals to his credit. Recently six students

47

View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy