Speaker Identification Using Power Distribution in
Speaker Identification Using Power Distribution in
net/publication/228941277
CITATIONS READS
3 547
2 authors:
All content following this page was uploaded by Hemant B Kekre on 05 June 2014.
Abstract
This paper presents a brief overview of the Speaker recognition process, its trends and applications.
Further a simple technique based on the Euclidean distance comparison is proposed. The technique is
applied for both text-dependent as well as text independent identification. Text dependent identification
gives excellent results whereas text independent identification gives almost 80% matching accuracy.
Introduction
Human beings can identify a speaker based on
his voice with a fairly good precision. With a
large number of applications like voice dialing,
phone banking, teleshopping, database access
services, information services, voice mail,
security systems and remote access to computers
etc., the automated systems need to perform as
well or even better, than humans. A lot of work
in this regard has been done. But still there is
lack of understanding of the characteristics of
the speech signal that can uniquely identify a
speaker. The speech signal gives various levels
of information. Firstly it conveys the words or Fig.1. Classification of Speaker Recognition
message being spoken; also on a secondary level systems.
it gives us information about the identity of the
speaker. The goal of speaker recognition is to In the latter case, no assumption about the text
extract the identity of the person speaking. being spoken is made, but the system must
Speaker Recognition is the process of model the general underlying properties of the
automatically recognizing who is speaking on speaker’s vocal spectrum. In general, text-
the basis of individual information included in dependent systems are more accurate, since both
speech signals. It can be divided into Speaker the content and voice can be compared.
Identification and Speaker Verification. Speaker Work on automatic Speaker recognition started
identification determines which registered in the 1960’s. Pruzansky at Bell Labs [1] was
speaker provides a given utterance from among the first to initiate the research using
amongst a set of known speakers (also known as filter banks and correlating the two digital
closed set identification). Speaker verification spectrograms for a similarity measure.
accepts or rejects the identity claim of a speaker Doddington at Texas instruments [2] replaced
(also known as open set identification). filter banks by formant analysis. For text
Speaker identification task can be further independent methods, various parameters were
classified into text-dependent or text- extracted by averaging over a long enough
independent task. In the former case, the duration or by extracting statistical or predictive
utterance presented to the system is known parameters like averaged auto-correlation [3],
beforehand. instantaneous spectra covariance matrix [4],
43
Journal of Sci., Engg. & Tech. Mgt. Vol 2 (1), January 2010
44
Journal of Sci., Engg. & Tech. Mgt. Vol 2 (1), January 2010
Fig 3 Speech signal for the sample1 Fig 5 (a) FFT of sample 1 (b) sum of the
magnitudes of FFT for 16 feature vectors
Experimental
For creating the database, all the time scaled Results and Discussions
samples of either 6 sec or 8 sec are considered. In this section the results obtained by applying
The FFT of the samples is found and the sum of the technique discussed in the previous section
the magnitude of FFT for different groupings is on a sample set of 26 speakers of varying age
found. This formed the feature vector. The groups (between 10 and 70 years of age) is
feature vectors were stored in the database. For presented for both text-dependent and text-
the identification of the speaker, the input independent speaker identification.
sample is similarly
Fig 6 shows the curves obtained for text-
dependent as well as text-independent
identification by varying the number of feature
vectors for a sample set of 26 speakers. As seen
from the figure, for text-dependent samples,
about 80 feature vectors are sufficient to get
100% accuracy. But with text-independent
speech, the maximum accuracy is about 84.61%.
Fig 7 shows the curve obtained for text-
independent identification by varying the
number of speakers in the database. It can be
seen that as the number of speakers increases the
Fig 4 Various parameters of sample 1. accuracy decreases as expected but it is still
Spectrogram, pitch, intensity, formants above 80%.
45
Journal of Sci., Engg. & Tech. Mgt. Vol 2 (1), January 2010
80
60
text dependent verification using linear prediction analysis, Ph.
text independent
40 D. Dissert., M.I.T., 1972.
20 [7] S. Furui, et. al., “Talker recognition by long
0
time averaged speech spectrum”, Electronics
16 20 40 80 200 400
and Communications in Japan, 55-A. pp. 54-61,
no. of feature vectors
1972.
Fig 6 Variation in the number of feature vectors [8]S. Furui, “Cepstral analysis technique for
automatic speaker verification”, IEEE Trans.
Acoustic, Speech, Signal Processing, ASSP-29,
Variation in no. of speakers
pp. 254-272, 1981.
120 [9]A. E. Rosenberg and M. R. Sambur, “New
Accuracy (%)
46
Journal of Sci., Engg. & Tech. Mgt. Vol 2 (1), January 2010
[18] F. J. Bimbot, et. al., “A tutorial on text- working under his guidance have received best
independent speaker verification”, EURASIP paper awards. Currently he is guiding ten Ph.D.
Journ. on Applied Signal Processing, pp. 430- students.
451, 2004.
[19] G.R. Doddington, “Speaker recognition Vaishali Kulkarni has received
based on idiolectal differences between B.E in
speakers”, Proc. Eurospeech, pp. 2521-2524, Electronics Engg. from Mumbai
2001. University in 1997, M.Tech
[20] S Furui, “50 years of progress in speech and (Electronics and Telecom) from
speaker recognition research”, ECTI Mumbai University in 2006.
Transactions on Computer and Information Presently she is pursuing Ph. D from NMIMS
Technology, Vol. 1, No.2, November 2005. University. She has a tezching experience of
[21] Marco Grimaldi and Fred Cummins, more than 7 years. She is Assistant Professor in
“Speaker Identification using Instantaneous telecom Department in MPSTME, NMIMS
Frequencies”, IEEE Transactions on Audio, University. Her area of interest include Speech
Speech, and Language Processing, vol., 16, no. processing: Speech and Speaker Recognition
6, August 2008.
[21] H. B. Kekre, Tanuja K. Sarode, “Speech
Data Compression using Vector Quantization”,
WASET International Journal of Computer and
Information Science and Engineering (IJCISE),
Fall 2008, Volume 2, Number 4, pp.: 251-254,
2008. http://www.waset.org/ijcise.
Author Biographies
47