0% found this document useful (0 votes)

55 views8 pages

Design A Text-Prompt Speaker Recognition System Using LPC-Derived Features

The document describes a text-prompt speaker recognition system that uses linear predictive coding (LPC)-derived features. The system has two modes: closed-set speaker identification and open-set speaker verification. In closed-set identification testing, the system achieved identification rates between 84-97% using various LPC-derived features. In open-set verification testing, the system randomly selects an 8-sample text prompt and achieves a verification rate greater than 99% when comparing speakers' responses to stored references from authorized speakers.

Uploaded by

mahdiwisful

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views8 pages

Design A Text-Prompt Speaker Recognition System Using LPC-Derived Features

Uploaded by

mahdiwisful

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

The 13th International Arab Conference on Information Technology ACIT'2012 Dec.

10-13 ISSN : 1812-0857

Design A Text-Prompt Speaker Recognition System Using LPC-Derived Features

Dr. Mustafa Dhiaa Al-Hassani dr_mdhiaa77@yahoo.com Mustansiriyah University, Baghdad, Iraq Dr. Abdulkareem A. Kadhim abdulkareem.a@coie-nahrain.edu.iq Al-Nahrain University, Baghdad, Iraq

Abstract: Humans are integrated closer to computers every day, and computers are taking over many services that used to be based on face-toface contact between humans. This has prompted an active development in the field of biometric systems. The use of biometric information has been known widely for both person identification and security applications. The paper is concerned with the use of speaker features for protection against unauthorized access. A speaker recognition system for 6304 speech samples is presented that relies on LPC-derived features. A vocabulary of 46 speech samples is built for 10 speakers, where each authorized person is asked to utter every sample 10 times. Two different modes are considered in identifying individuals according to their speech samples. In the closed-set speaker identification, it is found that all tested LPC-derived features outperform the raw LPC coefficients and 84% to 97% identification rates are achieved. Applying the preprocessing steps to the speech signals (preemphasis, remove DC offset, frame blocking, overlapping, normalization and windowing) improve the representation of speech features, and up to 100% identification rate was obtained using weighted Linear Predictive Cepstral Coefficients (LPCC). In the open-set speaker verification mode of our proposed system model, the system selects randomly a pass phrase of 8-samples length from its database for each trial a speaker is presented to the system. Up to 213 text-prompt trials from 23-different speakers (authorized and unauthorized) are recorded (i.e., 1704 samples) in order to study the system behavior and to generate the optimal threshold in which the speakers are verified or not when compared to those training references of authorized speakers constructed in the first mode, where the best obtained speaker verification rate is greater than 99%.

Keywords: Speaker Recognition, Speaker Identification, Speaker Verification, Biometric, Text-prompt, LPC-derived features, LSF.

1. Introduction
As everyday life is getting more and more computerized, automated security systems are getting more and more important. Today most personal banking tasks can be performed over the Internet and soon they can also be performed on mobile devices such as cell phones and PDAs. The key task of an automated security system is to verify that the users are in fact those who claim to be [1]. Since the level of security breaches and transaction fraud increases, the need for highly secure identification and personal verification technologies is becoming apparent. Biometric-based solutions are able to provide confidential financial transactions and personal data privacy [2]. The need for biometrics can be found in federal, state and local governments, in the military, and in commercial applications [1, 3]. A biometric system is essentially a pattern recognition system that establishes the authenticity of a specific physiological or behavioral characteristic possessed by a user. They are typically based on some single biometric feature of humans, but several hybrid systems also exist [2, 4, 5, 1, 6]. Human voice can serve as a key for any security objects, and it is not easy to lose or forget it. This technique can be used to verify the identity claimed by people accessing systems; that is, it enables control of access to various services by voice [3, 7]. Speaker recognition has received for many years the attention of researchers working in the field of signal processing. This technology has been developed in such a way that it can be used in a number of applications, such as: voice dialing, banking over a telephone network, person

authentication, remote access to computers, command and control systems, network security and protection, entry and access control systems, data access/information retrieval, Monitoring, etc [8, 5, 9, 10, 11].

2. Aim of the Work

This work aims to build a speaker recognition (identification/verification) system that automatically authenticate a speaker's identity by his/her voice, according to a random text-prompt generated by the system, and then gives only the authorized persons a privilege or an access right to the facility that need to be protected from the intrusion of unauthorized persons.

3. The Proposed Speaker Recognition System Model

In this section, several linear prediction based methods (LPC, PARCOR, LAR, ASRC, LPCC, and LSF) are tested for textdependent speaker recognition system in a closed-set mode. The open-set speaker verification mode is also investigated, which involves speakers verification according to a randomly textprompt sentence generated by the system. The block diagram for the proposed speaker recognition system model, shown in Fig. (1), illustrates that the input speech is passed through six preprocessing operations (preemphasis, remove DC offset, frame blocking, overlapping, normalization and windowing) prior to feature extraction phase. If the match is lower than certain threshold, then the identity claims is verified "Accepted", otherwise, the speaker is "Rejected" [1].

1
555

Table-1: The recorded speech samples Data Sets 1) Digits 2) Characters 3) Words Speech Samples 0 ... 9 A ... Z Accept, Reject, Open, Close, Help, Computer, Yes, No, Copy, Paste

For practical purposes, these data sets are very interesting because the similarities between several samples (especially letters) lead to the realization of important problems in speech recognition. In the closed-set speaker identification mode, up to 4600 samples were collected from different persons, whereas 1704 samples were recorded in the open-set speaker verification mode.
Figure (1): Block-Diagram of the proposed Speaker Recognition System Model

3.1. Speech Recording

Any speaker recognition system depends on speech recording samples as input data. The speech signals used for training and testing are recorded in a quiet (but not a soundproof) rooms via high quality built-in microphone and digitized by a sound card of type Crystal Intel (r) integrated audio using DELL Latitude C400 Notebook and having the following recording features: .wav file format, 11 kHz sampling rate, 2-bytes/sample and single channel [1].

In the open-set speaker verification mode of our proposed system model, the system selects randomly a pass phrase of 8samples length from its database for each trial a speaker is presented to the system. Up to 213 text-prompt trials from different speakers (i.e., authorized and unauthorized) are recorded (i.e., 1704 samples) in order to study the system behavior. In fact, the generated text-prompt sentence, shown in Table-2, is a random number between 1 and 46 which corresponds to the samples in the vocabulary shown in Table-1. This is performed in order to study the system behavior and to generate the optimal threshold in which the speakers are verified to be accepted or not when compared to those training references of authorized speakers constructed in the first mode [1].
Table-2: Examples of Randomly Text-Prompt Sentences generated by the System

3.2. Database Construction

In this work, database samples were recorded in two modes of operation: Closed-set speaker identification mode Open-set speaker verification mode In order to evaluate the identification/verification performance of the proposed system model, each speaker is asked to utter the vocabulary data sets, shown in Table-1, for a maximum of 10 utterances/sample. The number of repetition R ( 1 R < 10 ) can be considered as training set during an enrollment phase to train the speakers model of authorized persons, and the other ( 10 R ) repetitions are considered for testing during a matching phase to classify them with those training references in the database. As a result, the total database size of speakers samples for this mode is [1]:
Total DB S ize 10 No. of Samples No. of Spea ker s
No. of Training Re ferences R No.of Samples No. of Speaker s

Table-2 illustrates five examples of text-prompt sentences generated by the system where column Si ( i=1,2,...,8 ) stands for sample number i , which compose a sentence in each row [1].

3.3. Preprocessing
The basic idea behind speech preprocessing is to generate a signal with a fine structure as close as possible to that of the original speech signal. This produces a data reduction facility with easier task analysis [11]. A number of processing techniques adopted in this system model are applied in the following sequence:

(1) (2)

No. of Test Samples (10 R) No. of Samples No. of Spea ker s (3)

2
556

Preemphasis Usually the digital speech signal, s[n], is preemphasized first. This is achieved by passing the signal through a highpass filter. This process emphasis the high frequencies relative to low frequencies, hence, compensating the effect of band limiting the input signal with a low-pass filter in the recording process. The most commonly used preemphasis filter is given by the following transfer function [12, 13, 10, 14]: (4) where typically lies in the range of 0.9 < 1.0 , which controls the slope of the filter that is simply implemented as a first order differentiator: (5) For the proposed system model is set to 0.95 [1]. The Removal of DC offset DC offset occurs when hardware, such as a sound card, adds DC current to a recorded audio signal. This current produces a recorded waveform that is not centered on the baseline. Therefore, removing this DC offset is the process of forcing the input signal mean to the baseline by adding a constant value to the samples in the sound file. An illustrative example of removing DC offset from a waveform file is shown in Fig. (2) [1].

Overlapping Usually adjacent frames are overlapped. The frame is shifted forward by a fixed amount, typically 30 50 % of the frame length along the signal. The purpose of the overlapping is to avoid losing of information since that each speech sound of the input sequence would be approximately centered at some frame [1, 15, 13, 16]. Normalization The frames of speech are normalized to make their power equal to unity. This step is very important since the extracted frames have different intensities due to the speaker loudness, speaker distance from the microphone and recording level. The normalization is done by dividing each sample by the square root of the sum of squares of all the samples in the segment as stated below: (7)

where S[n] is the speech sample, N is the number of samples in the segment which is 256, and the subscript norm refers to normalization [1]. Windowing The purpose of windowing is to reduce the effect of spectralleakage (type of distortion in spectral analysis) that results from the framing process. Windowing involves multiplying a speech signal x(n) by a finite-duration window w(n), which yields a set of speech samples weighted by the shape of the window, as stated by the following equation [1, 15, 13, 17, 12]: (8) where N is the size of the window or frame. There exist many different windowing functions; Table-3 lists the window functions that are used in our experiments and their shapes illustrated in Fig. (3) [1].
Table-3: Rectangular, Hamming and Kaiser Window-Function

Figure (2): Removal of DC offset from a Waveform file (a) Exhibits DC offset, (b) After the removal of DC offset

Frame-Blocking It is the process of blocking or splitting the input speech samples into equal durations of N samples length to carryout frame-wise analysis. The selection of the frame length is a crucial parameter for successful spectral analysis, due to the trade-off between the time and frequency resolutions. The window should be long enough for adequate frequency resolution, but on the other hand, it should be short enough so that it would capture the local spectral properties. Typically a frame length of 10 30 milliseconds is used. The signal for the i-th frame is given by [15, 14, 10, 12]: (6) In this work, a frame length N = 256 samples with a duration of 23.2 milliseconds is used [1].

3
557

The prediction error for nth sample is given by the difference between the actual sample and its predicted value [1, 13, 20, 10]:

Equivalently,

e[ n ] s[ n ] a[ k ] s[ n k ]
k 1

(10) (11)

s[ n ] a[ k ] s[ n k ] e[ n ]
k 1

When the prediction residual e[n] is small, predictor Eq. (9) approximates s[n] well. The total squared prediction error is given by

e[ n ]
n n

( s [ n ] a [ k ] s [ n k ])
k 1

(12)

Figure (3): Rectangular, Hamming and Kaiser Window-Function of 256 Samples Length [1]

Minimization of error is achieved by setting the partial derivatives of E with respect to the model parameters {a[k]} to zero:

E 0 , k 1,..., p a[ k ]

(13)

3.4. Feature Extraction

Having acquired the testing or training utterances, it is now the role of the feature extractor to extract the features from the speech samples. Feature extraction refers to the process of reducing dimensionality by forming a new smaller set of features from the original feature set of the patterns. This can be done by extracting some numerical measurements from raw input patterns [8, 1, 15]. Several linear prediction based features are tested, which include LPC, PARCOR, LAR, ASRC, LPCC, and LSF. Linear Predictive Coding (LPC) Linear prediction (LP) forms an integral part of almost all modern day speech coding algorithms. The fundamental idea is that a speech sample can be approximated as a linear combination of past samples. Within a signal frame, the weights used to compute the linear combination are found by minimizing the mean-squared prediction error; the resultant weights, or linear prediction coefficients (LPCs), are used to represent the particular frame [18]. The importance of this method lies in its ability to provide extremely accurate estimates of the speech parameters, and in its relative speed of computation [20, 19]. The LPC model, assumes that each sample s(n) at time n, can be approximated by a linear sum of the p previous samples

By writing out Eq. (13) for k = 1 ... p, the problem of finding the optimal predictor coefficients reduced to solve of so-called (Yule- Walker) AR equations. Depending on the choice of the error minimization interval in Eq. (12), there are two methods for solving the AR equations: covariance method and autocorrelation method [13, 10, 19]. The two methods do not have large difference, but the autocorrelation method is the preferred since it is computationally more efficient and it always guarantees a stable filter.

Ra v which can be rewritten as [13, 1, 21]:

In matrix form, the set of linear equations is represented by

R(0) R(1) R(p1) a1 R(1) R(0) R(p 2) a2 R(2) R(1) R(p1) R(p 2) R(0) ap R(p)

(14)

s[ n ]

a [ k ] s[ n k ]

(9)

k 1

where R is a special type of matrix called Toeplitz matrix (symmetric with all diagonal elements equal, this facilitates the solution of the Yule-Walker equations for the LP coefficients {ak} through computationally fast algorithms such as the Levinson Durbin algorithm), a is the vector of the LPC coefficients and v is the autocorrelation. Both the matrix R and vector v are completely defined by p autocorrelation samples. The autocorrelation sequence of s[n] is defined as [1, 21, 10, 19, 13]:

where s[n] is an approximation of the present output, s[nk] are past outputs, p is the prediction order; and {a[k]}, k = 1...p are the model parameters called the predictor coefficients that need to be determined so that the average prediction error (or residual) is as small as possible [10, 19].

R[ k ]

1 N

N 1 k n0

s[ n ] s[ n k ]

(15)

where N is the number of data points in the segment.

4
558

Due to the redundancy in the Yule-Walker (AR) equations, there exists an efficient algorithm for finding the solution, known as Levinson-Durbin recursion [1, 10, 19, 20, 13].

cepstrum coefficients c n and prediction coefficients a k is represented in the following equations [1, 9, 13]:

E( 0 ) R( 0 ) ki ai( i ) E
(i )

c1 a1 cn 1 k / n . ak .cnk an , 1 n p
k 1 n1

(16)

R( i ) ki
2 i

i 1

a(j i1 ) R( i j ) E( i 1) , 1 i p j 1 1 j i 1

(23)

(17)

1) a(j i ) a(j i1 ) ki ai(i , j

(18) (19)

where p is a prediction order. It is usually said that the cepstrum, derived in such a way represents the smoothed version of the spectrum. Similar to LPC analysis, increasing the number of coefficients results in more details [10, 4]. Because of the sensitivity of the low-order cepstral coefficients to overall spectral slope and the sensitivity of the high-order cepstral coefficients to noise (and other forms of noise-like variability), it has become a standard technique to weight the cepstral coefficients by a tapered window so as to minimize these sensitivities and improving the performance of these coefficients [19, 14, 13, 1]. To achieve the robustness for large values of n, it must consider a more general weighting of the form:

(1 k ) E

( i 1 )

where k i : Partial Correlation Coefficients (PARCOR). a j ( i ) : is the jth predictor (LPC) coefficient after i iterations. E ( i ) : is the prediction error after i iterations. The Levinson-Durbin procedure takes the autocorrelation sequence as its input, and produces the coefficients a[k]; k = 1 p. The time complexity of the procedure is O(p 2) as opposed to standard Gaussian elimination method whose complexity is O(p3). Equations (16 19) are solved recursively for i = 1, 2, , p, where p is the order of the LPC analysis and the final solution is given as [13, 1, 10, 20, 19]:

c (n) c(n) w(n) , 1 n p

P n w(n) 1 sin( ) , p 2

(24)

where

1 n p

(25)

aj aj( p) ,

1 j p

(20)

This weighting function truncates the computation and deemphasis c n around n = 1 and around n = P [19]. Line Spectral Frequencies (LSFs) Another representation of the LP parameters of the all-pole spectrum is the set of line spectral frequencies (LSFs) or line spectrum pairs (LSPs) [23, 21]. It is proposed to be employed in speech compression and other audio signals, which is the most widely representation of LPC parameters used for quantization and coding but they have been applied with good results to speaker recognition [23, 24, 10, 1]. LSFs are the roots of the following polynomials:

Partial Correlation Coefficients (PARCOR) Several alternative representations can be derived from LPC coefficients when the autocorrelation method is used. The Levinson-Durbin algorithm produces the quantities {[k i ]}; i = 1, 2, p (are in the range of - 1 k i 1 ), which is known as the reflection or PARCOR coefficients [13, 1]. Log Area Ratio (LAR) A new parameter set, which can be derived from the PARCOR coefficients, is obtained by taking the logarithm of the area ratio, yielding log area ratios (LARs) {g i } defined as [19, 20, 22, 10, 1, 13].

P(z) = B(z ) + z -(p+ 1 ) B(z -1 ) Q(z) = B(z ) - z -(p+ 1 ) B(z -1 )

(26) (27)

1 ki g i log 1 k , i

1 i p

(21)

Arcsin Reflection Coefficients (ASRC) An alternative for the log area ratios are arcsin reflection coefficients, simply computed as taking the sine inverse of the reflection coefficients [10, 1, 13].

where B(z) = 1/H(z) = 1 A(z) is the inverse LPC filter. The roots of P(z) and Q(z) are interleaved and occur in complexconjugate pairs so that only p/2 roots are retained for each of P(z) and Q(z) (p roots in total). Also, the root magnitudes are known to be unity and, therefore, only their angles (frequencies) are needed. Each root of B(z) corresponds to one root in each of P(z) and Q(z). Therefore, if the frequencies of this pair of roots are close, then the original root in B(z) likely represents a formant, and, otherwise, this latter root represents a wide bandwidth feature of the spectrum. These correspondences provide us with an intuitive interpretation of the LSP coefficients [13].

arcsin i sin 1 ( k i ) ,

1 i p

(22)

Linear Predictive Cepstral Coefficients (LPCC) An important fact is that cepstrum can also be derived directly from the LPC parameter set. The relationship between

5
559

3.5. Pattern Matching

The resulting test template, which is an N-dimensional feature vector, is compared against the stored reference templates to find the closest match. The process is to find which unknown class matches a predefined class or classes. For the speaker identification task, the unknown speaker is compared to all references in the database. This comparison can be done through Euclidean (E.D.) or city-block (C.D.) distance measures [1, 25, 26], as shown below:
E .D . (ai i 1
N

Table-4: Identification Rate for the LP based Coefficients Euclidean Distance (E. D.) 84.173 95.173 94.000 94.826 97.087 95.695 City-block Distance (C. D.) 84.217 95.260 94.652 95.608 97.521 95.782

LPC PARCOR LAR ASRC LPCC LSF

bi ) 2

(28)

C .D .

(29)

where A and B are two vectors, such that A = [a1 a2 aN] and B = [b1 b2 bN].

3.6. Decision Rule

The decision rule process is to select the pattern that best match the unknown one. The primary methods for the discrimination process are either to measure the difference between the two feature vectors or to measure the similarity. In our approach the minimum distance classifier, by measuring the difference between the two patterns, is used for speaker recognition. This classifier assigns the unknown pattern to the nearest predefined pattern. The bigger distance between the two vectors, is the greater difference. On the other hand, the identity of the unknown speaker was verified by considering the best matched reference in the database where their distance is lower than a certain threshold [1, 25, 26].
Figure (4): Identification Rate for the LP based Coefficients

It is clear from Table-4 and its corresponding chart Fig. (4), that all tested LPC-derived features outperform the raw LPC coefficients which give about 84% identification rates.

4.2. Preemphasis of Speech Signals

There is a need to see the effects of preemphasis on digital speech signals before any further preprocessing steps. This is obviously demonstrated in the classification results of Fig. (5) according to the following conditions: preemphasis of speech signals, P = 15, remove DC offset, no overlap between successive frames, normalization, rectangular window type, and City-block distance measure were used.

4. Experimental Results
Many experiments and test conditions were accomplished to measure the performance of the proposed system with different criterions concerning: preemphasis, frame overlapping, LPC order, window type, cepstral weighting and the text-prompt speaker verification. The identification rate is defined as the ratio of correct identified speakers to the total number of test samples which corresponds to a nearest neighbor decision rule.
Identification Rate No. of Correctly Identified Spea ker s Total No. of SamplesTested 100 %

(30)

4.1. Identification Rate for LP based Coefficients

A more appropriate comparison can be made if the entire LP based coefficients methods (LPC, PARCOR, LAR, ASRC, LPCC and LSF) are measured under identical conditions (the order of LP based coefficients P = 15, remove DC offset, no overlap between successive frames, normalization, rectangular window type). The classification results are shown in Table-4 and its equivalent chart Fig. (4).
Figure (5): Effect of Preemphasis Speech Samples on Identification Rates

Figure (5) clearly indicates the higher improvements in identification rates overall LPC-based systems in the range of 93% to 98% after applying the preemphasis step to the speech signal.

6
560

4.3. LPC Predictor Order (P)

The order of the linear prediction analysis (P) is a compromise among spectral accuracy and computation complexity (time/memory). Based on the previous tests; further improvements in identification rates, shown by Table-5 and Fig. (6), can be achieved when LPC predictor order (P) is studied according to different values (P = 15, 30, 45) with overlapping successive frames to 50% of frame size.
Table-5: Identification Rates for different LPC Predictor Order (P =15, 30, 45) P = 15 92.695 97.652 95.478 97.434 99.130 98.130 P = 30 96.347 99.347 98.782 99.173 99.956 99.695 P = 45 97.565 99.869 99.347 99.695 99.956 99.826
LPC PARCOR LAR ASRC LPCC LSF

Table-6: Identification Rates for LP based Coefficients with different Window Type

LPC PARCOR LAR ASRC LPCC LSF

Rectangular Hamming 97.5652 94.9130 99.8696 99.4783 99.3478 99.4783 99.6957 99.4783 99.9565 99.9565 99.8261 99.9130

Kaiser 97.6087 99.7391 99.6087 99.6522 100 99.9130

4.5. Text-Prompt Speaker Verification

Another test is needed for the sake of verifying speakers identity from a randomly text-prompt generated by the system. This is relies on the best results obtained from the previous experiments, it is undoubtedly illustrated that LPCC exhibits paramount results when compared to other LP based coefficients. Therefore, it is selected to be the feature extraction method for speaker verification mode. The advantage of textprompting is that a possible intruder cannot know beforehand what the phrase will be because the prompt text is changed on each trial. Furthermore, our system takes additional precautions for the recording time by forcing the user to utter the pass phrase within a short time interval (up to 15 seconds), which provides additional difficulty on the intruder to use a device or software that synthesizes the users voice. It is worthwhile that the system is automatically split the sentence back to its attribute samples, then a pattern matching process is performed only to those equivalent samples features in the database (with regards to the system security threshold). A total of 213 speakers trials (8 samples length each) from 23 persons (authorized and unauthorized) are considered for verification test to obtain the optimum threshold of Crossover Error Rate (CER). This is defined as the point where the False Rejection Rate (FRR) and the False Acceptance Rate (FAR) curves meet in verifying user's identity. Different threshold values were considered in the verification test, as shown in Table-7.
Table-7: Text-Prompt Speaker Verification Rates for different Thresholds using City-block distance Threshold Successful () Decision 15.05 15.40 15.75 16.10 16.45 16.80 17.15 17.50 17.85 18.20 18.55 65.2582 71.8310 83.5681 92.9578 97.1831 99.5305 99.5305 97.1831 96.2442 95.3052 95.3052 FAR 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 2.8169 3.7558 4.6948 4.6948 FRR 34.7418 28.1690 16.4319 7.0422 2.8169 0.4695 0.4695 0.0000 0.0000 0.0000 0.0000

LPC PARCOR LAR ASRC LPCC LSF

100 99

Identification Rate %

98 97 96 95 94 93 92 15 30 45 LPC Order P (Num ber of Coefficients)

Figure (6): Effect of LPC Predictor Order (P =15, 30, 45) on Identification Rates

It is clearly seen from the results of Table-5 and Fig. (6) that the increasing number of predictor order P with the overlap between successive frames give positive influence for most identification rates. Therefore, the predictor order P is taken to be 45 for the next experimental tests.

4.4. Windowing Function

After determining the appropriate LPC predictor order, the system behavior for different window types must be studied. Therefore, Table-6 is considered for this purpose according to the following conditions: Rectangular, Hamming and Kaiser window types, overlap successive frames to 50% of frame size, LPCC cepstral weighting. From this experiment, it is clearly indicated that the speaker identification rates are improved further by adopting new window types like Kaiser window. The latter gives the best accuracy when compared to the other two window types used (rectangular and Hamming).

7
561

The successful decision in Table-7 corresponds to the rate of accepting registered persons and rejecting non-registered ones for all trials. The variation of FAR and FRR with different threshold values are also shown in Fig. (7), where the obtained CER is approximately 17.15 (which is the most suitable security threshold) for 99.53% successful decision rate.

[3] [4]

[5] [6]

[7] [8]

[9] [10]

[11] [12]

Figure (7): FAR and FRR Performance Curve for different threshold levels using city-block distance

[13]

5. Conclusion
A speaker recognition system for 6304 speech samples is presented that relies on LPC-derived features and acceptable results have been obtained. In the closed-set speaker identification, it is found that all tested LPC-derived features outperform the raw LPC coefficients where 84% to 97% identification rates are achieved. An improvement in identification rates with LPCbased systems is obtained in the range of 97% to 99% by applying the preprocessing steps (preemphasis, remove DC offset, frame blocking, overlap successive frames to 50% of frame size, normalization and windowing) to the speech signal and increasing the predictor order (P). According to speaker identification tests performed, one can deduce that LPCC exhibits paramount results when compared to other LPC based coefficients. However, the accuracy can be further improved by weighting the cepstral coefficients to obtain identification rates close to 100%. The open-set speaker verification mode is also presented for 213 trials (randomly text-prompt sentences generated by the system) from 23 persons (1704 samples). The obtained verification rates, greater than 99%, using our proposed system model is considered to be quite suitable.

[14]

[15]

[16]

[17]

[18] [19] [20]

[21]

[22] [23]

[24]

References
[1] Mustafa D. Al-Hassani, Identification Techniques using Speech Signals and Fingerprints, Ph.D. Thesis, Department of Computer Science, AlNahrain University, Baghdad, Iraq, September 2006. Tiwalade O. Majekodunmi, Francis E. Idachaba, A Review of the Fingerprint, Speaker Recognition, Face Recognition and Iris

[25] [26]

Recognition Based Biometric Identification Technologies, Proceedings of the World Congress on Engineering Vol. II WCE, London, U.K, 2011. M. Eriksson, Biometrics Fingerprint based identity verification, M. Sc. Thesis, Department of Computer Science, UME University, August 2001. Yuan Yujin, Zhao Peihua, ZhouQun, Research of Speaker Recognition Based on Combination of LPCC and MFCC, Electronic Information Engineering, Training and Experimental Center, Handan College, China, 2010. Anil K. Jain and Arun Ross, "Introduction to Biometrics, Springer Science+Business Media, LLC, USA, 2008. S. Gunnam, Fingerprint Recognition and Analysis System, A mini-thesis Presented to Dr. David P. Beach, Dept of Electronics and Computer Technology, Indiana state university, Terre Haute, In Partial Fulfillment of the Requirements for ECT 680, April 2004. E. Hjelms, Biometric Systems: A Face Recognition Approach, Department of Informatics, University of Oslo, Oslo, Norway, 2000. Valentin Andrei, Constantin Paleologu, Corneliu Burileanu, Implementation of a Real-Time Text Dependent Speaker Identification System, University Politehnica of Bucharest, Romania, 2011. E. Karpov, Real-Time Speaker Identification, M. Sc. Thesis, Department of Computer Science, University of Joensuu, Finland, January 2003. T. Kinnunen, Spectral Features for Automatic Text-Independent Speaker Recognition, Ph. D. Thesis, Department of Computer Science, University of Joensuu, Finland, December 2003. T. Chen, The Past, Present, and Future of Speech Processing, IEEE Signal Processing Magazine, No.5, May 1998. Biswajit Kar, Sandeep Bhatia & P. K. Dutta, Audio -Visual Biometric Based Speaker Identification, International Conference on Computational Intelligence and Multimedia Applications, India, 2007. Antonio M. Peinado, Jose C. Segura, Speech Recognition Over Digital Channels: Robustness and Standards, John Wiley & Sons Ltd, University of Granada, Spain, 2006. B. R. Wildermoth, Text-Independent Speaker Recognition using Source Based Features, M. Sc. Thesis, Griffith University, Australia, January 2001. Ch.Srinivasa Kumar, P. Mallikarjuna Rao ,Design of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm, Ch.Srinivasa Kumar et al. / International Journal on Computer Science and Engineering (IJCSE), Vol. 3 No. 8 August 2011. Ciira wa Maina and John MacLaren Walsh, Log Spectra Enhancement Using Speaker Dependent Priors for Speaker Verification, Drexel University, Department of Electrical and Computer Engineering, Philadelphia, PA 19104, 2011. Ning WANG, P. C. CHING, and Tan LEE, Robust Speaker Verification Using Phase Information of Speech, Department of Electronic Engineering, The Chinese University of Hong Kong, 2010. Wai C. Chu, Speech Coding Algorithms: Foundation and Evolution of Standardized Coders, John Wiley & Sons, Inc., California, USA, 2003. L. Rabiner, B.-H. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1993. Yasir. A.-M. Taleb, Statistical and Wavelet Approaches for Speaker Identification, M. Sc. Thesis, Department of Computer Engineering, AlNahrain University, Iraq, June 2003. N. Batri, Robust Spectral Parameter Coding in Speech Processing, M. Sc. Thesis, Department of Electrical Engineering, McGill University, Montreal, Canada, May 1998. J. P. Campbell, Speaker Recognition: A Tutorial, IEEE Proceedings, Vol. 85, No. 9, 1997. A. K. Khandani and F. Lahouti, Intra-frame and Inter-frame Coding of Speech: LSF Parameters Using a Trellis Structure, Department of Electrical and Computer Engineering, University of Waterloo, Ontario, Canada, June 2000. J. Rothweiler, A Root Finding Algorithm for Line Spectral Frequencies, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-99), March 15-19, U.S.A., 1999. S. E. Umbaugh, Computer Vision and Image Processing, PrenticeHall, Inc., U.S.A., 1998. R. C. Gonzalez, Richard E. Woods, Digital Image Processing, Second Edition, Prentice-Hall Inc., New Jersey, U.S.A., 2002.

[2]

8
562

Service Manual: AVR-E200 AVR-X500
100% (1)
Service Manual: AVR-E200 AVR-X500
120 pages
BHEL 600 MW Soot Blower
75% (8)
BHEL 600 MW Soot Blower
279 pages
IGV Commissioning For AddFem POCO+ Logic
100% (5)
IGV Commissioning For AddFem POCO+ Logic
17 pages
TEXT-PROMPTED REMOTE SPEAKER AUTHENTICATION - Project Report - GANESH TIWARI - IOE - TU
94% (18)
TEXT-PROMPTED REMOTE SPEAKER AUTHENTICATION - Project Report - GANESH TIWARI - IOE - TU
71 pages
Presentation Voice Recognition
No ratings yet
Presentation Voice Recognition
15 pages
RFP For LT Panel
100% (2)
RFP For LT Panel
13 pages
Self Learning Speaker Identification A System For PDF
No ratings yet
Self Learning Speaker Identification A System For PDF
185 pages
21.5" LCD Color Monitor Aoc F22: 6.2 Electric Block Diagram
100% (1)
21.5" LCD Color Monitor Aoc F22: 6.2 Electric Block Diagram
10 pages
Catalog For NXMG Chint Breakers For Encloser Box
No ratings yet
Catalog For NXMG Chint Breakers For Encloser Box
6 pages
Automatic Speaker Recognition System Based On Machine Learning Algorithms
0% (1)
Automatic Speaker Recognition System Based On Machine Learning Algorithms
12 pages
Automatic Speaker Verification
No ratings yet
Automatic Speaker Verification
24 pages
Speaker Recognition System
No ratings yet
Speaker Recognition System
7 pages
Using Gaussian Mixture: Automatic Speaker Recognition Speaker Models
No ratings yet
Using Gaussian Mixture: Automatic Speaker Recognition Speaker Models
20 pages
Puede-ser-Speaker Identification Based On Hybrid Feature
No ratings yet
Puede-ser-Speaker Identification Based On Hybrid Feature
6 pages
Shareef Seminar Docs
No ratings yet
Shareef Seminar Docs
24 pages
Final Thesis Speech Recognition
No ratings yet
Final Thesis Speech Recognition
45 pages
Speaker Recognition
No ratings yet
Speaker Recognition
11 pages
Fyp Final Poster
No ratings yet
Fyp Final Poster
1 page
Speech Recognition Using Correlation Tec
No ratings yet
Speech Recognition Using Correlation Tec
8 pages
Biometric Voice Recognition in Security System
No ratings yet
Biometric Voice Recognition in Security System
9 pages
Digital Signal Processing: The Final
No ratings yet
Digital Signal Processing: The Final
13 pages
DC Motor Control
No ratings yet
DC Motor Control
2 pages
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
No ratings yet
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
6 pages
Speaker Recognition System: A Project Report On
No ratings yet
Speaker Recognition System: A Project Report On
48 pages
MajorInterim Report1
No ratings yet
MajorInterim Report1
10 pages
Methodology For Speaker Identification and Recognition System
100% (1)
Methodology For Speaker Identification and Recognition System
13 pages
Speech Recognition: A Complete Perspective: Ashok Kumar, Vikas Mittal
No ratings yet
Speech Recognition: A Complete Perspective: Ashok Kumar, Vikas Mittal
6 pages
An Introduction To Speech and Speaker Recognition
No ratings yet
An Introduction To Speech and Speaker Recognition
8 pages
Combination of LPC and ANN For Speaker Recognition
No ratings yet
Combination of LPC and ANN For Speaker Recognition
5 pages
Improvement of Text Dependent Speaker Identification System Using Neuro-Genetic Hybrid Algorithm in Office Environmental Conditions
No ratings yet
Improvement of Text Dependent Speaker Identification System Using Neuro-Genetic Hybrid Algorithm in Office Environmental Conditions
6 pages
Voice Recognition Using Matlab
100% (1)
Voice Recognition Using Matlab
10 pages
Principle and Applications of Speaker Recognition Security System
No ratings yet
Principle and Applications of Speaker Recognition Security System
5 pages
Performance Comparison of Robust Speech PDF
No ratings yet
Performance Comparison of Robust Speech PDF
6 pages
Reconocimiento de Voz - MATLAB
No ratings yet
Reconocimiento de Voz - MATLAB
5 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
10 pages
Speaker Recognition System - v1
No ratings yet
Speaker Recognition System - v1
7 pages
Group Members:: Coefficients (MFCC), and Others. MFCC Is Perhaps The Best Known and Most Popular
No ratings yet
Group Members:: Coefficients (MFCC), and Others. MFCC Is Perhaps The Best Known and Most Popular
1 page
Automatic Speaker Recognition System
No ratings yet
Automatic Speaker Recognition System
11 pages
MS Thesis 459199
No ratings yet
MS Thesis 459199
60 pages
++++tutorial Text Independent Speaker Verification
No ratings yet
++++tutorial Text Independent Speaker Verification
22 pages
A Fast Shrinking Suspicious Criminal System From The Voice
No ratings yet
A Fast Shrinking Suspicious Criminal System From The Voice
5 pages
Voice Recognition System Using Machine L
No ratings yet
Voice Recognition System Using Machine L
7 pages
DSP Implementation of Voice Recognition Using Dynamic Time Warping Algorithm
No ratings yet
DSP Implementation of Voice Recognition Using Dynamic Time Warping Algorithm
7 pages
Advanced Signal Processing Using Matlab
No ratings yet
Advanced Signal Processing Using Matlab
20 pages
AC Circuits For Practice
No ratings yet
AC Circuits For Practice
49 pages
EEL6586 Final Project:: A Speaker Identification and Verification System
No ratings yet
EEL6586 Final Project:: A Speaker Identification and Verification System
16 pages
Speaker Recognition Using Vector Quantization and Gaussian Mixture Models
No ratings yet
Speaker Recognition Using Vector Quantization and Gaussian Mixture Models
6 pages
Field Evaluation of Text-Dependent Speaker Recognition in An Access Control Application
No ratings yet
Field Evaluation of Text-Dependent Speaker Recognition in An Access Control Application
4 pages
بحث عمر
No ratings yet
بحث عمر
25 pages
Sita#1part2 Merged
No ratings yet
Sita#1part2 Merged
61 pages
Speaker Recognition Publish
No ratings yet
Speaker Recognition Publish
6 pages
LT2000 User Guide
No ratings yet
LT2000 User Guide
18 pages
Real Time Speaker Recognition
No ratings yet
Real Time Speaker Recognition
45 pages
User's Manual 9410500, Rev A, Protector Premier
No ratings yet
User's Manual 9410500, Rev A, Protector Premier
59 pages
Code Converters, Multiplexers, and Demultiplexers
No ratings yet
Code Converters, Multiplexers, and Demultiplexers
34 pages
(RMU) OCR - Manual - EN - 201909
No ratings yet
(RMU) OCR - Manual - EN - 201909
20 pages
Telephone Line Traffic
No ratings yet
Telephone Line Traffic
5 pages
Elementary Electrical Engineering - MODULE 8 Part 2 - Series Parallel Conditions
No ratings yet
Elementary Electrical Engineering - MODULE 8 Part 2 - Series Parallel Conditions
14 pages
Mitsubishi Sl4u PDF
No ratings yet
Mitsubishi Sl4u PDF
35 pages
Mini Project Report Template
No ratings yet
Mini Project Report Template
31 pages
DTMF Controlled Robot Without Microcontroller (Aranju Peter)
No ratings yet
DTMF Controlled Robot Without Microcontroller (Aranju Peter)
10 pages
An Automatic Speaker Recognition System
No ratings yet
An Automatic Speaker Recognition System
11 pages
Lecture Notes 10 - Monday 7/10: Summary of Last Lecture
No ratings yet
Lecture Notes 10 - Monday 7/10: Summary of Last Lecture
5 pages
Speaker Recognition System
No ratings yet
Speaker Recognition System
7 pages
Aliasing: Aliasing Is An Effect That Causes Different Signals To Become Indistinguishable From
No ratings yet
Aliasing: Aliasing Is An Effect That Causes Different Signals To Become Indistinguishable From
6 pages
Ottobit JR Manual Meris Pedal
No ratings yet
Ottobit JR Manual Meris Pedal
14 pages
Area Negative Diff: Exch1 Exch2 Orgname Avlb INS AD AO ACI AC REQ Diff Type
No ratings yet
Area Negative Diff: Exch1 Exch2 Orgname Avlb INS AD AO ACI AC REQ Diff Type
1 page
State of The Art in Speaker Recognitin - 2202.12705v1
No ratings yet
State of The Art in Speaker Recognitin - 2202.12705v1
7 pages
Presentation 2
No ratings yet
Presentation 2
5 pages
Services in NGN - Next Generation Networks
No ratings yet
Services in NGN - Next Generation Networks
6 pages
Schedule 4 Co To Co Connection
No ratings yet
Schedule 4 Co To Co Connection
27 pages
Independent Writing Skill 1
No ratings yet
Independent Writing Skill 1
30 pages
Set 23 - PI Controller For Simple Applications: Consumer
No ratings yet
Set 23 - PI Controller For Simple Applications: Consumer
6 pages
Seminar On Appletalk: Guided By: Submitted By: Ms. Sasmita Acharya Dinesh Kumar Biswal
No ratings yet
Seminar On Appletalk: Guided By: Submitted By: Ms. Sasmita Acharya Dinesh Kumar Biswal
21 pages
2 Phone Intercom With Buzzers
No ratings yet
2 Phone Intercom With Buzzers
4 pages
A Review On Speaker Recognition - Technology and Challenges
No ratings yet
A Review On Speaker Recognition - Technology and Challenges
14 pages
Data Sheet 24v-250ah Battery (STS) Vrla PDF
No ratings yet
Data Sheet 24v-250ah Battery (STS) Vrla PDF
9 pages
Service Documentation: Market Release 2/88
No ratings yet
Service Documentation: Market Release 2/88
6 pages
Low Power DFF Paper Mod
No ratings yet
Low Power DFF Paper Mod
15 pages
1 s2.0 S2405844021003522 Main
No ratings yet
1 s2.0 S2405844021003522 Main
11 pages
Spek Trafo
No ratings yet
Spek Trafo
1 page
Rheem Ruud Heavy Duty Spec Sheet
No ratings yet
Rheem Ruud Heavy Duty Spec Sheet
4 pages
LCD
No ratings yet
LCD
4 pages
Data Sheet 3RV1901-1E
No ratings yet
Data Sheet 3RV1901-1E
3 pages
An Overview of The Development of Speaker Recognition
No ratings yet
An Overview of The Development of Speaker Recognition
11 pages
Voice Syn - NN
No ratings yet
Voice Syn - NN
15 pages
ACS LZ LC Evaluation Board User Guide
No ratings yet
ACS LZ LC Evaluation Board User Guide
8 pages
Ai Project Sona-1 (1) - 250630 - 194118
No ratings yet
Ai Project Sona-1 (1) - 250630 - 194118
10 pages
Basic Electronics Model Question Paper-2022!23!2
No ratings yet
Basic Electronics Model Question Paper-2022!23!2
4 pages
Top Networking Terms You Should Know
From Everand
Top Networking Terms You Should Know
JOHN SMITH
No ratings yet
Automatic Target Recognition: Fundamentals and Applications
From Everand
Automatic Target Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
From Everand
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
Fouad Sabry
No ratings yet
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Speaker Recognition: Fundamentals and Applications
From Everand
Speaker Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Voice Technologies and Systems: Definitive Reference for Developers and Engineers
From Everand
Voice Technologies and Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Design A Text-Prompt Speaker Recognition System Using LPC-Derived Features

Uploaded by

Design A Text-Prompt Speaker Recognition System Using LPC-Derived Features

Uploaded by

The 13th International Arab Conference on Information Technology ACIT'2012 Dec.

10-13 ISSN : 1812-0857

Design A Text-Prompt Speaker Recognition System Using LPC-Derived Features

2. Aim of the Work

3. The Proposed Speaker Recognition System Model

3.1. Speech Recording

3.2. Database Construction

3.4. Feature Extraction

Ra v which can be rewritten as [13, 1, 21]:

In matrix form, the set of linear equations is represented by

where N is the number of data points in the segment.

1) a(j i ) a(j i1 ) ki ai(i , j

c (n) c(n) w(n) , 1 n p

P(z) = B(z ) + z -(p+ 1 ) B(z -1 ) Q(z) = B(z ) - z -(p+ 1 ) B(z -1 )

3.5. Pattern Matching

LPC PARCOR LAR ASRC LPCC LSF

3.6. Decision Rule

4.2. Preemphasis of Speech Signals

4.1. Identification Rate for LP based Coefficients

4.3. LPC Predictor Order (P)

LPC PARCOR LAR ASRC LPCC LSF

Kaiser 97.6087 99.7391 99.6087 99.6522 100 99.9130

4.5. Text-Prompt Speaker Verification

LPC PARCOR LAR ASRC LPCC LSF

98 97 96 95 94 93 92 15 30 45 LPC Order P (Num ber of Coefficients)

4.4. Windowing Function

[18] [19] [20]

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.