0% found this document useful (0 votes)

13 views15 pages

Identification of Speaker From Disguised Voice Using MFCC Feature Extraction, Chi Square and Classification Technique

Uploaded by

Nids Chakravarty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views15 pages

Identification of Speaker From Disguised Voice Using MFCC Feature Extraction, Chi Square and Classification Technique

Uploaded by

Nids Chakravarty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Wireless Personal Communications (2024) 138:973–987

https://doi.org/10.1007/s11277-024-11542-0

Identification of Speaker from Disguised Voice Using MFCC

Feature Extraction, Chi‑Square and Classification Technique

Mahesh K. Singh1

Accepted: 28 August 2024 / Published online: 10 September 2024

Abstract
The purpose of this manuscript is to show that certain acoustic features can be used to rec-
ognize the disguised speech of unknown speakers. As the name implies, forensic speaker
identification entails the use of scientific techniques to ascertain an unknown speaker’s
identity during an inquiry. This study aims to provide a voice recognition method that
works well. To distinguish between speech and background noise in each frame, chi-square
tests are utilized. The estimated background noise is continuously modified to achieve this.
Chi-square noise estimations are then obtained once background noise has initially been
reduced. The observed signal distribution and the estimated noise distribution are com-
pared using a second chi-square test, this time using a different approach. For the frame
to be labelled as noise, the chi-square test scores must be close together. Mel-frequency
cepstrum coefficient (MFCC), features are grouped as three-dimensional features. The cor-
relation coefficient characteristics of speech are coupled with the different MFCC feature
extraction technique. The feature-based classification is done with support vector machine
(SVM) classifiers and k-nearest neighbor (k-NN) classification technique. Classification
results show that applying these unique features in an SVM classifier boosts classification
accuracy.

Keywords MFCC · SVM · k-NN · Voice disguised · Chi-square

1 Introduction to Disguise Voice

It is assumed that the caller is attempting to hide their genuine identity if they are using
a disguised voice. Due to the risk they provide to investigators, forensic investigations
involving Indian suspects are more likely to encounter this issue. This kind of behaviors
may diminish a person’s sense of value [1]. By changing his voice in a number of dif-
ferent ways, a speaker might deceive a listener or an automated system. It is regarded as
one of the most restricting features for speaker recognition. There are some crimes that
are more likely to be perpetrated in broad daylight than others. Speakers are additional
expected to adopt a disguise if the suspicious wants to prevent the auditor who is aware

* Mahesh K. Singh
mahesh.092002.ece@gmail.com
1
Department of ECE, Aditya University, Surampalem, India

Vol.:(0123456789)
974 M. K. Singh

with their speech from recognizing them. In circumstances like kidnapping, extortion,
or harassing phone calls, a disguise is more likely [2, 3].
The ability of human auditors to correctly identify the speakers in recorded conversa-
tions is being evaluated. Confident speakers performed well on the concealing test in
both relaxed and tense situations, scoring 79% and 98%, respectively. In both relaxed
and tense circumstances, individuals who were oblivious to the speaker performed 0.5
points worse than those in covert listening situations [4, 5]. Although individuals who
did not know the speakers or their language scored lower, there was no change in the
data patterns. Examples of disguise include covering one’s mouth with a handkerchief
or changing the pitch of one’s voice. Additionally, formant frequencies can be changed
by moving articulators like the tongue or lips [6]. The dictionary describes a disguise
as altering or distorted speech, regardless of the reason. He also differentiates between
electronic and non-electronic disguises, both of which are referred to as “deliberate.”
Altering one’s accent, voice quality, and phonatory alterations like whispering and fal-
setto are examples of non-electronic deceptions [7]. Another typical non-electric tech-
nique is to obstruct the voice organ’s movement by inserting a pencil between one’s
teeth. Health, age, emotions, and the distortions caused by recording and transmission
technology are just a few of the variables that might affect a person’s natural voice.
Although these effects aren’t considered disguises, they could nonetheless cause a per-
son’s voice to be mistakenly identified [8].
The use of voice scrambling and voice modification tools can mask other techni-
cal disguises. These technologies allow for accurate pitch and frequency adjusting of
the speech stream. A technique was put forth to make it easier to find noises that have
been electronically concealed by using mean values and correlation coefficients [9].
After rigorous testing, this system was able to recognize more than 90% of the voices in
diverse speech datasets that had been disguised in different ways. Feature extraction and
classification are the two distinct parts of speakers recognition systems, like many other
form of recognition. There are two main parts to categorization: decision-making and
decision and pattern matching shown in Fig. 1.
In order to provide speaker-specific information, a combination of phonological,
phonetic, semantic, and auditory alterations must be made [10]. As a speaker’s commu-
nicative intent and dialogue engagement change, changes are made to the speech signal
at the semantic level. When a speaker’s word choice and sentence structure reveal some-
thing about their financial class or educational background, it’s a good indicator of their
caliber. Ultimately, the phonological representation of the communication objective is
what matters most. Many things about someone’s original language and geographic area
can be gleaned from their voice and sentence length. The articulators of the vocal tract,
which include the jaw, tongue, velum, and vocal cords, collaborate to provide a phonetic
representation [11]. To make the same phoneme, for instance, the speaker can employ a
different combination of articulator movements. The spectral characteristics of audible

Fig. 1 Speaker identification

model from disguise voice
Identification of Speaker from Disguised Voice Using MFCC Feature… 975

speech are the primary emphasis at the acoustic level. The length and form of the vocal
fold, for example, affect the fundamental and resonant frequencies of the voice [12].
It described a technique for identifying a person’s voice even while they are hiding
their identity with a mask. Without the use of any additional presentations or aids, only
65% of participants correctly identified speakers with shaky voices, compared to 90%
accuracy with undisguised sounds. Falsetto phonation voice-masking was put to the test
[13]. False phonation was shown to be far less effective at distinguishing people from
one another than natural speech. Recognition rates declined from 97 to 4% when fal-
setto was used to muddle speech. research of the impact of common vocal techniques,
like changing one’s voice tone or pinching one’s nose, on forensic speaker identifica-
tion systems Falsetto speech might be a contributing issue here. The results showed that
when the reference populations had speech samples with similar voice disguises, using
three alternative voice disguises had no impact on a system’s performance [14]. The
impacts on the three different types of disguises under examination were more severe
and diverse if the only individuals in the control group were those who spoke normally
[6].
The speaker identification task is to identify the speaker who recorded the input speech
sample from a list of previously identified speakers. The collection of recognized voices
works in one of two ways. Identification can be done in two ways: open set mode or in
closed set mode (for short). When working in closed-set mode, the systems considers that
the voice to be identified must come from a group of voices that are previously known
[15]. It will operate in open-set mode under all other conditions. If the closed-set speaker’s
identity is known, it might be considered a problem of multiple-class categorization [16].
They are called speakers in open-set mode since they are not considered to be part of the
recognized voices. It is possible to use this problem to identify a criminal from an enor-
mous pool of previously known suspects, for example. Speech evidence is an example of
this kind of use. An investigation on the differences in listening capacities between pho-
netic experts and non-experts is described in this article [17]. The main objective of the
inquiry was to classify the speaker. Participants in both groups were requested to select the
speaker’s voice from a selection of five foils in an activity known as “direct identification.”
People who had previously studied phonetic speaker identification did significantly
better than those who had never studied the subject at all, according to the findings of a
research study [18]. Verifying the identity of a speaker is made possible through the use of
voice samples. This procedure is used to verify the authority of a speaker [19, 20]. Speaker
verifications are some of the terms used to describe this process. This situation could be
described as a true–false dilemma. Due of the open-set nature of the task at hand, it’s com-
mon to hear about the open-set difficulty when discussing this method. This is because the
goal is to identify the claimed speaker’s voice. For the time being, verification is the most
profitable activity, and voice recognition systems heavily rely on it [1].
For instance, the following are a few applications of voice recognition technology for
speakers: The technology of speaker recognition has several applications. Here is a list of
some practical applications for speaker recognition [8].
Security Speaker recognition system can be used for many dissimilar things, such as
transaction authentication, phone verification for banking access.
Personalization Personalized caller greetings are now a feature of intelligent answering
machines thanks to voice recognition technology improvements. We could design conver-
sational systems that are specific to each user’s requirements. Based on their profile, these
systems can recognize the user and point them in the direction of the goal in the shortest
period of time [15].
976 M. K. Singh

Information Retrieval Speaker recognition can be used to obtain information based on

the interests of the speakers in order to manage and access multimedia databases.
Speaker tracking Knowing who is speaking at any one time on a teleconference when
there are several participants who don’t know one another is useful.
It is clear from these applications that accurate voice recognition is needed. To call tel-
ephone-assisted services, a caller can utilize any phone network and any acoustic environ-
ment (such as a street or an office) (land line or cellular). They are free to speak in any
manner they choose as long as the microphone is facing them. In these conditions, it is
always possible to run into inappropriate situations. In many applications, speech recogni-
tion’s resilience to hardship is crucial [19].

2 Related Works from Disguised Voice

The term “speaker recognition” refers to a diversity of technique used to recognize the
source of a speech sample. For one thing, it’s possible to analyze the meaning of what
someone says by looking at their own distinctive vocal qualities and the words they use to
express their thoughts and feelings [1]. The examples of the same speaker, however, show
a wide range of differences. Because a speaker is unable to recite the same words over and
over again, this is the circumstance. In spite of this, trail to trail, a person’s signature differs
from person to person. It’s possible to categorize the work of a speaker recognition engi-
neer into three distinct specialties, depending on the specifics of the research at hand [4, 5].
Speaker recognition (also known as speech recognition) is sometimes considered a
catch-all term for the various subcategories of speaker identification and verification.
Overall, it describes how to recognize someone from their speech by assessing these quali-
ties alone, and how to do so [7]. To identify the speaker, you don’t analyze the language,
remember what the person looks like, or any other method. When a person isn’t clear if the
process is verification or identification, this can be employed [9, 10].
The verifiable activities in an open-set setting and speaker identification tasks in a
closed-set setting are combined into a single challenge in an identification for speakers
in an open set. In a closed-set environment, the system can identify recognized speakers,
but it has to be able to identify “unregistered speakers.” The use of speaker verification to
restrict access to financial services over the phone is a possible security method. It is essen-
tial to use methods that segment and group speakers when there are several presenters [12,
13]. When a voice recognition or speaker recognition application assumes that a specific
speaker’s speech can be processed, this assumption is common. Before the recognition pro-
cess can begin, the speech must be split into chunks including the voices of each of the sev-
eral speakers to guarantee that the intended speaker’s voice is not confused with the sounds
of other speakers. Segmentation is the term for this technique. To put it another way, the
goal here is to figure out who the speakers are in the incoming audio and then partition the
audio into identical pieces [6, 15].
Multi-speaker audio has recently been more prominent in popular web searches and con-
sumer electronics products, which has heightened interest in this activity. Audio archives can
be indexed using speaker segmentation and grouping, making it easier to find the record-
ings you need. Methods that don’t rely on the text are also available for automatic speaker
Identification of Speaker from Disguised Voice Using MFCC Feature… 977

recognition [18, 19]. For the system to learn, users must repeatedly enter the same text, but
with text-independent recognition, they can enter any phrase as long as it hasn’t been used
prior to. Typically, the “target speaker,” or “one who speaks for the model in question,” is
referred to as the one who makes the assertion. Test samples are compared to stored speech
models in order to identify the speaker of a given phrase or sentence. Model is the only char-
acteristic of the alleged identification that will be examined in the speaker verification process.
It’s also possible that the decision will be different depending on the system that you use. If
the test sample does not match one of the previously recorded speech models, an open-set rec-
ognition can reject the user. Verification tasks, on the other hand, are capable of accepting or
rejecting a person’s assertion of identity [15, 17].
You can select the identity of the model that most closely fits a test sample in a closed-
set identification task. Comparing the two is a simple way to do this. Additionally, in open-
set applications, a threshold may also be necessary to ensure that the likeness is genuine [12,
14]. For an open-set application, the cost of an error must be taken into account throughout
the selection process to account for the exclusion of some speakers. Discouraging a genuine
customer rather than dealing with the aftermath of turning away an impostor who wants to
withdraw money saves the bank money. In a particular setting, the effectiveness of a speaker-
recognition system can alter [3, 7]. Contrasting the system’s ability to identify speakers with
those who have already been recognized is one way of judging whether or not a system is
accurate. A false acceptance of a target speaker or a false rejection of that voice are the two
sorts of faults that can occur in speaker verification systems [8, 17].
It was shown that poor recording quality and a lack of equivalent terminology were the
most common causes of conclusions with no confidence or low confidence. Decisions were
influenced by disguised vocals and high-pitched voices. Far-field microphones are the subject
of investigation to see if they may increase speaker recognition reliability. He tried to find
ways to reduce the number of errors that occurred throughout the speaker recognition proce-
dure due of the instrument [7, 9]. According to a number of experts, the application of speaker
recognition in forensic situations should be approached with utmost caution. Speaker recogni-
tion researchers have a vital role to play in disseminating this information.
It is necessary to use auditory limits closely associated to the speech qualities that differ-
entiate speakers in a method for mechanical speaker recognition. For example, you might say.
The connections that have previously been built between the speech signal and the shapes and
movements of the vocal tracts must be taken into account when determining which parameters
to utilize. Most methods of speaker identification require the extraction of information from
the speaker’s speech in order to function properly. Besides phonetics, prosody, and lexical
info, high-level info contains values like dialect, accent, and the manner of context in which
the speaker speaks. Only humans are capable of recognizing and analyzing these characteris-
tics at this time [5, 7]. There should be a lot of variation across speakers, but this should be
kept to a minimum [6, 9].

• Be able to adapt to changes in sound and noise levels.

• Is regularly and unforcedly employed in conversation.
• Speech signal measurements are simple to perform.
• Cloning is a complicated process.
• Is unaffected by short-term changes in the speaker’s health or voice.
978 M. K. Singh

3 Research Methodology for Speaker Identification

The data sample was collected at signal processing (Acoustic) Lab of Thapar University
Patiala, Punjab, India for the age of 20–25 years students. For the study, 400 people of
various genders, races, religions, and ages were used as test subjects and control, mostly
of north Indian origin. Each voice sample was recorded using a digital recorder of the
highest quality. A variety of acoustical and perceptual factors influenced the recorded
voice samples used to create the disguised audio; therefore, each speaker’s voice sam-
ples were meticulously collected. Three control samples were also taken from each par-
ticipant to see if there was a noticeable difference between the person’s disguised voice
and their natural voice. The 10 disguise techniques listed below are all feasible depend-
ing on the conditions of disguise that various persons selected: Place hand or towel over
mouth, normal voice, differences in voice pitch, state of being extremely cold, state of
my throat, pan or tobacco chewing, voice box constriction, pinching the nose. The block
diagram for acoustic analysis of non-electronic disguised voice is shown in Fig. 2.

3.1 Materials Required for Speech Data Collection

Audacity software
Computerized speech lab
Premium headphones
Data Cable

Fig. 2 Block diagram for acoustic analysis of non-electronic disguised voice

Identification of Speaker from Disguised Voice Using MFCC Feature… 979

3.1.1 Steps for Sample Collection

1. Prior to the collection of a voice sample, each subject was provided with a pre-recorded
sample of standard speech.
2. Every speaker was given the transcript and told to recite it out four times, once in a
controlled state, and the other three times in a disguised state they chose. There were
400 speakers in total who made up four samples.
3. Audio samples were gathered using an Audacity recorder.
4. A signed consent form from each participant, as well as a recording of their voice, was
acquired for further investigation. Additionally, each speaker was given a declaration
to ensure that their voice samples would be protected and that they might be used.
5. Each speaker’s full name, date of birth, gender, and place of residence were meticulously
recorded, and these data have been preserved to this day.
6. Subsequently, various software was used to compare the auditory and perceptual simi-
larities and differences across all of the recorded samples from each participant.

Using a variety of voice masking techniques on the same speaker, researchers were able
to uncover differences within the speaker’s own speech. Acoustic characteristics can be
utilised to identify a disguised speaker, even if the method of disguise is unclear, thanks to
investigations like these. To conduct this research, a group of ten speakers from the same
age range was selected and given a text to read aloud. Using the method outlined above,
each student was required to recite the same passage 17 times using a range of vocal dis-
guises. The acoustic variance of each speaker was determined by comparing their con-
cealed recordings to their corresponding control samples.

3.2 Preparing Speech Files for Analysis

There are a variety of recording devices, each of which has its own unique format for audio
files. In order for files that are now in the improper format to be utilised for spectrographic
analysis, it is necessary to convert them using the appropriate format.
File Format: ‘.Wav’ recorded from audacity software.
Bit depth: 32 bits.
Channel: Mono.
Sampling rate: 8000 Hz.
Here found 20 terms that appeared in both the disguised and normal speech samples of
respectively speakers after compared the aural similarities between the two.
Collection of speech sample using audacity software.
By accessing the disguised and control voice files in other windows, you can examine
their attributes. To ensure consistency, all files should be converted to the same format.
Listen to each file at least three or four times to identify the common clue words in all four.
For spectrographic analysis, at least two windows of the same programme must contain at
least all words from each file it is shown in Fig. 3.

3.3 Spectrographic Approach from Disguised Voice

The spectrographic method of speaker recognition employs a device that displays voice sig-
nals. To convert sound into visuals, Potter of the Bell Telephone Laboratory developed an
electromechanical acoustic spectrograph. It is an apparatus that can monitor the fluctuating
980 M. K. Singh

Fig. 3 Screen shot of voice sample collection

energy frequency distribution as a speech wave moves through the atmosphere. To identify
people by their voiceprints, spectrographic impressions of their utterances are employed,
much like fingerprints. Law enforcement can use them to help identify suspicious callers.
It was thought to be an impenetrable method of verifying an individual’s identification.
The “voiceprint” approach of spectrographic analysis-based voice identification has been
in legal ambiguity for quite some time. The features, and bandwidth qualified examiner
may be able to determine the resemblance of the two samples.

3.4 MFCC Feature Extraction Technique

This is an example of the extraction of feature using MFCC technique in operation. The
pre-emphasis, that is a high-pass filter with a high pass, should be used in this particular
scenario. The following is a time-domain equations that represents input t[n] with the pre-
emphasis coefficients, that range from 0.8 to 1.0:
k[n] = t[n] − 𝛼t[n − 1] (1)
A frameshift is the difference in milliseconds (often 25 ms) among two frames left mar-
gins. To produce the resultant signal at time n, multiply the normal voice signal t[n] by the
window u[n].
k[n] = t[n].u[n] (2)
One of the simplest types of windows is the disguised rectangular window, which has a
signal that suddenly stops at its edges. The abruptness of the signal, however, causes dis-
continuities, which make Fourier analysis more challenging to implement. When collecting
MFCC features, a Hamming window is utilised to prevent discontinuities. This window
changes the signal values to zero at the boundary positions. This is depicted mathemati-
cally as follows:
{
1 0<L<n−1
uL[n] =
0 otherwise (3)
Identification of Speaker from Disguised Voice Using MFCC Feature… 981

{ ( )
2𝜋n
0.54 − 0.46 cos 0<L<n−1
G[n] = L (4)
0 otherwise

There are several benefits that the FFT offers over the DFT. DFT’s one drawback is that it
can only be used with N powers of two numbers. The only drawback of DFT is this. Below is
an illustration of a DFT mathematical representation:

∑
N
Vi (k) = vi (n)e−j2𝜋kn∕N 1 ≤ k ≤ K (5)
n=1

⎧0 k⟨f (m − 1) and k⟩f (m − 1)

⎪ k−f (m−1) f (m − 1) ≤ k ≤ f (m)
Hm (k) = ⎨ f (m)−f (m−1) (6)
⎪ f (m−1)−k f (m) ≤ k ≤ f (m − 1)
⎩ f (m−1)−f (m)

A Mel filter bank has 10 linear filter under 1000 Hz and the log filters above 1000 Hz. The
filters absorb energy from each filter band. The graphic above shows triangular bandpass fil-
ters. Periodic filter placement is determined by the Mel scale frequency range.
( )
f
mel(f ) = 2595 × log10 1 + (3.12) (7)
100

One way to convert it to the time domain is by applying the Discrete Cosine Transform
(DCT). Applying the following formula allows one to ascertain MFCC:
(| N |) ( )
∑N
|∑ j2𝜋kn | k(n − 0.5)π
c[n] = log || x[n]e N || cos
−
(8)
| n=0 | N
n=0 | |
It is possible to express the overall power of a frame as the entirety of the powers of all the
frames samples taken at all the time points inside a given windows of time, say, from samples
t1 to samples t2.

∑
t2
Energy = x2 (t) (9)
t=t1

It is easy to determine the delta features and the dual delta by analyzing the difference
between the two frames. When a certain cepstral value is taken into consideration, the delta
features d(t) at time instant ‘t’ are as follow c(t):
∑N
n(c(t + n) + c(t − n))
d[n] = n=1 ∑ (10)
2 Nn=1 n2

Using statistical moments, speech vectors of the same duration can be recovered since
MFCC characteristics vary with length. MFCC vectors are N-frame speech signals with ‘j’ as
the feature component and ‘L’ as the frame number.
Oj = {o1j, o2j, … … ., oNj}; where j = 1,2, … … .L (11)
982 M. K. Singh

We employed two statistical moments in this study. We calculate the correlation coef-
ficient between unique MFCC characteristics after subtracting the means Fj of related
MFCC features O j. In Eqs. (12) and (13), the method varies by scenario.
Fj = F(Oj );j = 1,2, … … .L (12)

cov(Oj , Oj� )
CRjj� = √ ( )√ ( ) ;1 ≤ j < j� ≤ L
(13)
var Oj var Oj�

After combining the correlation coefficient, the arithmetical moment of MFCC vectors
is calculated:
UMFCC = (CR12 , CR13 , … … CRL−1L ) (14)
One of the fundamental components of this technology is a speaker identification
method that is dependent on text. It is possible that this occurrence can be rationally
explained due to the fact that the acoustic feature coefficients of the disguised voices are
identical to those of the actual voices.
When applied to the speech data in the current study, the chi-square test is performed in
correlation frame derived by MFCC feature extraction “p” (speech signal present in frame)
for each sub-band “k” in order to test the aforementioned hypothesis. The vector of expec-
tancies coming from these values, “e,” is The noise probability density function from the
prior frame is roughly approximated by the noise histogram.
( )
e = e1 , e2 … … … … … … … … .eN for N value (15)

The same method and identical “N” value are used to obtain the vector “o” of experi-
ment from the present signal the observed value (o).
( )
o = o1 , o2 … … … … … … … … .oN for N value (16)

The chi-square test is then conducted on these bins, with the following chi-square
statistic:
( )2
∑N
oi − e i
2
X = (17)
i=1
ei

The generated chi-square statistic’s threshold value must increase as the permitted error
probability grows. This number is used to compare the outcomes of the chi-square statistic
and can be found in typical chi-square tables. If the actual value turns out to be higher than
the estimated value, the hypothesis is rejected; if not, it is accepted.

4 Results and Discussions

The subjects had to modify their normal voices for one of the voice samples. On the
400 different test speaker, the following deception techniques were employed, including
Whispering (6%), pinching nostrils (10%), Protruding lips (6%), Obstacle in mouth (8%),
Identification of Speaker from Disguised Voice Using MFCC Feature… 983

Mimicry (11%), State of cold (6%), Pretending anger (16%), Covering mouth (12%),
Changing accent (10%), Raising pitch (8%), Lowering of pitch (7%) shown in Table 1.
Participants choose to conceal themselves by using a Pretending anger or other object to
cover their mouths externally, followed by modifying the pitch of their voice by increasing
or reducing the typical values, and so on. Changing one’s natural accent or tone, bulging
lips, a sore throat, and a cold were among the least liked traits.
Speech quality was hampered by the usage of voice disguising. Both males and females
were able to maintain a good level of speech quality under typical voice settings. A high
negative correlation of (− 0.932) and (− 0.952) was established between the speech qual-
ity of male and female respondents when speaking in a disguised or control voice. Female
individuals exhibited more diversity in speech quality while using voice disguise than
male subjects. Disguised voice samples from both men and women, including by modify-
ing pitch, pinching nostrils and a cold, constricting tract, covering mouth, protruding lips,
throat infection and whispering, usually result in a lower quality of speech. This indicates a
significant difference between the two samples when it comes to the quality of speech. For
both male and female voice samples, a substantial link was found in the quality of speech
between the samples disguised by tugging the cheeks, symphonizing with fury, and modi-
fying the accent/tone and their control counterparts shown in Table 2.
We calculated the chi-square value for speech quality from the correlation coefficients
of all of the normal and disguised voice samples. Tossing out the null hypothesis in favor
of the alternative, which holds that the differences in voice quality may be traced back to
the sample used to record the audio. As shown in Table 3, the chi-square test was signifi-
cant when used to test for disguise through constricted tract, lower pitch, pinching of nose
and mouth covering. This test rejected the and confirmed that changes in speech quality
are strongly dependent on the type of speech sample used. Chi-square value calculated by
equation no 16, here shown that the correlation coefficient mentioned as expected coeffi-
cients (e) of a disguised voice and observed coefficients (o).
The calculation process uses MFCC statistical moments such correlation coefficients,
acoustic features, and vectors. Classifiers used to recognize voices affected by non-natural
sounds are based on auditory data. A SVM and k-NN classifier are used to study a speaker
sound recognition system. Table 4 presents findings.
Figure 4 shows classifier detection rates. SVM classifiers identify better than any other.
The k-NN classifier algorithm aids speaker recognition. In multimedia, numerous MFCC

Table 1 Disguised method S. No. Disguised method No. of Speaker Speakers (%)
by different speakers (Total
speakers = 400)
1 Whispering 24 6
2 Pinching nostrils 40 10
3 Protruding lips 24 6
4 Obstacle in mouth 32 8
5 Mimicry 44 11
6 State of cold 24 6
7 Pretending anger 64 16
8 Covering mouth 48 12
9 Changing accent 40 10
10 Raising pitch 32 8
11 Lowering of pitch 28 7
984 M. K. Singh

Table 2 Correlation coefficient S. No. Disguised method Correlation

for disguised and normal voice coefficient
samples
1 Normal voice 1.00
2 Whispering 0.89
3 Pinching nostrils − 0.56
4 Protruding lips 0.78
5 Obstacle in mouth 0.89
6 Mimicry 0.93
7 State of cold − 0.59
8 Pretending anger 0.92
9 Covering mouth − 0.48
10 Changing accent 0.79
11 Raising pitch − 0.87
12 Lowering of pitch 0.66

Table 3 Statistical analysis using S. No. Disguised method Correlation Chi-square value
the chi-square test on normal and coefficient
disguised voices
1 Normal voice 1.00 0
2 Whispering 0.89 0.0121
3 Pinching nostrils − 0.56 2.4336
4 Protruding lips 0.78 0.0484
5 Obstacle in mouth 0.89 0.0121
6 Mimicry 0.93 0.0049
7 State of cold − 0.59 2.5281
8 Pretending anger 0.92 0.0064
9 Covering mouth − 0.48 2.1904
10 Changing accent 0.79 0.0441
11 Raising pitch − 0.87 3.4969
12 Lowering of pitch 0.66 0.1156

feature extractions are coupled to speech correlation coefficient properties. Feature-based

classification uses K-nearest neighbours and SVMs. These special features in an SVM clas-
sifier improve classification accuracy to 98.36% compared to 90.49% with k-NN output
features with daily speech inputs. Progress is impressive.

5 Conclusion

The usage of speech masking is one of the most common flaws that continues to be a chal-
lenge for specialists. When dealing with normal or ideal voice recognition, which is the voice
that is constant and free from any type of distortion brought on by noise, changes in emotional
Identification of Speaker from Disguised Voice Using MFCC Feature… 985

Table 4 Comparison of the proposed classification efficiency (%) with the existing technique
Disguised method Neural network [1] SVM (proposed) k-NN (proposed)
existing

Whispering 85.32 92.45 91.27

Pinching nostrils 91.23 97.26 94.35
Protruding lips – 98.78 95.89
Obstacle in mouth 89.56 97.25 94.23
Mimicry 90.25 94.24 90.85
State of cold 92.36 92.45 92.89
Pretending anger 87.14 95.24 95.78
Covering mouth – 96.25 98.36
Changing accent 93.56 94.89 97.27
Raising pitch 92.72 98.36 90.49
Lowering of pitch 94.50 97.56 95.25

Fig. 4 Existing and proposed classification methods for different disguised method

and physical state, intentional disguise, and other factors, the identification of the speaker
is easier and produces a more conclusive opinion. When trying to identify a person based
on a disguised voice sample, difficulties arise. The chi-square test is the inspiration for this
new piece of fiction. Rather than relying on the typical heuristic concepts, it deviates from
the norm by looking for deviations from the noise distribution while making its speech/noise
determination. Over a wide variety of SNRs and noise types, it was discovered that the pro-
posed benchmark generated the highest accurate speech/noise categorization.

Funding The authors have not disclosed any funding.

Data Availability This manuscript has no associated data.

Declarations
Conflict of interest The authors declare that they have no conflict of interest.
986 M. K. Singh

References
1. Nair, A. M., & Savithri, S. P. (2021). Classification of pitch and gender of speakers for forensic speaker
recognition from disguised voices using novel features learned by deep convolutional neural net-
works. Traitement du Signal, 38(1).
2. Zhang, C., & Tan, T. (2008). Voice disguise and automatic speaker recognition. Forensic Science
International, 175(2–3), 118–122.
3. Singh, M. K., Singh, A. K., & Singh, N. (2018). Multimedia analysis for disguised voice and classifi-
cation efficiency. Multimedia Tools and Applications, 78(20), 29395–29411.
4. Ahmed, B., & Holmes, P. H. (2004). A voice activity detector using the chi-square test. In 2004 IEEE
international conference on acoustics, speech, and signal processing (Vol. 1, pp. I-625). IEEE.
5. Perrot, P., & Chollet, G. (2008). The question of disguised voice. Journal of the Acoustical Society of
America, 123(5), 3878.
6. Singh, M. K. (2023). A text independent speaker identification system using ANN, RNN, and CNN
classification technique. Multimedia Tools and Applications, 1–13.
7. Rodman, R. (1998). Speaker recognition of disguised voices: A program for research. In Proceedings
of the consortium on speech technology in conjunction with the conference on speaker by man and
machine: Direction for forensic applications (pp. 9–22). COST 250.
8. Singh, M. K. (2023). Feature extraction and classification efficiency analysis using machine learning
approach for speech signal. Multimedia Tools and Applications, 1–16.
9. Wu, H., Wang, Y., & Huang, J. (2014). Identification of electronic disguised voices. IEEE Transactions
on Information Forensics and Security, 9(3), 489–500.
10. Reich, A. R., Moll, K. L., & Curtis, J. F. (1976). Effects of selected vocal disguises upon spectro-
graphic speaker identification. The Journal of the Acoustical Society of America, 60(4), 919–925.
11. Singh, M. K., Singh, A. K., & Singh, N. (2018). Multimedia analysis for disguised voice and classifi-
cation efficiency. Multimedia Tools and Applications, Springer Journal, 78(20), 29395–29411.
12. Nandan, D., Singh, M. K., Kumar, S., & Yadav, H. K. (2022). Speaker identification based on physical
variation of speech signal. Traitement du Signal, 39(2).
13. Farrús, M. (2018). Voice disguise in automatic speaker recognition. ACM Computing Surveys (CSUR),
51(4), 1–22.
14. Wolf, J. J. (1972). Efficient acoustic parameters for speaker recognition. The Journal of the Acoustical
Society of America, 51(6B), 2044–2056.
15. Liang, H., Lin, X., Zhang, Q., & Kang, X. (2017). Recognition of spoofed voice using convolu-
tional neural networks. In 2017 IEEE global conference on signal and information processing
(GlobalSIP) (pp. 293–297). IEEE.
16. Wang, L., Liang, H., Lin, X., & Kang, X. (2018). Revealing the processing history of pitch-shifted
voice using CNNs. In 2018 IEEE international workshop on information forensics and security
(WIFS) (pp. 1–7). IEEE.
17. Satorra, A., & Bentler, P. M. (2001). A scaled difference chi-square test statistic for moment structure
analysis. Psychometrika, 66(4), 507–514.
18. Yao, L. (2020). A compressed deep convolutional neural networks for face recognition. In 2020 IEEE
5th international conference on cloud computing and big data analytics (ICCCBDA) (pp. 144–149).
IEEE.
19 Lakshmi, P. A., Veerapandu, G., Gamini, S., & Singh, M. K. (2022). CNN Classification of multi-scale
ensemble OCT for macular image analysis. Algorithms. International Journal of Electrical and Elec-
tronics Research, 10(4), 858–861. https://doi.org/10.37391/IJEER.100417
20. Yang, H., Yang, Z., & Huang, Y. (2019). Steganalysis of voip streams with cnn-lstm network. In Pro-
ceedings of the ACM workshop on information hiding and multimedia security (pp. 204–209).

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.
Identification of Speaker from Disguised Voice Using MFCC Feature… 987

Dr. Mahesh K. Singh did his Ph.D. from Department of Electronics

and Communication Engineering, Jaypee University of Engineering
and Technology, Guna, Madhya Pradesh in year 2019 with the special-
ization in Speech Signal Processing using Machine Learning and M.
Tech in 2011 from Jaypee University of Engineering and Technology,
Guna with specialization in Digital Signal Processing. He has com-
pleted his graduation in 2009 with Bachelor of Technology degree in
Electronics and Communication Engineering from Uttar Pradesh
Technical University, Lucknow, Uttar Pradesh. He has more than
12 years of teaching and Research experience for UG and PG courses
of Electronics and Communication Engineering and Computer Science
Engineering. He is an active researcher in the field of Speech and
Image signal processing using Machine learning. He has more than 50
research papers published in reputed International Journals and Con-
ferences with a citation more than 500. He is the reviewer of various
peer-reviewed journals. His current research interests include Speech
and Image Signal processing using Machine Learning. He has organ-
ized and attended many Workshops, Seminars, FDP and National and International Conferences in India. He
has given the expert talk in the various institutes of India. Currently he is working as Associate Professor in
Department of Electronics and Communication Engineering, Aditya University, Surampalem, Kakinada
District, A.P., India. His research interest in Speech Signal Processing, Digital Signal Processing, Digital
Image Processing and Multirate Signal Processing etc.

Speaker Recognition: Fundamentals and Applications
From Everand
Speaker Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Voice Spoofing Countermeasure For Voice Replay Attacks Using Deep Learning
No ratings yet
Voice Spoofing Countermeasure For Voice Replay Attacks Using Deep Learning
14 pages
EWM Setting Configurations
No ratings yet
EWM Setting Configurations
5 pages
C-17 Test1 WT-2
No ratings yet
C-17 Test1 WT-2
3 pages
Nidhi Chakravarty Mohit Dua: A Lightweight Feature Extraction Technique For Deepfake Audio Detection
No ratings yet
Nidhi Chakravarty Mohit Dua: A Lightweight Feature Extraction Technique For Deepfake Audio Detection
25 pages
Searhing Techniques Data Structure
No ratings yet
Searhing Techniques Data Structure
11 pages
1 s2.0 S0950705125007725 Main
No ratings yet
1 s2.0 S0950705125007725 Main
15 pages
AIESL CAPABILITY (Group A) 1
No ratings yet
AIESL CAPABILITY (Group A) 1
314 pages
Kipruto CV
No ratings yet
Kipruto CV
3 pages
VTU - B.E B.Tech - 2019 - 3rd Semester - July - CBCS 17 Scheme - CSE - 17CS35 UNIX and Shell Programming
No ratings yet
VTU - B.E B.Tech - 2019 - 3rd Semester - July - CBCS 17 Scheme - CSE - 17CS35 UNIX and Shell Programming
2 pages
Deep Learning Countermeasures For Detecting Replay
No ratings yet
Deep Learning Countermeasures For Detecting Replay
14 pages
Underwater Image Enhancement: A Comprehensive Review, Recent Trends, Challenges and Applications
No ratings yet
Underwater Image Enhancement: A Comprehensive Review, Recent Trends, Challenges and Applications
55 pages
Robust Classification of Smartphone Captured Handw
No ratings yet
Robust Classification of Smartphone Captured Handw
34 pages
An Amalgamation of Integrated Features With Deepspeech2 Architecture and Improved Spell Corrector For Improving Gujarati Language Asr System
No ratings yet
An Amalgamation of Integrated Features With Deepspeech2 Architecture and Improved Spell Corrector For Improving Gujarati Language Asr System
13 pages
Aspm
No ratings yet
Aspm
28 pages
A Novel One-Dimensional Cosine Within Sine Chaotic Map and Novel Permutation-Diffusion Based Medical Image Encryption
No ratings yet
A Novel One-Dimensional Cosine Within Sine Chaotic Map and Novel Permutation-Diffusion Based Medical Image Encryption
21 pages
CIBIL Report
No ratings yet
CIBIL Report
3 pages
The Relationship Between Gadget Usage and The Mental Emotional State of Schoolchildren During The COVID-19 Pandemic
No ratings yet
The Relationship Between Gadget Usage and The Mental Emotional State of Schoolchildren During The COVID-19 Pandemic
7 pages
Chase Bank February
100% (2)
Chase Bank February
4 pages
An Improved Feature Extraction For Hindi Language Audio Impersonation Attack Detection
No ratings yet
An Improved Feature Extraction For Hindi Language Audio Impersonation Attack Detection
26 pages
Inventory Recording
No ratings yet
Inventory Recording
12 pages
Speaker Identification Using Power Distribution in
No ratings yet
Speaker Identification Using Power Distribution in
6 pages
Scalable Malware Detection System Using Big Data A
No ratings yet
Scalable Malware Detection System Using Big Data A
18 pages
Integrity Without It Nothing Works.
No ratings yet
Integrity Without It Nothing Works.
7 pages
An Executive Guide Biometrics
From Everand
An Executive Guide Biometrics
alasdair gilchrist
No ratings yet
Speaker Identification (2) - This Document Help You To Understand How To Identify A Speaker.
No ratings yet
Speaker Identification (2) - This Document Help You To Understand How To Identify A Speaker.
40 pages
05 Tuition Fee and Other Dues
No ratings yet
05 Tuition Fee and Other Dues
4 pages
Irjet V7i6965
No ratings yet
Irjet V7i6965
5 pages
Erol Özvar
No ratings yet
Erol Özvar
5 pages
HISTORY OF LOW VOLTAGE SWITCHGEAR - AKD20 Rev
No ratings yet
HISTORY OF LOW VOLTAGE SWITCHGEAR - AKD20 Rev
14 pages
Logistics Manager - Franco Canzani
No ratings yet
Logistics Manager - Franco Canzani
2 pages
Hifonics Atlas Subwoofer Manual
No ratings yet
Hifonics Atlas Subwoofer Manual
8 pages
IHRM Notes UNIT 2 MBA Batch 2022-24 Semester 4
No ratings yet
IHRM Notes UNIT 2 MBA Batch 2022-24 Semester 4
17 pages
An Overview of The Development of Speaker Recognition
No ratings yet
An Overview of The Development of Speaker Recognition
11 pages
Hope 1 (Reveiwer)
No ratings yet
Hope 1 (Reveiwer)
2 pages
A Study On The Dynamic Analysis of Mooring System
No ratings yet
A Study On The Dynamic Analysis of Mooring System
9 pages
Classification of Bones.
No ratings yet
Classification of Bones.
20 pages
Voice Syn - NN
No ratings yet
Voice Syn - NN
15 pages
M.Tech (CS) - Syllabus
No ratings yet
M.Tech (CS) - Syllabus
49 pages
Design of Microbending Deformer For Optical Fiber Weight Sensor
No ratings yet
Design of Microbending Deformer For Optical Fiber Weight Sensor
7 pages
State of The Art in Speaker Recognitin - 2202.12705v1
No ratings yet
State of The Art in Speaker Recognitin - 2202.12705v1
7 pages
TERMINAL PLANES-WPS Office
No ratings yet
TERMINAL PLANES-WPS Office
4 pages
The Language of Lies
From Everand
The Language of Lies
michelle alsenaani
No ratings yet
A Review On Speaker Recognition - Technology and Challenges
No ratings yet
A Review On Speaker Recognition - Technology and Challenges
14 pages
Hedha Houa
No ratings yet
Hedha Houa
5 pages
Guide To Road
No ratings yet
Guide To Road
332 pages
Monalisha Barik Paper
No ratings yet
Monalisha Barik Paper
5 pages
Speaker Recognition From Whisper
No ratings yet
Speaker Recognition From Whisper
47 pages
BERINGER PMP518M User Manual
No ratings yet
BERINGER PMP518M User Manual
11 pages
Voice Biometrics: Judith A. Markowitz
No ratings yet
Voice Biometrics: Judith A. Markowitz
8 pages
بحث عمر
No ratings yet
بحث عمر
25 pages
95 Icph2
No ratings yet
95 Icph2
5 pages
Sita#1part2 Merged
No ratings yet
Sita#1part2 Merged
61 pages
Take Test: Intro To DB Quiz
No ratings yet
Take Test: Intro To DB Quiz
3 pages
A Comprehensive Review of Forensic Phonetics Techn
No ratings yet
A Comprehensive Review of Forensic Phonetics Techn
18 pages
English Assignment 1
No ratings yet
English Assignment 1
30 pages
A Fast Shrinking Suspicious Criminal System From The Voice
No ratings yet
A Fast Shrinking Suspicious Criminal System From The Voice
5 pages
Graded Examples in Reinforced Concrete Design
100% (5)
Graded Examples in Reinforced Concrete Design
116 pages
RZ Vs NRZ
No ratings yet
RZ Vs NRZ
7 pages
Phone Xia
No ratings yet
Phone Xia
21 pages
Intechopen 80419
No ratings yet
Intechopen 80419
18 pages
X28HC64
No ratings yet
X28HC64
24 pages
R G Bronze Mfg. Company PVT Limited RGB
No ratings yet
R G Bronze Mfg. Company PVT Limited RGB
2 pages
Automatic Speaker Recognition by Speech Signal
No ratings yet
Automatic Speaker Recognition by Speech Signal
15 pages
Cylindrical Pins Is-2393
No ratings yet
Cylindrical Pins Is-2393
2 pages
SWOT Analysis - Docx 2
No ratings yet
SWOT Analysis - Docx 2
11 pages
The Influence of Voice Disguise On Temporal Characteristics of Speech
No ratings yet
The Influence of Voice Disguise On Temporal Characteristics of Speech
2 pages
(1339309X - Journal of Electrical Engineering) Text-Independent Speaker Recognition Using Two-Dimensional Information Entropy PDF
No ratings yet
(1339309X - Journal of Electrical Engineering) Text-Independent Speaker Recognition Using Two-Dimensional Information Entropy PDF
5 pages
Speaker Identification Using Mel Frequency Cepstral Coefficients
No ratings yet
Speaker Identification Using Mel Frequency Cepstral Coefficients
5 pages
Forensic Speech Recognition
No ratings yet
Forensic Speech Recognition
11 pages
Speech Processing Unit 4 Notes
No ratings yet
Speech Processing Unit 4 Notes
16 pages
25IJELS 105202015 AReview PDF
No ratings yet
25IJELS 105202015 AReview PDF
5 pages
Puede-ser-Speaker Identification Based On Hybrid Feature
No ratings yet
Puede-ser-Speaker Identification Based On Hybrid Feature
6 pages
Russia Project
No ratings yet
Russia Project
14 pages
Self Learning Speaker Identification A System For PDF
No ratings yet
Self Learning Speaker Identification A System For PDF
185 pages
DC Motor Control
No ratings yet
DC Motor Control
2 pages
JAWS (Screen Reader)
No ratings yet
JAWS (Screen Reader)
18 pages
Performance Comparison of Robust Speech PDF
No ratings yet
Performance Comparison of Robust Speech PDF
6 pages
A Voice Identification System Using Hidden Markov Model
No ratings yet
A Voice Identification System Using Hidden Markov Model
6 pages
Time Frequency Analysis and Wavelet Transform Tutorial Time-Frequency Analysis For Voiceprint (Speaker) Recognition
No ratings yet
Time Frequency Analysis and Wavelet Transform Tutorial Time-Frequency Analysis For Voiceprint (Speaker) Recognition
22 pages
Comparative Study of Speaker Recognition System Using VQ and GMM
No ratings yet
Comparative Study of Speaker Recognition System Using VQ and GMM
7 pages
S V A P F: Peaker Erification Using Coustic and Rosodic Eatures
No ratings yet
S V A P F: Peaker Erification Using Coustic and Rosodic Eatures
7 pages
Speaker Recognition
No ratings yet
Speaker Recognition
11 pages
Speech Feature Extraction and Classification Techniques: Kamakshi and Sumanlata Gautam
No ratings yet
Speech Feature Extraction and Classification Techniques: Kamakshi and Sumanlata Gautam
3 pages
Speaker Recognition Using MFCC and VQ
No ratings yet
Speaker Recognition Using MFCC and VQ
2 pages
Acoustic Parameters For Speaker Verification
No ratings yet
Acoustic Parameters For Speaker Verification
16 pages
Voice Recognition With Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms
No ratings yet
Voice Recognition With Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms
9 pages
Biometric Voice Recognition in Security System
No ratings yet
Biometric Voice Recognition in Security System
9 pages
MajorInterim Report1
No ratings yet
MajorInterim Report1
10 pages
Automatic Speaker Verification
No ratings yet
Automatic Speaker Verification
24 pages
Digital Signal Processing: The Final
No ratings yet
Digital Signal Processing: The Final
13 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
10 pages
Study of Speaker Verification Methods
No ratings yet
Study of Speaker Verification Methods
4 pages
Speaker Recognition Using Vector Quantization and Gaussian Mixture Models
No ratings yet
Speaker Recognition Using Vector Quantization and Gaussian Mixture Models
6 pages
Speaker Recognition System: A Project Report On
No ratings yet
Speaker Recognition System: A Project Report On
48 pages
Speaker Recognition
No ratings yet
Speaker Recognition
29 pages
Methodology For Speaker Identification and Recognition System
100% (1)
Methodology For Speaker Identification and Recognition System
13 pages
Automatic Speaker Recognition System
No ratings yet
Automatic Speaker Recognition System
11 pages
Maretext Independent Speaker Identification Based On K-Mean Algorithm
No ratings yet
Maretext Independent Speaker Identification Based On K-Mean Algorithm
9 pages
Voice Command Recognition System Based On MFCC and DTW: Anjali Bala
No ratings yet
Voice Command Recognition System Based On MFCC and DTW: Anjali Bala
8 pages
EEL6586 Final Project:: A Speaker Identification and Verification System
No ratings yet
EEL6586 Final Project:: A Speaker Identification and Verification System
16 pages
Speaker Recognition Publish
No ratings yet
Speaker Recognition Publish
6 pages
Advanced Signal Processing Using Matlab
No ratings yet
Advanced Signal Processing Using Matlab
20 pages
An Automatic Speaker Recognition System
No ratings yet
An Automatic Speaker Recognition System
11 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Identification of Speaker From Disguised Voice Using MFCC Feature Extraction, Chi Square and Classification Technique

Uploaded by

Identification of Speaker From Disguised Voice Using MFCC Feature Extraction, Chi Square and Classification Technique

Uploaded by

Wireless Personal Communications (2024) 138:973–987

Identification of Speaker from Disguised Voice Using MFCC

Accepted: 28 August 2024 / Published online: 10 September 2024

Keywords MFCC · SVM · k-NN · Voice disguised · Chi-square

1 Introduction to Disguise Voice

Fig. 1 Speaker identification

Information Retrieval Speaker recognition can be used to obtain information based on

2 Related Works from Disguised Voice

• Be able to adapt to changes in sound and noise levels.

3 Research Methodology for Speaker Identification

3.1 Materials Required for Speech Data Collection

Fig. 2 Block diagram for acoustic analysis of non-electronic disguised voice

3.1.1 Steps for Sample Collection

3.2 Preparing Speech Files for Analysis

3.3 Spectrographic Approach from Disguised Voice

Fig. 3 Screen shot of voice sample collection

3.4 MFCC Feature Extraction Technique

⎧0 k⟨f (m − 1) and k⟩f (m − 1)

4 Results and Discussions

Table 2 Correlation coefficient S. No. Disguised method Correlation

feature extractions are coupled to speech correlation coefficient properties. Feature-based

Whispering 85.32 92.45 91.27

Funding The authors have not disclosed any funding.

Data Availability This manuscript has no associated data.

Dr. Mahesh K. Singh did his Ph.D. from Department of Electronics

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Identification of Speaker From Disguised Voice Using MFCC Feature Extraction, Chi Square and Classification Technique

Uploaded by

Identification of Speaker From Disguised Voice Using MFCC Feature Extraction, Chi Square and Classification Technique

Uploaded by

Wireless Personal Communications (2024) 138:973–987

Identification of Speaker from Disguised Voice Using MFCC

Accepted: 28 August 2024 / Published online: 10 September 2024

Keywords MFCC · SVM · k-NN · Voice disguised · Chi-square

1 Introduction to Disguise Voice

Fig. 1 Speaker identification

Information Retrieval Speaker recognition can be used to obtain information based on

2 Related Works from Disguised Voice

• Be able to adapt to changes in sound and noise levels.

3 Research Methodology for Speaker Identification

3.1 Materials Required for Speech Data Collection

Fig. 2 Block diagram for acoustic analysis of non-electronic disguised voice

3.1.1 Steps for Sample Collection

3.2 Preparing Speech Files for Analysis

3.3 Spectrographic Approach from Disguised Voice

Fig. 3 Screen shot of voice sample collection

3.4 MFCC Feature Extraction Technique

⎧0 k⟨f (m − 1) and k⟩f (m − 1)

4 Results and Discussions

Table 2 Correlation coefficient S. No. Disguised method Correlation

feature extractions are coupled to speech correlation coefficient properties. Feature-based

Whispering 85.32 92.45 91.27

Funding The authors have not disclosed any funding.

Data Availability This manuscript has no associated data.

Dr. Mahesh K. Singh did his Ph.D. from Department of Electronics

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

1 Introduction to Disguise Voice

2 Related Works from Disguised Voice

3 Research Methodology for Speaker Identification

3.1 Materials Required for Speech Data Collection

3.1.1 Steps for Sample Collection

3.2 Preparing Speech Files for Analysis

3.3 Spectrographic Approach from Disguised Voice

3.4 MFCC Feature Extraction Technique

4 Results and Discussions