Digital Voice Analysis
Digital Voice Analysis
Prepared by
GAURAV MISHRA
BHUMIKA DWIVEDI
AKASH RAJAN RAI
KARTIC KUMAR
Table of Contents
INTRODUCTION
SPEAKER VERIFICATION
FEATURE EXTRACTION
FUTURE WORK
CONCLUSION
2
DIGITAL VOICE ANALYSIS -INTRODUCTION
Voice analysis- is the study of speech sounds for purposes other than linguistic
content, such as in speech recognition.
include mostly medical analysis of the voice i.e. phoniatrics, but also speaker
identification.
Speaker recognition
process of identifying a person from a spoken phrase
allows for a secure method of authenticating speakers
Applications include:
voice dialing, banking over a telephone network, security control for
confidential information, etc
Challenges
Can be imitated to a certain degree
Need to capture discriminating features
Emotional physical states affect quality
3
SPEECH PROCESSING TAXONOMY
Recognition
Speaker Speaker
Identification Verification
4
SPEAKER IDENTIFICATION
5
GENERAL THEORY OF SPEAKER VERIFICATION SYSTEM
Mishra’s “Voiceprint”
Mishra
Input Speech
Speaker
Speaker
Model
Model
ACCEPT
Feature
Feature
extraction Σ Decision
Decision
extraction REJECT
“My Name is
Mishra”
Impostor
Impostor
Model
Model
Impostor “Voiceprints”
Identity Claim
6
Two distinct phases to any speaker verification system
Enrollment
Phase Enrollment speech for Voiceprints (models) for
each speaker each speaker
Bhumika Bhumika
Verification
Phase Feature Verification
Feature Verification Accepted!
extraction
extraction decision
decision
Claimed identity:
Bhumika
7
TRAINING PHASE
Speaker Speaker
Database Models
8
COMPONENTS OF SPEAKER IDENTIFICATION SYSTEM
There are three main components of SI System:
Front-end Processing
Speaker Modeling
9
FRONT-END PROCESSING
10
PRE-PROCESSING
11
WINDOWING
After the signal has been framed, window each individual frame so as
to minimize the signal discontinuities at the beginning and end of each
frame.
12
MFCC
FFT
Spectrum
Log Mel
Mel Filter
Compression Bank
Weighted
Spectrum
DCT
Mel Cepstral
Coefficients
converts each frame of N samples from the time domain into the frequency
domain.
defined on the set of N samples {xn}, as follow:
N −1
X k = ∑ xne − j 2πkn / N , k = 0,1,2,..., N − 1
n =0
14
Mel-frequency Wrapping
Mel-spaced filterbank
2
1.8
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0
0 1000 2000 3000 4000 5000 6000 7000
Frequency (Hz)
15
Cepstrum
16
PATTERN MATCHING AND CLASSIFICATION
The classifiers used for speaker identification can be grouped into two major
types:
Template-based and Stochastic model based classifiers
Template based classifiers are considered to be the simplest classifiers.
Dynamic Time Warping (useful for text-dependent speaker recognition)
Vector Quantization (useful for text-independent speaker recognition)
17
SPEAKER MODELING-VQ
Vector Quantization
It is not possible to use all the feature vectors of a given speaker
occurring in the training data to form the speaker's model.
Because there are too many feature vectors for each speaker.
18
FUTURE WORK
We would test and verify all the performance level of the algorithms. For
this purpose the data collected will be divided into training and testing data
(70% training, 30% testing).
19
CONCLUSION
20