0% found this document useful (2 votes)

962 views20 pages

Digital Voice Analysis

This document discusses digital voice analysis and speaker verification. It begins with an introduction to voice analysis and speaker recognition. It then describes the key components of a speaker verification system, including feature extraction through methods like MFCCs, and pattern matching for classification. Finally, it outlines future directions, like improving feature extraction and modeling techniques to build more robust systems.

Uploaded by

vivek gangwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (2 votes)

962 views20 pages

Digital Voice Analysis

Uploaded by

vivek gangwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 20

DIGITAL VOICE ANALYSIS

Prepared by
GAURAV MISHRA
BHUMIKA DWIVEDI
AKASH RAJAN RAI
KARTIC KUMAR
Table of Contents

INTRODUCTION

SPEAKER VERIFICATION

FEATURE EXTRACTION

FUTURE WORK

CONCLUSION

2
DIGITAL VOICE ANALYSIS -INTRODUCTION

Voice analysis- is the study of speech sounds for purposes other than linguistic
content, such as in speech recognition.

include mostly medical analysis of the voice i.e. phoniatrics, but also speaker
identification.

Speaker recognition
 process of identifying a person from a spoken phrase
 allows for a secure method of authenticating speakers
 Applications include:
voice dialing, banking over a telephone network, security control for
confidential information, etc

Challenges
 Can be imitated to a certain degree
 Need to capture discriminating features
 Emotional physical states affect quality
3
SPEECH PROCESSING TAXONOMY

Recognition

Speech Speaker Language

Recognition Recognition Recognition

Speaker Speaker
Identification Verification

Text -dependent Text -independent Text -dependent Text -independent

Closed -set Closed -set Closed -set Open -set

4
SPEAKER IDENTIFICATION

• Determine whether person is who they claim to be

• User makes identity claim: one to one mapping

• Unknown voice could come from large set of unknown speakers -

referred to as open-set verification

Is this Kartic’s voice?

5
GENERAL THEORY OF SPEAKER VERIFICATION SYSTEM

Mishra’s “Voiceprint”

Mishra
Input Speech

Speaker
Speaker
Model
Model
ACCEPT
Feature
Feature
extraction Σ Decision
Decision
extraction REJECT
“My Name is
Mishra”
Impostor
Impostor
Model
Model
Impostor “Voiceprints”
Identity Claim

6
Two distinct phases to any speaker verification system

Enrollment
Phase Enrollment speech for Voiceprints (models) for
each speaker each speaker

Akash Feature Akash

Feature Model
extraction Modeltraining
training
extraction

Bhumika Bhumika

Verification
Phase Feature Verification
Feature Verification Accepted!
extraction
extraction decision
decision

Claimed identity:
Bhumika
7
TRAINING PHASE

 1st phase of SIS is Enrollment

Speaker 1
Sessions also known as Training
Phase.
Speaker 2
Feature
Front Vectors
End
Processing
 During training phase, the SIS Speaker 3
generates a speaker model which
Speaker
is based on the speaker’s Modeling
characteristics.

Speaker Speaker
Database Models

8
COMPONENTS OF SPEAKER IDENTIFICATION SYSTEM
There are three main components of SI System:

Front-end Processing

Speaker Modeling

Pattern Matching and Classification

9
FRONT-END PROCESSING

Front-end Processing generally consists of three sub-processes

Preprocessing
Removal of Noise / Silence from Speech
Frame Blocking
Windowing
Feature Extraction
'the curse of the dimensionality'
the number of training/test-vectors needed for a classification problem
grows exponential with the dimension of the given input-vector- feature
extraction is needed.
Transform the speech signal into compact effective representation
More stable and discriminative than the original signal

10
PRE-PROCESSING

 The speech signal is a slowly timed varying signal called

quasi-stationary that is when the signal is examined over a short
period of time (5-100msec), the signal is fairly stationary.
 Speech signals are often analyzed in short time segments
referred to as short-time spectral analysis typically 20-30 msec
frames that overlap each other with 30-50%. This is done in
order not to lose any information due to the windowing.
 Duration of each frame is 23 ms for sampling
frequencies 11025 Hz, and a new frame contains
the last 11.5 ms of the previous frame’s data.
For the sampling frequency 8000 Hz,
duration of each frame is 16 ms and a new frame contains the
last 8 ms of the previous frame’s data

11
WINDOWING

 After the signal has been framed, window each individual frame so as
to minimize the signal discontinuities at the beginning and end of each
frame.

 Each frame is multiplied with a window function w(n) with length N,

where N is the length of the frame.

 Typically the Hamming window is used. It preserves higher order

harmonics and avoid problems due to truncation of the signal.

12
MFCC

Continous Frame Frame

Blocking Windowing
Speech

FFT

Spectrum

Log Mel
Mel Filter
Compression Bank
Weighted
Spectrum

DCT

Mel Cepstral
Coefficients

Feature Extracted Coefficients 13

Fast Fourier Transform (FFT)

 converts each frame of N samples from the time domain into the frequency
domain.
 defined on the set of N samples {xn}, as follow:

N −1
X k = ∑ xne − j 2πkn / N , k = 0,1,2,..., N − 1
n =0

14
Mel-frequency Wrapping

Mel-spaced filterbank
2

1.8

1.6

1.4

1.2

0.8

0.6

0.4

0.2

0
0 1000 2000 3000 4000 5000 6000 7000
Frequency (Hz)

15
Cepstrum

 convert the log mel spectrum back to time

 called the mel frequency cepstrum coefficients (MFCC)
 we denote those mel power spectrum coefficients that are the result
of the last step are , we can calculate the MFCC's, as

16
PATTERN MATCHING AND CLASSIFICATION

The classifiers used for speaker identification can be grouped into two major
types:
Template-based and Stochastic model based classifiers
Template based classifiers are considered to be the simplest classifiers.
Dynamic Time Warping (useful for text-dependent speaker recognition)
Vector Quantization (useful for text-independent speaker recognition)

Stochastic models provide more flexibility and better results.

Gaussian Mixture Model (useful for text-independent speaker recognition), the
Hidden Markov model (useful for text-dependent speaker recognition), and Neural
Networks to model a speaker's acoustic space.

17
SPEAKER MODELING-VQ

Vector Quantization
It is not possible to use all the feature vectors of a given speaker
occurring in the training data to form the speaker's model.
Because there are too many feature vectors for each speaker.

A method of reducing/compressing the number of training

vectors is required to form a codebook consisting of a small
number of highly representative vectors that efficiently represent
the speaker-specific characteristics.

VQ is the process of mapping feature vectors in a vector space

into a finite number of regions in that space. Each region is called
a cluster and each cluster is represented by its centroid. The
collection of all centroids is called codebook

18
FUTURE WORK

We would develop selected algorithms related to speaker identification in

MATLAB. The implementation would be modular and will be done keeping in
view the real time implementation. The complete real time implementation
though is not in the scope of the project

We would test and verify all the performance level of the algorithms. For
this purpose the data collected will be divided into training and testing data
(70% training, 30% testing).

For hardware implementation we would either use interfacing or

some digital signal processor.

19
CONCLUSION

 Speaker verification is one of the few recognition areas

where machines can outperform humans

 Speaker verification technology is a viable technique

currently available for applications

 Speaker verification can be augmented with other

authentication techniques to add further security

FDTD Getting Started Manual
No ratings yet
FDTD Getting Started Manual
63 pages
Addict Him To You: Learn More..
0% (1)
Addict Him To You: Learn More..
2 pages
Cohesity ServiceNow Integration-User Guide
No ratings yet
Cohesity ServiceNow Integration-User Guide
29 pages
3d TV Technology Seminar
83% (6)
3d TV Technology Seminar
27 pages
Biometric Voice Recognition
100% (1)
Biometric Voice Recognition
33 pages
Self Learning Speaker Identification A System For PDF
No ratings yet
Self Learning Speaker Identification A System For PDF
185 pages
Voice Recognition
100% (1)
Voice Recognition
18 pages
Automatic+Speaker+Recognition+System - EEE
No ratings yet
Automatic+Speaker+Recognition+System - EEE
11 pages
Automatic Speaker Recognition System Based On Machine Learning Algorithms
0% (1)
Automatic Speaker Recognition System Based On Machine Learning Algorithms
12 pages
Final Report Complete PDF
No ratings yet
Final Report Complete PDF
26 pages
Production of Ethanol From Molasses
80% (15)
Production of Ethanol From Molasses
13 pages
Speaker Verification For Remote Authentication
100% (2)
Speaker Verification For Remote Authentication
31 pages
Speech Analisys
No ratings yet
Speech Analisys
56 pages
Implementing Speaker Recognition: Chase Zhou Physics 406 - 11 May 2015
No ratings yet
Implementing Speaker Recognition: Chase Zhou Physics 406 - 11 May 2015
10 pages
Mohini Dey - Capstone
No ratings yet
Mohini Dey - Capstone
52 pages
Thesis Bich Ngoc Do
No ratings yet
Thesis Bich Ngoc Do
72 pages
Introduction Speaker Recognition
No ratings yet
Introduction Speaker Recognition
6 pages
Applications PDF
No ratings yet
Applications PDF
32 pages
Speaker Recognition
No ratings yet
Speaker Recognition
11 pages
Algorithm For The Identification and Verification Phase
No ratings yet
Algorithm For The Identification and Verification Phase
9 pages
Speaker Recognition: SRT Project of Signal Processing
No ratings yet
Speaker Recognition: SRT Project of Signal Processing
27 pages
Voice Recognition Using Matlab: Presented By: Avienash Raibole Paresh Meshram Vinayak Kolpek
100% (1)
Voice Recognition Using Matlab: Presented By: Avienash Raibole Paresh Meshram Vinayak Kolpek
18 pages
MajorInterim Report1
No ratings yet
MajorInterim Report1
10 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
45 pages
Methodology For Speaker Identification and Recognition System
100% (1)
Methodology For Speaker Identification and Recognition System
13 pages
Digital Signal Processing: The Final
No ratings yet
Digital Signal Processing: The Final
13 pages
Speaker Recognition Using Mel Frequency Cepstral Coefficients (MFCC) and Vector
No ratings yet
Speaker Recognition Using Mel Frequency Cepstral Coefficients (MFCC) and Vector
4 pages
Blok Diagram Pitch Correction
No ratings yet
Blok Diagram Pitch Correction
37 pages
Voice Recognition PDF
No ratings yet
Voice Recognition PDF
37 pages
Blavkjvdkhd
No ratings yet
Blavkjvdkhd
41 pages
Reconocimiento de Voz - MATLAB
No ratings yet
Reconocimiento de Voz - MATLAB
5 pages
A Study On Speech Recognition Using Dynamic Time Warping: CS 525: Project Presentation Palden Lama and Mounika Namburu
No ratings yet
A Study On Speech Recognition Using Dynamic Time Warping: CS 525: Project Presentation Palden Lama and Mounika Namburu
23 pages
8834 PDF
No ratings yet
8834 PDF
8 pages
Speaker Recognition System Based On VQ in MATLAB Environment
No ratings yet
Speaker Recognition System Based On VQ in MATLAB Environment
8 pages
Ijves Y14 05338
No ratings yet
Ijves Y14 05338
5 pages
3d TV Technology New
0% (1)
3d TV Technology New
21 pages
Speaker Recognition System Using MFCC and Vector Quantization
No ratings yet
Speaker Recognition System Using MFCC and Vector Quantization
7 pages
Speaker Recognition Using MFCC and VQ
No ratings yet
Speaker Recognition Using MFCC and VQ
2 pages
Enable Database Table Logging in SAP
No ratings yet
Enable Database Table Logging in SAP
3 pages
Speaker Recognition System - v1
No ratings yet
Speaker Recognition System - v1
7 pages
MFCC and Vector Quantization For Arabic Fricatives2012
No ratings yet
MFCC and Vector Quantization For Arabic Fricatives2012
6 pages
CMG Numerical Methods
No ratings yet
CMG Numerical Methods
4 pages
Automatic Speaker Recognition System
No ratings yet
Automatic Speaker Recognition System
11 pages
Utterance Based Speaker Identification
No ratings yet
Utterance Based Speaker Identification
14 pages
Speaker Recognition Using Matlab
No ratings yet
Speaker Recognition Using Matlab
14 pages
Final Year Project Progress Report
No ratings yet
Final Year Project Progress Report
17 pages
Maretext Independent Speaker Identification Based On K-Mean Algorithm
No ratings yet
Maretext Independent Speaker Identification Based On K-Mean Algorithm
9 pages
Sap FB08 & F.80 Tutorial: Document Reversal
100% (8)
Sap FB08 & F.80 Tutorial: Document Reversal
16 pages
Hemt Technology For High Speed Logic &communication
No ratings yet
Hemt Technology For High Speed Logic &communication
19 pages
FM Direction Finder
No ratings yet
FM Direction Finder
21 pages
Mini Pro 2
No ratings yet
Mini Pro 2
18 pages
Advanced Signal Processing Using Matlab
No ratings yet
Advanced Signal Processing Using Matlab
20 pages
EEL6586 Final Project:: A Speaker Identification and Verification System
No ratings yet
EEL6586 Final Project:: A Speaker Identification and Verification System
16 pages
Low Power Wireless Sensor Network - Document
100% (5)
Low Power Wireless Sensor Network - Document
30 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
Digital Signal Processing "Speech Recognition": Paper Presentation On
No ratings yet
Digital Signal Processing "Speech Recognition": Paper Presentation On
12 pages
Advanced Topics in Speech Processing (IT60116) : K Sreenivasa Rao School of Information Technology IIT Kharagpur
No ratings yet
Advanced Topics in Speech Processing (IT60116) : K Sreenivasa Rao School of Information Technology IIT Kharagpur
17 pages
Speaker Recognition Using Vector Quantization and Gaussian Mixture Models
No ratings yet
Speaker Recognition Using Vector Quantization and Gaussian Mixture Models
6 pages
Speaker Recognition Publish
No ratings yet
Speaker Recognition Publish
6 pages
Putting Memory in Circuits: Memdevices
No ratings yet
Putting Memory in Circuits: Memdevices
26 pages
LBG VQ
No ratings yet
LBG VQ
3 pages
QUIZ
83% (6)
QUIZ
34 pages
Project Report: "In Pursuit of Global Competitiveness"
75% (4)
Project Report: "In Pursuit of Global Competitiveness"
9 pages
Volume Based Broadband Packages SLT
No ratings yet
Volume Based Broadband Packages SLT
3 pages
University of Gujrat: Important Instructions
No ratings yet
University of Gujrat: Important Instructions
2 pages
Speaker Recognition System
No ratings yet
Speaker Recognition System
7 pages
Electronic Nose
0% (1)
Electronic Nose
28 pages
Leadership: Presented By: Mayank Shukla MBA, IV Sem
No ratings yet
Leadership: Presented By: Mayank Shukla MBA, IV Sem
12 pages
2022 - Overland Conveyor - Belt Analyst Price Sheet
No ratings yet
2022 - Overland Conveyor - Belt Analyst Price Sheet
2 pages
11 - Ir. Dr. Harriezan Ahmad PDF
No ratings yet
11 - Ir. Dr. Harriezan Ahmad PDF
10 pages
Iti Manakapur, Training Report
100% (3)
Iti Manakapur, Training Report
103 pages
Seño, Judy Ann F
No ratings yet
Seño, Judy Ann F
4 pages
Hedha Houa
No ratings yet
Hedha Houa
5 pages
JERICHO - ReferenceArchitectureForSecurityandPrivacy
No ratings yet
JERICHO - ReferenceArchitectureForSecurityandPrivacy
156 pages
Monalisha Barik Paper
No ratings yet
Monalisha Barik Paper
5 pages
Debre Markos Institute of Technology (DMIT) School of Computing Software Engineering Academic Program Final Year Project Title
No ratings yet
Debre Markos Institute of Technology (DMIT) School of Computing Software Engineering Academic Program Final Year Project Title
5 pages
OnlineExaminationAgreement Letter
No ratings yet
OnlineExaminationAgreement Letter
2 pages
Embedded Event Manager
No ratings yet
Embedded Event Manager
5 pages
Consolidate Block-Compressed Index: Purpose
No ratings yet
Consolidate Block-Compressed Index: Purpose
2 pages
M S Reshman Mazhoodha Resume
No ratings yet
M S Reshman Mazhoodha Resume
2 pages
Agilent Eesof Eda: Overview On Designing A Low-Noise Vco On Fr4
No ratings yet
Agilent Eesof Eda: Overview On Designing A Low-Noise Vco On Fr4
6 pages
String Manipulation - Output Questions
No ratings yet
String Manipulation - Output Questions
10 pages
Ahmad Monir 2024 Resume
No ratings yet
Ahmad Monir 2024 Resume
4 pages
Favsi m3 (Models)
No ratings yet
Favsi m3 (Models)
48 pages
Modern Fortran Explained Michael Metcalf Download PDF
100% (8)
Modern Fortran Explained Michael Metcalf Download PDF
53 pages
MS Word4
No ratings yet
MS Word4
1 page
Module 1 - Introduction To Hypershade
No ratings yet
Module 1 - Introduction To Hypershade
50 pages
3 SpecJLRPart3 1.3 171130
No ratings yet
3 SpecJLRPart3 1.3 171130
51 pages
Osy Report
No ratings yet
Osy Report
16 pages
Balanced Course Assignment
No ratings yet
Balanced Course Assignment
7 pages
Internship Report - Satyam Gawali
No ratings yet
Internship Report - Satyam Gawali
34 pages
Unit 5 (Automatic Speech Recognition)
No ratings yet
Unit 5 (Automatic Speech Recognition)
13 pages
C0604238513 MTBL BS6 Diagnostic v3.4.1 Setup SOP V1
No ratings yet
C0604238513 MTBL BS6 Diagnostic v3.4.1 Setup SOP V1
10 pages
LOZ - Tears of The Kingdom Dynamic FPS, Static FPS, and Visual Fixes Patch Collection - GBAtemp - Net - The Independent Video Game Community
No ratings yet
LOZ - Tears of The Kingdom Dynamic FPS, Static FPS, and Visual Fixes Patch Collection - GBAtemp - Net - The Independent Video Game Community
10 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Digital Voice Analysis

Uploaded by

Digital Voice Analysis

Uploaded by

DIGITAL VOICE ANALYSIS

Speech Speaker Language

Text -dependent Text -independent Text -dependent Text -independent

Closed -set Closed -set Closed -set Open -set

• Determine whether person is who they claim to be

• User makes identity claim: one to one mapping

• Unknown voice could come from large set of unknown speakers -

Is this Kartic’s voice?

Akash Feature Akash

 1st phase of SIS is Enrollment

Pattern Matching and Classification

Front-end Processing generally consists of three sub-processes

 The speech signal is a slowly timed varying signal called

 Each frame is multiplied with a window function w(n) with length N,

 Typically the Hamming window is used. It preserves higher order

Continous Frame Frame

Feature Extracted Coefficients 13

 convert the log mel spectrum back to time

Stochastic models provide more flexibility and better results.

A method of reducing/compressing the number of training

VQ is the process of mapping feature vectors in a vector space

We would develop selected algorithms related to speaker identification in

For hardware implementation we would either use interfacing or

 Speaker verification is one of the few recognition areas

 Speaker verification technology is a viable technique

 Speaker verification can be augmented with other

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.