0% found this document useful (0 votes)

11 views8 pages

NLP Unit V

The document discusses the fundamentals of speech recognition and automatic speaker recognition, detailing the stages of analysis, feature extraction, modeling, and testing. It covers various speech analysis techniques, feature extraction methods, and pattern comparison techniques, including temporal and spectral analysis. Additionally, it explains the mathematical and perceptual considerations in measuring spectral dissimilarity for effective speech pattern recognition.

Uploaded by

Priyanshu Shrivastava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views8 pages

NLP Unit V

Uploaded by

Priyanshu Shrivastava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

UNIT 5

The goal of speech recognition is for a machine to be able to "hear,” understand," and "act upon" spoken
information. The earliest speech recognition systems were first attempted in the early 1950s at Bell Laboratories,
Davis, Biddulph and Balashek developed an isolated digit Recognition system for a single speaker.

The goal of automatic speaker recognition is to analyze, extract characterize and recognize information about the
speaker identity. The speaker recognition system may be viewed as working in a four stages

1. Analysis

2. Feature extraction

3. Modeling

4. Testing

1. Speech Analysis Techniques

Speech data contains different type of information that shows a speaker identity. This includes speaker specific
information due to vocal tract, excitation source and behavior feature. The information about the behavior feature
also embedded in signal and that can be used for speaker recognition. The speech analysis stage deals with stage
with suitable frame size for segmenting speech signal for further analysis and extracting. The speech analysis
technique done with following three techniques.

1.1 Segmentation analysis- In this case speech is analyzed using the frame size of 20 ms with shift of 2.5ms to
extract speaker information. Studies are used to extract vocal tract information of speaker recognition.

1.2 Sub segmental analysis - Speech analyzed using the frame size of 5 ms with shift of 2.5ms is known as Sub
segmental analysis. This technique is used to mainly analyze and extract the characteristic of the excitation state.

1.3 Supra segmental analysis - In this case, speech is analyzed using the frame size of 250 ms with shift of 6.25
ms. This technique is used to analyze and characteristic due to behavior character of the speaker.

2. FEATURE EXTRACTION METHODS

Features of speech have a vital part in the segregation of a speaker from others. Feature extraction reduces the
magnitude of the speech signal devoid of causing any damage to the power of speech signal.

Before the features are extracted, there are sequences of preprocessing phases that are first carried out. The
preprocessing step is pre-emphasis. This is achieved by passing the signal through a FIR filter which is usually a
first-order finite impulse response (FIR) filter. This is succeeded by frame blocking, a method of partitioning the
speech signal into frames. It removes the acoustic interface existing in the start and end of the speech signal.

1
The framed speech signal is then windowed. Bandpass filter is a suitable window that is applied to minimize
disjointedness at the start and finish of each frame. The two most famous categories of windows are Hamming and
Rectangular windows. It increases the sharpness of harmonics, eliminates the discontinuous of signal by tapering
beginning and ending of the frame zero. It also reduces the spectral distortion formed by the overlap.

Feature extraction is process of obtaining different features such as power, pitch, and vocal tract
configuration from the speech signal. Feature extraction involves analysis of speech signal. Broadly the
feature extraction techniques are classified as temporal analysis and spectral analysis technique. In
temporal analysis the speech waveform itself is used for analysis. In spectral analysis spectral
representation of speech signal is used for analysis.

The temporal features (time domain features), which are simple to extract and have easy physical interpretation,
like: the energy of signal, zero crossing rate, maximum amplitude, minimum energy, etc.

The spectral features (frequency-based features), which are obtained by converting the time based signal into the
frequency domain using the Fourier Transform, like: fundamental frequency, frequency components, spectral
centroid, spectral flux, spectral density, spectral roll-off, etc. These features can be used to identify the notes,
pitch, rhythm, and melody.

2.1 Spectral Analysis techniques

2.1.1 Critical Band Filter Bank Analysis

It is one of the most fundamental concepts in speech processing. It can be regarded as crude model of the
initial stages of transduction in human auditory system. Critical bank filter bank is simply bank of linear
phase FIR bandpass filters that are arranged linearly along the Bark (or mel) scale. The bandwidths are
chosen to be equal to a critical bandwidth for corresponding center frequency. Bark i.e. critical band rate
scale and mel scale are perceptual frequency scale defined as

Bark = 13atan(0.76f /1000) + 3.5atan( f 2/(7500)2) (1)

mel frequency = 2595log10(1 + f /700) (2)

An expression for critical bandwidth is

BWcritical = 25 + 75[1 + 1.4(f /1000)2 ]0.69 (3)

2.1.2. Cepstral Analysis

Cepstrum is computed by taking inverse discrete Fourier transform (IDFT) of logarithm of magnitude of
discrete Fourier transform finite length input signal.
N-1

S (n)= (1 / N ) ∑ s(k) exp( j2π / N) nk

K=0
(n) is defined as cepstrum.

In speech recognition cepstral analysis is used for formant tracking and pitch (f0) detection. The samples of (n) in its
first 3ms describe v(n) and can be separated from the excitation. The later is viewed as voiced if (n) exhibits sharp
periodic pulses. Then the interval between these pulses is considered as pitch period. If no such structure is visible in
2
(n), the speech is considered unvoiced.

2.1.3 Mel Cepstrum Analysis

This analysis technique uses cepstrum with a nonlinear frequency axis following mel scale. For obtaining mel
cepstrum the speech waveform s(n) is first windowed with analysis window w(n) and then its DFT S(k) is computed.
The magnitude of S(k) is then weighted by a series of mel filter frequency responses whose center frequencies and
bandwidth roughly match those of auditory critical band filters.

The next step in determining the mel cepstrum is shown below

Emel (n, l )= (1 / Al)  |Vl ( k ) S ( k ) | 2

k =L1

Where Ul and Ll are upper and lower frequency indices over which each filter is nonzero and Al is the energy of
filter which normalizes the filter according to their varying bandwidths so as to give equal energy for flat spectrum.
The real cepstrum associated with Emel(n,l) is referred as the mel-cepstrum and is computed for the speech frame at
time n as
N-1
Cmel (n, m) = (1 / N ) ∑ log{Emel (n,l) }cos [2 π (l + 1 / 2) / N ]
l=0

Such mel cepstral coefficients Cmel provide alternative representation for speech spectra which exploits auditory
principles as well as decorrelating property of cepstrum.

2.1.4 Linear Predictive Coding (LPC) Analysis

The basic idea behind the linear predictive coding (LPC) analysis is that a speech sample can be approximated as
linear combination of past speech samples. By minimizing the sum of the squared differences (over a finite interval)
between the actual speech samples and the linearly predicted ones, a unique set of predictor coefficients is
determined. Speech is modeled as the output of linear, time-varying system excited by either quasi-periodic pulses
(during voiced speech), or random noise (during unvoiced speech). The linear prediction method provides a robust,
reliable, and accurate method for estimating the parameters that characterize the linear time-varying system
representing vocal tract. A very important LPC parameter set which is derived directly from LPC coefficients is LPC
Cepstral coefficients cm. The recursion used for this is
m-1
cm = ∑ (k / m) ckαm – k m>p
k=1
This method is efficient, as it does not require explicit Cepstral computation. Hence combines decorrelating
property of cepstrum with computational efficiency of LPC analysis.

2.2 Temporal Analysis

It involves processing of the waveform of speech signal directly. It involves less computation compared
to spectral analysis but limited to simple speech parameters, e.g., power and periodicity.

2.2.1 Power Estimation

The use of some sort of power measures in speech recognition is fairly standard today. Power is rather
simple to compute. It is computed on frame-by-frame basis as

3
Ns-1

P ( n ) = (1 / Ns)  (w ( m ) s(n− Ns / 2+ m )) I
m=0

Where Ns is the number of samples used to compute the power, s(n) denotes the signal, w(m) denotes the
window function, and the, and n denotes the sample index of center of the window. In most speech
recognition system Hamming window is almost exclusively used. Rather than using power directly in
speech recognition systems use the logarithm of power multiplied by 10, defined as the power in decibels,
in an effort to emulate logarithmic response of human auditory system. It is calculated as

Power in dB = 10 log10(P(n)) (45)

The major significance of P(n) is that it provides basis for distinguishing voiced speech segments from
unvoiced speech segments. The values of P(n) for the unvoiced segments are significantly smaller than that
for voiced segments. The power can be used to locate approximately the time at which voiced speech
becomes unvoiced and vice versa.

2.2.2 Fundamental Frequency Estimation

Fundamental Frequency (f0) or pitch is defined as the frequency at which the vocal cords vibrate during a
voiced sound. Fundamental frequency has long been difficult parameter to reliably estimate from the
speech signal. Previously it was neglected for number of reasons, including large computational burden
required for accurate estimation, the concern that unreliable estimation would be a barrier to achieving high
performance, and difficulty in characterizing complex interactions between f0 and super segmental
phenomenon. It is useful in speech recognition of tonal languages (e.g. Chinese) and languages that have
some tonal components (e.g. Japanese). Fundamental frequency is often processed on logarithmic scale,
rather than a linear scale to match the resolution of human auditory system. There are various algorithms
to estimate f0 we will consider two widely used algorithms: Gold and Rabiner algorithm, cepstrum based
pitch determination algorithm.

Gold and Rabiner algorithm

It is one of earliest and simplest algorithm for f0 estimation. In this algorithm the speech signal is processed so as to
create a number of impulse trains which retain the periodicity of the original signal and discard features which are
irrelevant to the pitch detection process. This enables use of very simple pitch detectors to estimate the period of each
impulse train. The estimates of several of these pitch detectors are logically combined to infer the period of the
speech waveform. The algorithm can be efficiently implemented either in special purpose hardware or on general-
purpose computer.

Cepstrum based Pitch Determination

In the cepstrum, we observe that for the voiced speech there is a peak in the cepstrum at the fundamental period of
the input speech segment. No such a peak appears in the cepstrum for unvoiced speech segment. If the cepstrum peak
is above the preset threshold, the input speech is likely to be voiced, and position of peak is good estimate of pitch
period. Its inverse provides f0. If the peak does not exceed the threshold, it is likely that the input speech segment is
unvoiced.

3. DIFFERENT PATTERN COMPARISON TECHNIQUES

In speech recognition system a speech pattern are compared to determine their similarity. Pattern comparison can be

4
done in wide variety of ways. The comparison between the phoneme feature of an arbitrary piece of speech, and
those predefined for various phonemes classes can be made via a tree as shown in below Figure.

phonemes

diphthon gs
vowels consonents semivowels ay-AY ᴐy-
OY aw-
AW
ey-EY
front nasals
m(M) whisper affricaties liquids
[i-IY,I-IH,e- stops frictives
EH,ᴂ-AE] n(N) h-H w-W
ᶯ(NG) ᶴ-L

mid voiced
[a-AA b-B voiced
glides
ᵌᵊ-ER d-D
r-R
ᴧ-AH g-G
y-Y
AX,
ᴐ-AO]
unvoiced

unvoiced
p-P
back t-T
[u-UW, U-UH, O- -K
OW]

Pattern based approach is of great interest in speech recognition system. We define a test pattern T as the
concatenation of spectral frames over the duration of speech. Such that

T={t1,t2,t3,…….tl}

Where ti is the spectral vector of the input speech at time i & l is the total number of frames of speech. In
a similar manner we define a set of reference patterns, {R1,R2,R3,…..Rv} where each reference pattern,
Rj,1≤ j ≤ v in order to identify the reference pattern that has the minimum dissimilarity, and to associate
the spoken input with this pattern.

There are various techniques exist for pattern comparison in speech recognition system.

3.1 Speech Distortion Measures

3.1.1 Mathematical Considerations
A key component of most pattern-comparison algorithm is a prescribed measurement of dissimilarity
between two feature vectors. This measurement of dissimilarity can be handled with mathematical rigor if
the pattern are visualized in a vector space.

Assume we have two feature vectors x,y defined on a vector space χ. We define a metric or distance function

5
d on the vector space χ as a real-valued function on the Cartesian product χ x χ such that
0≤d(x,y)<∞ for x,y є χ and d(x,y)=0 if and only if x=y

d(x,y)= d(y,x)for x,y є χ

d(x,y) ≤ d(x,z)+ d(y,z) for x,y,z є χ

In addition a distance function is called invariant if d(x+z,y+z)= d(x,y)

Properties of above three equations are commonly referred to as the positive definiteness, the symmetry,
and the triangle inequality conditions, respectively. A metric with the above properties ensure a high
degree of mathematical tractability.
If a measure of difference, d, satisfies only the positive definiteness property, we customarily call it a distortion
measure, particularly when the vectors are representation of signal spectra.

3.1.2. Perceptual Consideration

Choice of an appropriate measure of spectral dissimilarity or distance measure is the concept of subjective
judgment of sound difference. Phonetically relevant distance measure has the property that spectral changes
that perceptually make the sound being compared seem to be different should be associated with large
distance.

Consider two spectral representation S(ω) and S` (ω) using a distance measure d(S,S`). Spectral changes
that perceptually lead to the sounds judged as being phonetically different include a) significant differences
in formant locations i.e. the spectral resonances of S(ω) and S` (ω)occur at very different frequencies b) )
significant differences in formant bandwidth
i.e. the frequency widths of the special resonances of S(ω)
and S` (ω) are very different.

3.1.3 Spectral- Distortion Measures

Measuring the difference between two speech patterns in term of average or accumulated spectral distortion appears
to be a very reasonable way of comparing patterns both in terms of its mathematical tractability and its
computational efficiency. For computational convenience, when processing a time varying signal such as speech, we
often use the short-term autocorrelation defined over a frame of speech samples

{x(i), i = 0,1……N-1}
rn = [rN n , n = 0,1, … . . N − 1
0, n ≥ N
3.2 Log Spectral Distance
The log-spectral distance (LSD), also referred to as log-spectral distortion or root mean square log-spectral
distance, is a distance measure (expressed in dB) between two spectra. The log-spectral distance between spectra
P(w) and P^(w) is defined as:

6
Where P(w) and P^(w) are power spectra. The log spectra distance is symmetric.

In speech coding, log spectral distortion for a given frame is defined as the root mean square difference between the
original LPC log power spectrum and the quantized or interpolated LPC log power spectrum. Usually the average
of spectral distortion over a large number of frames is calculated and that is used as the measure of performance
of quantization or interpolation.

Thus, it calculates the log Spectral Distance between a speech signal and a distorted version of it. It has the
capability of calculating this distance for a specified sub-band as well. This measure is used for evaluation of
processed speech quality in comparison to the original speech.

3.3 Cepstral Distance

The cepstrum is a common transform used to gain information from a person’s speech signal. It can be used to
separate the excitation signal (which contains the words and the pitch) and the transfer function (which contains the
voice quality). The word cepstrum has the first syllable “ceps” which is “spec” with the letters rearranged in a
different order. Thus, the words cepstrum and spectrum are related through an interesting naming convention that
will use again. Below Figure provides a block diagram showing how a signal would be converted to the cepstral
domain.

The complex cepstrum of a signal is defined as the Fourier transform of the log of the signal spectrum. For a power
spectrum (magnitude-squared Fourier transform) S (ω), which is symmetric with respect to (ω) =0 and is periodic
for a sampled data sequence, the Fourier series representation of log S (ω) can be expressed as
∞

log S (ω) =  cn e- jnω

n=∞

where cn = c-n are real and often referred to as the Cepstral coefficients.

3.3 Weighted Cepstral

A weighted cepstral distance measure is proposed and is tested in a speaker-independent isolated word recognition
system using standard DTW (Dynamic Time Warping) techniques. The measure is a statistically weighted distance
measure with weights equal to the inverse variance of the cepstral coefficients. The experimental results show that
the weighted cepstral distance measure works substantially better than both the Euclidean cepstral distance and the
log likelihood ratio distance measures across two different data bases. This result was more than 3% higher than that
7
obtained using the simple Euclidean cepstral distance measure and about 2% higher than the results using the log
likelihood ratio distance measure. The recognition error rate obtained using the weighted cepstral distance measure
was about 1 percent for digit recognition. This result was less than one fourth of that obtained using the simple
Euclidean cepstral distance measure and about one-third of the results using the log likelihood ratio distance
measure. The most significant performance characteristic of the weighted cepstral distance was that it tended to
equalize the performance of the recognizer across different talkers.

The weighted cepstral distance measure dWCEP which is described by the following equation:
p
dWCEP = ∑ w(i) (ct(i) - cr(i))2
i=1
where w(i) is the inverse of the ith diagonal element viiof the covariance matrix V. The measure dWCEP is a weighted
Euclidean distance measure where each individual cepstral component c(i) is variance – equalized by the weight
w(i).

Speech Processing Unit 4 Notes
No ratings yet
Speech Processing Unit 4 Notes
16 pages
Speech Signal Processing
100% (2)
Speech Signal Processing
173 pages
Voice Recognition PDF
No ratings yet
Voice Recognition PDF
37 pages
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
No ratings yet
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
5 pages
Audio Data Analysis Using Machine Learning and Deep
No ratings yet
Audio Data Analysis Using Machine Learning and Deep
74 pages
Feature Extraction For Speech Recogniton: Manish P. Kesarkar (Roll No: 03307003) Supervisor: Prof. Preeti
No ratings yet
Feature Extraction For Speech Recogniton: Manish P. Kesarkar (Roll No: 03307003) Supervisor: Prof. Preeti
12 pages
Cepstrum vs. LPC: A Comparative Study For Speech Formant Frequencies Estimation
No ratings yet
Cepstrum vs. LPC: A Comparative Study For Speech Formant Frequencies Estimation
16 pages
Emotional Feature Analysis
No ratings yet
Emotional Feature Analysis
5 pages
Fault Identification and Monitoring in Rolling Element Bearing
100% (1)
Fault Identification and Monitoring in Rolling Element Bearing
234 pages
Comp Sci - Speech Recognition - Sandeep Kaur
No ratings yet
Comp Sci - Speech Recognition - Sandeep Kaur
6 pages
ETRAN2019 Omar - After - Reviewing
No ratings yet
ETRAN2019 Omar - After - Reviewing
6 pages
7.0 Speech Signals and Front-End Processing: References: 1. 3.3, 3.4 of Becchetti
No ratings yet
7.0 Speech Signals and Front-End Processing: References: 1. 3.3, 3.4 of Becchetti
50 pages
Voice Recognition
100% (1)
Voice Recognition
18 pages
An Automatic Speaker Recognition System
100% (1)
An Automatic Speaker Recognition System
11 pages
Feature Extraction Methods LPC, PLP and MFCC
100% (1)
Feature Extraction Methods LPC, PLP and MFCC
5 pages
Padovani
No ratings yet
Padovani
4 pages
Speech Emotion Recognition
No ratings yet
Speech Emotion Recognition
55 pages
Spectral Restoration Based Speech Enhancement For Robust Speaker Identification
No ratings yet
Spectral Restoration Based Speech Enhancement For Robust Speaker Identification
6 pages
Blok Diagram Pitch Correction
No ratings yet
Blok Diagram Pitch Correction
37 pages
Homomorphic Filtering and Speech Processing Using Cepstrum Analysis
100% (2)
Homomorphic Filtering and Speech Processing Using Cepstrum Analysis
22 pages
The Cepstrum Method
No ratings yet
The Cepstrum Method
31 pages
Speech Feature Extraction
No ratings yet
Speech Feature Extraction
9 pages
Mel Frequency Cepstral Coefficient (MFCC) - Guidebook - Informatica e Ingegneria Online
No ratings yet
Mel Frequency Cepstral Coefficient (MFCC) - Guidebook - Informatica e Ingegneria Online
12 pages
Chapter 2 - Speech Signal Processing
No ratings yet
Chapter 2 - Speech Signal Processing
60 pages
Unit 2 - Speech and Video Processing (SVP) - 1
No ratings yet
Unit 2 - Speech and Video Processing (SVP) - 1
23 pages
Final Report On Speech Recognition Project
No ratings yet
Final Report On Speech Recognition Project
32 pages
199568.speaker Recognition Method Combining FFT Wavelet Functions and Neural Networks
No ratings yet
199568.speaker Recognition Method Combining FFT Wavelet Functions and Neural Networks
4 pages
Voicemorphing (Finel)
100% (1)
Voicemorphing (Finel)
20 pages
Speech Analysis
No ratings yet
Speech Analysis
6 pages
DSP Lab Mini Project
No ratings yet
DSP Lab Mini Project
7 pages
Speaker Recognition System Using MFCC and Vector Quantization
No ratings yet
Speaker Recognition System Using MFCC and Vector Quantization
7 pages
Assesment of Voice Disorders
No ratings yet
Assesment of Voice Disorders
13 pages
13MFCC Tutorial
No ratings yet
13MFCC Tutorial
6 pages
Feature Extraction Techniques For Speech Processing A Review
No ratings yet
Feature Extraction Techniques For Speech Processing A Review
8 pages
The Process of Feature Extraction in Automatic Speech Recognition System For Computer Machine Interaction With Humans: A Review
No ratings yet
The Process of Feature Extraction in Automatic Speech Recognition System For Computer Machine Interaction With Humans: A Review
7 pages
Speaker Recognition Using Mel Frequency Cepstral Coefficients (MFCC) and Vector
No ratings yet
Speaker Recognition Using Mel Frequency Cepstral Coefficients (MFCC) and Vector
4 pages
$Xwrpdwlf6Shhfk5Hfrjqlwlrqxvlqj&Ruuhodwlrq $Qdo/Vlv: $evwudfw - 7Kh Jurzwk LQ Zluhohvv FRPPXQLFDWLRQ
No ratings yet
$Xwrpdwlf6Shhfk5Hfrjqlwlrqxvlqj&Ruuhodwlrq $Qdo/Vlv: $evwudfw - 7Kh Jurzwk LQ Zluhohvv FRPPXQLFDWLRQ
5 pages
Silence Removal
No ratings yet
Silence Removal
3 pages
Lindgren Thesis
No ratings yet
Lindgren Thesis
82 pages
A Statistical Pattern Recognition Paradigm For Vibration-Based Structural Health Monitoring
No ratings yet
A Statistical Pattern Recognition Paradigm For Vibration-Based Structural Health Monitoring
10 pages
Recognizing Voice For Numerics Using MFCC and DTW
No ratings yet
Recognizing Voice For Numerics Using MFCC and DTW
4 pages
Speaker Recognition Using Vocal Tract Features
No ratings yet
Speaker Recognition Using Vocal Tract Features
5 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
MajorInterim Report1
No ratings yet
MajorInterim Report1
10 pages
Spoken Language Identification Using Hybrid Feature Extraction Methods
No ratings yet
Spoken Language Identification Using Hybrid Feature Extraction Methods
5 pages
M FCC Review
No ratings yet
M FCC Review
10 pages
LSA 352 Speech Recognition and Synthesis: Dan Jurafsky
No ratings yet
LSA 352 Speech Recognition and Synthesis: Dan Jurafsky
104 pages
Abstract:: Text-Independent and Dependent Methods. in A Text
No ratings yet
Abstract:: Text-Independent and Dependent Methods. in A Text
11 pages
Gear Noise and Vibration-A Literature Survey: Article
No ratings yet
Gear Noise and Vibration-A Literature Survey: Article
27 pages
Hedha Houa
No ratings yet
Hedha Houa
5 pages
Brent William
No ratings yet
Brent William
173 pages
Intechopen 80419
No ratings yet
Intechopen 80419
18 pages
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
No ratings yet
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
5 pages
MFCC PDF
No ratings yet
MFCC PDF
14 pages
Speaker Recognition
No ratings yet
Speaker Recognition
11 pages
Speech Recognition: A Complete Perspective: Ashok Kumar, Vikas Mittal
No ratings yet
Speech Recognition: A Complete Perspective: Ashok Kumar, Vikas Mittal
6 pages
Algorithm For The Identification and Verification Phase
No ratings yet
Algorithm For The Identification and Verification Phase
9 pages
Speaker Recognition Using Matlab
No ratings yet
Speaker Recognition Using Matlab
14 pages
Telephony-Based Voice Pathology Assessment Using Automated Speech Analysis
No ratings yet
Telephony-Based Voice Pathology Assessment Using Automated Speech Analysis
10 pages
Punjabi A
No ratings yet
Punjabi A
7 pages
Product Data Sheet: VIBRO Condition Monitoring 3 (VCM-3 and VCM-3 Ex)
No ratings yet
Product Data Sheet: VIBRO Condition Monitoring 3 (VCM-3 and VCM-3 Ex)
7 pages
ps6 Soln Fall09
No ratings yet
ps6 Soln Fall09
12 pages
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
No ratings yet
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
3 pages
Monalisha Barik Paper
No ratings yet
Monalisha Barik Paper
5 pages
Reconocimiento de Voz - MATLAB
No ratings yet
Reconocimiento de Voz - MATLAB
5 pages
Automatic Identification of Telephone Speech: Language
No ratings yet
Automatic Identification of Telephone Speech: Language
30 pages
A Practical Handbook of Speech Coders
No ratings yet
A Practical Handbook of Speech Coders
15 pages
Speech Feature Extraction and Classification Techniques: Kamakshi and Sumanlata Gautam
No ratings yet
Speech Feature Extraction and Classification Techniques: Kamakshi and Sumanlata Gautam
3 pages
Maretext Independent Speaker Identification Based On K-Mean Algorithm
No ratings yet
Maretext Independent Speaker Identification Based On K-Mean Algorithm
9 pages
Pathology Voice Detection and Classification Using Ensemble Learning
No ratings yet
Pathology Voice Detection and Classification Using Ensemble Learning
8 pages
DSP Project 2
No ratings yet
DSP Project 2
10 pages
Cepstrum Analysis
No ratings yet
Cepstrum Analysis
13 pages
Hala Paper
No ratings yet
Hala Paper
6 pages
11111111j Apacoust 2020 107289
No ratings yet
11111111j Apacoust 2020 107289
9 pages
IT Report-1
No ratings yet
IT Report-1
14 pages
Acoustic Parameters For Speaker Verification
No ratings yet
Acoustic Parameters For Speaker Verification
16 pages
Indian Institue of Technology 1
No ratings yet
Indian Institue of Technology 1
186 pages
NLP Quantum
No ratings yet
NLP Quantum
108 pages
A Comparison of Some Vibration Parameters For The Condition Monitoring of Rolling Element Bearings
No ratings yet
A Comparison of Some Vibration Parameters For The Condition Monitoring of Rolling Element Bearings
5 pages
Arxiv ICAISC2023 Voice Spoofing Detection
No ratings yet
Arxiv ICAISC2023 Voice Spoofing Detection
14 pages
Performance Evaluation of MLP For Speech Recognition in Noisy Environments Using MFCC & Wavelets
No ratings yet
Performance Evaluation of MLP For Speech Recognition in Noisy Environments Using MFCC & Wavelets
5 pages
CNS Quantum
No ratings yet
CNS Quantum
112 pages
Speaker Identification E6820 Spring '08 Final Project Report Prof. Dan Ellis
No ratings yet
Speaker Identification E6820 Spring '08 Final Project Report Prof. Dan Ellis
16 pages
Major Project List-2
No ratings yet
Major Project List-2
4 pages
NLP Unit II - Part 2
No ratings yet
NLP Unit II - Part 2
16 pages
Periodic Test Schedule (PT-2) - 2
No ratings yet
Periodic Test Schedule (PT-2) - 2
2 pages
EEL6586 Final Project:: A Speaker Identification and Verification System
No ratings yet
EEL6586 Final Project:: A Speaker Identification and Verification System
16 pages
An Automatic Speaker Recognition System
No ratings yet
An Automatic Speaker Recognition System
11 pages
Digital Signal Processing "Speech Recognition": Paper Presentation On
No ratings yet
Digital Signal Processing "Speech Recognition": Paper Presentation On
12 pages
COMMUNICATION SYSTEMS
From Everand
COMMUNICATION SYSTEMS
B.P. Lathi
No ratings yet
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Some Case Studies on Signal, Audio and Image Processing Using Matlab
From Everand
Some Case Studies on Signal, Audio and Image Processing Using Matlab
Dr. Hedaya Mahmood Alasooly
No ratings yet
Filter Bank: Insights into Computer Vision's Filter Bank Techniques
From Everand
Filter Bank: Insights into Computer Vision's Filter Bank Techniques
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

NLP Unit V

Uploaded by

NLP Unit V

Uploaded by

UNIT 5

1. Speech Analysis Techniques

2. FEATURE EXTRACTION METHODS

2.1 Spectral Analysis techniques

2.1.1 Critical Band Filter Bank Analysis

Bark = 13atan(0.76f /1000) + 3.5atan( f 2/(7500)2) (1)

mel frequency = 2595log10(1 + f /700) (2)

An expression for critical bandwidth is

BWcritical = 25 + 75[1 + 1.4(f /1000)2 ]0.69 (3)

2.1.2. Cepstral Analysis

S (n)= (1 / N ) ∑ s(k) exp( j2π / N) nk

2.1.3 Mel Cepstrum Analysis

The next step in determining the mel cepstrum is shown below

Emel (n, l )= (1 / Al)  |Vl ( k ) S ( k ) | 2

2.1.4 Linear Predictive Coding (LPC) Analysis

2.2 Temporal Analysis

2.2.1 Power Estimation

Power in dB = 10 log10(P(n)) (45)

2.2.2 Fundamental Frequency Estimation

Gold and Rabiner algorithm

Cepstrum based Pitch Determination

3. DIFFERENT PATTERN COMPARISON TECHNIQUES

3.1 Speech Distortion Measures

d(x,y)= d(y,x)for x,y є χ

d(x,y) ≤ d(x,z)+ d(y,z) for x,y,z є χ

In addition a distance function is called invariant if d(x+z,y+z)= d(x,y)

3.1.2. Perceptual Consideration

3.1.3 Spectral- Distortion Measures

3.3 Cepstral Distance

log S (ω) =  cn e- jnω

3.3 Weighted Cepstral

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.