0% found this document useful (0 votes)
52 views2 pages

Speech Recognition: 4.1 Front End The Perception Processor 3.5 Energy Delay Squared

This document discusses speech recognition and how it works. It explains that speech recognition involves converting speech signals into acoustic observation vectors using DSP techniques, and then determining the most likely word sequence that corresponds to the observed vectors using probabilistic models. The acoustic and language models are used to calculate the probabilities of word sequences matching the observed vectors.

Uploaded by

akumar5189
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views2 pages

Speech Recognition: 4.1 Front End The Perception Processor 3.5 Energy Delay Squared

This document discusses speech recognition and how it works. It explains that speech recognition involves converting speech signals into acoustic observation vectors using DSP techniques, and then determining the most likely word sequence that corresponds to the observed vectors using probabilistic models. The acoustic and language models are used to calculate the probabilities of word sequences matching the observed vectors.

Uploaded by

akumar5189
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

4.

Speech Recognition

http://www.siliconintelligence.com/people/binu/perception/node21.html

Next: 4.1 Front End Up: The Perception Processor Previous: 3.5 Energy Delay Squared Contents

4. Speech Recognition
Modern approaches to large vocabulary continuous speech recognition are surprisingly similar in terms of their high-level structure [111]. The work described herein is based on the CMU Sphinx 3.2 system, but the general approach is applicable to other speech recognizers [49,74]. The explanation of large vocabulary continuous speech recognition (LVCSR) in this chapter is based on a simple probabilistic model presented in [80,111]. The human vocal apparatus has mechanical limitations that prevent rapid changes to sound generated by the vocal tract. As a result, speech signals may be considered stationary, i.e., their spectral characteristics remain relatively unchanged for several milliseconds at a time. DSP techniques may be used to summarize the spectral characteristics of a speech signal into a sequence of acoustic observation vectors. Typically, 100 such vectors will be used to represent one second of speech. Speech recognition then becomes a statistical problem of deriving the word sequence that has the highest likelihood of corresponding to the observed sequence of acoustic vectors. This notion is captured by the equation: (4.1)

Here,

is a sequence of

words and

is a sequence of

acoustic observation vectors. Equation 4.1 may be read as is the particular word sequence which has maximum a posteriori probability given the observation sequence . Using Bayes' rule, this equation may be rewritten as:

(4.2)

denotes the probability of the acoustic vector sequence . denotes the probability with which the word sequence

given the word sequence occurs in the language.

denotes the probability with which the acoustic vector sequence occurs in the spoken language. is independent of the word sequence, therefore can be computed without knowing . Thus Equation 4.2 may be rewritten as: (4.3)

The set of DSP algorithms that convert the speech signal into the acoustic vector sequence is commonly referred to as the front end. The quantity is generated by evaluating an acoustic model. The term is generated from a language model.

1 of 2

5/20/2002 10:13 AM

4. Speech Recognition

http://www.siliconintelligence.com/people/binu/perception/node21.html

Subsections 4.1 Front End 4.2 Acoustic Model 4.3 Language Model 4.4 Overall Operation 4.5 Architectural Implications

Next: 4.1 Front End Up: The Perception Processor Previous: 3.5 Energy Delay Squared Contents

Binu K. Mathew

2 of 2

5/20/2002 10:13 AM

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy