0% found this document useful (0 votes)

8 views9 pages

ASR For Embedded Real Time Applications: K.Kartheek, D.V.Srihari Babu

The document discusses a hardware-software coprocessing system for automatic speech recognition (ASR) optimized for embedded real-time applications. It features a Gaussian mixture model (GMM) accelerator implemented on a field-programmable gate array, achieving a real-time factor of 0.62 and a word accuracy rate of 93.33%. The system is designed to enhance performance while maintaining flexibility for various voice-controlled applications.

Uploaded by

Chayan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views9 pages

ASR For Embedded Real Time Applications: K.Kartheek, D.V.Srihari Babu

Uploaded by

Chayan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

IOSR Journal of Electrical and Electronics Engineering (IOSR-JEEE)

ISSN: 2278-1676Volume 3, Issue 3 (Nov. - Dce. 2012), PP 28-36

www.iosrjournals.org

ASR For Embedded Real Time Applications

K.Kartheek1, D.V.Srihari Babu2
1. K. Kartheek, M.Tech, SKTRMCE,
2. D.V.Srihari Babu, M.Tech (Ph.D.), Assoc. Proff, S.K.T.R.M.C.E.,

Abstract: The system consists of a standard microprocessor and a hardware accelerator for Gaussian mixture
model (GMM) emission probability calculation implemented on a field-programmable gate array. The GMM
accelerator is optimized for timing performance by exploiting data parallelism. In order to avoid large memory
requirement, the accelerator adopts a double buffering scheme for accessing the acoustic parameters with no
assumption made on the access pattern of these parameters. Experiments on widely used benchmark data show
that the real-time factor of the proposed system is 0.62, which is about three times faster than the pure software-
based baseline system, while the word accuracy rate is preserved at 93.33%. As a part of the recognizer, a new
adaptive beam-pruning algorithm is also proposed and implemented, which further reduces the average real-
time factor to 0.54 with the word accuracy rate of 93.16%. The proposed speech recognizer is suitable for
integration in various types of voice (speech)-controlled applications.
Index Terms: Automatic speech recognition (ASR), embedded system, hardware–software codesign, real-time
system, softcore-based system.

I. Introduction
Automatic speech recognition (ASR) on embedded platforms has been gaining its popularity. ASR has
been widely used in human–machine interaction, such as mobile robots, consumer electronics, manipulators in
industrial assembly lines, automobile navigation systems, and security systems. More sophisticated ASR
applications with larger vocabulary sizes and more complex knowledge sources are expected in the future. As a
result, the demand for high performance, accurate, and fast embedded ASR is increasing. This approach enables
fast deployment of ASR-based applications. However, the timing performance is constrained by the processing
power and memory bandwidth of the target platforms. At another extreme, a speech recognizer can be tailor-
made in a pure hardware-based system for good timing performance. However, in many human-machine
interaction applications, the search space for decoding speech varies dynamically depending on the user’s
response. A dedicated hardware architecture with a static search space has limited capabilities to deal with the
dynamic nature of ASR. In addition, the architecture becomes too application-specific and targets to only ASR
applications. It is unlikely that the datapath of the hardware can be reused for applications other than ASR. As a
compromise, a hardware–software codesign approach seems to be attractive. A typical hardware–software
coprocessing system consists of a general purpose processor and hardware units that accelerate time critical
operations to achieve required performance. Computationally intensive parts of the algorithm can be handled by
the hardware accelerator(s), while sequential and control-oriented parts can be run by the processor core. The
additional advantages of the hardware–software approach include the following:
1) Rapid prototyping of applications. Developers can build their applications in software without knowing
every detail of the underlying hardware architecture.
2) Flexibility in design modification. The parts of the algorithm which require future modification can be
implemented initially in software.
3) Universality in system architecture. The use of the general purpose processor core enables developers to
integrate ASR easily with other applications.

In this paper, we present the development and tradeoffs of a hardware–software coprocessing ASR
system which primarily targets on embedded applications. The system includes an optimized hardware
accelerator that deals with the critical part of the ASR algorithm. The final system achieves real-time
performance with a combination of software- and hardware implemented functionality and can be easily
integrated into applications with voice (speech) control.

www.iosrjournals.org 28 | Page
ASR For Embedded Real Time Applications

Fig.1. Data flow diagram of a typical ASR system. The input of the system is an audio speech signal. The output
is a sequence of words.

II. Automatic Speech Recognition System

In a typical hidden Markov model (HMM)-based ASR system, three main stages are involved. Fig. 1
shows the data flow within the ASR algorithm. The first stage is feature extraction. Its main purpose is to
convert a speech signal into a sequence of acoustic feature vectors, oT1 = {o1, o2, . . . , oT }, where T is the
number of feature vectors in the sequence. The entire speech signal is segmented into a sequence of shorter
speech signals known as frames. The time duration of each frame is typically 25ms with 15ms of overlapping
between two consecutive frames. Each frame is characterized by an acoustic feature vector consisting of D
coefficients. One of the widely used acoustic features is called mel frequency cepstral coefficient (MFCC).
Feature extraction continues until the end of the speech signal is reached. The next stage is the calculation of the
emission probability which is the likelihood of observing an acoustic feature vector. The emission probability
densities are often modeled by Gaussian mixture models (GMMs). The last stage is Viterbi search which
involves searching for the most probable word transcription based on the emission probabilities and the search
space. The use of weighted finite state transducers (WFSTs) offers a tractable way for representing the search
space. The advantage is that the search space represented by a WFST is compact and optimal. Fig. 2 shows an
example of a search space. Basically, a WFST is a finite state machine with a number of states and transitions.
As shown in Fig. 2, each WFST transition has an input symbol, an output symbol, and a weight. The input
symbols are the triphone or biphone labels. The output symbols are the word labels. In ASR, a word is
considered as a sequence of subword units called phones. Two or three phones are concatenated to form
biphones or triphones. Each triphone or biphone label is modeled by an HMM. In other words, each WFST
transition in Fig. 2 is substituted by an HMM. The entire WFST is essentially a network of HMM states. The
WFST weights are the language model probabilities which model the probabilistic relationship among the words
in a word sequence. Usually, a word is grouped with its preceding (n − 1) words. The n-word sequence called n-
gram is considered as a probabilistic event. The WFST weights estimate the probabilities of such events.
Typical n-grams used in ASR are unigram (one word), bigram (two word, also known as word pair grammar),
and trigram (three word). For implementation purposes, each HMM state has a bookkeeping entity called token
which records the probability (score) of the best HMM state sequence ended at that state.Each token is
propagated to its succeeding HMM states according to the topology of the search space. For example, in Fig. 2,
the token in State 3 of the /k-ae+t/ HMM will replicate itself. One will propagate to its own state with HMM
transition probability (a33 in this example) added to the token’s score. Another replicated token will enter State
1 of the /ae-t/ HMM and the WFST weight (ω6) will be added to its score. When two tokens meet at an HMM
state, only the better token with a higher score survives and stays at the HMM state. Other losing tokens are
discarded. This method of performing the Viterbi search is known as token passing. In addition to a score,
tokens also record a sequence of word labels encountered during propagation. The pseudocode of the ASR
algorithm is shown in Fig. 3. In the beginning of the algorithm, a token is instantiated in each of HMM states at
the start of each word (Line 2). Qword−start is a set of word-starting HMM states. The score of each token is
reset (Line 3). After the initialization, the algorithm begins to process each frame of speech. An acoustic feature
vector, ot, is generated by feature extraction (Line 6) for each speech frame. In practice, it is intractable to
perform a complete Viterbi search over all the HMM states within the search space. Therefore, pruning is
essential for practical applications with the cost of introducing search errors. One of the common pruning
techniques is called beam pruning. A pruning threshold is determined by subtracting a certain value called
pruning beamwidth from the maximum token score (Lines 8–9). A token remains active if its score is above the
pruning threshold (Line 13). Otherwise, the token is discarded. Pruning is said to be tight when the pruning
beamwidth is narrow, which reduces the number of active tokens in the search space. Since there are fewer
tokens, the decoding time is shorter. However, the word accuracy rate tends to decrease in tight pruning since
the token with the correct word transcription has a greater chance to be discarded.

www.iosrjournals.org 29 | Page
ASR For Embedded Real Time Applications

Fig. 2. Search space represented by a WFST. Each WFST transition x : y/z has three attributes. x is an input
symbol representing a triphone or biphone label. y is an output label representing a word label.

Fig. 3. Pseudocode of the speech recognition algorithm with beam pruning.

Fig. 4. Pseudocode of the V iterbi_search function

www.iosrjournals.org 30 | Page
ASR For Embedded Real Time Applications

After feature extraction and setting the pruning threshold, the algorithm iterates through all the HMM
states that have a token (Lines 12–18). If the token stays above the pruning threshold, the emission probability
of that state is calculated (Line 14). After that, Viterbi search is performed on that HMM state (Line 15). Token-
passing takes place during this process. It returns a set of new HMM states, V, which are occupied by the new
tokens after token passing. The new tokens are accumulated into another set ˜Qt+1 which is prepared for the
next speech frame. Once all the speech frames have been processed, the best token is found among all the word-
end HMM states denoted by Q (Lines 21–22). The best token records its propagation path from which the word
transcription can be determined. Fig. 4 shows the pseudocode of the V iterbi_search() function. The for-loop
iterates through all the succeeding states of q (Lines 2–9). For each succeeding state, new_score is calculated
(Line 3) where the transition weight can be either the HMMtransition probability for within-HMM transitions or
the WFST transition weight for cross-HMM transitions. If new_score is greater than the score at q_suc, the
new_score will update the score at q_suc (Line 5). The path record of the original token at q_suc is replaced by
the path record at q (Line 6). The pseudocode shows that there are three major levels of iterations in the ASR
algorithm: 1) iteration of T speech frames (Line 5 in Fig. 3); 2) iteration of ˜Qt HMM states in each frame (Line
12 in Fig. 3); and 3) iteration of qsuc states for each active HMM state (Line 2 in Fig. 4). Since the search result
of each speech frame in the first iteration loop depends on previous frames, only the second and the third loops
are suitable for possible parallelism. However, data contention is likely to occur because an HMM state is often
a qsuc state of multiple HMM states. The impact of contention on timing performance needs to be carefully
studied if parallelism is adopted. The performance of an ASR system is often evaluated by two metrics. The first
metric is word accuracy rate which is defined as follows [23]:

(1)
where n is the total number of words. s is the number of word substitutions (incorrectly recognized
words). d and i are the numbers of word deletions and word insertions, respectively. The second metric is real-
time factor which measures the timing performance of the ASR system. It is defined as follows:

(2)

III. Hardware–Software Coprocessing System

The ASR algorithm is partitioned into three main parts: feature extraction, GMM emission probability
calculation, and Viterbi search. The speech recognizer is first implemented in software where the 16-b fixed-
point implementation of the recognizer is compared with the floating-point implementation. The experimental
results show that there is no degradation in recognition accuracy in the fixed-point implementation. Hence, the
fixed-point system is chosen as our baseline system for time profiling. It shows that about 69% of the total
elapsed time is spent on GMM computation. The proportions of time spent on feature extraction and Viterbi
search are 7% and 24%, respectively. Since GMM computation is the most computationally intensive part, a
hardware accelerator is designed in order to speed up this part of the ASR algorithm.

A. System Architecture
The architecture of the hardware–software coprocessing system is shown in Fig. 5. The system consists
of an Altera Nios II processor core and a GMM hardware accelerator. The Nios II processor acts as the control
unit of the entire system. Feature extraction and Viterbi search are implemented in software. When the system
needs to perform a GMM calculation, the processor instructs the accelerator to carry out the computation. The
accelerator returns the computation result to the Nios II core. The entire coprocessing system is synthesized on
an Altera Stratix II EP2S60F672C5ES field-programmable gate array (FPGA).

www.iosrjournals.org 31 | Page
ASR For Embedded Real Time Applications

Fig. 5. System architecture of the hardware–software coprocessing recognizer with the GMM hardware
accelerator. Inside the brackets, it shows the data size and the ASR substages in which the data are accessed.
The Nios II processor performs feature extraction and Viterbi search, while the GMM accelerator is used for
GMM computation.

B. GMM Emission Probability Hardware Accelerator

1) Datapath: The GMM hardware accelerator calculates the log emission probability of an observation vector
given an HMM state. Given an observation feature vector ot, the emission probability function in an HMM state
j is modeled by a sum of weighted Gaussian mixtures

(3)

where bjm(ot) is the probability density function of the weighted mth Gaussian mixture. N(.) denotes a
Gaussian mixture. The mean vector and the covariance matrix of the Gaussian mixture are denoted by μjm and
Σjm, respectively. Since the coefficients of a feature vector are assumed to be independent, Σjm is a diagonal
matrix. The total number of Gaussian mixtures isM per HMM state. The weight of the mth Gaussian mixture is
cjm. The logarithm of a weighted Gaussian mixture, log bjm(ot), can be expressed by the following equation:

(4)
In the equation, o(d) t is the dth dimension of the observation vector at time t. D is the dimension of the
observation vector. In many ASR applications, the typical value of D is 39, which is commonly adopted by the
research community. μ(d) jm is the dth dimension of the μjm mean vector. Cjm, v(d) jm, and gjm are constants
defined as follows:

(5)

(6)

(7)

www.iosrjournals.org 32 | Page
ASR For Embedded Real Time Applications

where the symbol (σ(d) jm)2 is the dth feature variance, which is the dth diagonal element of the
covariance matrix. The log emission probability, log bj(ot), can be evaluated recursively by the following
equation:

(8)
The ⊕ symbol represents the log-add operator, which has the following definition and approximation:

(9)
where z = x − y. When |z| is greater than a threshold, the difference between exp(x) and exp(y) is large
enough to just consider only the greater number. The threshold value of 16 is chosen because it shows no
degradation in recognition accuracy and also it is a power of two. Several different thresholds (8, 16, and 32) are
tested. The word accuracy rates stay at 93.33% for threshold values of 16 and 32, whereas there is a slight
decrease in word accuracy (about 0.03%) when the threshold is 8. The log(1 + exp(.)) function can be calculated
offline and stored in a lookup table. The |z| value can be used as the look-up index of the table. It can be seen
from (4) that there is a summation of D interim values. Since these values are independent of each other, it is
possible to compute N of them at the same time in parallel, where 1 ≤ N ≤ D. For example, if N = D, D interim
values are calculated in one go. However, if 1 < N < D, N interim values are calculated each time and it requires
_D/N_ iterations to calculate all the values. In contrast, there is no parallelism if N = 1. In other words, the
degree of parallelism is governed by N, and it is a design variable which needs to be optimally chosen. In order
to avoid pipeline stalls, the hardware accelerator adopts a double-buffering scheme as shown in Fig. 6. Each
buffer contains the GMM parameters of an HMM state. Since the Avalon bus is 32-b wide and there are two
separate memories (SRAM and SDRAM), 8 B of parameters can be loaded to the buffer in each clock cycle.
GMM calculation and Viterbi search are performed during the retrieval of the next HMM state parameters from
the off-chip memories. The accelerator only needs to store the parameters of two HMM states, which are about
1280 B in the internal memory of the FPGA chip. Observation vector only needs to be loaded once for each
speech frame. The size of the observation vector buffer is 78 B.

Fig. 6. Double-buffering inside the GMM hardware accelerator. The arithmetic unit is reading from one buffer
while another buffer is retrieving GMM parameters from off-chip memories.

The major differences between the proposed system and the other coprocessing system are as follows:
a) The GMM accelerator has only one computation unit for calculating one dimension. We argue that this
architecture is not optimized. The proposed system includes N computation units and a parallel adder
block to further employ data parallelism.

www.iosrjournals.org 33 | Page
ASR For Embedded Real Time Applications

b) The accelerator in their system computes log bjm(ot) only. The summation of Gaussian mixtures is done
by the general purpose processor in software, while the proposed accelerator includes a hardware log-add
unit and the final output is log bj(ot).
c) The accelerator in their system internally stores 128 kB of HMM parameters, which is about 20% of the
total amount. This makes the architecture infeasible for larger vocabulary tasks. In addition, the
parameters are predetermined. The parameters of the most probable HMM states, which are found by
offline profiling on the test speech data, are stored inside the accelerator. In contrast, our proposed
accelerator only stores two HMM states (1280 B). Furthermore, we do not make any assumptions on
which HMM states should be stored.

2) Timing Profile:
After synthesis and place and route, the proposed system is implemented on the target FPGA board.
The first experiment is to investigate the relationship between the speedup in GMM calculation and the number
of parallel computation units (N). The aim is to find the smallest number of computation units with maximum
speed up. Fig. 9 shows the number of clock cycles for GMM calculation versus the number of computation
units. The task is the Resource Management (RM1) task, which consists of 1200 test utterances. The vocabulary
size is 993. Triphone HMM models with three emitting states and four Gaussian mixtures per state are trained
on 2880 utterances. Acoustic features are 39-D MFCCs with the zeroth coefficient plus their delta and delta–
delta coefficients. The language model is word-pair grammar (bigram). In terms of word accuracy, the GMM
accelerator is the exact implementation of the algorithm. Hence, the word accuracy rate is 93.33% which is the
same as that of the pure software-based system.

3) Resource Usage:
Table II shows the resource usage of the GMM hardware accelerator. Adaptive Logic Module (ALM),
which can be programmed to perform logic functions, is the building block of a Stratix II FPGA device. M4K
RAM blocks on the FPGA provide on-chip memory storage. Hardware multipliers are also embedded on the
FPGA.

IV. Adaptive Pruning

Our goal is to reduce the decoding time of those utterances which have a relatively greater real-time
factor, while keeping the recognition accuracy of the other utterances. In order to fulfil this goal, an adaptive
pruning scheme is proposed, where the pruning beamwidth is adaptive according to the number of active tokens.

A. Algorithm
Fig. 11 shows the pseudocode of the ASR algorithm with adaptive pruning. In the beginning, the
beamwidth is initialized to a value (Line 4). Before token passing, the algorithm modifies the pruning
beamwidth according to the number of active tokens, n(˜Qt). If the number of tokens is greater than a threshold,
τupper, a tighter beamwidth is adopted. The beamwidth is decreased by a certain amount denoted by δ (Lines
11–12). However, if the number of active tokens is smaller than another threshold, τlower, and also if the
beamwidth is tightened previously, the beamwidth will be relaxed and its value will be increased by δ (Lines
13–16). The rest of the algorithm is the same as the one shown in Fig. 3. The proposed pruning scheme is more
flexible than the narrow and fixed pruning scheme. The number of active tokens is often time varying in the
duration of an utterance. The fixed pruning scheme applies a tight beamwidth throughout theentire utterance
regardless of the number of active tokens. On the other hand, the adaptive scheme allows relaxation of the
beamwidth in parts of the utterance where the workload is less heavy. In terms of implementation, the proposed
adaptive scheme is simpler than histogram pruning. Implementing histogram pruning requires a sorted list of the
token scores. For each token, the recognizer needs to perform an insertion sort which involves searching for the
token’s ranking in a sorted list of the previously iterated token scores. Maintaining the tokens in a sorted order is
computationally intensive. In contrast, the adaptive pruning scheme only requires to record the number of active
tokens and a few decision-making statements (if-statements) for adjusting the beamwidth once for every speech
frame.

B. Timing Profile
Fig. 12 shows the real-time factor of the coprocessing system. Fixed beam pruning and adaptive beam
pruning are compared. The beamwidth is held constant at 170 for the fixed beam-pruning scheme. In adaptive
beam pruning, the original_beamwidth variable is also set to 170. The thresholds, τlower and τupper, are 1900
and 2300, respectively. The beamwidth adjustment value is 10 (δ = 10). These parameters are determined
empirically. In the fixed beam-pruning scheme, about 94% of the utterances have a real-time factor below one.

www.iosrjournals.org 34 | Page
ASR For Embedded Real Time Applications

When the adaptive beam-pruning scheme is used, this percentage increases to 99.75%. Only 3 out of
1200 utterances have a real-time factor above one. Compared with the fixed beam-pruning scheme, there is a
small degradation in recognition accuracy which decreases from 93.33% to 93.16%.We have also tried to
tighten the adaptive pruning scheme by adjusting τupper and τlower to smaller values (τupper = 1700, τlower =
1250), so that the real-time factors of all the utterances are below 1. The word accuracy rate reduces to 92.62%.

Fig.7. Speech recognition algorithm with adaptive beam pruning.

V. Conclusion
The proposed ASR system shows much better real-time factors than the other approaches without
decreasing the word accuracy rate. Other advantages of the proposed approach include rapid prototyping,
flexibility in design modifications, and ease of integrating ASR with other applications. These advantages, both
quantitative and qualitative, suggest that the proposed coprocessing architecture is an attractive approach for
embedded ASR. The proposed GMM accelerator shows three major improvements in comparison with another
coprocessing system. First, the proposed accelerator is about four times faster by further exploiting parallelism.
Second, the proposed accelerator uses a double-buffering scheme with a smaller memory footprint, thus being
more suitable for larger vocabulary tasks. Third, no assumption is made on the access pattern of the acoustic
parameters, whereas the accelerator has a predetermined set of parameters. Finally, we have presented a novel
adaptive pruning algorithm which further improves the real-time factor. Compared with other conventional
pruning techniques, the proposed algorithm is more flexible to deal with the time-varying number of active
tokens in an utterance. The performance of the proposed system is sufficient for a wide range of speech-
controlled applications. For more complex applications which involve multiple tasks working with ASR, further
improvement of timing performance, for example, by accelerating the Viterbi search algorithm, might be
required.

www.iosrjournals.org 35 | Page
ASR For Embedded Real Time Applications

References
[1] A. Green and K. Eklundh, ―Designing for learnability in human–robot communication,‖ IEEE Trans. Ind. Electron.,
vol. 50, no. 4, pp. 644–650, Aug. 2003.
[2] M. Imai, T. Ono, and H. Ishiguro, ―Physical relation and expression: Joint attention for human-robot interaction,‖
IEEE Trans. Ind. Electron., vol. 50, no. 4, pp. 636–643, Aug. 2003.
[3] B. Jensen, N. Tomatis, L. Mayor, A. Drygajlo, and R. Siegwart, ―Robots meet humans—Interaction in public
spaces,‖ IEEE Trans. Ind. Electron., vol. 52, no. 6, pp. 1530–1546, Dec. 2005.
[4] H. Lam and F. Leung, ―Design and training for combinational neurallogic systems,‖ IEEE Trans. Ind. Electron., vol.
54, no. 1, pp. 612–619, Feb. 2007.
[5] A. Chatterjee, K. Pulasinghe, K. Watanabe, and K. Izumi, ―A particleswarm- optimized fuzzy-neural network for
voice-controlled robot systems,‖ IEEE Trans. Ind. Electron., vol. 52, no. 6, pp. 1478–1489, Dec. 2005.

About the Authors

K. Kartheek,
M.Tech,
Sri Kottam Tulasi Reddy College
Of Engineering & Technology,
A.P, India

D.V.Srihari Babu,
M.Tech (Ph.D.),
Assoc. Proff,
Sri Kottam Tulasi Reddy College
Of Engineering & Technology,
A.P, India.

www.iosrjournals.org 36 | Page

Asr01 Intro
No ratings yet
Asr01 Intro
43 pages
End-to-End Speech Recognition: A Survey
No ratings yet
End-to-End Speech Recognition: A Survey
27 pages
Artificial Intelligence For Cochlear Implants: Review of Strategies, Challenges, and Perspectives
No ratings yet
Artificial Intelligence For Cochlear Implants: Review of Strategies, Challenges, and Perspectives
26 pages
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
100% (1)
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
65 pages
s11277 024 11448 X
No ratings yet
s11277 024 11448 X
35 pages
Preprints202212 0426 v1
No ratings yet
Preprints202212 0426 v1
18 pages
Incorporating Knowledge Sources Into Statistical Speech Recognition
No ratings yet
Incorporating Knowledge Sources Into Statistical Speech Recognition
20 pages
Speech Overview
No ratings yet
Speech Overview
30 pages
Applsci 12 01091
No ratings yet
Applsci 12 01091
18 pages
Implementing A Hidden Markov Model Speech Recognit
No ratings yet
Implementing A Hidden Markov Model Speech Recognit
12 pages
Aerospace 11 00219
No ratings yet
Aerospace 11 00219
13 pages
Automatic Speech Recognition Using Limited Vocabulary2
No ratings yet
Automatic Speech Recognition Using Limited Vocabulary2
22 pages
End-to-End Speech Recognition: A Survey
No ratings yet
End-to-End Speech Recognition: A Survey
27 pages
An Amalgamation of Integrated Features With Deepspeech2 Architecture and Improved Spell Corrector For Improving Gujarati Language Asr System
No ratings yet
An Amalgamation of Integrated Features With Deepspeech2 Architecture and Improved Spell Corrector For Improving Gujarati Language Asr System
13 pages
Chapter One
No ratings yet
Chapter One
13 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
A Review On Different Approaches For Speech - Recognition System
No ratings yet
A Review On Different Approaches For Speech - Recognition System
6 pages
Lecture1 PDF
No ratings yet
Lecture1 PDF
28 pages
Design and Implementation of Automatic Speech Recognition Application 1&2&3 (2) 095630
No ratings yet
Design and Implementation of Automatic Speech Recognition Application 1&2&3 (2) 095630
70 pages
A Study On Automatic Speech Recognition
100% (1)
A Study On Automatic Speech Recognition
2 pages
A Review On Automatic Speech Recognition Architect
No ratings yet
A Review On Automatic Speech Recognition Architect
13 pages
A Review Malay Speech Recognition and Audio Visual Speech Recognition
No ratings yet
A Review Malay Speech Recognition and Audio Visual Speech Recognition
6 pages
Preprocessing Signal
No ratings yet
Preprocessing Signal
6 pages
FARSDAT
No ratings yet
FARSDAT
12 pages
Sensors 20 02326 PDF
No ratings yet
Sensors 20 02326 PDF
19 pages
Spasov Ski 2015
No ratings yet
Spasov Ski 2015
8 pages
Editor in Chief,+recurrent Neural Networks in Automatic Speech Recognition
No ratings yet
Editor in Chief,+recurrent Neural Networks in Automatic Speech Recognition
8 pages
Review of Feature Extraction Techniques in Automatic Speech Recognition
100% (1)
Review of Feature Extraction Techniques in Automatic Speech Recognition
6 pages
ASR Proof
No ratings yet
ASR Proof
19 pages
Convai Technical Overview Speech Ai Part 2 2301964
No ratings yet
Convai Technical Overview Speech Ai Part 2 2301964
11 pages
IJCRT2204469
No ratings yet
IJCRT2204469
5 pages
Lectures 1 Rabiner Speech Processing
No ratings yet
Lectures 1 Rabiner Speech Processing
77 pages
An In-Depth Analysis of Automatic Speech Recognition System
No ratings yet
An In-Depth Analysis of Automatic Speech Recognition System
5 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
9 pages
Research Paper
No ratings yet
Research Paper
9 pages
(IJCST-V11I2P2) :pooja Shirude, Mohit Chaudhari, Gaurav Baviskar, Mahesh Kanhere
No ratings yet
(IJCST-V11I2P2) :pooja Shirude, Mohit Chaudhari, Gaurav Baviskar, Mahesh Kanhere
3 pages
Easychair Preprint: Adnene Noughreche, Sabri Boulouma and Mohammed Benbaghdad
No ratings yet
Easychair Preprint: Adnene Noughreche, Sabri Boulouma and Mohammed Benbaghdad
8 pages
Comparative Analysis of Automatic Speech Recognition Techniques
No ratings yet
Comparative Analysis of Automatic Speech Recognition Techniques
8 pages
The Impact of Speech Recognition On Speech Synthesis
No ratings yet
The Impact of Speech Recognition On Speech Synthesis
8 pages
A Review On Speech Recognition Challenge
No ratings yet
A Review On Speech Recognition Challenge
7 pages
Major Advancements in Automatic Speech Recognition Technology
No ratings yet
Major Advancements in Automatic Speech Recognition Technology
3 pages
Speech Recognition-Statistical Methods
No ratings yet
Speech Recognition-Statistical Methods
18 pages
Blessie Research
No ratings yet
Blessie Research
8 pages
As R Tutorial
No ratings yet
As R Tutorial
16 pages
Artificial Intelligence-An Introduction: Department of Computer Science & Engineering
No ratings yet
Artificial Intelligence-An Introduction: Department of Computer Science & Engineering
17 pages
Redaction HTK Amazigh Speech
No ratings yet
Redaction HTK Amazigh Speech
15 pages
Analysis of Voice Recognition Algorithms Using Matlab: Atheer Tahseen Hussein
No ratings yet
Analysis of Voice Recognition Algorithms Using Matlab: Atheer Tahseen Hussein
6 pages
Voice Recognition System Speech To Text
No ratings yet
Voice Recognition System Speech To Text
5 pages
Automatic Speech Recognition: A Review: Anchal Katyal, Amanpreet Kaur, Jasmeen Gill
No ratings yet
Automatic Speech Recognition: A Review: Anchal Katyal, Amanpreet Kaur, Jasmeen Gill
4 pages
Solomon Teferra Abate, Martha Yifiru Tachbelie, Wolfgang Menzel - Amharic
No ratings yet
Solomon Teferra Abate, Martha Yifiru Tachbelie, Wolfgang Menzel - Amharic
12 pages
TTL - Technology Collaborative Tools in The Digital World
No ratings yet
TTL - Technology Collaborative Tools in The Digital World
88 pages
Speech Recognition Algo
No ratings yet
Speech Recognition Algo
17 pages
Fanuc 10 Alarm List
50% (2)
Fanuc 10 Alarm List
8 pages
IRJET Speech Scribd
No ratings yet
IRJET Speech Scribd
3 pages
Automated Speech Recognition Systems Applications in Industry
No ratings yet
Automated Speech Recognition Systems Applications in Industry
4 pages
Empowerment Technologies Unit 1 Lesson 1 Introduction To Information and Communication Technologies
No ratings yet
Empowerment Technologies Unit 1 Lesson 1 Introduction To Information and Communication Technologies
17 pages
InfiniSolar Plus 5KW Manual 201501203
No ratings yet
InfiniSolar Plus 5KW Manual 201501203
54 pages
Automatic Speech Recognition: MD SHAKIR ALAM (2K18/CO/194)
No ratings yet
Automatic Speech Recognition: MD SHAKIR ALAM (2K18/CO/194)
2 pages
Voice Recognition System
No ratings yet
Voice Recognition System
4 pages
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
No ratings yet
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
6 pages
Kubernetes On OpenStack Ebook Final
No ratings yet
Kubernetes On OpenStack Ebook Final
27 pages
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
No ratings yet
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
6 pages
RCC21 Subframe Analysis
No ratings yet
RCC21 Subframe Analysis
10 pages
8 Dec DSA Roadmap From Beginner To Advanced With A Focus On
No ratings yet
8 Dec DSA Roadmap From Beginner To Advanced With A Focus On
22 pages
FAA Form 337 User Guide
No ratings yet
FAA Form 337 User Guide
165 pages
Speech Recognition As Emerging Revolutionary Technology
No ratings yet
Speech Recognition As Emerging Revolutionary Technology
4 pages
LIFECO Product Digital Catalogue
No ratings yet
LIFECO Product Digital Catalogue
48 pages
Completed SOW
No ratings yet
Completed SOW
3 pages
Overlap Add Save
No ratings yet
Overlap Add Save
8 pages
SAP Simple Finance Training Course Content
No ratings yet
SAP Simple Finance Training Course Content
5 pages
FOSS User Magazine - 2010 Dec
100% (1)
FOSS User Magazine - 2010 Dec
32 pages
Chapter 1
No ratings yet
Chapter 1
60 pages
Clustal
No ratings yet
Clustal
2 pages
Umesh Ratan Singh Sugara
No ratings yet
Umesh Ratan Singh Sugara
1 page
Project Phase 1
No ratings yet
Project Phase 1
29 pages
Creo Elements Direct Product Update and Roadmap en Pub
No ratings yet
Creo Elements Direct Product Update and Roadmap en Pub
23 pages
University of Gujrat: Important Instructions
No ratings yet
University of Gujrat: Important Instructions
2 pages
As 3648-1993 Specification and Methods of Test For Packaged Concrete Mixes
No ratings yet
As 3648-1993 Specification and Methods of Test For Packaged Concrete Mixes
7 pages
PDF Export
No ratings yet
PDF Export
110 pages
AIDI 1003 Presentation
No ratings yet
AIDI 1003 Presentation
9 pages
MegaStat Users Guide
No ratings yet
MegaStat Users Guide
72 pages
Mis-0114 Ds Agile Studio 2.8 Release Notes v280 r0
No ratings yet
Mis-0114 Ds Agile Studio 2.8 Release Notes v280 r0
6 pages
DBMS Lec 1
No ratings yet
DBMS Lec 1
19 pages
Computer Systems Servicing (CSS NCII)
No ratings yet
Computer Systems Servicing (CSS NCII)
8 pages
Configuring and Deploying The ODI Console
100% (1)
Configuring and Deploying The ODI Console
16 pages
Dot Net Programming Objective Type Questions Unit 3 and Unit 4
No ratings yet
Dot Net Programming Objective Type Questions Unit 3 and Unit 4
15 pages
Ecg PDF
No ratings yet
Ecg PDF
9 pages
IVMS-4200 v2.5.0.5 Release Notes
No ratings yet
IVMS-4200 v2.5.0.5 Release Notes
7 pages
Efficient Memory Optimization for IoT Intrusion Detection
From Everand
Efficient Memory Optimization for IoT Intrusion Detection
Ethan Evelyn
No ratings yet
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ASR For Embedded Real Time Applications: K.Kartheek, D.V.Srihari Babu

Uploaded by

ASR For Embedded Real Time Applications: K.Kartheek, D.V.Srihari Babu

Uploaded by

IOSR Journal of Electrical and Electronics Engineering (IOSR-JEEE)

ISSN: 2278-1676Volume 3, Issue 3 (Nov. - Dce. 2012), PP 28-36

ASR For Embedded Real Time Applications

II. Automatic Speech Recognition System

Fig. 3. Pseudocode of the speech recognition algorithm with beam pruning.

Fig. 4. Pseudocode of the V iterbi_search function

III. Hardware–Software Coprocessing System

B. GMM Emission Probability Hardware Accelerator

IV. Adaptive Pruning

Fig.7. Speech recognition algorithm with adaptive beam pruning.

About the Authors

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.