0% found this document useful (0 votes)
12 views5 pages

Ibrahim 2020

This document summarizes a study on automatic speech recognition systems. It discusses how speech recognition allows interaction between humans and machines through speech signals. The document reviews different models of speech recognition, including isolated word recognition where single words are recognized separately, and continuous speech recognition which aims to recognize natural speech. Applications of speech recognition are also discussed, such as for identification purposes by recognizing unique speech characteristics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views5 pages

Ibrahim 2020

This document summarizes a study on automatic speech recognition systems. It discusses how speech recognition allows interaction between humans and machines through speech signals. The document reviews different models of speech recognition, including isolated word recognition where single words are recognized separately, and continuous speech recognition which aims to recognize natural speech. Applications of speech recognition are also discussed, such as for identification purposes by recognizing unique speech characteristics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

A Study on Automatic Speech Recognition Systems

Habib Ibrahim Asaf Varol


Information and Communication Technology Department of Software Engineering
University of Education, Winneba Firat University
Winneba, Ghana Elazig, Turkey
habiib30@gmail.com avarol@firat.edu.tr

Abstract—Speech recognition is a technique that enables Automatic speech recognition is a technology that
machines to automatically identify the human voice through establishes an interaction between humans and the machine
speech signals. In other words, it helps create a communication system in a flexible way that is convenient and easy to deal
link between machines and humans. Speech recognition uses with. The advent of speech recognition technology has made
an acoustic and modeling algorithm to execute its task. It
life comfortable for people with learning or physical
allows interaction between computer interfaces and a natural
human voice. Speech recognition is one of the current topics of difficulties. Living in a technological environment where the
discussion in the twenty-first century. The advent of risk of cyber-security attacks is very common, there is a need
technological gadgets in modern society has become rampant to safeguard our identity. There are varieties of authorization
through vigorous efforts made by scientists in realizing their mechanisms for identification, such as alphanumeric
aim of developing an algorithm that will allow machines to password identification, facial identification, and biometric
interact with human beings. It is obvious that this situation identification. Hence, the advent of speech recognition as
previously thought of as fiction, has been achieved. Hence, one of the techniques that can be used for identification
there are several applications that enhance speech recognition. purposes since humans have peculiar speech characteristics.
This paper aims to study some models of the speech
A speaker can be identified from the information in speech
recognition system, its classification of speech, its significance,
and its application. waves [5]. The diagram (Fig. 1) below shows the process of
human communication and how it can be translated into
Keywords—speech recognition, speed signals, techniques, human–machine interaction.
algorithm.

I. INTRODUCTION

Speech as a medium of communication for human beings


is a natural phenomenon, and it is unique for every
individual. One of the ways that someone can be identified is
through speech. This has been the case since time
immemorial. The invention and widespread use of the
telephone, audio-media, and radio technology has given
further importance to speech communication and speech
processing. To achieve reliable speech recognition is a
difficult task. It needs a combination of many techniques;
however, because of advancements in technology, scientists
have been able to achieve an impressive degree of accuracy
in developing an algorithm for computer-human interaction
[1]. Advances in digital signal processing technology have
Fig. 1 Diagram of Voice Production/Perception Process [6]
led to the use of speech processing in many different
applications, such as automatic banking, speech The motive behind the automatic speech recognition
compression, enhancement, synthesis, and recognition [2]. system (ASR) is to create a system that can mimic the human
voice and give corresponding feedback; in other words, it
Speech recognition is an emerging technology in the can understand human language and translate speech into a
twenty-first century. The application of speech recognition different medium that can be in a form of text for easy
has become part of living, making life more comfortable and interpretation and understanding.
effective. For example, with mobile phones, instead of typing
the name of the person being searched for, their name can be The methodology of this study involves two steps. The
spoken into the mobile phone, and it will automatically call first step comprises analyses of the literature on the research
that person [3]. Speech recognition is also applicable in the into an automatic speech recognition system. The second
form of a text message. When people want to send a text step includes the application and significance of the speech
message, instead of typing the message, the people can easily recognition system.
speak to the mobile phone and their voice will be recognized.
In the year 1984, one of the scientists named “Star Trek to II. LITERATURE REVIEW
George Orwell’s,” came up with the idea that machines can The Concept of Automatic Speech Recognition
recognize the human voice [4]. Speech recognition is the ability of a machine to identify
words and phrases in spoken language, and it has the agility

978-1-7281-6939-2/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: University of Exeter. Downloaded on June 20,2020 at 15:55:27 UTC from IEEE Xplore. Restrictions apply.
to recognize a given voice from a variety of users. Several Speech recognition applications are designed for a
efforts have been made by scientists to design a machine specific reason, such as to identify zero through nine and to
with a graphical interface that can interact with the human be able to differentiate between “yes” and “no” via
voice but appeared futile due to complexity and inadequate telephone means, and once the task is defined, an algorithm
technological tools. However, the first speech recognition is designed to execute the task [4].
system was developed in 1952 by Davis et al. [7]. That
contained a device that recognized single spoken digits. As Models in Automatic Speech Recognition.
technology developed, new ideas and algorithms were There are different models available for speech
introduced. Hence, many applications began to emerge. In recognition systems. However, most speech recognition
the 1990s, much effort was made in the development of systems are categorized base on utterances, vocabulary size,
software tools that allowed many research programs all over and speaker dependency. With regards to utterances, the
the world to produce with innovations allowing machines to classifications are given below:
read the human voice and process it into a machine-readable
format. • Isolated words: In this process, a single utterance is
accepted at a time, with a short break between each
Related work. spoken word. Isolated word recognition comprises
As the use of technology increased, many large two stages: a training stage and a testing stage.
vocabulary systems were developed. A system was made by During the training stage, a dictionary of an acoustic
the Cambridge University team (led by Steve Young), called model is designed for each word that the system
the Hidden Markov Model Tool Kit (HTK) [8]. It was one needs to identify. In the isolated word recognition
of the most embraced software tools for automatic speech system, the response is good for single words but
recognition research. It applies a statistical classification pitiable for multiple words input [15]. A Hindi
technique for speech recognition. The motive is to develop a isolated word speech recognizer system was
facet of the speech generation process for each word, rather developed by Kumar and Aggarwal [16], using HTK
than storing samples of its output. which runs on the Linux platform. It recognizes
isolated words using an acoustic word model.
Deep neural networks, as one of the most current • Continued speech: This is where a natural voice is
methods used in speech recognition, have contributed inputted by the user and is recognized by the
massively towards the development of speech recognition machine. Identification of continuous speech in this
systems. A multitasking system for prediction was proposed method is quite problematic in the sense that, a
by Elloumi et al. [9]. It was derived based on the special method is utilized for execution. In a
convolutional neural network. This was developed after continuous speech recognition system, the speech
contrast was derived from an approach based on studied which is connected is processed rather than the one
features and predefined traits. French language programs having been isolated by pauses.
were used as sample data in this experiment, which included • Connected words: In a connected words model, single
data from the ETAPE project, ESTER 1, ESTER 2, and the utterances are accepted to run together but with a
REPERE evolution campaign [10], [11]. The report of the marginal pause between them. It is similar to the
results gained from this was better than predicted by CNN isolated words system.
with regards to its mean absolute error (MAE). • Spontaneous speech: This is another difficult task to
deal with in the field of automatic speech recognition
Another study on the problem of automatic speech systems. It is characterized by a filled pause,
recognition and how to identify irregularities to develop repetition, false start, etc. In a spontaneous speech
language understanding systems (SLUs) was conducted by system, speech is organized where utterances contain
Simonnet et al. [12]. They recommended a methodology well-formed sentences close to those found in written
designed to improve the set of semantic tags with specific documents. Several studies have focused on their
error tags. detection and correction.

Dominique et al. [13] presented the neural networks as Categorization Based on Vocabulary Size.
the most commonly used product as far as artificial In this classification, the accuracy of the system depends
intelligence is concerned, and they analyzed acoustic and on the complexity and processing requirements. Some
linguistic models of the automatic speech recognition applications are designed to process a few words while
system. This model comprises deep neural networks (DNN) others require a large vocabulary. The various categories are
and the Hidden Markov Model (HMM). briefly discussed below [16]:
• Small-size vocabulary: This contains tens of words.
The speech recognition method can be analyzed into This means that it has the capacity to recognize a
different types, which can be grouped into text-independent limited number of words.
and text-dependent modes. In a text-independent system, • Medium-size vocabulary: This consists of hundreds
speaker models capture characteristics of a given speech, of words. In this unit, a sizeable number of
which show up irrespective of what is being said. In a text- vocabularies are been recognized by the speech
dependent system, however, the recognition of the speaker's recognition system [17].
identity is based on him or her saying one or more specific
phrases, like passwords, card numbers, or PINs [14].

Authorized licensed use limited to: University of Exeter. Downloaded on June 20,2020 at 15:55:27 UTC from IEEE Xplore. Restrictions apply.
• Large-size vocabulary: This comprises thousands of assistance of the pronunciation dictionary. Finally, the
words. In this classification of the speech recognition speech is recognized and comes out as an output.
system, a huge number of words can be recognized.
• Very large-size vocabulary: this encompasses
millions of words, an unknown number.

Categorization Base on Speaker Dependency.


• Speaker-dependent: This is where the speech
recognition system uniquely notices the
characteristics of a single spoken voice. Parallel to
voice recognition. Here, the system has been designed
based on the available dataset and may use templates.
• Speaker-independent: With regards to the speaker-
independent speech recognition system, the “stored
word pattern must be representative of the collection
of speakers expected to use the system” [18]. After
obtaining a sizeable number of sample patterns from
different varieties of speakers, the word template is
formed. Templates then grouped to form a
representative pattern for each word.
• Speaker-adaptive: In this section, a different number
of techniques is defined to improve the original
speaker-independent while the speaker uses the
system. Hence, both speaker-adoptive, as well as
speaker-independent that might have a great influence
on speaker normalization, will be examined [18, 19].

Stages in Automatic Speech Recognition Systems (ASR). Fig. 1 Identification in the speech recognition system [20]
The main algorithm behind the speech recognition
system is the device’s capability to accept and interpret
spoken words or acoustic information. There are different III. APPLICATION OF SPEECH RECOGNITION SYSTEMS
techniques used by speech recognition to identify the human Automatic speech recognition has developed
voice. Its main function is to convert digital audio signals significantly over the past few years due to its technological
released by the sound card to recognize speech. The signals capability in translating speech to text. Speech recognition
then pass through different channels where various uses an algorithm that functions using acoustic and language
mathematical and statistical methods are applied to produce modeling for its execution. Acoustic modeling serves as the
what has been said [20]. These processes can be identified as intercessor between language units of speech and audio
models (Fig. 2): signals, while language modeling corresponds to the sound
• Incoming signal produced with the word sequence to help differentiate
• Signal processing between familiar words.
• Features extraction
• Language models After advancement in technology, the idea of artificial
• Recognition stage intelligence technology was then applied to the design of a
speech comprehension system [21]. Researchers were
The diagram (Fig. 1) shows the various stages the speech motivated to critically think deeply in this area by the
goes through before it comes out as a recognized speech. difficulties with regards to speaking and listening; and
The first stage converts the speech into a digital wave after therefore, they tried to develop a hearing aid for the
receiving the signal through an input device such as a disability. With these characteristics, speech recognition has
microphone, and this process is known as signal processing. been given a wide range of functions in various fields as
It is then moved to the second stage known as feature listed below [22, 23]:
extraction, where computation takes place. During this • Health sector
process, the wavelet transform is used to obtain a frequency • Telecommunication industry
domain, which gives accurate time-frequency localization • Robotic industry
that can be used to track abrupt changes in speech signals. • Educational sector
• Military camps
After the second stage, a grammar environment • Automated identification
(dictionary) is developed for neural learning to take place, • Forensics and law enforcement
which is known as the neural network. This is where the • Home automation
dictionary of words is formed, comprising billions of words,
creating connections between the words through the A summary description of the application of ASR listed
above has been discussed below (Table 1).

Authorized licensed use limited to: University of Exeter. Downloaded on June 20,2020 at 15:55:27 UTC from IEEE Xplore. Restrictions apply.
and its significance. The paper reviewed related research
TABLE I. APPLICATION OF AUTOMATIC SPEECH RECOGNITION into the speech recognition system and automated speech
Sector Application recognition. Through the analysis of the literature, it was
found that a system made by the Cambridge University team
In the healthcare industry, a
Healthcare industry speech recognition system can be led by Steve Young, the HTK, is the most commonly used
used by a radiologist in capturing software tool for automatic speech recognition.
chest examination reports, instead
of hand typing, which takes much REFERENCES
time. [1] S. K. Hasnain, A. Beg, and S. Awan, “Frequency Analysis of Urdu
For medical transcriptionist. Spoken Numbers Using MATLAB and Simulink,” Journal of Science
. & Technology PAF KIET ISSN 1994, vol. 862x, pp. 46–48, Dec.
Helping students with learning 2007.
Education disabilities. [2] D. Mandalia and P. Gareta, “Speaker Recognition Using MFCC and
Helping students in language Vector Quantization Model.”
learning. [3] R. Paul, R. Beniwal, R. Kumar, and R. Saini, “A Review on Speech
Help in improving students’ Recognition Methods”, International Journal on Future Revolution in
reading skills. Computer Science & Communication Engineering, vol. 4, no. 2, pp.
In physical disability, speech 292–298, 2018.
recognition systems can be used [4] R. N. Bracewell. The Fourier Transform and its Applications.
to convert the human voice into McGraw-Hill, 2000.
action. [5] H. Sakoe and S. Chiba, “Dynamic Programming Algorithm
This can be used in financial Quantization for Spoken Word Recognition,” IEEE Trans. Acoustics,
Automated identification institutions to authenticate the Speech and Signal Proc., vol. ASSP-26, no. 1, pp. 43–49, Feb. 1978.
identity of customers to help [6] Towards Data Science “Speech recognition is hard-Part 1.”
combat fraud. https://towardsdatascience.com/speech-recognition-is-hard-part-1-
258e813b6eb7 (Accessed December 20, 2019).
[7] R. Davis, R. Biddulph, and S. Balashek, “Automatic Recognition of
Telecommunication companies Spoken Digit,” J. Acoust. Soc. Am., vol. 24, Nov. 1952, p. 637.
Telecommunications serve their clients through [8] Towards Data Science “Speech recognition is hard”
customer care services. This https://towardsdatascience.com/speech-recognition-is-hard-part-1-
consists of various questions 258e813b6eb7 (Retrieved: November 20, 2019).
raised by the software to establish [9] E. Zied, L. Benjamin, G. Olivier, and B. Laurent, (2019). “Prédiction
the caller’s needs and then directs de performance des systèmes de reconnaissance automatique de la
them to the appropriate operator parole à l’aide de réseaux de neurones convolutifs,” HAL Id: hal-
for assistance. 01976284, TAL, vol. 59, no. 2/2018.
[10] G. Gravier, G. Adda., N. Paulson, M. Carré, A. Giraudel, and O.
Galibert. “The ETAPE corpus for the evaluation of speechbased TV
Military A lot of military applications with
content processing in the French language,” 2012 LREC Eighth
ASR technology embedded in it
International Conference on Language Resources and Evaluation.
are used to support a Human
[11] J. Kahn, O. Galibert, L.Quintard, M. Carré, A. Giraudel, and P. Joly.
Machine Interface (HMI), to
“A presentation of the REPERE challenge, ContentBased Multimedia
lessen the workload on the pilot
Indexing (CBMI),” 2012 10th International Workshop on Content
in advanced aircraft.
based Multimedia Indexing, IEEE, pp. 1–6, 2012.
[12] S. Edwin, G. Sahar, C. Nathalie, E. Yannick, and D. Renato. “ASR
Forensics and law enforcement ASR technology is used in the error management for improving spoken language understanding,”
forensic sector for judicial and 2017 arXiv: 1705.09515v1[cs.CL].
law enforcement purposes, for the [13] F. Dominique, M.Odile, and I. Irina. New Paradigm in Speech
verification of a suspected speech Recognition: Deep Neural Networks, the ContNomina project
sample. supported, French National Research Agency (ANR), 2017.
[14] M. Saundade and P. Kurle, “Speech Recognition using Digital Signal
Processing,” International Journal of Electronics, Communication &
IV. SIGNIFICANCE OF AUTOMATIC SPEECH RECOGNITION Soft Computing Science and Engineering ISSN: 2277-9477, vol. 2,
SYSTEMS no. 6, 2013.
Speech recognition systems are a vital part of technology [15] Developing an Isolated Word Recognition System in MATLAB
https://www.mathworks.com/company/newsletters/articles/developin
in the twenty-first century. Every category of people, g-an-isolated-word-recognition-system-in-matlab.html (Accessed
especially those with disabilities, benefits from this powerful December 15, 2019)
tool. The end-user does not need any special skill or [16] K. Kumar and R. K. Aggarwal, “Hindi Speech Recognition System
qualification to interact with the system since speech is a Using HTK”, International Journal of Computing and Business
Research, ISSN (Online): 2229-6166, vol. 2, no. 2, May 2011.
natural phenomenon. Additionally, using speech is the [17] M. A. Anusuya and S. K. Katti, “Speech Recognition by Machine: A
easiest and fastest form of communication. Automatic speech Review,” International Journal of Computer Science and Information
recognition systems are very flexible in allowing users to Security (IJCSIS), vol. 6, no. 3, pp. 181–205, 2009.
issue a command while doing other activities. Economically, [18] )Pratik K. Kurzekar, Ratnadeep R. Deshmukh, Vishal B. Waghmare,
Pukhraj P. Shrishrimal, "A Comparative Study of Feature Extraction
the application of the ASR system in our daily lives saves Techniques for Speech Recognition System", International Journal of
time and money. Innovative Research in Science, Engineering and Technology, Vol. 3,
V. CONCLUSION No. 12, December 2014, ISSN: 2319-8753, pp.18006-18016
Speech recognition technology is booming across the [19] R. K. Aggarwal and M. Dave “Acoustic Modelling Problem for
Automatic Speech Recognition System: Conventional Methods (Part
globe. Although the technology has been in existence for a I),” International Journal of Speech Technology (2011), vol. 14, pp.
few years now, numerous technological advancements have 297–308.
taken place in this field. In this paper, different speech [20] E. Chandra, “A review on Speech and Speaker Authentication
recognition models were discussed. The paper also System using Voice Signal feature selection and Extraction,” 2009
IEEE International Advance Computing Conference (IACC 2009)
discussed the application of the speech recognition system Patiala, India, March 6–7 2009.

Authorized licensed use limited to: University of Exeter. Downloaded on June 20,2020 at 15:55:27 UTC from IEEE Xplore. Restrictions apply.
[21] J. Vajpai et al., “Industrial Applications of Automatic Speech M.B.M.Engg. College Jodhpur, India, ISSN: 2248-9622, vol. 6, no. 3,
Recognition Systems” Int. Journal of Engineering Research and (Part 1) March 2016, pp. 88–95.
Applications, www.ijera.com ISSN: 2248-9622, vol. 6, no. 2, (Part 6) [23] X.D. Huang, A Study on Speaker-Adaptive Speech Recognition,
February 2016. School of Computer Science Carnegie Mellon University Pittsburgh,
[22] A. Vajpai and A. Bora, “Industrial Applications of Automatic Speech PA 15213. https://www.aclweb.org/anthology/H91-1054.pdf
Recognition Systems”, Department of Electrical Engineering, (retrieved: December 15, 2019).

Authorized licensed use limited to: University of Exeter. Downloaded on June 20,2020 at 15:55:27 UTC from IEEE Xplore. Restrictions apply.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy