0% found this document useful (0 votes)
14 views14 pages

Updated Research Paper

The document presents a research initiative aimed at developing a real-time system for converting sign language to text and text to speech using machine learning techniques. This system seeks to enhance communication accessibility for the deaf and hard-of-hearing communities by utilizing advanced methods in natural language processing and computer vision. The paper includes a literature review, system architecture, and methodologies for implementing sign language recognition and text-to-speech synthesis.

Uploaded by

Pratham Dubey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views14 pages

Updated Research Paper

The document presents a research initiative aimed at developing a real-time system for converting sign language to text and text to speech using machine learning techniques. This system seeks to enhance communication accessibility for the deaf and hard-of-hearing communities by utilizing advanced methods in natural language processing and computer vision. The paper includes a literature review, system architecture, and methodologies for implementing sign language recognition and text-to-speech synthesis.

Uploaded by

Pratham Dubey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Real-Time Conversion for Sign-to-Text and Text-

to-Speech Communication using Machine


Learning

Dr. Rachna Jain


Department of CSE
JSS Academy of Technical Education
Noida, India
rachnajain@jssaten.ac.in

Harshit Garg Pratham Dubey Shaurya Gupta


Department of CSE Department of CSE Department of CSE
JSS Academy of Technical JSS Academy of Technical JSS Academy of Technical
Education Education Education
Noida, India Noida, India Noida, India
harshitgarg2309@gmail.com prathamd67@gmail.com guptashaurya1659@gmail.co
m

Abstract

"Real-Time Conversion for Sign to Text and Text to Using Machine Learning" aims to use
machine learning to create a system that can effortlessly translate sign language gestures into
text and convert text into natural-sounding speech in real time. This groundbreaking
development seeks to address the long-standing issue of communication accessibility for the
deaf and hard-of-hearing communities. By harnessing cutting-edge machine learning
techniques that integrate natural language processing and computer vision, this initiative aims
to break down the barriers and provide a two-way communication channel. This channel will
not only interpret sign language gestures but also transmit information through synthesized
speech and written text. To lay the foundation for this study, a comprehensive review of the
literature is conducted, exploring the progression of text generation, sign language
recognition, and text-to-speech synthesis over time. Building upon this knowledge, the
subsequent sections delve into the system architecture and techniques employed for text-to-
speech synthesis and sign language recognition.

Keywords Artificial Neural Network (ANN) · Convolutional Neural Network (CNN) ·


TensorFlow · Keras · OpenCV

1 Introduction

Effective communication forms the foundation of human relationships, creating connection


and understanding. However, individuals who rely on sign language face significant barriers
to communication, leading to ongoing challenges in their daily interactions [1]. Thankfully,
the integration of machine learning into instant messaging has the potential to bridge this gap,
enabling conversational communication between sign language users and others.

This research paper introduces a groundbreaking technology called "instant character-to-text


and speech conversion through machine learning." Our primary goal is the development of a
system that can effortlessly translate text into natural speech while also instantaneously
interpreting and translating hand gestures into text. By addressing persistent communication
barriers, we aim to facilitate effective interactions between sign language users and the
general public [2].

The significance of this study lies in its capacity to support the deaf community by providing
numerous avenues for easy communication. Leveraging the latest machine learning
techniques, such as language processing, computer vision, and blueprints, enables the
creation of gestures and draft contact letters.

Additionally, the incorporation of text-to-speech synthesis ensures bidirectional


communication, promoting wide-ranging and inclusive discourse among all participants
involved [3]. We strive to reduce the communication gap between individuals with and
without knowledge of sign language, ensuring productive conversations and empowering
both parties. Sign language translation is a swiftly growing field of study, offering individuals
with hearing loss the most natural means of communication possible.

This study delves into the creation of powerful real-time language recognition models,
exploring deeper into computer vision and deep learning [4]. Simultaneously, language
processing tools enable the transformation of written language into spoken language,
facilitating communication between speakers of the same language.

By thoroughly reviewing existing literature, this article presents a comprehensive


framework for understanding language change, encompassing text generation, language
recognition, and text-to-speech synthesis. The subsequent section covers the system
architecture, language recognition and text-to-speech methods, implementation ideas, and
performance assessment [5]. Through the enhancement of machine learning capabilities, our
research aims to improve communication and foster increased community involvement in
collaborative efforts.
Fig. 1 Sign Language to Text Conversion Application
2 Related Work

Feature extraction and representation [8] involve transforming an image into a three-
dimensional matrix. This matrix has dimensions equal to the image's height and width, with a
depth value assigned to each pixel. In the case of RGB images, there are three depth values,
while grayscale images have just one. These pixel values play a crucial role in helping
Convolutional Neural Networks (CNNs) extract useful features.

An Artificial Neural Network (ANN) is a network of neurons that imitates the structure of
the human brain. Information is transmitted from one neuron to another through connections.
[1] The first layer of neurons receives inputs, processes them, and passes them on to the
hidden layers. After going through several levels of the hidden layers, the information
reaches the final output layer.

Fig. 2 Artificial Neural Network Architecture

To work effectively, neural networks require training. Different learning strategies exist,
1. Unmonitored education
2. Guidance-Based Education
3. Applied Reinforcement

CNN Convolutional Neural Network: CNNs, unlike traditional neural networks,


arrange their neurons in three dimensions: width, height, and depth.[18] Unlike fully
connected layers, where each neuron is linked to every other neuron in the layer, CNN layers
connect only to a small portion of the preceding layer, known as the window size. This
arrangement allows CNNs to efficiently process images. At the end of the CNN architecture,
the entire image is reduced to a single vector of class scores, usually determined by the
number of classes.
Fig. 3 Convolutional Neural Network

TensorFlow: TensorFlow is a comprehensive open-source machine learning platform


that supports the advancement of the field. Researchers can explore its extensive ecosystem
of tools, libraries, and community resources. For developers, TensorFlow simplifies the
creation and integration of machine learning-powered applications.[28]

By leveraging the high-level Keras API, model development and training become more
accessible.[33] TensorFlow also offers eager execution, allowing for instantaneous iteration
and intuitive debugging. Additionally, the Distribution Strategy API helps distribute training
across different hardware configurations for large-scale machine learning tasks without
altering the model definition.

Keras, a Python library, serves as a wrapper around TensorFlow and facilitates the rapid
building and testing of neural networks with minimal code.[28] It assists with various data
types, such as text and images, and provides implementations of commonly used neural
network components like layers, objectives, activation functions, and optimizers.

OpenCV: OpenCV, also known as Open-Source Computer Vision, is a programming


function library used for real-time computer vision.[19]

It offers functionalities like image processing, video recording, and feature analysis,
including object and face recognition. While bindings for Python, Java, and
MATLAB/OCTAVE exist, the primary interface of OpenCV is written in C++.

3 Motivation Behind the Work

 A language barrier arises when it comes to communication between normal people


and those who are deaf and mute (D&M), as sign language differs from regular text.
Unlike verbal communication, D&M individuals rely on visual-based interactions
[11].
 To address this issue, a common interface has been developed that translates sign
language into text. This allows non-D&M individuals to easily understand the
gestures [9]. Consequently, efforts have been made to create a vision-based
interface system, which would enable D&M individuals to communicate without
the need to understand each other's spoken language.
 The ultimate objective is to establish a user-friendly human-computer interface
(HCI) that can comprehend human sign language. This would facilitate smooth and
effective communication between D&M individuals and computers.
 Sign languages span across the globe, with American Sign Language (ASL) [15],
French Sign Language, British Sign Language (BSL), Indian Sign Language,
Japanese Sign Language, and various other languages being just a few examples.
Extensive research has been conducted to develop sign language recognition
systems that cater to these diverse languages.

4 Literature Survey

Paper Author Advantages Disadvantages

Sign Language B. Suneetha, Enables two-way Limited to the signs included


Translator for J. Mrudula, et communication between in the dataset. - Relies on
Deaf and Dumb al. deaf-dumb and ordinary webcam and microphone input,
Using Machine individuals. may have environmental
Learning [1] dependencies.
Uses machine learning
models for sign language to Requires a visual sign word
speech and speech to sign library for accurate speech to
language conversion. sign language conversion.

Real-time sign language-to- Relies on a camera for data


American Sign text translation. - Innovative source, may have
Language Aditi Bailur,
use of CNN with Inception environmental dependencies.
Recognition and its Yesha Limbachia, V3 for accurate ASL gesture
Conversion from et al. recognition. Specific to American Sign
Text to Speech [3] Language, may not
Converts ASL words into generalize well to other sign
text and further into languages.
audible speech.
Sign Language Ameer Khan B, Real-time method for Challenges in achieving
Detection and Chandru M, et fingerspelling based ASL. - high accuracy in noisy or
Conversion to al. Utilizes neural networks for challenging environments. -
Text and hand gesture recognition. - Potential latency in real-
Speech Achieves a high accuracy of time recognition.
Conversion [5] 98.00% for the alphabet

A Machine Rahul Solleti - Addresses communication - Relies on the availability


Learning challenges for individuals and accuracy of regional
Framework and with hearing disabilities. - sign language datasets. -
Method to Utilizes advanced Implementation may
Translate technologies like AR require extensive
Speech to Real- glasses for real-time sign collaboration with the Deaf
Time Sign language translation. community for dataset
Language for curation.
AR Glasses [8]
Sign Language Prof. M.T. Addresses communication Limited to American Sign
to Speech Dangat, Rudra challenges for deaf Language (ASL). -
Conversion Chandgude, et individuals. - Uses flex Accuracy may vary based
[10] al. sensors on a glove for sign on individual gestures.
language recognition.
Sign Language Shubham High accuracy (98.7%) Assumes a smooth
to Text Thakar, Samveg achieved with transfer background in images;
Conversion in Shah, et al. learning compared to CNN Future scope includes
Real Time [13] (94%). diversifying the model for
different sign languages
and improving robustness
to diverse image
backgrounds

Sign Language Shreyas Affordable and efficient Limited to 11 ASL


to Text and Viswanathan, solution using Raspberry Pi. alphabets due to processing
Speech Saurabh Pandey, Hand gesture recognition power constraints.
Conversion et al. for American Sign Challenges in diverse
Using CNN Language lighting conditions.
[15]

Sign Language Mary Jane C. Accurate recognition of Limited to fingerspelling in


Fingerspelling Samonte, Carl ASL fingerspelling ASL
Recognition Jose M.
Using Depth Guingab, et al.
Information and
Deep Belief
Networks [18]

Sign Language Akshatha Rani Bridges communication gap Achieves 74% accuracy -
to Text-Speech K, Dr. N between deaf-mute Recognizes almost all
Translator Manjanaik individuals and others - letters in ASL - Addresses
Using Machine Utilizes efficient hand the challenge of
Learning [21] tracking with media pipe - communication for deaf
Converts recognized signs and mute individuals
to speech, aiding blind
individuals.

Sign Language S. Kumara The proposed system Dependency on hardware. -


Recognition Krishnan, V. utilizes a virtual reality Limitation to alphabets. -
and Response Prasanna headset for immersive sign Cost implications with
via Virtual Venkatesan, et language learning. It increased sensors.
Reality [23] al. employs Leap Motion
controller features for real-
time gesture recognition.
5 Hand Gesture Techniques

In recent years, there has been extensive research on hand gesture recognition. Through a
literature review, we have identified the fundamental stages involved in this process.

Firstly, let's discuss data collection. One method involves using sensory apparatus, such as
electromechanical devices, to provide precise hand configuration and position.[5] However,
this approach is not user-friendly and can be quite costly.

Alternatively, a vision-based approach utilizes the computer webcam as an input device to


capture hand and/or finger information. This method eliminates the need for additional
hardware, thus reducing costs and enabling seamless communication between people and
computers. The main challenges in vision-based hand detection include accounting for the
various appearances of the human hand due to different movements, skin tones, and camera
viewpoints.[7]

Next, we move on to data pre-processing and feature extraction for the vision-based
approach. A combination of background subtraction and threshold-based color detection is
used for hand detection.[1] Additionally, the AdaBoost face detector helps differentiate
between hands and faces, which have similar skin tones. Gaussian blur, also known as
Gaussian smoothing, is applied to extract the required training image. By utilizing open
computer vision (OpenCV), we can easily apply this filter. Using instrumented gloves further
aids in obtaining accurate and concise data, while reducing computation time for pre-
processing.[8]

To improve the segmentation of images, color segmentation techniques have been explored.
However, the reliance on lighting conditions and the similarity between certain gestures pose
challenges. To address these issues, we decided to keep the hand's background as a stable
single color. This eliminates the need for segmentation based on skin color and enhances
accuracy for a large number of symbols.[22]

Now, let's talk about gesture classification. Hidden Markov Models (HMM) [15] are utilized
to categorize gestures, specifically addressing their dynamic components. By tracking skin-
color blobs corresponding to the hand, gestures can be extracted from a sequence of video
images. Differentiating between symbolic and deictic classes of gestures is the primary aim.
Statistical objects called blobs are employed in identifying homogeneous regions by
gathering pixels with skin tones.[28] For static hand gesture recognition, the Naïve Bayes
Classifier is employed, which categorizes gestures based on geometric-based invariants
extracted from segmented image data.[33] This method is independent of skin tone and
captures gestures in every frame of the film. Additionally, the K nearest neighbour algorithm,
assisted by the distance weighting algorithm (KNNDW), is utilized to classify gestures and
provide data for a locally weighted Naïve Bayes classifier.[29]

Researchers from the Institute of Automation Technology, National Taipei University of


Technology, have developed a skin model to extract hands from an image. They apply a
binary threshold to the entire image and calibrate it around the principal axis. This calibrated
image is then fed into a convolutional neural network model, which learns and predicts the
results.[37] With their trained model, they achieved an impressive accuracy of approximately
95% for seven hand gestures.
In conclusion, hand gesture recognition has undergone significant advancements in recent
years. Through data collection, pre-processing, and classification, researchers have achieved
remarkable results in accurately interpreting hand gestures. These developments have the
potential to revolutionize human-computer interaction and open up new possibilities for
seamless communication.

6 Methodology

The method used by our system is based on vision. In this approach, there is no need for
artificial devices to aid in interaction, as all signs can be read using hand gestures.

Data Set Creation:

[18] In our quest to find ready-made datasets for the project, we scoured multiple sources but
couldn't find any that met our requirements in terms of raw image formats. We did manage to
locate RGB value datasets, though. Given this situation, we made the decision to create our
own data set. Here are the steps we followed: Utilizing the Open Computer Vision (OpenCV)
library, we captured around 800 pictures of each symbol in American Sign Language (ASL)
for training purposes. Additionally, we took approximately 200 pictures of each symbol for
testing purposes.

After capturing the image, we applied a Gaussian blur filter to extract various features. The
image, post-Gaussian blur, had the following appearance:

Fig.4 Gaussian Blur Image


Gesture Classification:

To predict the final symbol made by the user, our method utilizes two layers of algorithms.

Fig. 5 Gesture Classification

Algorithm Layer 1:

 We apply the Gaussian Blur filter and threshold to the image obtained from
OpenCV, in order to extract features.
 The processed image is then fed into the CNN model for prediction. If a letter is
identified across more than 50 frames [18], it is printed and used to form a word.
 The blank symbol represents the space between words.

Algorithm Layer 2:

 We identify different sets of symbols that yield similar detection results.


 For each set, we use classifiers specifically designed to classify between those sets.

Implementing Finger Spelling Sentence Formation:

 When a detected letter count exceeds a predefined value, which is not within a
threshold distance from any other letter, we print the letter and append it to the
current string. In our code, we set the value at 50 and the difference threshold at 20.
[22]
 [11] In case an incorrect letter is predicted, we discard the current dictionary
containing the number of detections of the current symbol.
 If the current buffer is empty, no spaces are detected. However, if the count of the
blank (plain background) exceeds a certain value, it appends the current to the
sentence below and predicts the end of the word by printing a space.

AutoCorrect Feature:
For every incorrectly input word, we utilize the Python library Hunspell_suggest to suggest
suitable alternatives. This allows us to present the user with a list of words matching the
current word, from which they can select a replacement to add to the sentence.[13] This not
only reduces spelling errors but also aids in predicting complex words.[2]

Training and Testing:

To minimize unnecessary noise, we apply a Gaussian blur to our grayscale input images,
which are originally in RGB format. After resizing the photos to 128 x 128 pixels, we use
adaptive thresholding to separate our hand from the background.

[17] Once the input images have been pre-processed, we perform all the necessary operations
on our model and feed it into the training and testing phases. The prediction layer makes an
informed guess regarding which class the image belongs to.

To ensure that the sum of values in each class adds up to 1, the output is normalized between
0 and 1 using the SoftMax function.

The output from the prediction layer may slightly deviate from the actual value. To improve
accuracy, the network is trained using labelled data. One performance metric used in the
classification process is cross-entropy, a continuous function that equals zero when it matches
the labeled value, but increases for values that differ from the labelled value.[25]

The aim is to minimize cross-entropy as much as possible. This is achieved by modifying the
neural network weights in the network layer. TensorFlow provides an integrated function for
computing cross-entropy. Once the cross-entropy function has been determined, we use
gradient descent, specifically the Adam Optimizer, to optimize it.[11]

7 Conclusion

A cognitive system that will be especially helpful for the deaf and mute community could
potentially be developed by, like, totally implementing the system with an image processing-
based sign language translator. Installing more words from, like, a wider range of signers into
the dataset would be awesome. It would help create a neural network-based system that is
way more dependable and stuff.
After I've implemented the two, uh, layers of the algorithm, the confirmed, you know, and
predicted symbols are more likely, like, to occur together. So, as a result, we accurately
identify, like, almost every single symbol, assuming that it is, like, correctly displayed, you
know what I mean? In this case, there is, like, no presence of background noise or something,
and the lighting conditions, oh my gosh, are like, way more than satisfactory!
As machine literacy and artificial intelligence continue to advance, we can anticipate to see
more sophisticated and accurate speech- to- textbook and symbol- to- textbook conversion
software in the future, which is nothing short of amazing. That is it. This technology literally
allows people with hear and speech impairments to communicate more effectively with
others. It also helps bridge the communication gap between people who use sign language
and those who don't. The use of voice- to- textbook and hand- to- textbook technologies will
also have a huge impact on the education sector. This allows scholars with hail and speech
disabilities to more laboriously share in classroom conversations and lectures. It also helps
preceptors communicate more effectively with scholars. Still, it's critical that this technology
is accessible to everyone, anyhow of their socio- profitable status. In summary, the future of
Speech- to- Text and subscribe- to- Text technology is veritably bright, and we can anticipate
indeed more innovative and sophisticated results in the future.

References

[1] Machine Learning-Based Sign Language Interpreter for the Deaf and Dumb, ISSN: 0970-2555
52nd June Issue, June 2023
[2] Recognition of American Sign Language and its Translation from Text to Speech, Volume
11, Issue IX, September 20, 2023
[3] Recognition of Sign Languages and Their Translation to Text and Speech, Volume: 07,
Issue: 10 | October – 2023

[4] A Framework for Machine Learning and Technique for Converting Speech to Instantaneous
Sign Language for AR Glasses, Vol. 03, Issue 10, October 2023
[5] Sign Language to Speech Conversion, October 20, 2023, Volume 11, Issue X

[6] Real-Time Translation from Sign Language to Text via Transfer Learning, December
2022
[7] Translation from Sign Language to Text and Speech using CNN, Volume:03/Issue:05/May-
2021

[8] International Conference on Industrial Engineering and Operations Management Proceedings:


Sign Language Fingerspelling Recognition Using Depth Information and Deep Belief
Networks. Turkey's Istanbul, March 7–10, 2022
[9] Machine Learning-Based Sign Language to Text-Speech Translator, Volume 09, No. 7, July
2021
[10] A. Tayade and A. Haldera. Vol (2) Issue (5), pp. 9–17, 2021; Real-time Vernacular Sign
Language Recognition using MediaPipe and Machine Learning.
[11] Lalitha, A. Thodupunoori, A. Muppidi. Real Time Sign Language Detection for the Dumb and
Deaf, August 06, 2022, Volume 11, pp. 153-157.
[12] Prashant G. Ahire et al. "Two-way communication between the able-bodied and the
hearing-impaired." IEEE, 2015, pp. 641-644, 2015 International Conference on
Computer Communication Control and Automation (ICCUBEA).
[13] D. Agarwal (2018). Sentiment analysis: Insights into techniques, applications, and
challenges. International Journal of Computer Sciences and Engineering, 6(5), 697-703.
DOI: 10.26438/ijcse/v6i5.697703
[14] D. Agarwal, V. Balio, A. Agarwal, K. Pozwal, M. Gupta, A. Gupta (2021). Analysis of
sentiment in tweets using term frequency based supervised machine learning
techniques.
[15] D. Aggarwal, K. Banerjee, R. Jain, S. Agrawal, S. Mittal and V. Bhatt, "An Insight into
Android Applications for Safety of Women: Techniques and Applications," 2022 IEEE
Delhi Section Conference (DELCON), 2022, pp. 1-6.
[16] Sign Language Recognition and Response via Virtual Reality, Volume 5, Issue 2,
March-April 2023.
[17] Furkan, Ms. N.Sengar, Real-Time Sign Language Recognition
System For Deaf And Dumb People, volume 9, June 2021, pp. 390- 394.
[18] KoSign Sign Language Translation Project: Introducing The NIASL2021 Dataset, Language
Resources and Evaluation Conference (LREC 2022), Marseille, 20-25 June 2022
[19] Sign language recognition system for communicating to people with disabilities,
Volume 216, 2023
[20] J.Kaur,C.R Krishna. An Efficient Indian Sign Language Recognition System using Sift
Descriptor Volume-8 Issue-6, pp. 1456-1461 August, 2019.
[21] J.Kim and P.O’Neill-Brown. Improving American Sign Language Recognition with
Synthetic Data, volume 1, pp-151-161, August, 2019.
[22] K.Y Lum, Y.H Goh, Y.B Lee. American Sign Language Recognition Based on
MobileNetV2 2020, Vol. 5, No. 6, pp. 481-488
Vol. 5.
[23] Kumari, Sonal, and S.K. Mitra. "Human action recognition using DFT." Computer
Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2011 Third
National Conference on. IEEE, pp. 239-242, October 15,2022.
[24] L.K.S. Tolentino and R.O.S Juan. Static Sign Language Recognition Using Deep
Learning ,pp. 821-827, December 2019.
[25] Li, Dongxu and Rodriguez, Cristian and Yu, Xin and Li, Hongdong. Word-level.Deep
Sign Language Recognition from Video: A New Large-scale Dataset and Methods
Comparison, 2020, pp. 1459-1469.
[26] Machine translation from text to sign language: a systematic review, 03 July 2021
[27] M. ali, A. H., Abbas, H. H., & Shahadi, H. I. (2022). Real-time sign language
recognition system. International Journal of Health Sciences, 6(S4), pp. 10384– 10407,
27 July 2022.
[28] Ming Zhong, Pengfei Liu, Yiran Chen, Danqing Wang, Xipeng Qiu, and Xuanjing
Huang. 2020. Extractive Summarization as Text Matching. arXiv preprint
arXiv:2004.08795 (2020).
[29] N. Inamdar1, Z.Inamdar. A Survey Paper on Sign Language Recognition Vol 1, issue 4,
pp. 1696-1699, April 2022.
[30] R. Nagar, D. Aggarwal, U. R. Saxena and V.Bali, “Early Prediction and Diagnosis for
Cancer Based on Clinical and Non-Clinical Parameters: A Review”, International
Journal of Grid and Distributed Computing, vol. 13, no. 1, (2020), pp. 548-557.
[31] R.Patil, V.Patil and A.Bahuguna. Indian Sign Language Recognition using
Convolutional Neural Network 2021.
[32] R.A. Kadhim, M.Khamees. A Real-Time American Sign Language Recognition System
using Convolutional Neural Network for Real Datasets, Volume 9, Issue 3, Pages 937-
943, ISSN 2217-8309, DOI: 10.18421/TEM93-14, August 2020.
[33] R.Nagar, D.Aggarwal, Urvashi Rahul Saxena, V.Bali. (2020). Cancer Prediction Using
Machine Learning Techniques Based on Clinical & Non-Clinical Parameters.
International Journal of Advanced Science and Technology, 29(04), 8281 -8293.
[34] Yang Liu and Mirella Lapata. 2019. Text summary of pre-trained coders. In accordance
of the 2019 Empirical Methods Conference in Natural Language Processing.
[35] Zhu, C. et al. (2021). Recent advances in text-to-speech synthesis: From concatenated
approaches to parametric approaches. IEEE Signal Processing Magazine, 38(3), 51-66.
[36] "Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-
to-Speech," by Kim J., Jong, & Son J., "International Conference on Machine Learning,
PMLR, 2021
[37] Hayashi, Inaguma, Ozaki, Yamamoto, R. Takeda, Aizawa ESPnet-TTS Unified,
reproducible, and integratable open-source end-to-end text-to-speech toolkit ASRU
2021
[38] S.S.Kumar and A.Asha. A Review on Indian Sign Language Recognition, pp. 3147-
3159, IJSRR, 8(2) June., 2019.
[39] Donahue, J., et al., End-to-end adversarial text-tospeech. arXiv preprint
arXiv:2006.03575, 2020.
[40] Biswas N, Uddin KM, Rikta ST, Dey SK. A comparative analysis of machine learning
classifiers for stroke prediction: a predictive analysis. Treatment analysis. November 1,
2022; 2:100116.
[41] Tyagi, S., Bonafonte, A., Lorenzo-Trueba, J. and Latorre, J., 2021. Proteno: Text
normalization with limited data for rapid deployment in text-to-speech systems. arXiv
preprint arXiv:2104. 07777..
[42] Ro, J.H., Stahlberg, F., Wu, K. and Kumar, S., 2022. Transform-based Models of Text
Normalization for Speech Applications. arXiv eeltrikk arXiv:2202.00153.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy