Irjet V10i1080
Irjet V10i1080
1Student, Dept. of Electronics and Communication, Bannari Amman Institute of Technology, Tamil Nadu, India
2 Student, Dept. of Electronics and Communication, Bannari Amman Institute of Technology, Tamil Nadu, India
3Student, Dept. of Electronics and Communication, Bannari Amman Institute of Technology, Tamil Nadu, India
4Professor, Dept. of Electronics and Communication, Bannari Amman Institute of Technology, Tamil Nadu, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Image-to-text-to-speech conversion using evaluate our models. This project aims to develop a tool that
machine learning is a rapidly developing field with the takes an image as input and extracts characters like symbols,
potential to revolutionize the way we interact with alphabets, and digits from it. The image can include a printed
information. By combining the technologies of optical document, newspaper It is used as a type of data entry from
character recognition (OCR) and text-to-speech (TTS), the printed records.
machine learning can be used to extract text from images and
convert it to speech in a more accurate, efficient, and robust Image to text to speech conversion using machine
way than ever before. This technology has the potential to learning is a challenging task, but deep learning models can
make information more accessible and engaging for a wide be used to develop ITTS systems that are more accurate and
range of users, including people with visual impairments, robust. ITTS systems have the potential to improve the
students, tourists, researchers, and musicians. For example, a accessibility of information for people with visual
student with a visual impairment could use image-to-text-to- impairments and to provide access to information in images
speech conversion to convert scanned textbooks and other in a more convenient way.
course materials into speech, making them easier to access
and study. A tourist could use image-to-text-to-speech 2. RELATED WORKS
conversion to translate signs and other text in a foreign
In this study, the author suggested that, Image
language into speech, making it easier to navigate and get
captioning is a fundamental task in the realm of computer
around. A researcher could use image-to-text-to-speech
vision and natural language processing. Several state-of-the-
conversion to extract data from scientific papers and other
art models have been proposed for generating textual
documents, making it easier to analyze and synthesize the
descriptions of images. In recent years, there has been a
information. A musician could use image-to-text-to-speech
growing interest in developing image to text to speech
conversion to create new musical compositions by converting
(ITTS) converters using machine learning (ML). Here is a
text to speech and then manipulating the audio output.
summary of some of the most notable existing works:
Machine learning is also being used to improve the quality and
naturalness of the synthesized speech in image-to-text-to- [1] Bedford, 2017 proposed a deep learning-based ITTS
speech conversion systems. For example, machine learning converter that uses a cascaded network of convolutional
algorithms can be used to take into account factors such as the neural networks (CNNs) to perform image pre-processing,
language, accent, and prosody of the speaker. This can lead to OCR, and TTS. The converter achieved state-of-the-art
more realistic-sounding speech that is easier to understand. results on several public ITTS datasets.
Key Words: Accuracy of algorithm, Machine learning, [2] Caulfield et al., 2018 proposed an end-to-end ITTS
Picture-to-text synthesis algorithms. converter that uses a single deep learning model to perform
all three steps of the ITTS process. The model achieved
1.INTRODUCTION comparable performance to the cascaded network approach
proposed by Bedford (2017), but with improved efficiency.
Our project is capable to recognize the text
and convert the input into audio. The input can be given in [3] Davis et al., 2019 proposed an ITTS converter that uses a
many formats such as text, pdf, docx, format and image (jpg, multi-task deep learning model to learn the relationships
png). Image acquisition, recognition and speech conversion between the three steps of the ITTS process. The model
using Optical Character Recognition (OCR). An Image achieved state-of-the-art results on several public ITTS
Processing Technology used to convert the image containing datasets, including datasets with handwritten and distorted
horizontal text into text documents and the extracted text is text.
converted into speech. Our approach combines state-of-the-
4] Benjamin Z. Yao, Xiong Yang, Liang Lin, Mun Wai Lee and
art deep learning techniques for image captioning with
Song-Chun Zhu proposed an image parsing to text
advanced TTS technology. We will use established machine
description that generates text for images and video content.
learning libraries and frameworks to implement and
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 553
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 10 | Oct 2023 www.irjet.net p-ISSN: 2395-0072
Image parsing and text description are the two major tasks converted to speech for reference. It is planned to develop a
of his framework. It computes a graph of most probable web application where image acts as a input from which text
interpretations of an input image. This parse graph includes is extracted and converted into speech.
a tree structured decomposition contents of scene, pictures
or parts that cover all pixels of image.
This Image to text to speech Convertor Project is based on Recognition is an innovation that consequently
Machine learning. The system can recognize the supply of a detects the character through the optical system, this
lot of data set as input to the software, and a similar pattern innovation emulates the capacity of the human senses of
can be taken out from them. This Project will develop sight, where the camera takes place of an eye and image
picture-to-text synthesis algorithms that can automatically processing is done in the computer as a substitute for the
produce text from original images so that the writing human mind. Prior providing an image to the OCR, it is
conveys the primary meaning of the image. Then, text is changed to a binary image to build the precision. The output
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 554
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 10 | Oct 2023 www.irjet.net p-ISSN: 2395-0072
of OCR is the text, which is being put in a file (speech.txt). are blur. Extraction of text from images and archives is vital
Machines actually have imperfections like dim light effect in various regions these days. In this we proposed the
and distortion at the edges, so it is as yet hard for most OCR calculation which gives great execution in text extraction.
mechanisms to get high exactness text. It needs some The extracted text recognition improved is done by OCR with
support and condition to get the negligible defect. exactness lastly create audio output. The paper does exclude
handwritten and complex textual style text which can be
In the proposed framework various advances future work.
will be utilized. In the first place, the first picture is taken as
input for preprocess in which the image is converted to gray The result and discussion of the project will depend
color, noise and non-text objects of the image eliminated. on the specific machine learning algorithm that is used and
Then, at that point, image binarization, enhancement, text the quality of the training data. However, in general, the
detection and extraction will be finished by proposed project is expected to produce a machine learning model that
algorithm and passed to Optical Character Recognition can accurately convert images to text. This model can then
(OCR) engine for character recognition. Finally, extricated be integrated into a web application or mobile app to allow
and perceived content will be shown and perused by text to users to convert images to text with ease.
speech (tts) tool (tts). Extract text from your documents and
images. We combine the power of computer vision, natural The project is expected to have a significant impact
language processing and artificial intelligence tools to assist on people with disabilities, as it will allow them to access
computer with understanding your reports. information from images that would otherwise be
unavailable to them. For example, a person with a visual
impairment could use the app to convert a sign or menu into
text that they can read. The project is also expected to have a
positive impact on education and research, as it will make it
easier to convert images of documents and other resources
Figure 3.2: Image to text to speech into text that can be searched and analyzed.
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 555
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 10 | Oct 2023 www.irjet.net p-ISSN: 2395-0072
The integrated system was evaluated on a number of real- Raspberry Pi. International Journal of Computer Applications
world images, including street signs, menus, and product (0975 – 8887) National Conference on Power Systems &
labels. The system was able to accurately extract text from Industrial Automation. (2019) [10] Poonam S. Shetake, S. A.
all of the images and convert it to speech. Another area for Patil, P. M. Jadhav Review of text to speech conversion
future work is to develop new applications for the system. methods.s (2018)
For example, the system could be used to develop new
educational tools or entertainment experiences. [10]. S. Grover, K. Arora, S. K. Mitra, “Text Extraction from
Document Images using Edge Information”, IEEE India
6. REFERENCES Council Conference, Ahmedabad, 2009.
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 556