0% found this document useful (0 votes)
141 views

Tamil Textual Image Reader

1) The Tamil Textual Image Reader is a mobile app that uses optical character recognition and text-to-speech conversion to scan Tamil text images and read them aloud, helping students study efficiently. 2) It uses React Native to build the mobile interface and Flask/Connexion to build a web API for image-to-text and text-to-speech conversion services. Tesseract OCR and Indic TTS are used for recognition and synthesis. 3) Recurrent neural networks are employed for their ability to consider sequential dependencies important for tasks like language processing. The app aims to support more languages and offline use in the future.

Uploaded by

KogulVimal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views

Tamil Textual Image Reader

1) The Tamil Textual Image Reader is a mobile app that uses optical character recognition and text-to-speech conversion to scan Tamil text images and read them aloud, helping students study efficiently. 2) It uses React Native to build the mobile interface and Flask/Connexion to build a web API for image-to-text and text-to-speech conversion services. Tesseract OCR and Indic TTS are used for recognition and synthesis. 3) Recurrent neural networks are employed for their ability to consider sequential dependencies important for tasks like language processing. The app aims to support more languages and offline use in the future.

Uploaded by

KogulVimal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Tamil Textual Image Reader

Introduction
Since the invention of mobile phones there has been a challenge prevailing to make them
speak. So, I thought of an idea to making them to speak in my native language, Tamil. And for the
sake of learning students I thought to find an alternative for eyestrain when reading lengthy texts and
articles. The resolution was to make a mobile application that would read all the lengthy texts by
voice for students. And the “Tamil Textual Image Reader” has been built.
The “Tamil Textual Image Reader” is a mobile application which can be used to scan a Tamil textual
image, and which read the text by voice. This application includes the features of optical character
reading and text to speech conversion of Tamil language. This application is useful for students who
learn by listening. This application also supports image file uploading and conversion feature.
Literature Review
For the last few decades, textual image processing and Optical Character Recognition (OCR)
have been leading research topic in the field of machine learning. Recognition of machine printed, or
handwritten document is an essential part in applications like intelligent scanning machines, text to
speech converters and automatic language to language translators. “The purpose of document image
analysis is to recognize the text and graphical components in the paper document and to extract the
intended information, as human beings do. Two components of document image analysis are Textual
processing and Graphical processing. Textual processing deals with the text component of the
document image. The graphical processing deals with non-textual line and symbol components that
make up line diagrams, delimiting straight lines between text sections and company logos etc. In the
current context, it is limited only to the textual processing part”.

This system also deals with implementation of text to speech using machine learning. Research is
being done throughout the world to improve the human-computer interface and one of the promising
areas is the text-to-speech (TTS) conversion. “The term text-to-speech refers to the conversion of
input text into spoken utterance. The input text may consist of number of words, sentences,
paragraphs, numbers and even abbreviations. The TTS conversion process should identify the text
without any ambiguity and generate the corresponding sound output with acceptable clarity. This
means that the quality of the output of the TTS engine should be made as close to the natural speech
as possible.”

“In general, TTS conversion can be carried out in three ways. They are Formant based, Parameter
based, and Concatenation based. In this work, the concatenation-based technique has been chosen to
develop a TTS engine. Since preliminary studies have been already carried out by several researches,
this work was focused on Tamil. The method of concatenation includes two phases - namely “offline
phase” and “online phase”. The offline phase includes the basic unit selection, identification of
language rules (phonetic rules and prosodic rules) and creation of the sound database”. The online
phase includes splitting of input text into basic units and converting them into speech after applying
the Tamil language rules. The system takes an arbitrary text file and processes the contents letter by
letter and passes them through the stages of text analysis & parsing (i.e. identification of basic units),
application of language rules and finally concatenation and synthesis to produce the speech output.
Usage of the Application

This mobile application can be used by students who prefers studying by hearing than by reading and
the students who are visually disabled. They can install this application in their mobile and take the
picture of the text and the app will read the text for them by voice then they can take necessary points
and study. There is audio player in the application which has the feature to pause and continue the
hearing. The users can save the audio file and listen to it whenever they want.

This system can be divided in t two parts such as the mobile application and the web hosted system.
When the user submits the captured image or existing image in his storage the image is posted to a
cloud hosted web system. Then the image processing begins.

First, the image is converted into a readable text document then the document is processed to read the
words and converted into an audio file. Then the audio file is responded back to the mobile
application. Then the mobile application plays the audio file in the audio player embed in the
application. If you want to save the audio file and listen to it later, we have a save feature for you.

Used Technologies
Mobile application: I have used React Native Javascript Library to build the mobile application so
that, It can run on devices with Android and Ios operating systems. This library is easy to use and has
the ability to build user friendly mobile applications.

Web System: I have used Flask and Connexion Python libraries to build up my web API in which I
can run the services that converts image to text and then converts text to voice.
The are two services in the Web API.
1. Image to Text Converter Service
2. Text to Voice Converted Service.
Image to Text Converter Service used pyterserract Python library to identify the textual representation
of the graphics in the image and converts to Tamil texts. ‘pyteserract’ python library uses Tesseract
OCR to recognize the characters.
Text to Voice Converted Service uses Tamil TTS library to identify the sound for the words and
concatenates to change in to human understandable voice sentence. Tamil TTS library uses a huge
database with phenomes of the Tamil words and characters. These audio segments are joined together
to provide the Tamil vocal output of the input Tamil text.
Tesseract OCR
Tesseract was developed as a proprietary software by Hewlett Packard Labs. In 2005, it was open
sourced by HP in collaboration with the University of Nevada, Las Vegas. Since 2006 it has been
actively developed by Google and many open source contributors.
“Tesseract acquired maturity with version 3.x when it started supporting many image formats and
gradually added a large number of scripts (languages). Tesseract 3.x is based on traditional computer
vision algorithms. In the past few years, Deep Learning based methods have surpassed traditional
machine learning techniques by a huge margin in terms of accuracy in many areas of Computer
Vision. Handwriting recognition is one of the prominent examples. So, it was just a matter of time
before Tesseract too had a Deep Learning based recognition engine.”

In version 4, Tesseract has implemented a Long Short-Term Memory (LSTM) based recognition
engine. LSTM is a kind of Recurrent Neural Network (RNN).

Indic TTS

This is a project on developing text-to-speech (TTS) synthesis systems for Indian languages,
improving quality of synthesis, as well as small footprint TTS integrated with disability aids and
various other applications. “This is a consortium-based project funded by the Department of
Electronics and Information Technology (Deity), Ministry of Communication and Information
Technology (M CIT), Government of India involving 13 institutions and SMT, IITM being one of
them. The project comprises of Phase I and Phase II. Phase I of the project used Festival-based speech
synthesis for Bengali, Hindi, Tamil, Telugu, Malayalam and Marathi . Phase II of the project
commenced in 2012 employing HTS based statistical speech synthesis for 13 Indian languages.
Neural networks are not used for speech segmentation in the TTS framework for Indian languages
even though they are widely used in speech recognition. In this work, GMMs in HMM-GMM
framework for phoneme segmentation in TTS systems are replaced by DNN and CNN for better
phoneme segmentation. Acoustic models are built by training the neural networks with the GMM-
HMM monophonic alignment (also known as HMM-based phone alignment) as the initial alignment.
The DNN-HMM/CNN-HMM are then trained iteratively to get accurate final phone boundaries.

AI Techniques Used
Recurrent Neural Networks
“This neural network type is a most necessary kind of neural networks which is mostly used
in natural language processing. Commonly in neural networks an input is processed through a number
of layers and an output is produced, with an assumption that two successive inputs are not depending
on each other”.
“We cannot assume in this way in most of the real-life scenarios. As an example, we can take the
prediction of stock exchange at a given time or prediction of the next word in a sentence, in these two
scenarios we have dependency on the previous value. This must be considered. This term “Recurrent”
in this neural network explains that this executes the same action for each element of a sequence in
which the output is depended on the previous execution. In another way, we can say that RNNs have a
“memory” which holds the information of the calculations happened so far. Theoretically, Recurrent
Neural Nets can access information in arbitrarily long series, but actually, they are limited to look
back only few steps. The executed results (information) runs through a loop in recurrent neural
networks. In this cycle, when the net comes to conclusion, it takes into consideration of the current
input and the information it has learnt from the inputs which are already received.”
Future Scope
This application future scope is to add more languages to the system and offline functionality of the
system.

Conclusions
The package “Tamil Text Image Reader” has been tested on different fonts. Attempts have been
made to make the output (recognised voice) very similar to input document visually. The overall
recognition rate is around 94% with the presence of some special characters and numerals. A
hierarchical classification scheme has been followed.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy