Blind Aid Report
Blind Aid Report
BLIND - AID
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
by
Aushapur (V), Ghatkesar (M), Medchal(dist.).
2019-2020
1
CERTIFICATE
This is to certify that the technical seminar report titled “BLIND AID” is being
submitted by K. Sri Lakshmi Naga Sahithi (16P61A0575) in B. Tech IV-I
semester Computer Science & Engineering is a record of bonafide work carried out
by them. The results embodied in this report have not been submitted to any other
University for the award of any degree.
ACKNOWLEDGEMENT
Self-confidence, hard work, commitment and planning are essential to carry-out any task.
Possessing these qualities is sheer waste, if an opportunity does not exist. So, we
whole-heartedly thank Mr. Dr.G. AMARENDER RAO, Principal, and Dr.K.Sreenivasa
Rao, Head of the Department, Computer Science and Engineering for their encouragement
and support.
We thank our seminar in-charge, seminar in-charge name for guiding us in completing
our seminar successfully.
We would also like to express our sincere thanks to all the staff of Computer Science and
Engineering, VBIT, for their kind cooperation and timely help during the course of our
seminar. Finally, we would like to thank our parents and friends who have always stood by
us whenever we were in need of them.
3
CONTENTS
ABSTRACT
CHAPTER 1
INTRODUCTION
1.1 Problem Statement
1.2. Existing system
1.3 Proposed system
CHAPTER 2
WORKFLOW PROCESS
CHAPTER 3
Modules
CHAPTER 4
Applications
CHAPTER 5
Conclusion
Bibliography
4
ABSTRACT
“Boundary – an often imaginary line that marks the edge or limit of something ”. Blindness is
one of the largest boundaries which can be “drawn ” between people and the modern world. It is
the sight, which conveys more information than any other of human senses. According to
International Council of Ophthalmology, there are 45 million blind people in the world and 135
million more with significant loss of vision. Unfortunately, it is currently impossible to make a
system which will make the blind person see. However, it is possible to design the one which
will read a printed text for them. Therefore, we undertook the development of Blind Aid – a
personal, portable text-reading system. The system comprises a video camera (mounted inside
sunglasses), a processing device, a text-to-speech converter and an earphone. The process of
reading includes extracting text from the video stream and synthesizing it into a human-like
speech, which can be heard in an earphone. Our system is designed to read printed texts
(documents, books, magazines, newspapers, posters, information signs, etc.). It is able to
perform the following secondary tasks: saving the extracted text into a memory to play it back
later,
5
CHAPTER I
INTRODUCTION
Accessing printed text in a mobile context is a major challenge for the blind. The scope of
this project is to provide technical solution and to assist the visually impaired people to
access various text resources and enhance their knowledge. This project deals with a device
that assists blind people with reading printed text in real time. In this project, the camera
module is used to capture the real time image of the product. Which is given to the main
module. The main module is of raspberry pi which is on its own a mini-computer, which
processes the image captured by the camera. Raspberry pi module, which contains the image
processing code loaded, optical character recognition technique, is used to process the image.
The image is processed internally in the raspberry pi hardware to separate the text from the
captured image by using OPENCV (open source computer vision) library. The desired letters
in the label is identified by using Tesseract OCR (optical character recognition). When the
program is executed, this system captures the image placed in front of the web camera which
is connected to Raspberry pi through USB. After that the captured image undergoes OCR
Technology. OCR is the identification of printed characters using computer software. It
converts images of typed, handwritten or printed text into machine encoded text from
scanned document or from subtitle text superimposed on an image. It also allows the
conversion of scanned images of printed text or symbols into text or information that can be
understood or edited using a computer program. In our system for OCR technology we are
using Tesseract library. Camera acts as main vision in detecting the image of the paper then
image is processed internally and separates texted region from image by using open CV
library and finally identifies the text and identified text is pronounced through voice. The
Raspberry Pi is a small, barebones computer developed by the Raspberry Pi Foundation. the
6
small size makes for an easy-to-hide computer that sips power and can be mounted behind
the display with an appropriate case. Raspberry Pi is meant to be used as a final product and
operate as a traditional desktop computer. Raspberry Pi computer is designed around the idea
of producing a computer that is “capable enough” as cheaply as possible. Raspberry Pi is a
low cost, credit card sized computer that plugs computer monitor or TV and uses standard
keyboard and mouse that uses python programming. The Raspberry Pi 3 is the third
generation Raspberry Pi. It replaced the Raspberry Pi 2 Model B in February 2016.
7
Human communication today is mainly via speech and text. To access information in a text,
a person needs to have vision. However those who are deprived of vision can gather
information using their hearing capability. Reading is very important in today‟s world.
Blind people are an integral part of our society. However, their disabilities have forced them
to be dependent on others for assistance for daily life activities such as shopping, reading
sign post etc. This has also made them to have lesser access to computers, internet than the
people with clear vision. Consequently, they have not been able to improve on their own
knowledge, and have significant influence and impact on the society. Today in the world
there are more than 30 crore people who are visually impaired, out of which more than
4crore people are blind. According to the National Census of India there are around 2.2 crore
disabled people in India, out of which more than 1.5 crore are blind. This number tells us
that the numbers of blind people are more than other disabled people in India. And this
number is increasing rapidly since ages. Blind people are unable to perform visual tasks. For
instance, text reading requires the use of a braille reading system or a digital speech
synthesizer. The majority of published printed works does not include braille or audio
versions, and digital versions are still a minority. On the other hand, blind people are not
able to read the simple warnings in walls or signals that surround us. Thus, the development
of a portable device that can perform the image to speech conversion, whether it‟s has a
great potential and utility. Some blind students use guide dogs that are specifically trained
and usually well disciplined. Most of the time the guide dog lie quietly under or beside the
table or desk. The greatest disruption a faculty member might expect may be an occasional
yawn, stretch, or low moan at the sound of a siren. As tempting as it might be to pet a guide
dog, it is important to remember that the dog is responsible for guiding its owner and should
not be distracted from the duty while in harness.
The existing systems for blind are partially conventional but not wholesome. The most
commonly used system is Braille system.Braille is a system of raised dots that can be read
with the fingers by people who are blind or who have low vision. People who are not
visually impaired ordinarily read braille with their eyes. Braille is not a language. Rather, it
is a code by which many languages such as English, Spanish,Arabic, Chinese, and dozens of
others may be written and read. Braille isused by thousands of people all over the world in
their native languages, and provides a means of literacy for all. The specific code used in the
United States has been English Braille, American Edition but as of 2016 the main code for
reading material is Unified English Braille, a code used in seven other English speaking
countries. Braille symbols are formed within units of space known as braille cells. A full
braille cell consists of six raised dots arranged in two parallel rows each having three dots.
The dot positions are identified by numbers from one through six. Sixty-four combinations
are possible using one or more of these six dots. A single cell can be used to represent an
alphabet letter, number, punctuation mark, or even a whole word. This braille alphabet and
numbers page illustrates what a cell looks like and how each dot is numbered. When every
letter of every word is expressed in braille, it is referred to as uncontracted braille. Some
books for young children are written in uncontracted braille although it is less widely used
for reading material meant for adults. However, many newly blinded adults find
uncontracted braille useful for labelling personal or kitchen items when they are first
learning braille.
9
There are a variety of tools for both reading and writing that are used by blind. These might
include the following Perkins braille writer (also referred to as a braille writer): Similar in
appearance and function to an old-fashioned manual typewriter, the braille writer has six
keys used to emboss (press) dots on the page to form braille.
10
A portable tool for writing braille, the slate and stylus is often used like a notepad to write
down short messages, such as a telephone number, telephone message, or shopping list or to
produce labels for items such as DVDs or cereal boxes. It is typically introduced to children.
11
Also known as a portable note taker or electronic note taker, a PDA is similar to a laptop
computer without a screen. Using this device, visually impaired can write with either a
standard keyboard or a braille keyboard, and can read material on the PDA either by
listening to it spoken aloud via synthetic speech or by reading braille on a refreshable braille
display.
Audio books:
When there is a large volume of material to be read, your blind people may find it beneficial
to listen to the material. Audio texts may be available on tape or CD, or, increasingly in
digital formats downloadable to a computer, PDA, or other device.
12
The proposed system is to help the visually impaired on reading text. The existing system for
visually impaired is the Braille method which is a traditionally written with embossed paper.
But there are some difficulties such as the visually impaired people cannot read the text on
normal paper. To overcome such difficulties we propose a wearable device with camera which
captures the text and the captured text is first detected and extracted using MSER algorithm
.Then OCR method is employed which converts the images of typed or printed text into
machine encoded text. From OCR algorithm the text is checked for errors using Post
processing algorithm. The captured text is converted into speech signal using the Text To
Speech (TTS) algorithm. The converted speech signal is read out through the earphones of the
visually impaired persons. The software employed in this system is Python. The entire system
is implemented using Raspberry pi 3 model. This system enables blind person to lead an
independent day to day life. The main objectives are:
b. Inserting assistive enhancements into a blind person’s shoes or cane adds more weight
influencing its torque and usage adversely.
Project aims at removing such constraints by helping the visually challenged to read
independently without restricting his/her movements. It also uses COTS (commercial
off-the-shelf technologies) ensuring cost-effectiveness of the product. The technical objective
of this device, in the context of reading from the distance, is to allow a cost-effective,
independent reading experience for the blind. We aimed to use commercial off-the-shelf
(COTS) components. Our prototype incorporates the following:
3. An OCR software.
13
4. A microcontroller board.
5. Braille glove.
The detection and recognition of text from natural scene images constitute one of the main
tasks that need to be fulfilled in order to proceed with our project. A global method like the
Otsu’s technique is not quite suitable for camera captured images, since it often leads to loss
of textual information against the background. The camera used for this purpose was the
Microsoft LifeCam Studio Webcam. For text detection from the image, the open source
Optical Character Recognition engine, Abbyy FineReader was used. The pre-processed image
was then fed into the OCR engine and the detected text was displayed into a .txt file or a .doc
file. Optical Character Recognition, or OCR, is a technology that enables one to convert
different types of documents or images captured by a digital camera into editable and
searchable data. ABBYY FineReader is an optical character recognition (OCR) software that
works with text conversion and creates editable, searchable files and e-books from scans of
paper documents, PDFs and digital photographs.
The three basic principles that allow humans to recognize objects are:
1. Integrity
2. Purposefulness
3. Adaptability (IPA).
b. It divides the page into elements such as blocks of texts, tables, images, etc.
14
c. The lines are divided into words and then - into characters.
After processing huge number of such probabilistic hypotheses, the program finally takes the
decision, presenting the recognized text. Using ABBYY FineReader OCR is easy. The process
generally consists of three stages:
2. Recognize it
The entire process of data conversion from original paper document, image or PDF takes less
than a minute, and the final recognized document looks just like the original. Advanced,
powerful OCR software allows one to save a lot of time and effort when creating, processing
and repurposing various documents.
1. Using a Computer/Laptop.
Using laptop: At first the project was developed using laptop as the processing medium. The
components exclusively needed for developing using a laptop are: a. Webcam or a digital
camera: A digital camera with 5-megapixel resolution or higher is used equipped with Flash
disable mode, optical zoom, an anti-shake feature and autofocus.
i) Finereader:
With ABBYY Screenshot Reader one can take Image Screenshots or Text Screenshots. Image
Screenshots: Easily create screenshots and save them as images, only selected area on the
screen, a complete window (print screen) or his entire desktop can be captured.
15
If one wants to grab some text from an image file, Web site, presentation, or PDF he can
quickly turn text areas into truly editable text that he can paste directly into an open
application, edit or save as Microsoft Word or Excel documents. Screenshot Reader will
convert the image of the screenshot into text.
c. Python Programming:
Using Smartphone:
The entire project was shifted to smartphone from laptop primarily due to portability issues.
The components used are:
This powerful software development kit (SDK) enables images and photographs to be
transformed into searchable and editable document formats and supports all of the most
popular mobile platforms and devices.
Step 3: OCR – includes the options of Business Card Processing or Barcode Recognition.
designed a .apk android file which sends the data from the Android device to the Arduino
using a USB cable. An effective user-interface support for the architecture is important, as an
embedded operating system would provide structure and low-level functionality. This is the
reason the Arduino Uno microcontroller development board was chosen.
Arduino:
Arduino Uno is a microcontroller board based on the ATmega328. It has
14 digital input /output pins, 6 analog inputs, a 16 MHz ceramic resonator, a USB connection,
a power jack, an ICSP header, and a reset button.
a. Memory:
b. Communication:
The ATmega328 provides UART TTL (5V) serial communication, which is available on
digital pins 0 (RX) and 1 (TX).
c. Programming: The Arduino Uno can be programmed with the Arduino software . The
ATmega328 on the Arduino Uno comes preburned with a bootloader that allows one to upload
new code to it without the use of an external hardware programmer. It communicates using the
original STK500 protocol.
Steps:
1. The analog serial pins are at first assigned as six output pins.
3. For each character – all the alphabetic, numeric and a few alphanumeric, specific set of
output pins are high. Here the Braille convention is followed.
4. The program is terminated. All over the world, persons with visual handicaps have used
Braille as the primary means reading information. Standard Braille is an approach to creating
17
documents which could be read through touch. This is accomplished through the concept of a
Braille cell consisting of raised dots on thick sheet of paper.
A cell consists of six dots arranged in the form of a rectangular grid of two dots horizontally
and three dots vertically. With six dots arranged this way, one can obtain sixty three different
patterns of dots. A visually Handicapped person is taught Braille by training him or her in
discerning the cells by touch, accomplished through his or her fingertips.
18
CHAPTER 2
WORKFLOW PROCESS
BRAILLE GLOVE:
All over the world, persons with visual handicaps have used Braille as the primary means
reading information. Standard Braille is an approach to creating documents which could be
read through touch. This is accomplished through the concept of a Braille cell consisting of
raised dots on thick sheet of paper. A cell consists of six dots arranged in the form of a
rectangular grid of two dots horizontally and three dots vertically. With six dots arranged this
way, one can obtain sixty three different patterns of dots. A visually Handicapped person is
taught Braille by training him or her in discerning the cells by touch, accomplished through his
or her fingertips.
A printed sheet of Braille normally contains upwards of twenty five rows of text with forty
cells in each row. The physical dimensions of a standard Braille sheet are approximately 11
inches by 11 inches. The dimensions of the Braille cell are also standardized but these may
vary slightly depending on the country. The dimension of a Braille cell, as printed on an
embosser is shown below. The six dots forming the cell permit sixty three different patterns of
19
dot arrangements. Strictly, it is sixty four patterns but the last one is a cell without any dots
and thus serves the purpose of a space. A Braille cell is thus an equivalent of a six bit character
code, if we view it in the light of text representation in a computer! However, it is not related
to any character code in use with computers. In standard English Braille, many of the sixty
three cells will correspond to a letter of the Roman alphabet, or a punctuation mark. A few
cells will represent short words or syllables that are frequently encountered in English. This is
done so that the number of cells required to show a sentence may be reduced, which helps
minimize the space requirements while printing Braille.
The six dots forming the cell permit sixty three different patterns of dot arrangements. It is
matched with alphabets, numbers and special symbols of the English language. The braille
glove contains six vibration motors. These are fixed in five fingers and center palm. The basic
technique used in the hand glove based on retrieval value of English letter value from the user
20
typed in the keyboard. It is converted into Braille value and activated the corresponding
motors. So based on the position of vibration the blind person can understand the value of the
letter. For example if the user can type the letter “r”. It is converted int Braille value as 1,2,3,5
and this value activates the corresponding motors in Braille hand glove. This conversion
program in written in high tech C language and it is recorded in micro controller of the hand
glove. Any blind person can wear this glove in right hand, and understand the English letters
through vibration instead of touch the Braille sheet. Similarly the whole word or sentence is
converted into Braille vibration and send to blind person. Based on this method the visible
person and deaf and blind person can communicate effectively.
The Braille Hand glove will be comprised of the following key components
2. RS 232 C
4. power supplies
21
MICROCONTROLLER:
Microcontroller is a general purpose device, which integrates a number of the components of a
microprocessor system onto a single chip. It has inbuilt CPU, memory and peripherals to
make it as a mini computer. A microcontroller is integrated with
1. CPU Core
The vibration hand glove contains a microcontroller AT89C51. It is the 40 pins, 8 bit
Microcontroller manufactured by Atmel group. It is the flash type reprogrammable memory.
Advantage of this flash memory is we can erase the program within a few minutes. It has 4KB
on chip ROM and 128 bytes internal RAM and 32 I/O pin as arranged as port 0 to port 3 each
has 8 bit bin .Port 0 contains 8 data line (D0-D7) as well as low order address line(AO-A7).
22
The position identification and controlling the motors is programmed in hi tech c language and
is loaded in microcontroller.
1) Crystal:
The heart of the microcontroller is the circuites which generate the clock pulse. Then
microcontroller provides the two pins. XTAL 1, XTAL 2 to correct the external crystal
resonator along with capacitor. The crystal frequency is the basic clock frequency of the
microcontroller. Based on the frequency rotation time of vibration motor inside the hand glove
is controlled by micro controller.
2) Reset:
The memory location for 89C51 0000H to 0FFFH. Whenever switch on the supply the
memory location starts from 0000H.The 89C51 micro controller provide 9th pin for Reset
Function. Here the reset circuitry consists of 10Mfcapacitor in series with 10k resistor. When
switch on the supply of the capacitor changes and discharged gives high low pulse to the 9th
pin through the 7414 inverter. Here we interface LCD display to microcontroller via port 0 and
port
2. LCD control lines are connected in port 2 and Data lines are connected in port 0. whenever
struggle in motor speed, it is used to restart the program.
3) LCD:
Liquid Crystal Display has 16 pins in which first three and 15th pins are used for power
supply. 4th pin is RS(Register Selection) if it is low data and if it is high command will be
displayed. 5th pin is R/W if it is low it performs write operation. 6th pin act as enable and
remaining pins are data lines In vieration hand glove RS-232 is a standard for serial binary
data interconnection between a DTE (Data terminal equipment) and a DCE (Data
Circuit-terminating Equipment). It is commonly used in computer serial ports.Here ascii
values are converted into binary signals and send to vibration glove to activates the vibration
motors.
.
23
Details of the character format and transmission bit rate are controlled by the serial port
hardware, often a single integrated circuit called a UART that converts data from parallel to
serial form. A typical serial port includes specialized driver and receiver integrated circuits to
convert
between internal logic levels and RS-232 compatible signa llevels. A relay is an electrically
operated switch. Current flowing through the coil of the relay creates a magnetic field which
attracts a lever and changes the switch contacts. The coil current can be on or off so relays
have two switch positions and they are double throw (changeover) switch. The coil of a relay
passes a relatively large current, typically 30mA for a 12V relay, but it can be as much as
100mA for relays designed to operate from lower voltages. Most ICs (chips) cannot provide
this current and a transistor is usually used to amplify the small IC current to the larger value
required for the relay coil. The maximum output current for the popular 555 timer IC is
200mA so these devices can supply relay coils directly without amplification.
24
CHAPTER 3
MODULES
EXTRACTION ALGORITHM
Step 1: Detect Candidate Text Regions Using MSER The MSER feature detector works well
for finding text regions. It works well for text because the consistent color and high contrast of
text leads to stable intensity profiles. Use the detect MSER Features function to find all the
regions within the image and plot these results. Notice that there are many non-text regions
detected alongside the text.
MSER Regions
Although the MSER algorithm picks out most of the text, it also detects many other stable
regions in the image that are not text. You can use a rule-based approach to remove non-text
regions. For example, geometric properties of text can be used to filter out non-text regions
using simple thresholds. Alternatively, you can use a machine learning approach to train a text
vs. non-text classifier. Typically, a combination of the two approaches produces better results.
This example uses a simple rule based approach to filter non-text regions based on geometric
properties. There are several geometric properties that are good for discriminating between
text regions including: Aspect ratio, Eccentricity, Euler number, Extent, solidity.
Step 3: Remove Non-Text Regions Based On Stroke Width Variation another common metric
used to discriminate between text and non-text is stroke width. Stroke width is a measure of
the width of the curves and lines that make up a character. Text regions tend to have little
stroke width variation, whereas non-text regions tend to have larger variations. To help
understand how the stroke width can be used to remove non-text regions, estimate the stroke
width of one of the detected MSER regions. You can do this by using a distance transform and
binary thinning operation. In the images shown above, notice how the stroke width image has
very little variation over most of the region. This indicates that the region is more likely to be
26
a text region because the lines and curves that make up the region all have similar widths,
which is a common characteristic of human readable text.
Region Image
At this point, all the detection results are composed of individual text characters. To use these
results for recognition tasks, such as OCR, the individual text characters must be merged into
words or text lines. This enables recognition of the actual words in an image, which carry
more meaningful information than just the individual characters. For example, recognizing the
string 'EXIT' vs. the set of individual characters {'X','E','T','I'}, where the meaning of the word
is lost without the correct ordering. One approach for merging individual text regions into
words or text lines is to first find neighbouring text regions and then form a bounding box
around these regions.
To find neighbouring regions, expand the bounding boxes computed earlier with region props.
This makes the bounding boxes of neighbouring text regions overlap such that text regions
27
that are part of the same word or text line form a chain of overlapping bounding boxes. Now,
the overlapping bounding boxes can be merged together to form a single bounding box around
individual words or text lines. To do this, compute the overlap ratio between all bounding box
pairs. This quantifies the distance between all pairs of text regions so that it is possible to find
groups of neighboring text regions by looking for non-zero overlap ratios. Once the pair-wise
overlap ratios are computed, use a graph to find all the text regions "connected" by a non-zero
overlap ratio.
After detecting the text regions, use the OCR function to recognize the text within each
bounding box. Note that without first finding the text regions, the output of the ocr function
would be considerably more noisy.
1. Document
2. Gray scale conversion
3. Filtering
4. Stored
5. character
6. Gray scale conversion
7. Filtering
8. Features Extraction
9. Recognition pattern
29
Recognition rate in these algorithms depends on the choice of features. Most of the existing
algorithms involve extensive processing on the image before the features are extracted that
results in increased computational time. In this paper, we discuss a pattern matching based
method for character recognition that would effectively reduce the image processing time
while maintaining efficiency and versatility. The parallel computational capability of neural
network ensures a high speed of recognition which is critical to a commercial environment.
The key factors involved in the implementation are: an optimal selection of features which
categorically defines the details of the characters, the number of features and a low image
processing time.
POST PROCESSING:
the vocabulary and grammar characteristics surrounding the error word. Other proposed
system is a statistical method for auto-correction of OCR errors; this approach uses a
dictionary to generate a list of correction candidates based on the n-gram model. Then, all
words in the OCR text are grouped into a frequency matrix that identifies the exiting sequence
of characters and their count. The correction candidate having the highest count in the
frequency matrix is then selected to substitute the error word.
The structure of the text-to-speech synthesizer can be broken down into major modules:
Natural Language Processing (NLP) module: It produces a phonetic transcription of the text
read, together with prosody.
It transforms the symbolic information it receives from NLP into audible and intelligible
speech. A text to speech system is composed of two parts: a front-end and a back-end. The
front-end has two major tasks. First, it converts raw text containing symbols like numbers and
abbreviations into the equivalent of written-out words. This process is often called text
normalization, pre-processing, or tokenization. The front-end then assigns phonetic
31
transcriptions to each word, and divides and marks the text into prosodic units, like phrases,
clauses, and sentences. The process of assigning phonetic transcriptions to words is called
text-to phone me or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody
information together make up the symbolic linguistic representation that is output by the
front-end. The back-end often referred to as the synthesizer then converts the symbolic
linguistic representation into sound. In certain systems, this part includes the computation of
the target prosody (pitch contour, phoneme durations), which is then imposed on the output.
CAMERA
Digital camera is a camera that encodes digital images and videos digitally and stores them. It
shares an optical system, typically using a lens with a variable diaphragm to focus light onto
an image pickup device. The diaphragm and shutter admit the correct amount of light to the
imager, just as with film but the image pickup device is electronic rather than chemical. A
digital camera records and stores photographic image in digital form. Many current models are
also able to capture images
PRINCIPLE OF CAMERAS:
It captures light through a small lens at the front using a tiny grid of microscopic light
detectors built into an image sensing microchip (either a charge-coupled device (CCD) or
more likely these days, a CMOS image sensor). A simple Webcam setup consists of a digital
camera attached to your computer, typically through the USB port. The camera part of the
32
Webcam setup is just a digital camera -- there's really nothing special going on there. The
Webcam nature of the camera comes with the software.
USB PORT
DTV standards such as ATSC, DVB, as these are encapsulations of the MPEG data streams,
which are passed off to a decoder, and output as uncompressed video data, which can be
high-definition. This video data is then encoded into TMDS for transmission digitally over
HDMI. HDMI also includes support for 8-channel uncompressed digital audio. Beginning
with version 1.2, HDMI now supports up to 8 channels of one-bit audio. One-bit audio is what
is used on Super Audio CDs
RASPBERRY PI
The Raspberry pi is a single computer board with credit card size, that can be used for many
tasks that your computer. The main purpose of designing the raspberry pi board is, to
encourage learning, experimentation and innovation for school level students. The raspberry pi
board is a portable and low cost. Maximum of the raspberry pi computers is used in mobile
phones. In the 21st century, the growth of mobile computing technologies is very high, a huge
33
segment of this being driven by the mobile industries. The 98% of the mobile phones were
using ARM technology.
ULTRASONIC SENSOR
An Ultrasonic sensor is a device that can measure the distance to an object by using sound
waves. It measures distance by sending out a sound wave at a specific frequency and listening
for that sound wave to bounce back. By recording the elapsed time between the sound wave
being generated and the sound wave bouncing back, it is possible to calculate the distance
between the sonar sensor and the object.
34
CHAPTER 4
APPLICATIONS
● People with learning disabilities: Some people have difficulty reading large amounts of
text due to dyslexia and other learning disabilities. Offering them an easier option for
experiencing website content is a great way to engage them.
● People who have literacy difficulties: Some people have basic literacy levels. They often
get frustrated trying to browse the internet because so much of it is in text form. By
offering them an option to hear the text instead of reading it, they can get valuable
information in a way that is more comfortable for them.
● People who speak the language but do not read it: Having a speech option for the foreign
born will open up your audience to this under-served population. Many people who
come to a new country learn to speak and understand the native language effectively, but
may still have difficulty reading in a second language.
● Though they may be able to read content with a basic understanding, text to speech
technology allows them to take in the information in a way they are more comfortable
with, making your content easier to comprehend and retain.
● People who multitask: A busy life often means that people do not have time to do all the
reading they would like to do online. Having a chance to listen to the content instead of
reading it allows them to do something else at the same time. With the prevalence of
smartphones and tablets, it also provides an option for content consumption on the go,
taking content away from the computer screen and into any environment thats
convenient for the consumer.
● People with visual impairment: Text to speech can be a very useful tool for the mild or
moderately visually impaired. Even for people with the visual capability to read, the
process can often cause too much strain to be of any use or enjoyment. With text to
speech, people with visual impairment can take in all manner of content in comfort
instead of strain. People who access content on mobile devices:
35
● Reading a great deal of content on a small screen is not always easy. Having
text-to-speech software doing the work is much easier. It allows people to get the
information they want without a great deal of scrolling and aggravation
● People with different learning styles: Some people are auditory learners, some are visual
learners, and some are kinaesthetic learners – most learn best through a combination of
the three. Universal Design for
● Learning is a plan for teaching which, through the use of technology and adaptable
lesson plans, aims to help the maximum number of learners comprehend and retain
information by appealing to all learning styles.
36
CHAPTER 5
CONCLUSION
The results obtained from the procedure described above are indicated in the figures below.
The preprocessed image which is given to tesseract OCR engine to extract the text in the
image. However due to the less resolution of the webcam, the output obtained is not 100%
accurate. The accuracy can be improved by making use of a HD camera or mobile camera.
This project provides a novel concept for text reading for the blind, utilizing a local-sequential
scan. The system includes a text tracking algorithm that extracts words from a close-up
camera view. Text to speech synthesis is a rapidly growing aspect of computer technology and
is increasingly playing a more important role in the way we interact with the system and
interfaces across a variety of platforms. The planned system gives a very simple method for
text to speech conversion. Text inputs like the alphabets, sentences, words and numbers are
given to the system. Text to speech conversions is achieved and receives a better result which
is audible and perfect. This system is very much used in the web applications, email readings,
mobile applications and so on for making an intelligent speaking system. Suggested system, as
an independent program, is fairly cheap and it is possible to install onto smart phone held by
blind people. This allows blind people to easy excess the program. This project is a standalone
application developed in Python which can be installed on any system free of cost. The
motivation for the development of this algorithm was the simple fact that English alphabets
are fixed glyphs and they shall not be changed ever. In this project, we have described a
system to read printed text and hand held objects for assisting the blind people is described. To
extract text regions from complex backgrounds, a novel text localization algorithm based on
models of stroke orientation and edge distributions using canny algorithm is proposed. Block
patterns project the proposed feature maps of an image patch into a feature vector. Adjacent
character grouping is performed to calculate candidates of text patches prepared for text
classification. OCR is used to perform word recognition on the localized text regions and
transform into audio output for blind users. The camera acts as input for the paper. As the
Raspberry Pi board is powered, the camera starts streaming. Speech recognition technology is
of particular interest due to the direct support of communications in between humans and
computers. The streaming data will be displayed on the screen. Using Tesseract library the
image will be converted into data and the data detected from the image will be shown on the
37
status bar. The obtained data will be pronounced through the ear phones. An image to speech
conversion technique using raspberry pi is implemented. The simulation results have been
successfully verified and the hardware output has been tested using different samples. The
algorithm successfully processes the image and reads it out clearly and it provides significant
help for people with disabilities. This is an economical as well as efficient device for the
visually impaired people. We have applied our algorithm on many images and found that it
successfully does its conversion. The device is compact and helpful to society. The main
advantages of this project are it requires less
consumption of time in recognizing and reading text with lower operational costs also text of
different fonts can be recognized. This project can also be used by partial blind people and
elderly people with different eyesight problems. It plays a significant role for visually
impaired students in their education. Logically, if listening gets a reader through text more
quickly, then it must be considered more efficient when time is of concern. Other advantages
include more flexibility, high accuracy, it is best suited for different illuminant condition and it
can be executed easily. There are few limitations to this project. Font size below 20 cannot be
recognized and the camera does not auto-focus. The major challenge is that, it is hard to adjust
to the distance between the camera and book. Speech recognizers are not perfect listeners.
They make mistakes. A big challenge in designing speech applications is working with
imperfect speech recognition technology. The problem of adjusting the distance between book
and camera can be solved by designing a robotic table that flips the pages automatically. By
providing a battery backup to the raspberry pi, the main aim of the proposed project of
portability can be achieved. The future work will be concentrated on developing an efficient
portable product that can extract text from any image enabling the blind people to read text
present on the products, banners, books etc. This project can effectively distinguish the object
of interest from the background or other objects in the camera view, in future this project can
be implemented in hardware which is used to detect and recognize the object and vehicles on
the road, so that it will assist the person not to cross the road during vehicle movement. The
algorithm can also be extended to handle non horizontal text strings. Future work will extend
localization algorithm to process text strings with characters fewer than three and to design
more robust block patterns for text feature extraction.The alignment of camera can be adjusted
and use more function of ocr to enhance the application. By enhancing application the
38
electronics labels, vehicles number can be scanned and processed and can be used for traffic
monitoring.
39
BIBLIOGRAPHY
1. APPLE, Voice Over para OS X, online available in accessibility.
2. V. Ajantha devi1, dr. Santhosh baboo “Embedded optical character recognition on Tamil
text image using raspberry pi” international journal of computer science trends and
technology (IJCST) – volume 2 issue 4, Jul-Aug 2014
3. Bindu Philip and R. D. sudhaker Samuel 2009 “Human machine interface – a smart ocr for
the visually challenged” International Journal of Recent trends in Engineering, vol
no.3,November 2009
4. Ezaki N., Bulacu M., and Schomaker, L. Text detection from natural scene images: towards
a system for visually impaired persons. In ICPR(2004).
5. Gopinath , aravind , pooja et.Al “Text to speech conversion using Matlab” International
journal of emerging technology and advanced engineering. volume 5, issue 1, January 2015.
6. Khushali Desai, Jaiprakash verma, “Image to sound conversion” International journal of
advance research.
7. Peters, J.Thillou, S. Embedded reading device for blind people: a usercentered design in
ISIT 2004.
8. V. Bhope, Prachi khilari, “Online speech to text engine” International journal of innovative
research in science, engineering and technology, issue 7, July 2015.
40
WEB REFERENCES
➔ https://www.researchgate.net/publication/282270189_A_TEXT_READING_SYSTEM
_FOR_THE_VISUALLY_DISABLED
➔ https://www.researchgate.net/publication/321883136_A_device_to_assist_blind_in_rea
ding_text
➔ https://www.ijitee.org/wp-content/uploads/papers/v8i6s3/F10350486S319.pdf