0% found this document useful (0 votes)

56 views12 pages

Virtual Assistant and Navigation For Visually Impa-1

New pdfs specially made for u darling

Uploaded by

whitedevil.x10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views12 pages

Virtual Assistant and Navigation For Visually Impa-1

New pdfs specially made for u darling

Uploaded by

whitedevil.x10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

IOT002 Virtual Reality Navigation Assistant for the visually impaired

A Project Report Submitted in Partial Fulfillment of the Requirement for the

Degree of

BACHELOR OF ENGINEERING

COMPUTER SCIENCE ENGINEERING

Sri Balaji Chockalingam Engineering College, Arni

(Affiliated to Anna University)

( by AICTE, Accredited by N.B.A, NewDelhi)

SRI BALAJI CHOCKALINGAM ENGINEERING COLLEGE, ARNI

(Affiliated to Anna University)

(Approved by AICTE, Accredited by N.B.A, New Delhi)

CERTIFICATE

This is to certify that the Project Report entitled “IOT002 VIRTUAL REALITY
NAVIGATION ASSISTANT FOR THE VISUALLY IMPAIRED” that is being submitted
by

K.ELAKIYAN (512722104010)

V.JAIGANESH (512722104018)

P.YUVASANKAR (512722104059)

L.B.SRIKANTH (512722104043)

In partial fulfillment of the requirement for the award of B.E in Computer Science
Engineering in the SRI BALAJI CHOCKALINGAM ENGINEERING COLLEGE, ARNI.
Virtual Assistant and Navigation for Visually
Impaired using Deep Neural Network and Image
Processing

ABSTRACT

The quick development of technology promotes the use of the

resources that are at hand to make simpler daily tasks and
improve the standard and quality of life for those who are
blind. This module suggests creating a system of virtual
assistant glasses to help the visually impaired navigate around
them. An obstacle detection module built into the device uses
computer vision to find obstacles and inform the user
through haptic feedback. The system also has a text
recognition module that can turn any text it identifies into
speech that the user can hear using built-in speakers in the
user's glasses. Users are now able to access to printed
materials like menus and signs in a way that was not
previously feasible. A sign board recognition module, which
converts text on signs into speech, is also part of the system.
The proposed approach has the potential to enhance the
liberty and standard of existence of blind individuals through
integrating these attributes in a wearable device.
I. Introduction

Millions of people globally suffer with a visual impairment, which is an

extremely serious issue. In 2020, the World Health Organization (WHO)
forecasts that there will be 36 million blind persons in the world. There were
also 217 million people who have moderate to severe visual impairment.
Blindness can have numerous causes, but the most common ones include
cataracts, untreated refractive errors, glaucoma, and macular degeneration
caused by age.
Through the years, a variety of methods and gadgets have been designed that
can help visually impaired people in navigating their environments, obtaining
information there, and observing and avoiding hazards in the environment.
Electronic aids for navigation are effective for some extinct and difficult-to-use
challenges, but they can give more precise information about obstacles than
other approaches. Environmental elements like weather and electromagnetic
interference can also have an impact on electronic mobility aids.

White canes are a cheap and useful tool for detecting obstructions on the ground, but they
could miss obstructions above the ground, such signs or low-hanging branches. A white cane
also needs a lot of training to use correctly, and it might not warn you of obstructions well
enough to prevent mishaps. Tactile ground-based indicators (TGSIs), Guide Dogs, Ultrasound
systems, and infrared systems are examples of earlier known technologies with limitations.
Although audio feedback systems can be useful for spotting adjacent impediments, they could
not give enough details about the obstacle's type or location to allow for a safe avoidance.
Additionally, text reading and navigation modify audio feedback.

This journal is based on a project called Virtual Assistant and Navigation for Visually
Impaired using Deep-Neural Network and Image Processing that examines how modern
technology and the requirements of the blind may coexist. The development of deep neural
networks and virtual assistants has created new opportunities for enhancing the quality of life
for those who have visual impairments. The development that can offer virtual aid, navigation,
and text reading for the blind is the main focus of this publication. The use of smart glasses for
object detection can give the blind access to real-time environmental information. The glasses
can recognize items and people in the user's environment and provide them audio or tactile
feedback using a camera and deep learning algorithms.
The sight handicapped can use this to move through busy spaces or avoid obstructions.

A pair of smart glasses could also be used for reading aid. The glasses can read text from
books, signs, or other written items and convert it to speech or braille using OCR
(Optical Character Recognition) technology. The visually impaired may be able to read and
access information with more freedom as a result.

The Journal aims to bring together researchers, developers, and practitioners from a variety of
fields, such as computer science, engineering, psychology, and medicine. By exchanging
knowledge and expertise, we hope to accelerate progress in this crucial and exciting area of
research, and
subsequently improve the lives of millions of visually output is played through the speaker. The system is controlled
impaired people worldwide. by a tactile button that the user presses to take a picture and
initiate the OCR operation.
II. Existing System Finding and classifying things inside an image or a video are
As this work focuses on obstacle detection techniques and the tasks involved in object detection, a computer vision task.
virtual reading assistants, related work in those domains is Many computer vision applications, such as self-driving cars,
evaluated in this section. security systems, and image search engines, depend heavily
on object detection. Numerous phases of processing are often
Analyzing, modifying, and altering digital photographs in involved in object detection methods, including picture
order to extract important information or enhance their visual preprocessing, feature extraction, object proposal
appeal is known as image processing. acquisition of images development, object categorization, and object localization.
Pre-processing the processing Techniques like filtering, The region-based convolutional neural network (R-CNN)
segmentation, feature extraction, object recognition, or family, You Only Look Once (YOLO), and Single Shot
compression may be used in this. Edge detection for Detector (SSD) are three well-known object detection
thresholding, etc. Analysis after Post-Processing. techniques.
N. Chen et al. [1] presents a mobile assistive reading system J. Redmon, S. Divvala, R. Girshick, [4] and A. Farhadi
for visually impaired people that uses deep learning presents a unified, real-time object detection system based on
techniques for text recognition and text-to-speech conversion a deep convolutional neural network. The proposed algorithm,
. including the use of deep learning models such as named YOLO, divides an image into a grid and predicts
Convolutional Neural Networks (CNNs) and Long Short- bounding boxes and class probabilities for each grid cell. The
Term Memory (LSTM) networks for text recognition and system is capable of detecting objects in real-time with high
speech synthesis, respectively. He presented the accuracy and efficiency. They trained the YOLO model on the
implementation of the system using the TensorFlow PASCAL VOC 2012 dataset and testing on the COCO dataset.
framework and evaluates the performance of the system using W. Liu et al.[5] presented a new object detection algorithm
several metrics such as accuracy, recall, precision, and F1 based on deep learning. The proposed method, called SSD,
score. P. Sivakumar and A. Santhakumar [2] used a Raspberry uses a single convolutional neural network to directly predict
Pi as the hardware platform and implemented the image object bounding boxes and class probabilities in a single pass.
processing techniques using Python programming language This approach is designed to be both highly accurate and
and OpenCV library. The system takes input from a camera efficient, allowing for real-time object detection on a wide
module, and the captured image is processed through various range of platforms . They used computer with a GPU for
image processing techniques to extract the text information training and testing their model. They also used standard deep
.Some of the image processing techniques used in the learning libraries such as Caffe and TensorFlow. Ren et al [6]
proposed system include 1) Image Preprocessing 2) Character proposes a faster region-based convolutional neural network
Segmentation 3) Character Recognition 4)Text-to-speech (Faster R-CNN) for real-time object detection. The authors
Conversion. introduce a Region Proposal Network (RPN) that shares
An open-source software package called Tesseract OCR convolutional layers with the detection network, allowing for
(Optical Character Recognition) engine is used to extract text efficient end-to-end object detection. The RPN generates
from digital photographs. It analyses and finds patterns in region proposals and assigns objectness scores to each
photos that match to letters, numbers, and other characters proposal, which are then passed to the detection network for
using machine learning methods. Tesseract's OCR engine classification and bounding box regression. The Faster R-
functions in phases. The image is first preprocessed to reduce CNN achieves state-of-the-art performance on the PASCAL
noise, normalize text size, and boost contrast. After that, VOC and MS COCO datasets . THEY USED including a 12-
character segmentation is used to separate out specific core Intel Xeon CPU, an NVIDIA Titan X GPU, Caffe deep
characters from the image. The Tesseract OCR engine then learning framework, and the PASCAL VOC and MS COCO
use feature extraction algorithms to examine each character datasets.
individually and extract pertinent features, such as the Many contemporary systems, despite multiple tries, are
character's height, width, and shape. The characters are then unable to fully combine elements like accuracy and
compared to a pre-existing lexicon of characters and words computational resources. For those who are blind or visually
using these attributes. Finally, the engine uses statistical
impaired, the suggested reading virtual assistant system
models to assess the probability that various character
sequences could combine to form words. It priorities the most provides a more effective and user-friendly solution. It also
likely string of characters that corresponds to the supplied has the added advantage of having object detection
image using language models. capabilities with increased efficiency, improved user
experience, and contextual awareness.
A. Prasad and S. P. Singh [3] proposed a portable reading
assistant for the visually impaired people. The system III. Proposed System
comprises of a Raspberry Pi computer, Pi camera, speaker,
and a tactile button. The software used in this system includes Our primary contribution to this project was the development
OpenCV, Tesseract OCR, and eSpeak speech synthesizer. of a straightforward, transportable, hands-free ETA prototype
The Pi camera is used to take a picture of the text, which is with text-to-speech conversion features for basic, everyday
then processed by OpenCV to create a better image. The indoor and outdoor use.
Tesseract OCR engine is then given the processed image to
extract the text from it. The eSpeak speech synthesizer is then
used to turn the recognized text into speech, and the audio
In this article, we suggest a cutting-edge visual aid solution
for people who are absolutely blind. The following are the FIGURE( I )
distinctive qualities that determine the originality of the
suggested design:-

• A reading assistant built into the hands-free,

wearable, low-power, and small design that can be
mounted on a pair of spectacles for indoor and
outdoor navigation.
• Low-end setup processing of complex algorithms.
• Accurate, real-time distance measuring using
cameras, which simplifies the design and decreases
the cost by lowering the number of sensors needed.

A Raspberry Pi board, a camera module, a specially designed

camera holder that was 3D printed, and smart glasses make
up the virtual assistant system. The Raspberry Pi board's
camera module is used to take pictures of printed text, which
are subsequently edited using image-processing software.
The image processing procedures utilized to extract the text
sections from the collected images include preprocessing,
binarization, noise removal, and segmentation. The text is
subsequently converted into digital format by an OCR
machine using the extracted text regions as input. The Fig.I. PROPOSED PROTOTYPE FLOW MODEL OF VIRTUAL
matching output is then produced by a text-to-speech or ASSISTANT GLASSES.
Braille conversion system using the digital text.
TABLE ( I )
To increase the precision and effectiveness of the virtual
assistant, we integrate object detection using deep neural Model Existing System Proposed System
networks to enable the detection of certain items, such a page
of a book. We employ a real-time object detection method Design Typically bulky Portable and
dubbed YOLO (You Only Look Once), which is a deep and stationary hands-free
neural network architecture. YOLO operates by dividing the Object Typically not Incorporates
image into a grid and calculating the likelihood that each grid Detection included object detection
cell will contain an object. YOLO is capable of real-time using deep-neural
performance and can recognize several items in a single networks to enable
image. To enable the detection of individual objects, we train the detection of
the YOLO network on a dataset of different printed materials, specific objects
such as books, magazines, and newspapers. Text Relies on OCR Utilizes image
Recognition technology, processing
An OCR system is used to convert the text from the extracted which can be techniques for
text regions into digital representation. Tesseract, an open- limited in improved accuracy
source OCR engine, is the foundation of the OCR technology accuracy and and efficiency
employed by the reading aid. Tesseract may be trained to efficiency
recognize bespoke fonts and languages in addition to a User Interface Often requires a Utilizes the glasses
variety of text sizes and typefaces. In order to increase physical for a hands-free
accuracy, the OCR system also uses post-processing methods interface such as and intuitive
including text normalization and spell checking. buttons or a interface
keyboard
Depending on the user's preferences, the OCR system Table.I. DIFFERENCE BETWEEN EXISTED SYSTEM AND
PROPOSED SYSTEM.
converts the digital text it produces into speech or another
format. The text-to-speech conversion tool reads the user's
text aloud using a synthesized voice. The Raspberry Pi board
Overall, compared to current reading assistant models, the
includes integrated text-to-speech features that are meant to
suggested reading assistant for the blind offers substantial
offer a seamless user experience.
enhancements in terms of portability, precision, and
effectiveness. Additionally, the proposed model is a more
understandable and practical solution for blind people due to
the integration of object identification utilizing deep neural
networks and the usage of smart glasses for a hands-free
interface.
• Image resizing: To enable quicker processing and demand
less memory, the acquired photographs are downsized to a
IV. BLOCK DIAGRAM fixed size.
• Image enhancement: To increase the contrast and
brightness of the text portions, the collected photographs are
FIGURE ( II ) enhanced.
• Noise removal: Artefacts and noise that could obstruct
OCR and object detection algorithms are removed from the
collected photos.
• Binarization: To enable segmentation and the extraction of
the text sections, the acquired images are transformed to
binary format.
• Usability: Reading aid glasses should be straightforward to
use, with clear controls and user-friendly software.

The system is characterized as employing a Raspberry Pi

camera to take a text image, then using a mix of the Tesseract
Fig.II. FLOW MODEL OF VIRTUAL ASSISTANT FOR OBJECT
OCR engine, OpenCV, and eSpeak speech synthesizer,
DETECTION turning the text into speech.
Here is a more thorough breakdown of each action:
FIGURE ( III )
Detecting the objects: This is the initial phase. The process
of locating and identifying items in a still or moving image is
known as object detection. It is an essential task in computer
vision with several applications in robotics, surveillance, and
other fields. Using deep learning models, like the YOLOv5
model, is one of the common methods for object detection.

A single convolutional neural network is used by the cutting-

edge real-time object detection system YOLOv5 to identify
objects in pictures and videos.
The initial step in using YOLOv5 for object identification is
loading the pre-trained model. The PyTorch library can be
used to download and load the model. The input image is
Fig.III. FLOW MODEL OF VIRTUAL ASSISTANT FOR OBJECT preprocessed after the model has been loaded by being
DETECTION
resized to a fixed size and having the pixel values normalized
to a range between 0 and 1.
FIGURE ( IV )
The input image is preprocessed before being fed into the
YOLOv5 model to produce the output. The result is a
collection of bounding boxes, each with a confidence score
and class title. These bounding boxes depict the things that
were found in the source image.
The bounding boxes, together with the matching class label
and confidence score, are drawn on the input image in order
to visualize the outcome. Filtering out the bounding boxes
with low confidence ratings is done using the threshold value.
Displaying the output image or saving it to disc is the last
step.

Image capture: The second step entails taking a picture of

the text using a Raspberry Pi camera. The Raspberry Pi board
Fig.IV. VARIOUS TYPES OF VIRTUAL ASSISTANTS FOR
VISUALLY IMPAIRED PEOPLE USING SMART GLASSES.
is attached to the camera, which may be instructed in Python
to capture still photos or record videos. We employ the image
produced by the object detection algorithm in this system.
V. METHODOLOGY Once an image has been acquired, it is processed using
OpenCV. For image processing, a popular open-source
The collected photos are preprocessed to increase the computer vision library is called OpenCV. Due to camera
precision of the algorithms before OCR and object detection. movement, bad lighting, or other circumstances, the obtained
The preprocessing methods comprise: image could be distorted. The image can be made better by
using OpenCV to reduce noise, boost contrast, and tweak
brightness and color. The OCR engine receives the processed
image next to perform text recognition. Convolutional Layers: The input image is passed through a
series of convolutional layers, which apply a set of filters to
Optical Character Recognition (OCR): The text in the the input image to extract features. Each convolutional layer
image will be recognized using the Tesseract OCR engine in is represented by the following equation:
the following step. Google created the open-source OCR
engine tesseract. It has a good accuracy rate and can 𝒀𝒊 = f ( 𝑾𝒊 * 𝒙𝒊 + 𝒃𝒊 ) (1)
recognize text from photos in many different languages.
where 𝑥𝑖 is the input tensor, Wi is the weight tensor for the
Optical Character Recognition (OCR): The text in the i-th convolutional layer, bi is the bias tensor for the i-th
image will be recognized using the Tesseract OCR engine in convolutional layer, and f is the activation function (e.g.,
the following step. Google created the open-source OCR ReLU).
engine tesseract. It has a good accuracy rate and can Down sampling: The feature maps produced by the
recognize text from photos in many different languages. The convolutional layers are down sampled using max pooling
engine operates by deciphering characters and sentences in layers to reduce the spatial dimensions of the feature maps
the image and analyzing it. while preserving the most important features. The down
sampling operation is represented by the following equation:
Conversion of text to speech: After the text has been
identified by the OCR engine, it is sent to the eSpeak speech 𝒚𝒊,𝒋,𝒌 = max(𝒙𝒊 , i=1,...,N, j*𝒔𝒕𝒓𝒊𝒅𝒆𝒉≤𝒊≤(𝒋+𝟏) *𝒔𝒕𝒓𝒊𝒅𝒆𝒉 ,
synthesizer for speech synthesis. A small, open-source k*𝒔𝒕𝒓𝒊𝒅𝒆𝒘≤𝒊≤(𝒋+𝟏) * 𝒔𝒕𝒓𝒊𝒅𝒆𝒘 (2)
speech synthesizer called eSpeak can generate speech in a
number of different languages. It can be used to produce where xi is the input tensor, yi,j,k is the output tensor, N is the
speech from inputted phonemes or from text. The technology
number of input channels, strideh and stridew are the
turns the recognized text into audio using eSpeak.
vertical and horizontal strides, respectively, and max is the
maximum pooling operation
Text input: The engine accepts text input from a number of
sources, including dictated or typed text, emails, and web
Concatenation: The feature maps produced by the down
information.
sampling layers are concatenated with the feature maps
produced by the corresponding convolutional layers to
Text analysis: The engine analyses the text using natural
produce a set of high-resolution feature maps that capture
language processing techniques to identify words and
both low-level and high-level features of the input image. The
phrases, as well as the proper intonation and stress patterns.
concatenation operation is represented by the following
equation:
Speech synthesis: The engine transforms the phonemes and
prosody into a digital audio waveform using a speech
𝒁𝒊 = concat( 𝒚𝒊 , 𝒙𝒊 ) (3)
synthesis method.
where 𝑦𝑖 is the output tensor of the down
Audio output: Playing the audio output through speakers is
sampling layer and 𝑥𝑖 is the output tensor of the
the last stage. The Raspberry Pi board's speaker receives the
corresponding convolutional layer.
audio output and plays it back. The recognized text is then
audible to the user as speech.
𝒚𝒊 = f ( 𝑾𝒊 * 𝒙𝒊 + 𝒃𝒊 ) (4)
The system is controlled by a tactile button that the user can 𝒕𝒙 = sigmoid(𝒚𝒊 , 𝒕𝒙 ) (5)
press to take a picture and launch the OCR procedure. When 𝒕𝒚 = sigmoid(𝒚𝒊 , 𝒕𝒚 ) (6)
the user hits the button, the system takes a picture, runs 𝒕𝒘 = exp(𝒚𝒊 , 𝒕𝒘 ) (7)
OpenCV on it, uses Tesseract OCR to recognize the text, 𝒕𝒉 = exp(𝒚𝒊 , 𝒕𝒉 ) (8)
turns the text into speech, and outputs the audio through the 𝒕𝒄 = sigmoid𝒚𝒊 , 𝒕𝒄 ) 9)
speaker.
where 𝑦𝑖 is the input tensor, 𝑊𝑖 is the weight tensor, 𝑏𝑖 is the
bias tensor, f is the activation function (e.g., ReLU), 𝑡𝑥 , 𝑡𝑦 ,
VI. STIMULATION RESULTS 𝑡𝑤 , 𝑡ℎ , and 𝑡𝑐 are the predicted coordinates and confidence
score for the detected object, sigmoid is the sigmoid
activation function, and exp is the exponential function.
The YOLOv5 model can be represented mathematically
using a series of equations that describe the various Non-Maximum Suppression: The output of the detection
components of the architecture. Here is a high-level overview layers is processed using non-maximum suppression (NMS)
of the YOLOv5 model equations: to remove duplicate detections and select the most confident
detections for each object class.
Input: The input to the YOLOv5 model is an image
represented as a tensor of size (C, H, W), where C is the Overall, the YOLOv5 model is a complex deep neural
number of channels (e.g., 3 for RGB images), H is the height network that uses a combination of convolutional
of the image, and W is the width of the image.
layers,down sampling layers, concatenation, and detection
layers to detect and localize objects in an input image.

FIGURE ( V ) TABLE ( II )

Model mAP@0.5 mAP@0.5:0.95 FPS

YOLOv5
(Proposed 0.509 0.301 140
System)
EfficientDet-D7 0.5 0.7 9
Faster R-CNN
0.402 0.624 7
ResNet-101
SSD ResNet-101 0.382 0.582 22
Retina Net
Fig.V. OBJECT DETECTION OF PERSON AND CELL PHONE. 0.38 0.567 6
ResNet-101
FIGURE ( VI ) TABLE.II. REPRESENTS THE MEAN AVERAGE PRECISION
INTERSECTION OVER UNION WITH FRAMES PER SECOND.

GRAPH ( II )

FPS
160 140
FRAMES PER SECOND

140
120
100
80
60
40 22
Fig.VI. OBJECT DETECTION OF PERSON AND BOTTLE. 9 7 6
20
0
FIGURE ( VII )

MODELS
GRAPH.II. COMPARISION OF VARIOUS MODELS WITH FRAMES
PER SECOND.

mAP@0.5 represents the mean Average Precision at an

Intersection over Union (IoU) threshold of 0.5, which is a
commonly used metric for object detection accuracy.
mAP@0.5:0.95 represents the mAP over a range of IoU
thresholds from 0.5 to 0.95.
Fig.VII. OBJECT DETECTION OF PERSON AND CLOCK.
The mathematical equation for an OCR model can be
Here, are some comparisons of various models for object expressed as follows:
detections : y = f(x;θ) (10)
GRAPH ( 1 )
where y represents the output text, x represents the input
image, and θ represents the parameters of the OCR model.
The function f represents the mapping between the input
image and the output text.
Optical Character Recognition (OCR) involves various
mathematical equations that describe the different
components of the OCR system.
Image Preprocessing: OCR starts with preprocessing the
input image to enhance the quality of the image and remove
noise. Some of the mathematical operations used in image
preprocessing include scaling, binarization, and noise
reduction. These operations can be represented by the
GRAPH.I. COMPARISION OF VARIOUS MODELS WITH THRESOLD following equations:
I' = scale(I) (11) FIGURE ( IX )
I' = binarize(I) (12)
I'= reduce_noise(I) (13)

where I is the input image and I' is the preprocessed image.

Segmentation: The preprocessed image is segmented into
individual characters or words. The segmentation operation
can be represented by the following equation:

S = segment(I') (14)

where S is the set of segmented characters or words.

Feature Extraction: Features are extracted from each
segmented character or word to represent its shape and FIG.IX. READING OF ROAD SIGN USING READING ASSISTANT .
texture. The feature extraction operation can be represented
by the following equation: FIGURE ( X )

F = extract_features(S) (15)

where F is the set of feature vectors representing each

segmented character or word.
Classification: The feature vectors are classified into the
corresponding characters or words using a machine learning
algorithm. The classification operation can be represented by
the following equation:

C = classify(F) (16)

where C is the set of classified characters or words. FIG.X. READING OF PUBLIC SIGN BOARD USING READING
Post-processing: The output of the OCR engine is post- ASSISTANT .
processed to correct errors and improve the accuracy of the
recognized text. The post-processing operation may include Here, is the accuracy comparision of various models for Text-
spell-checking, grammar checking, and formatting. The post- to-Speech :
processing operation can be represented by the following
equation: TABLE( III )
T' = postprocess(T) (17)
Model Accuracy
where T is the recognized text and T' is the corrected text. Tesseract 4.0 85.50%
Overall, OCR involves a combination of image processing,
feature extraction, machine learning, and post-processing Kraken OCR 98.30%
techniques to recognize text in an image. The accuracy of the OCRopus 92.90%
OCR engine depends on the quality of the input image, the
segmentation algorithm, the feature extraction algorithm, and Tesseract 5.0 (Proposed System) 98.52%
the classification algorithm used. TABLE.III. REPRESENTS THE COMPARISION OF VARIOUS
MODELS WITH THEIR ACCURACY .
FIGURE ( VIII )
GRAPH ( III )

Accuracy
100.00%
ACCURACY

95.00%
90.00%
85.00%
80.00%
75.00%
Tesseract Kraken OCR OCRopus Tesseract
4.0 5.0
FIG.VIII. READING OF THE WARNING USING READING MODELS
ASSISTANT .

GRAPH.III. THE ACCURACY OF VARIOUS MODELS

GRAPHICALLY.
VII. CONCLUSION
Based to the study's results, visually impaired people may
find the suggested virtual assistant and navigation system to
be a beneficial tool. The technology can increase the virtual
assistant's precision and effectiveness because it is capable of
precise distance measuring, object detection, and image
processing. The cost and complexity of the system are
decreased by using YOLO for object detection and transfer
learning for neural network training on a smaller dataset. The
deep neural networks and image processing techniques have
made it possible for the virtual assistant to identify items,
increasing the user's independence and mobility. Virtual
assistants and image processing techniques could be
integrated in a variety of industries, including healthcare and
transportation, thanks to technological advancements.

The Tesseract open-source OCR engine, which can recognize

text in a variety of fonts, sizes, and languages, is the
foundation of the OCR technology used in the reading aid.
Increased accessibility for people with visual impairments,
increased productivity when performing tasks that require
reading large amounts of text, and increased engagement
with digital content such as virtual assistants,
educational materials, and customer service systems are all
advantages of text-to-speech engines.

The system's performance is assessed using a variety of

datasets and metrics, and the findings demonstrate high
object detection, text extraction, and conversion efficiency.
Given its low cost and straightforward sensors, the suggested
system has the potential to be extensively adopted in the
consumer market and can considerably enhance the travelling
experience for people with visual impairments. For users
with varying degrees of vision impairment, the system's
auditory cues—such as beep noises and visual
enhancement techniques—further increase usability and
effectiveness.

Overall, the proposed virtual assistant and navigation system

is a promising approach to addressing the challenges faced
by visually impaired people in their daily lives and has
the potential to significantly enhance the quality of life
of visually impaired people by providing them with a
basic, portable, and hands-free ETA prototype for basic,
everyday indoor and outdoor use.
IX. REFERENCES [10] A. P. Sonawane, J. S. Pradhan, V. P. Waghmare, S.
Kesari, S. K. Singh and P. Pal, "Complete Data Transmission
using Li-Fi Technology with Visible Light Communication,"
[1] N. Chen, Y. Zhang, L. Wang, Y. Zhang, and Z. Huang,
"A mobile assistive reading system for visually impaired 2022 International Conference on Futuristic Technologies
people using deep learning," Multimedia Tools and (INCOFT), Belgaum, India, 2022, pp. 1-5, doi:
Applications, vol. 79, no. 9-10, pp. 6353- 6373, May 2020. 10.1109/INCOFT55651.2022.10094453.

[11] M. K. Mali et al., "Evaluation and Segregation of Fruit

[2] P. Sivakumar and A. Santhakumar, "Assistive device Quality using Machine and Deep Learning
for visually impaired people using image processing Techniques," 2022 International Conference on Futuristic
techniques," International Journal of Engineering and Technologies (INCOFT), Belgaum, India, 2022, pp. 1-8, doi:
Advanced Technology (IJEAT), vol. 9, no. 2, pp. 2382- 10.1109/INCOFT55651.2022.10094447.
2388, Dec. 2019.
[12] S. K. Singh and A. Kumar, "Modified design of STBC
Encoder for reducing Non-Linear distortions in OFDM
[3] A. Prasad and S. P. Singh, "Design of a portable Channel Estimation," 2022 Second International Conference
reading assistant for the visually impaired," International on Advances in Electrical, Computing, Communication and
Journal of Innovative Research in Computer and Sustainable Technologies (ICAECT), Bhilai, India, 2022, pp.
Communication Engineering, vol. 4, no. 6, pp. 8719-8724, 1-5, doi: 10.1109/ICAECT54875.2022.9807993.
Jun. 2016.
[13] K. Baidya, A. J. R. Dampella, V. V. Surya Charan
Paidipalli, S. Bansod, S. K. Singh and P. Pal, "Pesticides
[4] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi,
Spraying Using Non-GPS-Based Autonomous Drone," 2022
"You only look once: Unified, real-time object detection,"
in Proceedings of the IEEE Conference on Computer International Conference on Futuristic Technologies
Vision and Pattern Recognition, 2016 (INCOFT), Belgaum, India, 2022, pp. 1-5, doi:
10.1109/INCOFT55651.2022.10094368.

[5] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, [14] A. Chaurasiya et al., "Realization of OpenCL based
C.-Y. Fu, and A. C. Berg, "SSD: Single shot multibox CNN Implementation on FPGA using SDAccel Platform,"
detector," in Proceedings of the European Conference on 2022 International Conference on Futuristic Technologies
Computer Vision, 2016. (INCOFT), Belgaum, India, 2022, pp. 1-5, doi:
10.1109/INCOFT55651.2022.10094321.

[6] Y. Ren, Y. Li, X. Zhang, and J. Sun, "Faster R-CNN: [15] A. P. S. Shekhawat, A. Chaurasiya, P. Chaurasiya, P. K.
Towards real-time object detection with region proposal Patel, P. Pal and S. K. Singh, "Realization of Smart and
networks," in Advances in Neural Information Processing Highly Efficient IoTbased Surveillance System using Facial
Systems, 201 Recognition on FPGA," 2022 International Conference on
Futuristic Technologies (INCOFT), Belgaum, India, 2022,
pp. 1-5, doi: 10.1109/INCOFT55651.2022.10094500.
[7] J. S. Rao, K. Choragudi, S. Bansod, S. C. Paidipalli V V,
S. K. Singh and P. Pal, "AI, AR Enabling on Embedded [16] P. Bhosle, P. Pal, V. Khobragade, S. K. Singh and P.
systems for Agricultural Drones," 2022 International Kenekar, "Smart Navigation System Assistance for Visually
Conference on Futuristic Technologies (INCOFT), Belgaum, Impaired People," 2022 International Conference on
India, 2022, pp. 1-4, doi: Futuristic Technologies (INCOFT), Belgaum, India, 2022,
10.1109/INCOFT55651.2022.10094383. pp. 1-5, doi: 10.1109/INCOFT55651.2022.10094458.

[8] R. H. Krishnan, B. A. Naik, G. G. Patil, P. Pal and S. K. [17] R. B. Dushing, S. A. Jagtap, P. Kumar, G. G. Patil, P.
Singh, "AI Based Autonomous Room Cleaning Bot," 2022 Pal and S. K. Singh, "Swarm Robotics for Ultra-Violet
International Conference on Futuristic Technologies Sterilization Robot," 2022 International Conference on
(INCOFT), Belgaum, India, 2022, pp. 1-4, doi: Futuristic Technologies (INCOFT), Belgaum, India, 2022,
10.1109/INCOFT55651.2022.10094492. pp. 1-5, doi: 10.1109/INCOFT55651.2022.10094477.

[9] T. P. Pardhe, Y. M. Jogdande, S. B. Landge, Y. Kumar,

S. K. Singh and P. Pal, "Alcohol Detection and Traffic Sign
Board Recognition for Vehicle Acceleration Using
CNN," 2022 4th International Conference on Circuits,
Control, Communication and Computing (I4C), Bangalore,
India, 2022, pp. 519-523, doi:
10.1109/I4C57141.2022.10057863.

An Innovative Smart Glass For Blind People Using Artificial Intelligence
No ratings yet
An Innovative Smart Glass For Blind People Using Artificial Intelligence
7 pages
Route Detection and Navigation System Using Optimized YOLOv8
No ratings yet
Route Detection and Navigation System Using Optimized YOLOv8
21 pages
Design of The Smart Glove To System The Visually Impaired
No ratings yet
Design of The Smart Glove To System The Visually Impaired
63 pages
Navigation System Using YOLOv8
No ratings yet
Navigation System Using YOLOv8
16 pages
Iot Based Smart Gloves For Blind People
No ratings yet
Iot Based Smart Gloves For Blind People
7 pages
Robotic Assistant For Object Recognition Using Con
No ratings yet
Robotic Assistant For Object Recognition Using Con
13 pages
Paper 2
No ratings yet
Paper 2
9 pages
A Mobility Assistance Application Using Depth Imaging Analysis For The Visually Impaired - BGSASING
No ratings yet
A Mobility Assistance Application Using Depth Imaging Analysis For The Visually Impaired - BGSASING
62 pages
Mobile Assistive Technologies For The Visually Impaired Shortened
No ratings yet
Mobile Assistive Technologies For The Visually Impaired Shortened
20 pages
Developing A Flexible Navigational Assistive Application For The Visually Impaired and Blind Using Depth Imaging Analysis
No ratings yet
Developing A Flexible Navigational Assistive Application For The Visually Impaired and Blind Using Depth Imaging Analysis
15 pages
Internet of Things: Kanak Manjari, Madhushi Verma, Gaurav Singal
No ratings yet
Internet of Things: Kanak Manjari, Madhushi Verma, Gaurav Singal
20 pages
Understanding India
No ratings yet
Understanding India
84 pages
Third Eye For Blind Using Ultrasonic Sensor Research
No ratings yet
Third Eye For Blind Using Ultrasonic Sensor Research
6 pages
Law of Disability Suggestion Mid-Term 2025
No ratings yet
Law of Disability Suggestion Mid-Term 2025
12 pages
Visual Acuity (Va) : Distance Vision
No ratings yet
Visual Acuity (Va) : Distance Vision
7 pages
Object Detection System Using Arduino and Android Application For Visually Impaired People
No ratings yet
Object Detection System Using Arduino and Android Application For Visually Impaired People
3 pages
Share Integrated Smart Shoes For Blind People Mini Project
No ratings yet
Share Integrated Smart Shoes For Blind People Mini Project
32 pages
Boudreault 2016
No ratings yet
Boudreault 2016
4 pages
Camera & Sensors-Based Assistive Devices For Visually Impaired Persons: A Systematic Review
No ratings yet
Camera & Sensors-Based Assistive Devices For Visually Impaired Persons: A Systematic Review
20 pages
126 Virtual NC
No ratings yet
126 Virtual NC
8 pages
Trinetra: An Assistive Eye For The Visually Impaired
No ratings yet
Trinetra: An Assistive Eye For The Visually Impaired
6 pages
Assistive Technology For Visual Impairment
No ratings yet
Assistive Technology For Visual Impairment
15 pages
Case Study On The Conduct of Training and Competency Assessment For Persons With Disabilities in Barista NC Ii and Housekeeping NC Ii
No ratings yet
Case Study On The Conduct of Training and Competency Assessment For Persons With Disabilities in Barista NC Ii and Housekeeping NC Ii
53 pages
Smart Glasses A Visual Assistant For The Blind
No ratings yet
Smart Glasses A Visual Assistant For The Blind
6 pages
IOT Based Assistive Technologies For Visually Impaired Persons
No ratings yet
IOT Based Assistive Technologies For Visually Impaired Persons
6 pages
Smart Stick For Visually Impaired
No ratings yet
Smart Stick For Visually Impaired
6 pages
Five Senses
No ratings yet
Five Senses
8 pages
Sensors: Sensor-Based Assistive Devices For Visually-Impaired People: Current Status, Challenges, and Future Directions
No ratings yet
Sensors: Sensor-Based Assistive Devices For Visually-Impaired People: Current Status, Challenges, and Future Directions
42 pages
Kamaruddin 2021 J. Phys. Conf. Ser. 2107 012030
No ratings yet
Kamaruddin 2021 J. Phys. Conf. Ser. 2107 012030
9 pages
Goggle
No ratings yet
Goggle
18 pages
22600-Article Text-33960-1-10-20200611
No ratings yet
22600-Article Text-33960-1-10-20200611
9 pages
Mahendran Computer Vision-Based Assistance System For The Visually Impaired Using Mobile CVPRW 2021 Paper
No ratings yet
Mahendran Computer Vision-Based Assistance System For The Visually Impaired Using Mobile CVPRW 2021 Paper
10 pages
Iajsjbn
No ratings yet
Iajsjbn
9 pages
Cryptography and Cyber Security - CB3491 - 2 Marks Questions With Answer
No ratings yet
Cryptography and Cyber Security - CB3491 - 2 Marks Questions With Answer
33 pages
A Manual On Special Needs of CWSN in Schools
No ratings yet
A Manual On Special Needs of CWSN in Schools
73 pages
Base Pap1
No ratings yet
Base Pap1
11 pages
(PB) (Critical Concerns in Blindness) Shelley Kinash, Ania Paszuk-Accessible Education For Blind Learners Kindergarten Through Postsecondary-Information Age Publishing (2007)
No ratings yet
(PB) (Critical Concerns in Blindness) Shelley Kinash, Ania Paszuk-Accessible Education For Blind Learners Kindergarten Through Postsecondary-Information Age Publishing (2007)
41 pages
VisualPal A Mobile App For Object Recognition For The Visually Impaired
No ratings yet
VisualPal A Mobile App For Object Recognition For The Visually Impaired
6 pages
Design and Construction of Electronic Aid For Visually Impaired People
No ratings yet
Design and Construction of Electronic Aid For Visually Impaired People
11 pages
Facial Recognition Smart Glasses For Visually Challenged Persons
No ratings yet
Facial Recognition Smart Glasses For Visually Challenged Persons
5 pages
1.passenger BUS Alert System For Easy Navigation of Blind
100% (1)
1.passenger BUS Alert System For Easy Navigation of Blind
8 pages
Ic3i 2016 7917927
No ratings yet
Ic3i 2016 7917927
4 pages
Nairobi Protocol - Full Text
100% (1)
Nairobi Protocol - Full Text
9 pages
Smart Glasses
No ratings yet
Smart Glasses
11 pages
Black Book
No ratings yet
Black Book
52 pages
Visiting The Possible Implementation of Artificial Intelligence-Powered Object Identification For Improved Aid For The Visually Impaired
No ratings yet
Visiting The Possible Implementation of Artificial Intelligence-Powered Object Identification For Improved Aid For The Visually Impaired
5 pages
Ai Glass 1
No ratings yet
Ai Glass 1
6 pages
JETIR2404953
No ratings yet
JETIR2404953
5 pages
Irjet V10i3128
No ratings yet
Irjet V10i3128
4 pages
10 1109@ccoms 2019 8821654
No ratings yet
10 1109@ccoms 2019 8821654
5 pages
Design and Construction of Electronic Aid For Visually Impaired People
No ratings yet
Design and Construction of Electronic Aid For Visually Impaired People
11 pages
Rajendran 2020
No ratings yet
Rajendran 2020
4 pages
Assist Ive Technology For Visual Impairment
No ratings yet
Assist Ive Technology For Visual Impairment
15 pages
Irjet V7i3567 PDF
No ratings yet
Irjet V7i3567 PDF
6 pages
Ultra Sonic Glassses For The Blind
No ratings yet
Ultra Sonic Glassses For The Blind
8 pages
Typetricks Userdesign SofieBeier
No ratings yet
Typetricks Userdesign SofieBeier
24 pages
Pdi Report (Copy)
No ratings yet
Pdi Report (Copy)
25 pages
Third Eye An Aid For Visually Impaired 1
No ratings yet
Third Eye An Aid For Visually Impaired 1
6 pages
Chapter 2 Edited
No ratings yet
Chapter 2 Edited
6 pages
AI-Based Navigator For Hassle-Free Navigation For Visually Impaired
No ratings yet
AI-Based Navigator For Hassle-Free Navigation For Visually Impaired
6 pages
ic-ETITE47903.2020.093 Improving Reality Perception For The Visually Impaired
No ratings yet
ic-ETITE47903.2020.093 Improving Reality Perception For The Visually Impaired
7 pages
Fin Irjmets1657961785
No ratings yet
Fin Irjmets1657961785
4 pages
Blind Assistance System
No ratings yet
Blind Assistance System
8 pages
Njala University: Bo Campus-Kowama Location
No ratings yet
Njala University: Bo Campus-Kowama Location
32 pages
8 Science NCERT Chapter 16
No ratings yet
8 Science NCERT Chapter 16
17 pages
Guide Dogs Annual Report 2010
No ratings yet
Guide Dogs Annual Report 2010
64 pages
Asistivne Tehnologije
No ratings yet
Asistivne Tehnologije
8 pages
A Study To Assess The Knowledge On Refractive Errors Among School Children
No ratings yet
A Study To Assess The Knowledge On Refractive Errors Among School Children
6 pages
2 - Sudden and Gradual Vision Loss
No ratings yet
2 - Sudden and Gradual Vision Loss
31 pages
CEL4
No ratings yet
CEL4
99 pages
IJRPR35279
No ratings yet
IJRPR35279
3 pages
Destine - Gps Guided Shoes For Blind: Sumanth G. Rao
No ratings yet
Destine - Gps Guided Shoes For Blind: Sumanth G. Rao
3 pages
International Glaucoma Association: Abebe Bejiga MD
No ratings yet
International Glaucoma Association: Abebe Bejiga MD
2 pages
Smartglasses For Visually Impaired
No ratings yet
Smartglasses For Visually Impaired
7 pages
2025 Kjsea Candidates With Special Needs
No ratings yet
2025 Kjsea Candidates With Special Needs
1 page
AIML 2 and 3
No ratings yet
AIML 2 and 3
34 pages
Artificial Intelligence and Digital Health in Glob
No ratings yet
Artificial Intelligence and Digital Health in Glob
12 pages
Primary Care Screening of High Risk Population - Dr. Hera Dwi Novita, SP.M (K)
No ratings yet
Primary Care Screening of High Risk Population - Dr. Hera Dwi Novita, SP.M (K)
17 pages
Definition of Disability
No ratings yet
Definition of Disability
57 pages
E Sneschool Trang 1
No ratings yet
E Sneschool Trang 1
16 pages
Monotype 3
No ratings yet
Monotype 3
21 pages
Virtual Assistant For Visually Blind People Ijariie21968
No ratings yet
Virtual Assistant For Visually Blind People Ijariie21968
5 pages
Hessen: Barrier-Free in Frankfurt - Including For Partially Sighted or Visually Impaired Visitors
No ratings yet
Hessen: Barrier-Free in Frankfurt - Including For Partially Sighted or Visually Impaired Visitors
1 page
Nigeria
No ratings yet
Nigeria
9 pages
18CSP109L 1st Review-Major
No ratings yet
18CSP109L 1st Review-Major
15 pages
Recruitment of Drugs Inspector (Medical Devices)
No ratings yet
Recruitment of Drugs Inspector (Medical Devices)
149 pages
Sight Unseen by Georgina Kleege PDF
No ratings yet
Sight Unseen by Georgina Kleege PDF
28 pages
Optical Braille Recognition: Empowering Accessibility Through Visual Intelligence
From Everand
Optical Braille Recognition: Empowering Accessibility Through Visual Intelligence
Fouad Sabry
No ratings yet
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Percept: Fundamentals and Applications
From Everand
Percept: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computer Vision: Exploring the Depths of Computer Vision
From Everand
Computer Vision: Exploring the Depths of Computer Vision
Fouad Sabry
No ratings yet
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Computer Vision: Fundamentals and Applications
From Everand
Computer Vision: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Virtual Assistant and Navigation For Visually Impa-1

Uploaded by

Virtual Assistant and Navigation For Visually Impa-1

Uploaded by

IOT002 Virtual Reality Navigation Assistant for the visually impaired

A Project Report Submitted in Partial Fulfillment of the Requirement for the

COMPUTER SCIENCE ENGINEERING

Sri Balaji Chockalingam Engineering College, Arni

(Affiliated to Anna University)

( by AICTE, Accredited by N.B.A, NewDelhi)

(Affiliated to Anna University)

(Approved by AICTE, Accredited by N.B.A, New Delhi)

The quick development of technology promotes the use of the

Millions of people globally suffer with a visual impairment, which is an

• A reading assistant built into the hands-free,

A Raspberry Pi board, a camera module, a specially designed

The system is characterized as employing a Raspberry Pi

A single convolutional neural network is used by the cutting-

Image capture: The second step entails taking a picture of

Model mAP@0.5 mAP@0.5:0.95 FPS

mAP@0.5 represents the mean Average Precision at an

where I is the input image and I' is the preprocessed image.

where S is the set of segmented characters or words.

where F is the set of feature vectors representing each

GRAPH.III. THE ACCURACY OF VARIOUS MODELS

The Tesseract open-source OCR engine, which can recognize

The system's performance is assessed using a variety of

Overall, the proposed virtual assistant and navigation system

[11] M. K. Mali et al., "Evaluation and Segregation of Fruit

[9] T. P. Pardhe, Y. M. Jogdande, S. B. Landge, Y. Kumar,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.