Virtual Assistant and Navigation For Visually Impa-1
Virtual Assistant and Navigation For Visually Impa-1
Degree of
BACHELOR OF ENGINEERING
IN
CERTIFICATE
This is to certify that the Project Report entitled “IOT002 VIRTUAL REALITY
NAVIGATION ASSISTANT FOR THE VISUALLY IMPAIRED” that is being submitted
by
K.ELAKIYAN (512722104010)
V.JAIGANESH (512722104018)
P.YUVASANKAR (512722104059)
L.B.SRIKANTH (512722104043)
In partial fulfillment of the requirement for the award of B.E in Computer Science
Engineering in the SRI BALAJI CHOCKALINGAM ENGINEERING COLLEGE, ARNI.
Virtual Assistant and Navigation for Visually
Impaired using Deep Neural Network and Image
Processing
ABSTRACT
White canes are a cheap and useful tool for detecting obstructions on the ground, but they
could miss obstructions above the ground, such signs or low-hanging branches. A white cane
also needs a lot of training to use correctly, and it might not warn you of obstructions well
enough to prevent mishaps. Tactile ground-based indicators (TGSIs), Guide Dogs, Ultrasound
systems, and infrared systems are examples of earlier known technologies with limitations.
Although audio feedback systems can be useful for spotting adjacent impediments, they could
not give enough details about the obstacle's type or location to allow for a safe avoidance.
Additionally, text reading and navigation modify audio feedback.
This journal is based on a project called Virtual Assistant and Navigation for Visually
Impaired using Deep-Neural Network and Image Processing that examines how modern
technology and the requirements of the blind may coexist. The development of deep neural
networks and virtual assistants has created new opportunities for enhancing the quality of life
for those who have visual impairments. The development that can offer virtual aid, navigation,
and text reading for the blind is the main focus of this publication. The use of smart glasses for
object detection can give the blind access to real-time environmental information. The glasses
can recognize items and people in the user's environment and provide them audio or tactile
feedback using a camera and deep learning algorithms.
The sight handicapped can use this to move through busy spaces or avoid obstructions.
A pair of smart glasses could also be used for reading aid. The glasses can read text from
books, signs, or other written items and convert it to speech or braille using OCR
(Optical Character Recognition) technology. The visually impaired may be able to read and
access information with more freedom as a result.
The Journal aims to bring together researchers, developers, and practitioners from a variety of
fields, such as computer science, engineering, psychology, and medicine. By exchanging
knowledge and expertise, we hope to accelerate progress in this crucial and exciting area of
research, and
subsequently improve the lives of millions of visually output is played through the speaker. The system is controlled
impaired people worldwide. by a tactile button that the user presses to take a picture and
initiate the OCR operation.
II. Existing System Finding and classifying things inside an image or a video are
As this work focuses on obstacle detection techniques and the tasks involved in object detection, a computer vision task.
virtual reading assistants, related work in those domains is Many computer vision applications, such as self-driving cars,
evaluated in this section. security systems, and image search engines, depend heavily
on object detection. Numerous phases of processing are often
Analyzing, modifying, and altering digital photographs in involved in object detection methods, including picture
order to extract important information or enhance their visual preprocessing, feature extraction, object proposal
appeal is known as image processing. acquisition of images development, object categorization, and object localization.
Pre-processing the processing Techniques like filtering, The region-based convolutional neural network (R-CNN)
segmentation, feature extraction, object recognition, or family, You Only Look Once (YOLO), and Single Shot
compression may be used in this. Edge detection for Detector (SSD) are three well-known object detection
thresholding, etc. Analysis after Post-Processing. techniques.
N. Chen et al. [1] presents a mobile assistive reading system J. Redmon, S. Divvala, R. Girshick, [4] and A. Farhadi
for visually impaired people that uses deep learning presents a unified, real-time object detection system based on
techniques for text recognition and text-to-speech conversion a deep convolutional neural network. The proposed algorithm,
. including the use of deep learning models such as named YOLO, divides an image into a grid and predicts
Convolutional Neural Networks (CNNs) and Long Short- bounding boxes and class probabilities for each grid cell. The
Term Memory (LSTM) networks for text recognition and system is capable of detecting objects in real-time with high
speech synthesis, respectively. He presented the accuracy and efficiency. They trained the YOLO model on the
implementation of the system using the TensorFlow PASCAL VOC 2012 dataset and testing on the COCO dataset.
framework and evaluates the performance of the system using W. Liu et al.[5] presented a new object detection algorithm
several metrics such as accuracy, recall, precision, and F1 based on deep learning. The proposed method, called SSD,
score. P. Sivakumar and A. Santhakumar [2] used a Raspberry uses a single convolutional neural network to directly predict
Pi as the hardware platform and implemented the image object bounding boxes and class probabilities in a single pass.
processing techniques using Python programming language This approach is designed to be both highly accurate and
and OpenCV library. The system takes input from a camera efficient, allowing for real-time object detection on a wide
module, and the captured image is processed through various range of platforms . They used computer with a GPU for
image processing techniques to extract the text information training and testing their model. They also used standard deep
.Some of the image processing techniques used in the learning libraries such as Caffe and TensorFlow. Ren et al [6]
proposed system include 1) Image Preprocessing 2) Character proposes a faster region-based convolutional neural network
Segmentation 3) Character Recognition 4)Text-to-speech (Faster R-CNN) for real-time object detection. The authors
Conversion. introduce a Region Proposal Network (RPN) that shares
An open-source software package called Tesseract OCR convolutional layers with the detection network, allowing for
(Optical Character Recognition) engine is used to extract text efficient end-to-end object detection. The RPN generates
from digital photographs. It analyses and finds patterns in region proposals and assigns objectness scores to each
photos that match to letters, numbers, and other characters proposal, which are then passed to the detection network for
using machine learning methods. Tesseract's OCR engine classification and bounding box regression. The Faster R-
functions in phases. The image is first preprocessed to reduce CNN achieves state-of-the-art performance on the PASCAL
noise, normalize text size, and boost contrast. After that, VOC and MS COCO datasets . THEY USED including a 12-
character segmentation is used to separate out specific core Intel Xeon CPU, an NVIDIA Titan X GPU, Caffe deep
characters from the image. The Tesseract OCR engine then learning framework, and the PASCAL VOC and MS COCO
use feature extraction algorithms to examine each character datasets.
individually and extract pertinent features, such as the Many contemporary systems, despite multiple tries, are
character's height, width, and shape. The characters are then unable to fully combine elements like accuracy and
compared to a pre-existing lexicon of characters and words computational resources. For those who are blind or visually
using these attributes. Finally, the engine uses statistical
impaired, the suggested reading virtual assistant system
models to assess the probability that various character
sequences could combine to form words. It priorities the most provides a more effective and user-friendly solution. It also
likely string of characters that corresponds to the supplied has the added advantage of having object detection
image using language models. capabilities with increased efficiency, improved user
experience, and contextual awareness.
A. Prasad and S. P. Singh [3] proposed a portable reading
assistant for the visually impaired people. The system III. Proposed System
comprises of a Raspberry Pi computer, Pi camera, speaker,
and a tactile button. The software used in this system includes Our primary contribution to this project was the development
OpenCV, Tesseract OCR, and eSpeak speech synthesizer. of a straightforward, transportable, hands-free ETA prototype
The Pi camera is used to take a picture of the text, which is with text-to-speech conversion features for basic, everyday
then processed by OpenCV to create a better image. The indoor and outdoor use.
Tesseract OCR engine is then given the processed image to
extract the text from it. The eSpeak speech synthesizer is then
used to turn the recognized text into speech, and the audio
In this article, we suggest a cutting-edge visual aid solution
for people who are absolutely blind. The following are the FIGURE( I )
distinctive qualities that determine the originality of the
suggested design:-
FIGURE ( V ) TABLE ( II )
GRAPH ( II )
FPS
160 140
FRAMES PER SECOND
140
120
100
80
60
40 22
Fig.VI. OBJECT DETECTION OF PERSON AND BOTTLE. 9 7 6
20
0
FIGURE ( VII )
MODELS
GRAPH.II. COMPARISION OF VARIOUS MODELS WITH FRAMES
PER SECOND.
S = segment(I') (14)
F = extract_features(S) (15)
C = classify(F) (16)
where C is the set of classified characters or words. FIG.X. READING OF PUBLIC SIGN BOARD USING READING
Post-processing: The output of the OCR engine is post- ASSISTANT .
processed to correct errors and improve the accuracy of the
recognized text. The post-processing operation may include Here, is the accuracy comparision of various models for Text-
spell-checking, grammar checking, and formatting. The post- to-Speech :
processing operation can be represented by the following
equation: TABLE( III )
T' = postprocess(T) (17)
Model Accuracy
where T is the recognized text and T' is the corrected text. Tesseract 4.0 85.50%
Overall, OCR involves a combination of image processing,
feature extraction, machine learning, and post-processing Kraken OCR 98.30%
techniques to recognize text in an image. The accuracy of the OCRopus 92.90%
OCR engine depends on the quality of the input image, the
segmentation algorithm, the feature extraction algorithm, and Tesseract 5.0 (Proposed System) 98.52%
the classification algorithm used. TABLE.III. REPRESENTS THE COMPARISION OF VARIOUS
MODELS WITH THEIR ACCURACY .
FIGURE ( VIII )
GRAPH ( III )
Accuracy
100.00%
ACCURACY
95.00%
90.00%
85.00%
80.00%
75.00%
Tesseract Kraken OCR OCRopus Tesseract
4.0 5.0
FIG.VIII. READING OF THE WARNING USING READING MODELS
ASSISTANT .
[5] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, [14] A. Chaurasiya et al., "Realization of OpenCL based
C.-Y. Fu, and A. C. Berg, "SSD: Single shot multibox CNN Implementation on FPGA using SDAccel Platform,"
detector," in Proceedings of the European Conference on 2022 International Conference on Futuristic Technologies
Computer Vision, 2016. (INCOFT), Belgaum, India, 2022, pp. 1-5, doi:
10.1109/INCOFT55651.2022.10094321.
[6] Y. Ren, Y. Li, X. Zhang, and J. Sun, "Faster R-CNN: [15] A. P. S. Shekhawat, A. Chaurasiya, P. Chaurasiya, P. K.
Towards real-time object detection with region proposal Patel, P. Pal and S. K. Singh, "Realization of Smart and
networks," in Advances in Neural Information Processing Highly Efficient IoTbased Surveillance System using Facial
Systems, 201 Recognition on FPGA," 2022 International Conference on
Futuristic Technologies (INCOFT), Belgaum, India, 2022,
pp. 1-5, doi: 10.1109/INCOFT55651.2022.10094500.
[7] J. S. Rao, K. Choragudi, S. Bansod, S. C. Paidipalli V V,
S. K. Singh and P. Pal, "AI, AR Enabling on Embedded [16] P. Bhosle, P. Pal, V. Khobragade, S. K. Singh and P.
systems for Agricultural Drones," 2022 International Kenekar, "Smart Navigation System Assistance for Visually
Conference on Futuristic Technologies (INCOFT), Belgaum, Impaired People," 2022 International Conference on
India, 2022, pp. 1-4, doi: Futuristic Technologies (INCOFT), Belgaum, India, 2022,
10.1109/INCOFT55651.2022.10094383. pp. 1-5, doi: 10.1109/INCOFT55651.2022.10094458.
[8] R. H. Krishnan, B. A. Naik, G. G. Patil, P. Pal and S. K. [17] R. B. Dushing, S. A. Jagtap, P. Kumar, G. G. Patil, P.
Singh, "AI Based Autonomous Room Cleaning Bot," 2022 Pal and S. K. Singh, "Swarm Robotics for Ultra-Violet
International Conference on Futuristic Technologies Sterilization Robot," 2022 International Conference on
(INCOFT), Belgaum, India, 2022, pp. 1-4, doi: Futuristic Technologies (INCOFT), Belgaum, India, 2022,
10.1109/INCOFT55651.2022.10094492. pp. 1-5, doi: 10.1109/INCOFT55651.2022.10094477.