0% found this document useful (0 votes)
56 views12 pages

Virtual Assistant and Navigation For Visually Impa-1

New pdfs specially made for u darling

Uploaded by

whitedevil.x10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views12 pages

Virtual Assistant and Navigation For Visually Impa-1

New pdfs specially made for u darling

Uploaded by

whitedevil.x10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

IOT002 Virtual Reality Navigation Assistant for the visually impaired

A Project Report Submitted in Partial Fulfillment of the Requirement for the

Degree of

BACHELOR OF ENGINEERING

IN

COMPUTER SCIENCE ENGINEERING

Sri Balaji Chockalingam Engineering College, Arni

(Affiliated to Anna University)

( by AICTE, Accredited by N.B.A, NewDelhi)


SRI BALAJI CHOCKALINGAM ENGINEERING COLLEGE, ARNI

(Affiliated to Anna University)

(Approved by AICTE, Accredited by N.B.A, New Delhi)

CERTIFICATE

This is to certify that the Project Report entitled “IOT002 VIRTUAL REALITY
NAVIGATION ASSISTANT FOR THE VISUALLY IMPAIRED” that is being submitted
by

K.ELAKIYAN (512722104010)

V.JAIGANESH (512722104018)

P.YUVASANKAR (512722104059)

L.B.SRIKANTH (512722104043)

In partial fulfillment of the requirement for the award of B.E in Computer Science
Engineering in the SRI BALAJI CHOCKALINGAM ENGINEERING COLLEGE, ARNI.
Virtual Assistant and Navigation for Visually
Impaired using Deep Neural Network and Image
Processing

ABSTRACT

The quick development of technology promotes the use of the


resources that are at hand to make simpler daily tasks and
improve the standard and quality of life for those who are
blind. This module suggests creating a system of virtual
assistant glasses to help the visually impaired navigate around
them. An obstacle detection module built into the device uses
computer vision to find obstacles and inform the user
through haptic feedback. The system also has a text
recognition module that can turn any text it identifies into
speech that the user can hear using built-in speakers in the
user's glasses. Users are now able to access to printed
materials like menus and signs in a way that was not
previously feasible. A sign board recognition module, which
converts text on signs into speech, is also part of the system.
The proposed approach has the potential to enhance the
liberty and standard of existence of blind individuals through
integrating these attributes in a wearable device.
I. Introduction

Millions of people globally suffer with a visual impairment, which is an


extremely serious issue. In 2020, the World Health Organization (WHO)
forecasts that there will be 36 million blind persons in the world. There were
also 217 million people who have moderate to severe visual impairment.
Blindness can have numerous causes, but the most common ones include
cataracts, untreated refractive errors, glaucoma, and macular degeneration
caused by age.
Through the years, a variety of methods and gadgets have been designed that
can help visually impaired people in navigating their environments, obtaining
information there, and observing and avoiding hazards in the environment.
Electronic aids for navigation are effective for some extinct and difficult-to-use
challenges, but they can give more precise information about obstacles than
other approaches. Environmental elements like weather and electromagnetic
interference can also have an impact on electronic mobility aids.

White canes are a cheap and useful tool for detecting obstructions on the ground, but they
could miss obstructions above the ground, such signs or low-hanging branches. A white cane
also needs a lot of training to use correctly, and it might not warn you of obstructions well
enough to prevent mishaps. Tactile ground-based indicators (TGSIs), Guide Dogs, Ultrasound
systems, and infrared systems are examples of earlier known technologies with limitations.
Although audio feedback systems can be useful for spotting adjacent impediments, they could
not give enough details about the obstacle's type or location to allow for a safe avoidance.
Additionally, text reading and navigation modify audio feedback.

This journal is based on a project called Virtual Assistant and Navigation for Visually
Impaired using Deep-Neural Network and Image Processing that examines how modern
technology and the requirements of the blind may coexist. The development of deep neural
networks and virtual assistants has created new opportunities for enhancing the quality of life
for those who have visual impairments. The development that can offer virtual aid, navigation,
and text reading for the blind is the main focus of this publication. The use of smart glasses for
object detection can give the blind access to real-time environmental information. The glasses
can recognize items and people in the user's environment and provide them audio or tactile
feedback using a camera and deep learning algorithms.
The sight handicapped can use this to move through busy spaces or avoid obstructions.

A pair of smart glasses could also be used for reading aid. The glasses can read text from
books, signs, or other written items and convert it to speech or braille using OCR
(Optical Character Recognition) technology. The visually impaired may be able to read and
access information with more freedom as a result.

The Journal aims to bring together researchers, developers, and practitioners from a variety of
fields, such as computer science, engineering, psychology, and medicine. By exchanging
knowledge and expertise, we hope to accelerate progress in this crucial and exciting area of
research, and
subsequently improve the lives of millions of visually output is played through the speaker. The system is controlled
impaired people worldwide. by a tactile button that the user presses to take a picture and
initiate the OCR operation.
II. Existing System Finding and classifying things inside an image or a video are
As this work focuses on obstacle detection techniques and the tasks involved in object detection, a computer vision task.
virtual reading assistants, related work in those domains is Many computer vision applications, such as self-driving cars,
evaluated in this section. security systems, and image search engines, depend heavily
on object detection. Numerous phases of processing are often
Analyzing, modifying, and altering digital photographs in involved in object detection methods, including picture
order to extract important information or enhance their visual preprocessing, feature extraction, object proposal
appeal is known as image processing. acquisition of images development, object categorization, and object localization.
Pre-processing the processing Techniques like filtering, The region-based convolutional neural network (R-CNN)
segmentation, feature extraction, object recognition, or family, You Only Look Once (YOLO), and Single Shot
compression may be used in this. Edge detection for Detector (SSD) are three well-known object detection
thresholding, etc. Analysis after Post-Processing. techniques.
N. Chen et al. [1] presents a mobile assistive reading system J. Redmon, S. Divvala, R. Girshick, [4] and A. Farhadi
for visually impaired people that uses deep learning presents a unified, real-time object detection system based on
techniques for text recognition and text-to-speech conversion a deep convolutional neural network. The proposed algorithm,
. including the use of deep learning models such as named YOLO, divides an image into a grid and predicts
Convolutional Neural Networks (CNNs) and Long Short- bounding boxes and class probabilities for each grid cell. The
Term Memory (LSTM) networks for text recognition and system is capable of detecting objects in real-time with high
speech synthesis, respectively. He presented the accuracy and efficiency. They trained the YOLO model on the
implementation of the system using the TensorFlow PASCAL VOC 2012 dataset and testing on the COCO dataset.
framework and evaluates the performance of the system using W. Liu et al.[5] presented a new object detection algorithm
several metrics such as accuracy, recall, precision, and F1 based on deep learning. The proposed method, called SSD,
score. P. Sivakumar and A. Santhakumar [2] used a Raspberry uses a single convolutional neural network to directly predict
Pi as the hardware platform and implemented the image object bounding boxes and class probabilities in a single pass.
processing techniques using Python programming language This approach is designed to be both highly accurate and
and OpenCV library. The system takes input from a camera efficient, allowing for real-time object detection on a wide
module, and the captured image is processed through various range of platforms . They used computer with a GPU for
image processing techniques to extract the text information training and testing their model. They also used standard deep
.Some of the image processing techniques used in the learning libraries such as Caffe and TensorFlow. Ren et al [6]
proposed system include 1) Image Preprocessing 2) Character proposes a faster region-based convolutional neural network
Segmentation 3) Character Recognition 4)Text-to-speech (Faster R-CNN) for real-time object detection. The authors
Conversion. introduce a Region Proposal Network (RPN) that shares
An open-source software package called Tesseract OCR convolutional layers with the detection network, allowing for
(Optical Character Recognition) engine is used to extract text efficient end-to-end object detection. The RPN generates
from digital photographs. It analyses and finds patterns in region proposals and assigns objectness scores to each
photos that match to letters, numbers, and other characters proposal, which are then passed to the detection network for
using machine learning methods. Tesseract's OCR engine classification and bounding box regression. The Faster R-
functions in phases. The image is first preprocessed to reduce CNN achieves state-of-the-art performance on the PASCAL
noise, normalize text size, and boost contrast. After that, VOC and MS COCO datasets . THEY USED including a 12-
character segmentation is used to separate out specific core Intel Xeon CPU, an NVIDIA Titan X GPU, Caffe deep
characters from the image. The Tesseract OCR engine then learning framework, and the PASCAL VOC and MS COCO
use feature extraction algorithms to examine each character datasets.
individually and extract pertinent features, such as the Many contemporary systems, despite multiple tries, are
character's height, width, and shape. The characters are then unable to fully combine elements like accuracy and
compared to a pre-existing lexicon of characters and words computational resources. For those who are blind or visually
using these attributes. Finally, the engine uses statistical
impaired, the suggested reading virtual assistant system
models to assess the probability that various character
sequences could combine to form words. It priorities the most provides a more effective and user-friendly solution. It also
likely string of characters that corresponds to the supplied has the added advantage of having object detection
image using language models. capabilities with increased efficiency, improved user
experience, and contextual awareness.
A. Prasad and S. P. Singh [3] proposed a portable reading
assistant for the visually impaired people. The system III. Proposed System
comprises of a Raspberry Pi computer, Pi camera, speaker,
and a tactile button. The software used in this system includes Our primary contribution to this project was the development
OpenCV, Tesseract OCR, and eSpeak speech synthesizer. of a straightforward, transportable, hands-free ETA prototype
The Pi camera is used to take a picture of the text, which is with text-to-speech conversion features for basic, everyday
then processed by OpenCV to create a better image. The indoor and outdoor use.
Tesseract OCR engine is then given the processed image to
extract the text from it. The eSpeak speech synthesizer is then
used to turn the recognized text into speech, and the audio
In this article, we suggest a cutting-edge visual aid solution
for people who are absolutely blind. The following are the FIGURE( I )
distinctive qualities that determine the originality of the
suggested design:-

• A reading assistant built into the hands-free,


wearable, low-power, and small design that can be
mounted on a pair of spectacles for indoor and
outdoor navigation.
• Low-end setup processing of complex algorithms.
• Accurate, real-time distance measuring using
cameras, which simplifies the design and decreases
the cost by lowering the number of sensors needed.

A Raspberry Pi board, a camera module, a specially designed


camera holder that was 3D printed, and smart glasses make
up the virtual assistant system. The Raspberry Pi board's
camera module is used to take pictures of printed text, which
are subsequently edited using image-processing software.
The image processing procedures utilized to extract the text
sections from the collected images include preprocessing,
binarization, noise removal, and segmentation. The text is
subsequently converted into digital format by an OCR
machine using the extracted text regions as input. The Fig.I. PROPOSED PROTOTYPE FLOW MODEL OF VIRTUAL
matching output is then produced by a text-to-speech or ASSISTANT GLASSES.
Braille conversion system using the digital text.
TABLE ( I )
To increase the precision and effectiveness of the virtual
assistant, we integrate object detection using deep neural Model Existing System Proposed System
networks to enable the detection of certain items, such a page
of a book. We employ a real-time object detection method Design Typically bulky Portable and
dubbed YOLO (You Only Look Once), which is a deep and stationary hands-free
neural network architecture. YOLO operates by dividing the Object Typically not Incorporates
image into a grid and calculating the likelihood that each grid Detection included object detection
cell will contain an object. YOLO is capable of real-time using deep-neural
performance and can recognize several items in a single networks to enable
image. To enable the detection of individual objects, we train the detection of
the YOLO network on a dataset of different printed materials, specific objects
such as books, magazines, and newspapers. Text Relies on OCR Utilizes image
Recognition technology, processing
An OCR system is used to convert the text from the extracted which can be techniques for
text regions into digital representation. Tesseract, an open- limited in improved accuracy
source OCR engine, is the foundation of the OCR technology accuracy and and efficiency
employed by the reading aid. Tesseract may be trained to efficiency
recognize bespoke fonts and languages in addition to a User Interface Often requires a Utilizes the glasses
variety of text sizes and typefaces. In order to increase physical for a hands-free
accuracy, the OCR system also uses post-processing methods interface such as and intuitive
including text normalization and spell checking. buttons or a interface
keyboard
Depending on the user's preferences, the OCR system Table.I. DIFFERENCE BETWEEN EXISTED SYSTEM AND
PROPOSED SYSTEM.
converts the digital text it produces into speech or another
format. The text-to-speech conversion tool reads the user's
text aloud using a synthesized voice. The Raspberry Pi board
Overall, compared to current reading assistant models, the
includes integrated text-to-speech features that are meant to
suggested reading assistant for the blind offers substantial
offer a seamless user experience.
enhancements in terms of portability, precision, and
effectiveness. Additionally, the proposed model is a more
understandable and practical solution for blind people due to
the integration of object identification utilizing deep neural
networks and the usage of smart glasses for a hands-free
interface.
• Image resizing: To enable quicker processing and demand
less memory, the acquired photographs are downsized to a
IV. BLOCK DIAGRAM fixed size.
• Image enhancement: To increase the contrast and
brightness of the text portions, the collected photographs are
FIGURE ( II ) enhanced.
• Noise removal: Artefacts and noise that could obstruct
OCR and object detection algorithms are removed from the
collected photos.
• Binarization: To enable segmentation and the extraction of
the text sections, the acquired images are transformed to
binary format.
• Usability: Reading aid glasses should be straightforward to
use, with clear controls and user-friendly software.

The system is characterized as employing a Raspberry Pi


camera to take a text image, then using a mix of the Tesseract
Fig.II. FLOW MODEL OF VIRTUAL ASSISTANT FOR OBJECT
OCR engine, OpenCV, and eSpeak speech synthesizer,
DETECTION turning the text into speech.
Here is a more thorough breakdown of each action:
FIGURE ( III )
Detecting the objects: This is the initial phase. The process
of locating and identifying items in a still or moving image is
known as object detection. It is an essential task in computer
vision with several applications in robotics, surveillance, and
other fields. Using deep learning models, like the YOLOv5
model, is one of the common methods for object detection.

A single convolutional neural network is used by the cutting-


edge real-time object detection system YOLOv5 to identify
objects in pictures and videos.
The initial step in using YOLOv5 for object identification is
loading the pre-trained model. The PyTorch library can be
used to download and load the model. The input image is
Fig.III. FLOW MODEL OF VIRTUAL ASSISTANT FOR OBJECT preprocessed after the model has been loaded by being
DETECTION
resized to a fixed size and having the pixel values normalized
to a range between 0 and 1.
FIGURE ( IV )
The input image is preprocessed before being fed into the
YOLOv5 model to produce the output. The result is a
collection of bounding boxes, each with a confidence score
and class title. These bounding boxes depict the things that
were found in the source image.
The bounding boxes, together with the matching class label
and confidence score, are drawn on the input image in order
to visualize the outcome. Filtering out the bounding boxes
with low confidence ratings is done using the threshold value.
Displaying the output image or saving it to disc is the last
step.

Image capture: The second step entails taking a picture of


the text using a Raspberry Pi camera. The Raspberry Pi board
Fig.IV. VARIOUS TYPES OF VIRTUAL ASSISTANTS FOR
VISUALLY IMPAIRED PEOPLE USING SMART GLASSES.
is attached to the camera, which may be instructed in Python
to capture still photos or record videos. We employ the image
produced by the object detection algorithm in this system.
V. METHODOLOGY Once an image has been acquired, it is processed using
OpenCV. For image processing, a popular open-source
The collected photos are preprocessed to increase the computer vision library is called OpenCV. Due to camera
precision of the algorithms before OCR and object detection. movement, bad lighting, or other circumstances, the obtained
The preprocessing methods comprise: image could be distorted. The image can be made better by
using OpenCV to reduce noise, boost contrast, and tweak
brightness and color. The OCR engine receives the processed
image next to perform text recognition. Convolutional Layers: The input image is passed through a
series of convolutional layers, which apply a set of filters to
Optical Character Recognition (OCR): The text in the the input image to extract features. Each convolutional layer
image will be recognized using the Tesseract OCR engine in is represented by the following equation:
the following step. Google created the open-source OCR
engine tesseract. It has a good accuracy rate and can 𝒀𝒊 = f ( 𝑾𝒊 * 𝒙𝒊 + 𝒃𝒊 ) (1)
recognize text from photos in many different languages.
where 𝑥𝑖 is the input tensor, Wi is the weight tensor for the
Optical Character Recognition (OCR): The text in the i-th convolutional layer, bi is the bias tensor for the i-th
image will be recognized using the Tesseract OCR engine in convolutional layer, and f is the activation function (e.g.,
the following step. Google created the open-source OCR ReLU).
engine tesseract. It has a good accuracy rate and can Down sampling: The feature maps produced by the
recognize text from photos in many different languages. The convolutional layers are down sampled using max pooling
engine operates by deciphering characters and sentences in layers to reduce the spatial dimensions of the feature maps
the image and analyzing it. while preserving the most important features. The down
sampling operation is represented by the following equation:
Conversion of text to speech: After the text has been
identified by the OCR engine, it is sent to the eSpeak speech 𝒚𝒊,𝒋,𝒌 = max(𝒙𝒊 , i=1,...,N, j*𝒔𝒕𝒓𝒊𝒅𝒆𝒉≤𝒊≤(𝒋+𝟏) *𝒔𝒕𝒓𝒊𝒅𝒆𝒉 ,
synthesizer for speech synthesis. A small, open-source k*𝒔𝒕𝒓𝒊𝒅𝒆𝒘≤𝒊≤(𝒋+𝟏) * 𝒔𝒕𝒓𝒊𝒅𝒆𝒘 (2)
speech synthesizer called eSpeak can generate speech in a
number of different languages. It can be used to produce where xi is the input tensor, yi,j,k is the output tensor, N is the
speech from inputted phonemes or from text. The technology
number of input channels, strideh and stridew are the
turns the recognized text into audio using eSpeak.
vertical and horizontal strides, respectively, and max is the
maximum pooling operation
Text input: The engine accepts text input from a number of
sources, including dictated or typed text, emails, and web
Concatenation: The feature maps produced by the down
information.
sampling layers are concatenated with the feature maps
produced by the corresponding convolutional layers to
Text analysis: The engine analyses the text using natural
produce a set of high-resolution feature maps that capture
language processing techniques to identify words and
both low-level and high-level features of the input image. The
phrases, as well as the proper intonation and stress patterns.
concatenation operation is represented by the following
equation:
Speech synthesis: The engine transforms the phonemes and
prosody into a digital audio waveform using a speech
𝒁𝒊 = concat( 𝒚𝒊 , 𝒙𝒊 ) (3)
synthesis method.
where 𝑦𝑖 is the output tensor of the down
Audio output: Playing the audio output through speakers is
sampling layer and 𝑥𝑖 is the output tensor of the
the last stage. The Raspberry Pi board's speaker receives the
corresponding convolutional layer.
audio output and plays it back. The recognized text is then
audible to the user as speech.
𝒚𝒊 = f ( 𝑾𝒊 * 𝒙𝒊 + 𝒃𝒊 ) (4)
The system is controlled by a tactile button that the user can 𝒕𝒙 = sigmoid(𝒚𝒊 , 𝒕𝒙 ) (5)
press to take a picture and launch the OCR procedure. When 𝒕𝒚 = sigmoid(𝒚𝒊 , 𝒕𝒚 ) (6)
the user hits the button, the system takes a picture, runs 𝒕𝒘 = exp(𝒚𝒊 , 𝒕𝒘 ) (7)
OpenCV on it, uses Tesseract OCR to recognize the text, 𝒕𝒉 = exp(𝒚𝒊 , 𝒕𝒉 ) (8)
turns the text into speech, and outputs the audio through the 𝒕𝒄 = sigmoid𝒚𝒊 , 𝒕𝒄 ) 9)
speaker.
where 𝑦𝑖 is the input tensor, 𝑊𝑖 is the weight tensor, 𝑏𝑖 is the
bias tensor, f is the activation function (e.g., ReLU), 𝑡𝑥 , 𝑡𝑦 ,
VI. STIMULATION RESULTS 𝑡𝑤 , 𝑡ℎ , and 𝑡𝑐 are the predicted coordinates and confidence
score for the detected object, sigmoid is the sigmoid
activation function, and exp is the exponential function.
The YOLOv5 model can be represented mathematically
using a series of equations that describe the various Non-Maximum Suppression: The output of the detection
components of the architecture. Here is a high-level overview layers is processed using non-maximum suppression (NMS)
of the YOLOv5 model equations: to remove duplicate detections and select the most confident
detections for each object class.
Input: The input to the YOLOv5 model is an image
represented as a tensor of size (C, H, W), where C is the Overall, the YOLOv5 model is a complex deep neural
number of channels (e.g., 3 for RGB images), H is the height network that uses a combination of convolutional
of the image, and W is the width of the image.
layers,down sampling layers, concatenation, and detection
layers to detect and localize objects in an input image.

FIGURE ( V ) TABLE ( II )

Model mAP@0.5 mAP@0.5:0.95 FPS


YOLOv5
(Proposed 0.509 0.301 140
System)
EfficientDet-D7 0.5 0.7 9
Faster R-CNN
0.402 0.624 7
ResNet-101
SSD ResNet-101 0.382 0.582 22
Retina Net
Fig.V. OBJECT DETECTION OF PERSON AND CELL PHONE. 0.38 0.567 6
ResNet-101
FIGURE ( VI ) TABLE.II. REPRESENTS THE MEAN AVERAGE PRECISION
INTERSECTION OVER UNION WITH FRAMES PER SECOND.

GRAPH ( II )

FPS
160 140
FRAMES PER SECOND

140
120
100
80
60
40 22
Fig.VI. OBJECT DETECTION OF PERSON AND BOTTLE. 9 7 6
20
0
FIGURE ( VII )

MODELS
GRAPH.II. COMPARISION OF VARIOUS MODELS WITH FRAMES
PER SECOND.

mAP@0.5 represents the mean Average Precision at an


Intersection over Union (IoU) threshold of 0.5, which is a
commonly used metric for object detection accuracy.
mAP@0.5:0.95 represents the mAP over a range of IoU
thresholds from 0.5 to 0.95.
Fig.VII. OBJECT DETECTION OF PERSON AND CLOCK.
The mathematical equation for an OCR model can be
Here, are some comparisons of various models for object expressed as follows:
detections : y = f(x;θ) (10)
GRAPH ( 1 )
where y represents the output text, x represents the input
image, and θ represents the parameters of the OCR model.
The function f represents the mapping between the input
image and the output text.
Optical Character Recognition (OCR) involves various
mathematical equations that describe the different
components of the OCR system.
Image Preprocessing: OCR starts with preprocessing the
input image to enhance the quality of the image and remove
noise. Some of the mathematical operations used in image
preprocessing include scaling, binarization, and noise
reduction. These operations can be represented by the
GRAPH.I. COMPARISION OF VARIOUS MODELS WITH THRESOLD following equations:
I' = scale(I) (11) FIGURE ( IX )
I' = binarize(I) (12)
I'= reduce_noise(I) (13)

where I is the input image and I' is the preprocessed image.


Segmentation: The preprocessed image is segmented into
individual characters or words. The segmentation operation
can be represented by the following equation:

S = segment(I') (14)

where S is the set of segmented characters or words.


Feature Extraction: Features are extracted from each
segmented character or word to represent its shape and FIG.IX. READING OF ROAD SIGN USING READING ASSISTANT .
texture. The feature extraction operation can be represented
by the following equation: FIGURE ( X )

F = extract_features(S) (15)

where F is the set of feature vectors representing each


segmented character or word.
Classification: The feature vectors are classified into the
corresponding characters or words using a machine learning
algorithm. The classification operation can be represented by
the following equation:

C = classify(F) (16)

where C is the set of classified characters or words. FIG.X. READING OF PUBLIC SIGN BOARD USING READING
Post-processing: The output of the OCR engine is post- ASSISTANT .
processed to correct errors and improve the accuracy of the
recognized text. The post-processing operation may include Here, is the accuracy comparision of various models for Text-
spell-checking, grammar checking, and formatting. The post- to-Speech :
processing operation can be represented by the following
equation: TABLE( III )
T' = postprocess(T) (17)
Model Accuracy
where T is the recognized text and T' is the corrected text. Tesseract 4.0 85.50%
Overall, OCR involves a combination of image processing,
feature extraction, machine learning, and post-processing Kraken OCR 98.30%
techniques to recognize text in an image. The accuracy of the OCRopus 92.90%
OCR engine depends on the quality of the input image, the
segmentation algorithm, the feature extraction algorithm, and Tesseract 5.0 (Proposed System) 98.52%
the classification algorithm used. TABLE.III. REPRESENTS THE COMPARISION OF VARIOUS
MODELS WITH THEIR ACCURACY .
FIGURE ( VIII )
GRAPH ( III )

Accuracy
100.00%
ACCURACY

95.00%
90.00%
85.00%
80.00%
75.00%
Tesseract Kraken OCR OCRopus Tesseract
4.0 5.0
FIG.VIII. READING OF THE WARNING USING READING MODELS
ASSISTANT .

GRAPH.III. THE ACCURACY OF VARIOUS MODELS


GRAPHICALLY.
VII. CONCLUSION
Based to the study's results, visually impaired people may
find the suggested virtual assistant and navigation system to
be a beneficial tool. The technology can increase the virtual
assistant's precision and effectiveness because it is capable of
precise distance measuring, object detection, and image
processing. The cost and complexity of the system are
decreased by using YOLO for object detection and transfer
learning for neural network training on a smaller dataset. The
deep neural networks and image processing techniques have
made it possible for the virtual assistant to identify items,
increasing the user's independence and mobility. Virtual
assistants and image processing techniques could be
integrated in a variety of industries, including healthcare and
transportation, thanks to technological advancements.

The Tesseract open-source OCR engine, which can recognize


text in a variety of fonts, sizes, and languages, is the
foundation of the OCR technology used in the reading aid.
Increased accessibility for people with visual impairments,
increased productivity when performing tasks that require
reading large amounts of text, and increased engagement
with digital content such as virtual assistants,
educational materials, and customer service systems are all
advantages of text-to-speech engines.

The system's performance is assessed using a variety of


datasets and metrics, and the findings demonstrate high
object detection, text extraction, and conversion efficiency.
Given its low cost and straightforward sensors, the suggested
system has the potential to be extensively adopted in the
consumer market and can considerably enhance the travelling
experience for people with visual impairments. For users
with varying degrees of vision impairment, the system's
auditory cues—such as beep noises and visual
enhancement techniques—further increase usability and
effectiveness.

Overall, the proposed virtual assistant and navigation system


is a promising approach to addressing the challenges faced
by visually impaired people in their daily lives and has
the potential to significantly enhance the quality of life
of visually impaired people by providing them with a
basic, portable, and hands-free ETA prototype for basic,
everyday indoor and outdoor use.
IX. REFERENCES [10] A. P. Sonawane, J. S. Pradhan, V. P. Waghmare, S.
Kesari, S. K. Singh and P. Pal, "Complete Data Transmission
using Li-Fi Technology with Visible Light Communication,"
[1] N. Chen, Y. Zhang, L. Wang, Y. Zhang, and Z. Huang,
"A mobile assistive reading system for visually impaired 2022 International Conference on Futuristic Technologies
people using deep learning," Multimedia Tools and (INCOFT), Belgaum, India, 2022, pp. 1-5, doi:
Applications, vol. 79, no. 9-10, pp. 6353- 6373, May 2020. 10.1109/INCOFT55651.2022.10094453.

[11] M. K. Mali et al., "Evaluation and Segregation of Fruit


[2] P. Sivakumar and A. Santhakumar, "Assistive device Quality using Machine and Deep Learning
for visually impaired people using image processing Techniques," 2022 International Conference on Futuristic
techniques," International Journal of Engineering and Technologies (INCOFT), Belgaum, India, 2022, pp. 1-8, doi:
Advanced Technology (IJEAT), vol. 9, no. 2, pp. 2382- 10.1109/INCOFT55651.2022.10094447.
2388, Dec. 2019.
[12] S. K. Singh and A. Kumar, "Modified design of STBC
Encoder for reducing Non-Linear distortions in OFDM
[3] A. Prasad and S. P. Singh, "Design of a portable Channel Estimation," 2022 Second International Conference
reading assistant for the visually impaired," International on Advances in Electrical, Computing, Communication and
Journal of Innovative Research in Computer and Sustainable Technologies (ICAECT), Bhilai, India, 2022, pp.
Communication Engineering, vol. 4, no. 6, pp. 8719-8724, 1-5, doi: 10.1109/ICAECT54875.2022.9807993.
Jun. 2016.
[13] K. Baidya, A. J. R. Dampella, V. V. Surya Charan
Paidipalli, S. Bansod, S. K. Singh and P. Pal, "Pesticides
[4] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi,
Spraying Using Non-GPS-Based Autonomous Drone," 2022
"You only look once: Unified, real-time object detection,"
in Proceedings of the IEEE Conference on Computer International Conference on Futuristic Technologies
Vision and Pattern Recognition, 2016 (INCOFT), Belgaum, India, 2022, pp. 1-5, doi:
10.1109/INCOFT55651.2022.10094368.

[5] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, [14] A. Chaurasiya et al., "Realization of OpenCL based
C.-Y. Fu, and A. C. Berg, "SSD: Single shot multibox CNN Implementation on FPGA using SDAccel Platform,"
detector," in Proceedings of the European Conference on 2022 International Conference on Futuristic Technologies
Computer Vision, 2016. (INCOFT), Belgaum, India, 2022, pp. 1-5, doi:
10.1109/INCOFT55651.2022.10094321.

[6] Y. Ren, Y. Li, X. Zhang, and J. Sun, "Faster R-CNN: [15] A. P. S. Shekhawat, A. Chaurasiya, P. Chaurasiya, P. K.
Towards real-time object detection with region proposal Patel, P. Pal and S. K. Singh, "Realization of Smart and
networks," in Advances in Neural Information Processing Highly Efficient IoTbased Surveillance System using Facial
Systems, 201 Recognition on FPGA," 2022 International Conference on
Futuristic Technologies (INCOFT), Belgaum, India, 2022,
pp. 1-5, doi: 10.1109/INCOFT55651.2022.10094500.
[7] J. S. Rao, K. Choragudi, S. Bansod, S. C. Paidipalli V V,
S. K. Singh and P. Pal, "AI, AR Enabling on Embedded [16] P. Bhosle, P. Pal, V. Khobragade, S. K. Singh and P.
systems for Agricultural Drones," 2022 International Kenekar, "Smart Navigation System Assistance for Visually
Conference on Futuristic Technologies (INCOFT), Belgaum, Impaired People," 2022 International Conference on
India, 2022, pp. 1-4, doi: Futuristic Technologies (INCOFT), Belgaum, India, 2022,
10.1109/INCOFT55651.2022.10094383. pp. 1-5, doi: 10.1109/INCOFT55651.2022.10094458.

[8] R. H. Krishnan, B. A. Naik, G. G. Patil, P. Pal and S. K. [17] R. B. Dushing, S. A. Jagtap, P. Kumar, G. G. Patil, P.
Singh, "AI Based Autonomous Room Cleaning Bot," 2022 Pal and S. K. Singh, "Swarm Robotics for Ultra-Violet
International Conference on Futuristic Technologies Sterilization Robot," 2022 International Conference on
(INCOFT), Belgaum, India, 2022, pp. 1-4, doi: Futuristic Technologies (INCOFT), Belgaum, India, 2022,
10.1109/INCOFT55651.2022.10094492. pp. 1-5, doi: 10.1109/INCOFT55651.2022.10094477.

[9] T. P. Pardhe, Y. M. Jogdande, S. B. Landge, Y. Kumar,


S. K. Singh and P. Pal, "Alcohol Detection and Traffic Sign
Board Recognition for Vehicle Acceleration Using
CNN," 2022 4th International Conference on Circuits,
Control, Communication and Computing (I4C), Bangalore,
India, 2022, pp. 519-523, doi:
10.1109/I4C57141.2022.10057863.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy