Voice Recogination Report
Voice Recogination Report
Submitted By :
Vivek Kumar Mishra
20scse1010603
CANDIDATE’S DECLARATION
I hereby certify that the work which is being presented in the project, entitled “AI BASED -
VIRTUAL ASSISTANT {GENESIS}” in partial fulfillment of the requirements for the award of
B.tech the submitted in the School of Computing Science and Engineering of Galgotias University,
Greater Noida, , Department of Computer Science and Engineering, Galgotias University, Greater
Noida.
The matter presented in the project has not been submitted by me/us for the award of any other
degree of this or any other places.
Vivek Kumar Mishra
20scse1010603
This is to certify that the above statement made by the candidates is correct to
the best of my knowledge.
Dr. Suman Devi
CERTIFICATE
The Final Project Viva-Voce examination of Vivek Kumar Mishra (20scse1010603) has been held
on and his/her work is recommended for the award of
B.Tech.
Date:
Place: Greater Noida
Table of Content
Abstract
1. Introduction
1.1 Motivation behind this Project
1.2 Objectives
1.3 Purpose, Scope & Applicability
2. Literature Survey
3. System Architecture
5. System Design
5.1 Activity Diagram
6. Result
6.1 Methodology
6.2 Code
6.3 Output
7. Conclusion
8. References
Abstract
In recent years, virtual assistants have become increasingly popular due to their
ability to provide helpful information and assist users in various tasks. This project
aims to develop a virtual assistant that can interact with users, understand their
queries, and respond with appropriate information.
The virtual assistant is designed to be user-friendly and intuitive, making it easy for
users to access the information they need. It also learns from user interactions,
improving its responses over time. The assistant use natural language processing
(NLP) techniques to understand user queries and respond with relevant
information. It performs numerous tasks, such as it apprises about the date and
time,searching wikipedia, open youtube, google and stackoverflow, playing music.
The virtual assistant is developed using machine learning algorithms and cloud-
based technologies, ensuring scalability and reliability The project also include an
intuitive user interface, making it easy for users to interact with the virtual
assistant. Overall, this project aims to develop a powerful and user-friendly virtual
assistant that can assist users with various tasks and provide them with useful
information in an efficient and effective manner.
1. INTRODUCTION
In today’s era almost all tasks are digitalized. We have Smartphone in hands and it is
nothing less than having world at your finger tips. These days we aren’t even using
fingers. We just speak of the task and it is done. There exist systems where we can
say Text Dad, “I’ll be late today.” And the text is sent. That is the task of a Virtual
Assistant. It also supports specialized task such as booking a flight, or finding
cheapest book online from various ecommerce sites and then providing an interface
to book an order are helping automate search, discovery and online order operations.
Virtual Assistants are software programs that help you ease your day to day tasks,
such as showing weather report, creating reminders, making shopping lists etc. They
can take commands via text (online chat bots) or by voice. Voice based intelligent
assistants need an invoking word or wake word to activate the listener, followed by
the command. For my project the wake word is GENESIS.
We have so many virtual assistants, such as Apple’s Siri, Amazon’s Alexa and
Microsoft’s Cortana. For this project, wake word was chosen GENESIS. This system
is designed to be used efficiently on desktops. Personal assistant software improves
user productivity by managing routine tasks of the user and by providing information
from online sources to the user. GENESIS is effortless to use. Call the wake word
‘GENESIS’ followed by the command. And within seconds, it gets executed.
Virtual assistants are turning out to be smarter than ever. Allow your intelligent
assistant to make email work for you. Detect intent, pick out important information,
automate processes, and deliver personalized responses.
This project was started on the premise that there is sufficient amount of openly
available data and information on the web that can be utilized to build a virtual
assistant that has access to making intelligent decisions for routine user activities.
1.1 MOTIVATION BEHIND THIS PROJECT
Voice search is 3.7x faster than typing and takes lesser energy too. With a voice AI
assistant, your users are able to explain an issue in detail, which helps to assess the
problem at hand better.
Unlike text, voice AI bot understands emotions and intents to ensure transparent
communication with the user.
Your customer prefers frictionless and smooth customer service. Voice AI makes
this possible. Interacting and getting support through a voice AI assistant is hands-
free. All your users need to do is speak to the assistant and get their queries answered
effortlessly.
Voice AI also lets your customers multi-task. Since they don’t really need to hold a
device to contact customer support, users can focus on other tasks and save time.
Speed, precision, and convenience are key aspects users look for in customer support
– which voice technology fulfils.
3. Higher customer satisfaction
The kind of ease and simplicity voice AI brings makes it quite popular among
people. On average, 80% of people who shop using a voice AI assistant are satisfied
with their experience.
A voice AI assistant can also give them speedy resolutions if they are stuck
somewhere during the process. Subsequently, this leads to lower cart abandonment
rates and improved customer satisfaction – which looks great to your untapped
market!
4. Two-way communication
More often than not, conventional communication with emails and call centre IVRs
leaves consumers impatient and dissatisfied. This is because there is no natural flow
or consistency in conversations between a business and the customer.
Voice AI technology is sharp. It keeps a track and record of any important data
provided by customers on your platform during the interaction.
Your voice AI assistant can store and utilize this information and the previously
entered data to offer accurate and relevant suggestions. Where humans are prone to
forget things, voice technology-based chatbots have a record of minute details.
No loss in data also equips you with better consumer insights. Voice AI makes it
easier for you to assess, understand, and tailor your products and services according
to different cohorts.
1.2 OBJECTIVES
Virtual assistants can tremendously save you time. We spend hours in online
research and then making the report in our terms of understanding. JIA can do that
for you. Provide a topic for research and continue with your tasks while JIA does
the research. Another difficult task is to remember test dates, birthdates or
anniversaries. It comes with a surprise when you enter the class and realize it is
class test today. Just tell JIA in advance about your tests and she reminds you well
in advance so you can prepare for the test.
One of the main advantages of voice searches is their rapidity. In fact, voice is
reputed to be four times faster than a written search: whereas we can write about40
words per minute, we are capable of speaking around 150 during the same period
of time15. In this respect, the ability of personal assistants to accurately recognize
spoken words is a prerequisite for them to be adopted by consumers.
1.3 PURPOSE, SCOPE AND APPILCABILITY
Purpose
Scope
Voice assistants will continue to offer more individualized experiences as they get
better at differentiating between voices. However, it’s not just developers that need
to address the complexity of developing for voice as brands also need to
understand the capabilities of each device and integration and if it makes sense for
their specific brand. They will also need to focus on maintaining a user experience
that is consistent within the coming years as complexity becomes more of a
concern. This is because the visual interface with voice assistants is missing. Users
simply cannot see or touch a voice interface.
Applicability
The mass adoption of artificial intelligence in users’ everyday lives is also fueling
the shift towards voice. The number of IoT devices such as smart thermostats and
speakers are giving voice assistants more utility in a connected user’s life. Smart
speakers are the number one way we are seeing voice being used. Many industry
experts even predict that nearly every application will integrate voice technology
in some way in the next 5 years. The use of virtual assistants can also enhance the
system of IoT (Internet of Things). Twenty years from now, Microsoft and its
competitors will be offering personal digital assistants that will offer the services
of a full-time employee usually reserved for the rich and famous.
2. REQUIREMENT AND ANALYSIS
We already have multiple virtual assistants. But we hardly use it. There are
number of people who have issues in voice recognition. These systems can
understand English phrases but they fail to recognize in our accent. Our way of
pronunciation is way distinct from theirs. Also, they are easy to use on mobile
devices than desktop systems. There is need of a virtual assistant that can
understand English inIndian accent and work on desktop system.
When a virtual assistant is not able to answer questions accurately, it’s because it
lacks the proper context or doesn’t understand the intent of the question. Its ability
to answer questions relevantly only happens with rigorous optimization, involving
both humans and machine learning. Continuously ensuring solid quality control
strategies will also help manage the risk of the virtual assistant learning undesired
bad behaviors. They require large amount of information to be fed in order for it to
work efficiently.
Virtual assistant should be able to model complex task dependencies and use these
models to recommend optimized plans for the user. It needs to be tested for
finding optimum paths when a task has multiple sub-tasks and each sub-task can
have its own sub-tasks. In such a case there can be multiple solutions to paths, and
the it should be able to consider user preferences, other active tasks, priorities in
order to recommend a particular plan.
2.2 REQUIREMENT SPECIFICATION
Personal assistant software is required to act as an interface into the digital world
by understanding user requests or commands and then translating into actions or
recommendations based on agent’s understanding of the world.
GENESIS focuses on relieving the user of entering text input and using voice as
primary means of user input. Agent then applies voice recognition algorithms to
this input and records the input. It then use this input to call one of the personal
information management applications such as task list or calendar to record a new
entry or to search about it on search engines like Google, Bing or Yahoo etc.
Focus is on capturing the user input through voice, recognizing the input and
thenexecuting the tasks if the agent understands the task. Software takes this input
in natural language, and so makes it easier for the user to input what he or she
desiresto be done.
Voice recognition software enables hands free use of the applications, lets users to
query or command the agent through voice interface. This helps users to have
access to the agent while performing other tasks and thus enhances value of the
system itself. GENESIS also have ubiquitous connectivity through Wi-Fi or LAN
connection, enabling distributed applications that can leverage other APIs exposed
on the web without a need to store them locally.
• Providing information such as weather, facts from e.g. Wikipedia etc. • Set an
alarm or make to-do lists and shopping lists.
Feasibility study can help you determine whether or not you should proceed with
your project. It is essential to evaluate cost and benefit. It is essential to evaluate
cost and benefit of the proposed system. Five types of feasibility study are taken
into consideration.
1. Technical feasibility: It includes finding out technologies for the project, both
hardware and software. For virtual assistant, user must have microphone to convey
their message and a speaker to listen when system speaks. These are very cheap
now a days and everyone generally possess them. Besides, system needs internet
connection. While using JIA, make sure you have a steady internet connection. It
is also not an issue in this era where almost every home or office has Wi-Fi.
3. Economical feasibility: Here, we find the total cost and benefit of the proposed
system over current system. For this project, the main cost is documentation cost.
User also would have to pay for microphone and speakers. Again, they are cheap
and available. As far as maintenance is concerned, JIA won’t cost too much.
Hardware:
• Pentium-pro processor or later.
• RAM 512MB or more.
Software:
• Windows 7(32-bit) or above.
• Python 2.7 or later
• Chrome Driver
• Selenium Web Automation
• SQLite
3.1 ACTIVITY DIAGRAM
Initially, the system is in idle mode. As it receives any wake up call it begins
execution.
The class user has 2 attributes command that it sends in audio and the response it
receives which is also audio. It performs function to listen the user command.
Interpret it and then reply or sends back response accordingly. Question class has
the command in string form as it is interpreted by interpret class. It sends it to
generalor about or search function based on its identification.
The task class also has interpreted command in string format. It has various
functions like reminder, note, mimic, research and reader.
3.3 USE CASE DIAGRAM
In this project there is only one user. The user queries command to the
system. Systemthen interprets it and fetches answer. The response is sent
back to the user.
3.4 SEQUENCE DIAGRAM
The above sequence diagram shows how an answer asked by the user is
being fetched from internet. The audio query is interpreted and sent to
Web scraper. The web scraper searches and finds the answer. It is then
sent back to speaker, where it speaks the answer to user.
Sequence diagram for Task Execution
The user sends command to virtual assistant in audio form. The command is
passed to the interpreter. It identifies what the user has asked and directs it to task
executer. If the task ismissing some info, the virtual assistant asks user back
aboutit. The received information is sent back to task and it is accomplished.
After execution feedback is sent back to user.
3.5 DATA FLOW DIAGRAM
DFD Level 1
DFD Level 2
Settings of
virtual
Assistance
3.6 COMPONENT DIAGRAM
The main component here is the Virtual Assistant. It provides two specific service,
executing Task or Answering your question.
4. LITERATURE SURVEY
Again, in the paper “On the track of Artificial Intelligence: Learning with Intelligent
Personal Assistant” by Nil Goksel and all, the potential use of intelligent personal
assistants (IPAs) which use advanced computing technologies and Natural Language
Processing (NLP) for learning is being examined. Basically, they have reviewed the
working system of IPAs within the scope of AI [4].
The application of voice assistants has been taken to some higher level in the paper
“Smart Home Using Internet of Things” by Keerthana S and all where they have
discussed how the application of smart assistants can lead to developing a smart
home system using Wireless Fidelity (Wi-Fi) and Internet of Things. They have used
CC3200MCU that has in-built Wi-Fi modules and temperature sensors. The
temperature that is sensed by the temperature sensor is sent to the microcontroller
unit (MCU) which is then posted to a server and using that data the status of
electronic equipment like fan, light etc is monitored and controlled [5].
The application of voice assistants has been beautifully discussed in the paper “An
Intelligent Voice Assistant Using Android Platform'' by Sutar Shekhar and all where
they have stressed on the fact that mobile users can perform their daily task using
voice commands instead of typing things or using keys on mobiles. They have also
used a prediction technology that will make recommendations based on the user
activity [6].
We also studied the systems developed by Google Text To Speech – Electric Hook
Up (GTTS-EHU) for Query-by- example Spoken Term Detection (QbE-STD) and
Spoken Term Detection (STD) tasks of the Albayz in 2018 Search on Speech
Evaluation. For representing audio documents and spoken queries Stacked bottleneck
features (sBNF) are used as frame level acoustic representation. Spoken queries are
synthesized, average of sBNF representations is taken and then the average query is
used for Qbe- STD [8].
Basic Workflow
The figure below shows the workflow of the main method of voice assistant. Speech
recognition is used to convert speech input to text. This text is then sent to the
processor, which determines the character of the command and calls the appropriate
script for execution. But that's not the only complexity. No matter how many hours
of input, another factor plays a big role in whether a package notices you. Ground
noise simply removes the speech recognition device from the target. This may be due
to the inability to essentially distinguish between the bark of a dog or the sound near
hearing that a helicopter is flying overheadfrom your voice.
The main purpose is to facilitate the users' daily lives by sensing the voice
and interpreting it into action.
Web browser
To perform web search. This module comes built-in with Python.
OS
The OS module in Python provides functions for interacting with the os. OS comes
under Python’s standard utility modules. This module provides a way of using
operating system dependent functionality.
Pyaudio
PyAudio is a set of Python bindings for PortAudio, a cross- platform C++ library
interfacing with audio drivers.
PyQt5
PyQt5 is a comprehensive set of Python bindings for Qt v5.It is implemented as
more than 35 extension modules and enables Python to be used as an alternative
application development language to C++ on all supported platforms including iOS
and Android.PyQt5 may also be embedded in C++ based applications to allow users
of those applications to configure or enhance the functionality of those applications.
Python Backend
The python backend gets the output from the speech recognition module and then
identifies whether the command or the speech output is an API Call and Context
Extraction. The output is then sent back to the python backend to give the required
output to the user.
Text to speech module
Text-to-Speech (TTS) refers to the ability of computers to read text aloud. A TTS
Engine converts written text to a phonemic representation, then converts the
phonemic representation to waveforms that can be output as sound. TTS engines with
different languages, dialects and specialized vocabularies are available through third-
party publishers.
Content Extraction
Context extraction (CE) is the task of automatically extracting structured information
from unstructured and/or semi-structured machine-readable documents. In most
cases, this activity concerns processing human language texts using natural language
processing (NLP). Recent activities in multimedia document processing like
automatic annotation and content extraction out of images/audio/video could be seen
as context extraction test results.
Textual output
It decodes the voice command and performs the operation then shows the voice
command as textual output in the terminal.
6.2 Code
import pyttsx3
import speech_recognition as sr
import datetime
import wikipedia
import
webbrowser
import os
engine = pyttsx3.init('sapi5')
voices = engine.getProperty('voices')
engine.setProperty('voice',voices[0].id)
def speak(audio):
engine.say(audio)
engine.runAndWait()
def wishMe():
hour = int(datetime.datetime.now().hour)
if hour>=0 and hour<=12:
speak("Good Morning!")
elif hour>=12 and
hour<18:
speak("Good Afternoon!")
else:
speak("Good Evening!")
def takeCommand():
r = sr.Recognizer()
with sr.Microphone() as
source: print("Listening
...................")
r.pause_threshold = 1
audio = r.listen(source)
try:
print("Recognizing. .")
query = r.recognize_google(audio, language='en-in')
print(f"User said: , {query}\n")
except Exception as e:
if 'wikipedia' in query:
speak('Searching Wikipedia..... .')
query = query.replace("Wikipedia", "")
results = wikipedia.summary(query, sentences=2)
speak("According to Wikipedia")
print(results)
speak(results)