B.E Etce Batchno 8
B.E Etce Batchno 8
Engineering by
SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
Accredited with Grade “A” by NAAC
JEPPIAAR NAGAR, RAJIV GANDHI SALAI, CHENNAI – 600 119
APRIL-2021
DEPARTMENT OF ELECTRONICS AND TELECOMMUNICATION
ENGINEERING
BONAFIDE CERTIFICATE
This is to certify that this Project Report is the bonafide work of Naman Mittal
(37250018) and Imteyaz Mallick(37250005) who carried out the project entitled “
DESKTOP VOICE ASSISTANT USING ARTIFICIAL INTELLIGENCE” under my supervision
from November 2020 to April 2021.
Internal Guide
I, Naman Mittal (Reg. no. 37250018) and Imteyaz Mallick (Reg. no.
37250005) hereby declare that the Project Report entitled DESKTOP
VOICE ASSISTANT USING ARTIFICIAL INTELLIGENCE done by us under the
guidance of MR. A. ARANGANATHAN M.Tech,.(Ph.D.) Dept of ETCE at
Sathyabama Institute of Science and Technology is submitted in partial
fulfillment of the requirements for the award of Bachelor of Engineering
degree in Electronics and Telecommunication Engineering.
1.
2.
ACKNOWLEDGEMENT
v
CONTENTS
CHAPTER NO. CHAPTER NAME PAGE NO.
1 INTRODUCTION 1
1.3 Objectives 4
1.4 Purpose 5
2 LITERATURE SURVEY 6
2.1 Scope 6
2.2 Applicability 7
5.3 Recapitulate 39
6 CONCLUSION 44
vi
LIST OF ABBREVIATIONS
ABBREVIATION EXPANSION
Internet of Things
Graphical User Interface
Artificial Intelligence
vii
LIST OF FIGURES
viii
LIST OF TABLES
ix
CHAPTER 1
INTRODUCTION
In today’s era almost all tasks are digitalized. We have Smartphone in hands and it
is nothing less than having world at Wer finger tips. These days we aren’t even
using fingers. We just speak of the task and it is done. There exist systems where
we can say Text Dad, “I’ll be late today.” And the text is sent. That is the task of a
Virtual Assistant. It also supports specialized task such as booking a flight, or
finding cheapest book online from various ecommerce sites and then providing an
interface to book an order are helping automate search, discovery and online
order operations.
Virtual Assistants are software programs that help We ease Wer day to day tasks,
such as showing weather report, creating reminders, making shopping lists etc.
They can take commands via text (online chat bots) or by voice. Voice based
intelligent assistants need an invoking word or wake word to activate the listener,
followed by the command. For my project the wake word is JIA. We have so many
virtual assistants, such as Apple’s Siri, Amazon’s Alexa and Microsoft’s Cortana.
For this project, wake word was chosen JIA.
This system is designed to be used efficiently on desktops. Personal assistant
software improves user productivity by managing routine tasks of the user and by
providing information from online sources to the user. JIA is effortless to use. Call
the wake word ‘JIA’ followed by the command. And within seconds, it gets
executed.
Voice searches have dominated over text search. Web searches conducted via
mobile devices have only just overtaken those carried out using a computer and
the analysts are already predicting that 50% of searches will be via voice by
2020.Virtual assistants are turning out to be smarter than ever. Allow We are
intelligent assistant to make email work for We. Detect intent, pick out important
information, automate processes, and deliver personalized responses.
This project was started on the premise that there is sufficient amount of openly
available data and information on the web that can be utilized to build a virtual
assistant that has access to making intelligent decisions for routine user activities.
1
1.2 Implementation Overview
We can use any IDE like Anaconda, Visual Studio etc. So the first step is to create a
function for a voice which has arguments. Also, we are using Speech API to take
voice as input. There are two default voices in our computer I.e. Male and Female.
So We can use either any from these. Also, check the voice function by giving some
text input and it will be converted into voice.
We can create a new function for wishing the user. Use if_else condition statement
for allocating the wished output. E.g. If its time between 12 to 18 then the system
will say “Good Evening”.
Along with this, We can get a welcome voice also. E.g. “Welcome, What can I do for
us”.After that, we have to install the speech recognition model and then import it
Define a new function for taking command from a user. Also mention class for
speech recognition and input type like microphone etc. Also mention
pause_threshold. Set a query for voice recognition language. We can use Google
engine to convert Wer voice input to the text.
We have to install and import some other packages like pyttsx3, Wikipedia etc.
Pyttsx3 helps We to convert text input to speech conversion. If We ask for any
information then it will display the result in textual format. We can convert it very
easily in the voice format as we have already defined a function in our code
previously.
Else We ask to open WeTube in the query then it will go to the WeTube address
automatically. For that, We have to install a web browser package and import it. In
the same way, We can add queries for many websites like Google, Instagram,
Facebook etc. The next task is to play the songs. It is same as We have done
before. Add a query for “play songs”. Also, add the location of songs folder so that
assistant will play the song from that only.
So the main question is that how the queries work? So here we are using conditional
statements. If the computer hears voice command which contains word Wetube then
it will go to the WeTube page address, if the voice contains google command then it
will go to the Google search page and so many accordingly.
We can add so many pages and commands for Web desktop assistant.
2
Have We ever wondered how cool it would be to have Wer own A.I.
assistant? Imagine how easier it would be to send emails without typing a
single word, doing Wikipedia searches without opening web browsers, and
performing many other daily tasks like playing music with the help of a
single voice command.
3
1.3 OBJECTIVES
Main objective of building personal assistant software (a virtual assistant) is
using semantic data sources available on the web, user generated content
and providing knowledge from knowledge databases. The main purpose of
an intelligent virtual assistant is to answer questions that users may have.
This may be done in a business environment, for example, on the business
website, with a chat interface. On the mobile platform, the intelligent
virtual assistant is available as a call-button operated service where a
voice asks the user “What can I do for us?” and then responds to verbal
input.Virtual assistants can tremendously save We time. We spend hours
in online research and then making the report in our terms of
understanding. JIA can do that for We. Provide a topic for research and
continue with Wer tasks while JIA does the research. Another difficult task
is to remember test dates, birthdates or anniversaries.
It comes with a surprise when We enter the class and realize it is class test
today. Just tell JIA in advance about Wer tests and she reminds We well in
advance so We can prepare for the test.
One of the main advantages of voice searches is their rapidity. In fact, voice is
reputed to be four times faster than a written search: whereas we can
write about 40 words per minute, we are capable of speaking around 150
during the same period of time15. In this respect, the ability of personal
assistants to accurately recognize spoken words is a prerequisite for them
to be adopted by consumers.
4
1.4 Purpose
5
CHAPTER 2
LITERATURE SURVEY
2.1 Scope
2.2 Applicability
The mass adoption of artificial intelligence in users’ everyday lives is also
fueling the shift towards voice. The number of IoT devices such as smart
thermostats and speakers are giving voice assistants more utility in a
connected user’s life. Smart speakers are the number one way we are
seeing voice being used. Many industry experts even predict that nearly
every application will integrate voice technology in some way in the next 5
years.
The use of virtual assistants can also enhance the system of IoT (Internet of
Things). Twenty years from now, Microsoft and its competitors will be
offering personal digital assistants that will offer the services of a full-time
employee usually reserved for the rich and famous
6
2.3 SURVEY OF TECHNOLOGY
2.3.1 Python
2.3.2 DBpedia
7
2.3.4 Pyttsx
This is a library for performing speech recognition, with support for several
engines and APIs, online and offline. It supports APIs like Google Cloud
Speech API, IBM Speech to Text, Microsoft Bing Voice Recognition etc.
2.3.6 SQLite
9
CHAPTER 3
Problem definition
We already have multiple virtual assistants. But we hardly use it. There are
number of people who have issues in voice recognition. These systems can
understand English phrases but they fail to recognize in our accent. Our way
of pronunciation is way distinct from theirs. Also, they are easy to use on
mobile devices than desktop systems. There is need of a virtual assistant
that can understand English in Indian accent and work on desktop system.
10
3.2 REQUIREMENT SPECIFICATION
Personal assistant software is required to act as an interface into the digital world
by understanding user requests or commands and then translating into actions or
recommendations based on agent’s understanding of the world.
Jarvis focuses on relieving the user of entering text input and using voice as
primary means of user input. Agent then applies voice recognition
algorithms to this input and records the input. It then use this input to call
one of the personal information management applications such as task list
or calendar to record a new entry or to search about it on search engines
like Google, Bing or Yahoo etc.
Focus is on capturing the user input through voice, recognizing the input and
then executing the tasks if the agent understands the task. Software takes
this input in natural language, and so makes it easier for the user to input
what he or she desires to be done.
Here, we find the total cost and benefit of the proposed system over current
system. For this project, the main cost is documentation cost. User also
would have to pay for microphone and speakers. Again, they are cheap and
11
available. As far as maintenance is concerned, JIA won’t cost too much.
12
This shows the management and organizational structure of the project. This
project is not built by a team. The management tasks are all to be carried
out by a single person. That won’t create any management issues and will
increase the feasibility of the project.
3.3.5 Cultural feasibility :-
13
CHAPTER 4
SYSTEM REQUIREMENT AND DESIGN
Hardware :-
• Pentium-pro processor or later.
• RAM 512MB or more.
Software :-
• Windows 7(32-bit) or above.
• Python 2.7 or later
• Chrome Driver
4.2.1 ER DIAGRAM
14
Fig 4.2.1 ER Diagram
The above diagram shows entities and their relationship for a virtual assistant
system. We have a user of a system who can have their keys and values.
It can be used to store any information about the user. Say, for key
“name” value can be “Jim”. For some keys user might like to keep secure.
There he can enable lock and set a password (voice clip).
Single user can ask multiple questions. Each question will be given ID to
get recognized along with the query and its corresponding answer. User
can also be having n number of tasks. These should have their own unique
id and status i.e. their current state. A task should also have a priority
value and its category whether it is a parent task or child task of an older
task.
15
Fig 4.2.2 Activity Diagram
Initially, the system is in idle mode. As it receives any wake up cal it begins
execution. The received command is identified whether it is a
questionnaire or a task to be performed. Specific action is taken
accordingly. After the Question is being answered or the task is being
performed, the system waits for another command. This loop continues
unless it receives quit command. At that moment, it goes back to sleep.
15
4.2.3 CLASS DIAGRAM
16
4.2.4 USE CASE DIAGRAM
17
Fig 4.2.5.1 Sequence diagram for Query-Response
The above sequence diagram shows how an answer asked by the user is
being fetched from internet. The audio query is interpreted and sent to
Web scraper. The web scraper searches and finds the answer. It is then
sent back to speaker, where it speaks the answer to user.
18
Fig 4.2.5.2 Sequence diagram for Task Execution
The user sends command to virtual assistant in audio form. The command
is passed to the interpreter. It identifies what the user has asked and
directs it to task executer. If the task is missing some info, the virtual
assistant asks user back about it. The received information is sent back to
task and it is accomplished. After execution feedback is sent back to user.
19
4.2.6 DATA FLOW DIAGRAM
20
Fig 4.2.6.2 DFD Level 1
21
DFD Level 2
23
4.2.7 COMPONENT DIAGRAM
24
Fig 4.2.8 Deployment Diagram
The user interacts with SQLite database using SQLite connection in Python code.
The knowledge database DBPedia must be accessed via internet
connection. This requires LAN or WLAN / Ethernet network.
25
4.3 DATA DICTIONARY
User
Key Text
Value Text
Lock Boolean
Password Text
Question
Qid Integer PRIMARY KEY
Query Text
Answer Text
Task
Tid Integer PRIMARY KEY
Status Text (Active/Waiting/Stopped)
Level Text (Parent/Sub)
Priority Integer
Reminder
Rid Integer PRIMARY KEY
Tid Integer FOREIGN KEY
What Text
When Time
On Date
Notify before Time
26
Note
Nid Integer PRIMARY KEY
Tid Integer FOREIGN KEY
Data Text
Priority Integer
27
CHAPTER 5
Test ID: T1
Test Objective: To make sure that the system respond back time is efficient.
Description:
Time is very critical in a voice based system. As we are not typing inputs,
we are speaking them. The system must also reply in a moment. User
must get instant response of the query made.
Test Title:
T2
Description:
28
5.1.3 Test Case 3
28
Test ID: t3
Starting VS Code
We have used the VS Code IDE in this video. Feel free to use any other IDE
We are comfortable d with. I have started a new project and make a file
called jarvis.py.
Windows, Linux and macOS. Features include support for debugging, syntax
29
Finally, after installation completes, click on the finish button, and the visual
studio code will get open.
By default, VS Code installs under C:\users\{username}\AppData\Local\
Programs\Microsoft VS Code.After the successful installation, let’s move to
the next section to understand the various components of the User Interface
of Visual Studio Code Editor.
What are the essential components of the VS Code? Visual Studio Code is a
code editor at its core. Like many other code editors, VS Code adopts a
standard user interface and laWet of an explorer on the left, showing all of
the files and folders We have access to. Additionally, it has an editor on the
right, showing the content of the files We have opened. Below are a few of
the most critical components the VSCode editor:
Editor – It is the main area to edit Wer files. We can open as many editors as
possible side by side vertically and horizontally.
SideBar – Contains different views like the Explorer to assist We while working on
Wer project.
Status Bar – It contains the information about the opened project and the
files We edit. Activity Bar – It is located on the far left-hand side. It lets We
switch between views and gives We additional context-specific indicators,
like the number of outgoing changes when Git is enabled.
30
Panels – It displays different panels below the editor region for output or
debug information, errors, and warnings, or an integrated terminal.
Additionally, the panel can also move to the right for more vertical space.
VS Code opens up in the same state it was last in, every time We
start it. It also preserves folder, laWet, and opened files.
Defining Speak Function
The first and foremost thing for an A.I. assistant is that it should be able to
speak. To make our J.A.R.V.I.S. talk, we will make a function called speak().
This function will take audio as an argument, and then it will pronounce it.
def speak(audio):
pass #For now, we will write the conditions later.
Now, the next thing we need is audio. We must supply audio so that we can
pronounce it using the speak() function we made. We are going to install a
module called pyttsx3.
What is pyttsx3?
Installation:
31
Usage:
import
pyttsx3
engine =
pyttsx3.init('sapi5')
engine.setProperty('voice',
voice[0].id)
What is sapi5?
What Is VoiceId?
We made a function called speak() at the starting of this tutorial. Now, we will
write our speak() function to convert our text to speech.
def
speak(audio):
engine.say(audi
o)
We will create a main() function, and inside this main() Function, we will call
our speak function.
32
Code:
Imteyaz")
Whatever We will write inside this speak() function will be converted into
speech. Congratulations! With this, our J.A.R.V.I.S. has its own voice, and it is
ready to speak.
Now, we will make a wishme() function, that will make our J.A.R.V.I.S. wish or
greet the user according to the time of computer or pc. To provide current or
live time to A.I., we need to import a module called datetime. Import this
module to Wer program, by:
import datetime
def wishme():
hour = int(datetime.datetime.now().hour)
Here, we have stored the current hour or time integer value into a variable
named hour. Now, we will use this hour value inside an if-else loop.
The next most important thing for our A.I. assistant is that it should take
command with the help of the microphone of the user's system. So, now we
will make
a takeCommand() function. With the help of the takeCommand() function, our
A.I. assistant will return a string output by taking microphone input from the
user.
import speechRecognition as sr
def takeCommand():
#It takes microphone input from the user and returns string
output
r = sr.Recognizer()
with sr.Microphone() as
source:
print("Listening...")
r.pause_threshold = 1
audio = r.listen(source)
try:
print("Recognizing...")
query = r.recognize_google(audio, language='en-in')
#Using google for voice recognition.
print(f"User said: {query}\n") #User query will be
printed.
except Exception as
e: # print(e)
print("Say that again please...") #Say that again
will be printed in case of improper voice
return "None" #None string will be
returned return query
Coding logic of Jarvis
After successfully installing the Wikipedia module, import it into the program
by writing an import statement.
35
Defining Task 2: To open WeTube site in a web-browser
Code:
elif 'open Wetube' in query:
webbrowser.open("Wetube.com")
Here, we are using the elif loop to check whether Wetube is in the user's
query. Let' suppose the user gives a command as "J.A.R.V.I.S., open Wetube."
So, open Wetube will be in the user's query, and the elif condition will be
true.
To play music, we need to import a module called os. Import this module
directly with an import statement.
In the above code, we first opened our music directory and then listed all the
songs present in the directory with the os module's help. With the help of
os.startfile, We can play any song of Wer choice. I am playing the first song
in the directory. However, We can also play a random song with the help of a
random module. Every time We command to play music, J.A.R.V.I.S. will play
Defining Task 5: To know the current time
In the above, code with using datetime() function and storing the current or
live system into a variable called strTime. After storing the time in strTime,
we are passing this variable as an argument in speak function. Now, the time
string will be converted into speech.
In the above code, we are using the SMTP module, which we have already
discussed above.
Note: Do not forget to 'enable the less secure apps' feature in Wer Gmail
account. Otherwise, the sendEmail function will not work properly.
We are using the try and except block to handle any possible error while
sending emails.
38
5.3 Recapitulate
First of all, we have created a wishme() function that gives the greeting
functionality according to our A.I system time.
After wishme() function, we have created a takeCommand() function, which
helps our A.I to take command from the user. This function is also
responsible for returning the user's query in a string format.
We developed the code logic for opening different websites like google,
Wetube, and stack overflow.
Developed code logic for opening VS Code or any other
Is it an A.I.?
Many people will argue that the virtual assistant that we have created is not
an A.I, but it is the output of a bunch of the statement. But, if we look at the
fundamental level, the sole purpose of A.I develop machines that can
perform human tasks with the same effectiveness or even more effectively
than humans.
It is a fact that our virtual assistant is not a very good example of A.I., but it is an A.I. !
39
5.4 Code as inscribed
engine = pyttsx3.init('sapi5')
voices =
engine.getProperty('voices') #
print(voices[1].id)
engine.setProperty('voice', voices[1].id)
def speak(audio):
engine.say(audio) #engine will speak the given audio string
engine.runAndWait()
def wishMe():
hour =
int(datetime.datetime.now().hour) if
hour>=0 and hour<12:
speak("Good Morning Mam, Welcome to Our Final Year Project!")
else:
speak("Good Evening Mam, Welcome to Our Final Year Project!")
def takeCommand():
#It takes microphone input from the user and returns string output
r = sr.Recognizer()
with sr.Microphone() as source:
print("Listening...")
r.pause_threshold = 1
audio = r.listen(source)
try:
print("Recognizing...")
query = r.recognize_google(audio, language='en-in')
print(f"User said: {query}\n")
except Exception as e:
# print(e) #if i wnt to see error in my console, print it or else comm
ent
it print("Say that again
please...") return "None"
return query
43
CHAPTER 6
CONCLUSION
44
REFERENCES
2. Deller John R., Jr., Hansen John J.L., Proakis John G. ,Discrete-
Time Processing of Speech Signals, IEEE Press, ISBN 0-7803-
5386-2
6. http://www.cse.unsw.edu.au/~waleed/phd/html/node38.html , downloaded on 2
Oct 2012
45