FINAL - MINI - PROJECT Report 2 (
FINAL - MINI - PROJECT Report 2 (
NOVEMBER- 2023
A MINI-PROJECT REPORT
ON
Arya Patil
Soham Vede
Yash Vichare
Saakshi Wagh
Internal Guide
Prof. B. N. Panchal
1
Juhu-Versova Link Road Versova, Andheri(W), Mumbai-53
CERTIFICATE
Guide H.O.D.
Prof. B. N. Panchal Prof. Sunil P. Khachane
Principal
Dr. Sanjay Bokade
2
Project Report Approval for T. E.
Examiners:
1---------------------------------------------
2.--------------------------------------------
Date:
Place:
3
Declaration
We wish to state that the work embodied in this project titled “Artificial Voice
Assistant” forms our own contribution to the work carried out under the guidance of “B.N
Panchal” at Rajiv Gandhi Institute of Technology.
I declare that this written submission represents my ideas in my own words and where
others' ideas or words have been included, I have adequately cited and referenced the original
sources. I also declare that I have adhered to all principles of academic honesty and integrity
and have not misrepresented or fabricated or falsified any idea/data/fact/source in my
submission. I understand that any violation of the above will be cause for disciplinary action
by the Institute and can also evoke penal action from the sources which have thus not been
properly cited or from whom proper permission has not been taken when needed.
(Students’ Signatures)
4
Abstract
In this project, we explore the process of creating a voice assistant. We begin by discussing
the various components of a voice assistant, including speech recognition, natural language
processing, and text-to-speech synthesis. We then describe the architecture of our system,
which is based on a modular design that allows for easy customization and extension. We
shall also discuss key algorithms and packages required, how they work and why they are
used. Finally, we present the results of our experiments, which demonstrate the effectiveness
of our approach in terms of accuracy and speed. This research project focuses on the
development of a Python-based Artificial Voice Assistant capable of understanding and
responding to natural language voice commands. The key objectives encompass technology
choice, algorithm enhancement, API integration, user interaction, testing and evaluation,
usability testing, and comprehensive documentation.
The literature survey provides valuable insights into the growth of voice-based assistants,
their applications, and relevant research. Notable works include "An Intelligent Voice
Assistant Using Android Platform" by Sutar Shekhar and colleagues, "Smart Home Using
Internet of Things" by Keerthana S and colleagues, "On the track of Artificial Intelligence:
Learning with Intelligent Personal Assistant" by Nil Goksel and colleagues, and "Speech
recognition using flat models" by Patrick Nguyen and colleagues.
The proposed concept involves the implementation of a personal voice assistant that utilizes a
Speech Recognition library to understand user commands and respond in voice through Text
to Speech functions. Underlying algorithms convert voice to text for processing user
instructions.
The workflow involves capturing user commands and triggering the virtual assistant based on
specific keywords, including "Wikipedia," "Google chrome," "Time," and more. These
trigger words initiate corresponding actions by the voice assistant.
The architecture comprises four phases: capturing input through a microphone, recognizing
audio data and converting it into text using NLP, processing the data through Python scripts,
and presenting the output as text or speech using TTS.
The algorithm encompasses setting up the speech engine, defining functions for user
greetings, capturing and recognizing user speech, and executing commands based on trigger
words.
5
Contents
Introduction 1
1 1.1 Introduction Description ...………………………………………. 1
1.2 Organization of Report ………………………………………… 2
Literature Review 3
2.1 Survey Existing system …………………………………………….. 3
2
2.2 Limitation Existing system or research gap ……………………… 4
2.3 Problem Statement and Objective………………………………….. 5
Proposed System
3.1 Algorithm……………………………………
3.2 Details of Hardware & Software…………………………………….
3.2.1 Hardware Requirement..……………………………………..
3
3.2.2 Software Requirement……………………………………….
3.3 Design Details……………………………………………………….
3.3.1 System Flow/System Architecture……………………………
3.3.2 Detailed Design
3.4 Methodology/Procedures…..
Results & Discussions
4
4.1 Results……………………………………………………………….
4.2 Discussion-Comparative study/ Analysis…………………………..
5 Conclusion and Future Work
References
6
CHAPTER 1
Introduction
7
CHAPTER 2
Literature Review
8
may not be handled well. Dependency on wake words, limited language support,
intrusiveness, and incompatibility issues also persist. Accessibility concerns and over-
reliance on these systems raise questions about their broader societal impact. Data privacy,
update-related changes, and tone appropriateness add further complexities to the voice
assistant landscape.
2.3 Objectives
1. Voice Assistant Development - Create a functional python based Artificial Voice
Assistant capable of understanding and responding to natural language voice
commands.
2. Algorithm Enhancement - Implement and optimize algorithms to enhance the
efficiency and accuracy of voice recognition and natural language processing within
the assistant.
3. API Integration - Integrate relevant APIs to expand the voice assistant's capabilities,
enabling it to perform actions such as opening Google, YouTube, setting timers, and
alarms based on user commands.
4. Python Language - Utilize the Python programming language to build the core
functionality of the voice assistant, ensuring code modularity and maintainability.
5. User Interaction - Develop an intuitive user interface that facilitates user interaction
with the voice assistant on Android devices.
6. Testing and Evaluation - Conduct extensive testing to ensure the accuracy of voice
recognition, the effectiveness of API integration, and the overall performance of the
voice assistant
7. Usability Testing - Gather user feedback through usability testing to identify areas for
improvement and refine the user experience.
8. Documentation - Maintain comprehensive documentation of the development
process, including algorithms, codebase, API integration, and user interface design.
2.4 Scope
The future scope of artificial voice assistants is exceptionally promising, with a trajectory
poised to revolutionize the way we interact with technology. As advancements in natural
language processing and artificial intelligence continue, these digital companions will evolve
to offer unparalleled user experiences. Their ability to comprehend and respond to natural
language will become even more seamless, enhancing their utility across various domains.
Moreover, the integration of voice assistants with emerging technologies like augmented
reality and virtual reality promises to usher in an era of immersive and interactive
interactions. With a focus on personalization, voice assistants will adapt to individual
preferences and needs, making them indispensable in daily life. In addition, they will exhibit
enhanced context awareness, ensuring more intuitive and contextually relevant responses.
Industries, ranging from healthcare to smart homes, will increasingly rely on voice assistants
for tasks, and as global support expands, these assistants will reach a more diverse audience.
9
While emphasizing privacy and security, voice assistants will continue to advance,
incorporating emotional intelligence, expanding their roles in education, commerce, and even
content creation. The future holds immense potential for these digital aides, and their
integration into various aspects of our lives is inevitable, driving us towards a more
connected and efficient future.
CHAPTER 3
Proposed System
3.1 Algorithm
1. Step 1: Import necessary packages.
2. Step 2: Set up the speech engine using pyttsx3 with voice options (Male or Female).
3. Step 3: Define a function to greet the user based on the current time.
4. Step 4: Create a function to capture and recognize user speech using a microphone
and Google's speech recognition.
5. Step 5: In the main function, listen for trigger words in user input. If trigger words are
detected, execute corresponding commands like opening websites, providing
information, or performing actions.
6. Step 6: End the program.
10
3.2 Hardware and Software:
3.2.1 Software Details:
Libraries and Modules:
speech_recognition: Used for voice recognition.
pyttsx3: Used for text-to-speech
conversion.datetime: Used for working with date and time.
wikipedia: Used to fetch information from Wikipedia.webbrowser
os: Used for interacting with the operating
system.time: Used for adding delays.
subprocess: Used for executing system commands.
wolframalpha: Used for answering factual question.
11
requests: Used for making HTTP requests to fetch weather data
APIs:Wolfram Alpha API: Used for answering factual questions.
OpenWeatherMap API: Used for fetching weather information.
PyQt5: Python library create graphical user interfaces (GUIs) for desktop applications.
12
3.3.1 System flow
The commands given by the humans is stored in the variable statement.
If the following trigger words are present in the statement given by the user it invokes the
virtual assistant to execute the statement of the trigger word.
• Trigger words:-
• Wikipedia
• Google chrome , G-Mail and YouTube
• Time
• News
• Search
• Log off, Sign out
• who are you, what can you do
• who made you
• stop good bye
13
3.3.2 Architecture
The system design consists of:
1. Taking the input as speech patterns through microphone.
2. Audio data recognition and conversion into text.
3. Comparing the input with predefined commands.
4. Giving the desired output
The initial phase includes the data being taken in as speech patterns from the microphone. In
the second phase the collected data is worked over and transformed into textual data using
NLP. In the next step, this resulting string the data is manipulated through Python Script to
finalize the required output process. In the last phase, the produced output is presented either
in the form of text or converted from text to speech using TTS.
14
Features the System shall be developed to offer the following
features:
1) It keeps listening continuously in inaction and wakes up into action when called with
a particular predetermined functionality.
2) Browsing through the web based on the individual’s spoken parameters and then
issuing a desired output through audio and at the same time it will print the output on
the screen.
3.4 Methodology
In the development of our voice assistant, we have set clear objectives, focusing on its
fundamental purpose – to listen and open specific applications as per user commands. We
identified a range of target applications, encompassing web browsers, music players, and
various business tools, broadening the utility of the assistant. To enhance user experience, we
carefully designed the user interface, enabling seamless voice command input and providing
feedback for command confirmation. For accurate command recognition, we integrated a
voice recognition system through APIs. Our command recognition logic was meticulously
crafted to interpret distinct voice commands such as "Open Chrome" or "Start Spotify." We
ensured a user-friendly interaction by implementing a response generation mechanism that
confirms command execution and gracefully handles errors. This includes offering alternative
suggestions and seeking clarification for ambiguous commands, ensuring a smoother and
more effective user experience.
15
CHAPTER 4
The future of artificial voice assistants holds great promise, with numerous avenues for
advancement. Enhancing natural language understanding (NLU) is a critical focus. Current
voice assistants are impressive, but further improvements in NLU will make them more
accurate and context-aware, supporting complex conversations and nuanced responses.
Multimodal integration is another key aspect. Combining voice commands with other sensory
inputs like vision and touch can lead to more immersive and intuitive interactions. Privacy
and security concerns need to be addressed, with a focus on stronger encryption, on-device
processing, and user data control.
Personalization is essential, tailoring responses to individual users, while expanding language
support fosters inclusivity. Emotional intelligence, accessibility, and ethical considerations
should guide future development. Voice assistants must seamlessly integrate with a growing
number of smart devices and continually improve to ensure responsible and meaningful
benefits to society.
Discussion
In this project, a voice assistant was created with basic experimental features. The voice
assistant includes a GUI interface implemented using tkinter in Python. GUI implementation
aims to keep the interface simple and is still at a primary level.
The voice assistant has default features for the implementation of a virtual voice assistant. At
a primary level, it is capable of performing basic tasks and opening required applications
such as YouTube, Gmail, and Google. However, there may be rare instances where the voice
assistant may not catch wrong or no input from the user due to environment disturbance.
It is important to note that the voice assistant's capabilities are limited and it is still in the
early stages of development. Further enhancements and improvements can be made to
expand its functionality and usability.
Overall, this project focuses on creating a voice assistant with basic experimental features and
a simple GUI interface using tkinter in Python. It serves as a starting point for further
development and refinement of the voice assistant's capabilities.
Note: The information provided above is based on the available feedback dated 13th October,
2023.
16
REFERENCES
17
Acknowledgement
This project bears on imprint of many peoples. We sincerely thank our project guide, Prof.
B.N. Panchal for his/her guidance and encouragement in carrying out this synopsis work.
Finally, we would like to thank all our colleagues and friends who helped us in completing
project work successfully.
1. Arya Patil
2. Soham Vede
3. Yash Vichare
4. Saakshi Wagh
18