SR22503183839
SR22503183839
ISSN: 2319-7064
SJIF (2022): 7.942
Abstract: JARVIS, a virtual embedded voice assistant that includes cutting-edge technology based on gTTS and Python in developing a
personalized assistant. JARVIS integrates the functionality of AIML and, together with Google, the industry leader, a text-to-speech
platform and thus male/female voices into the gTTS libraries powered by the Marvel world. This is often the result of adopting the
dynamic base Pyttsx Pythons considered wise in contiguous phases of gTTS, facilitating the establishment of essentially fine-tuned
dialogues between assistants management and users. It will help end users in their daily activities like general human speech, query
search in Google, Bing or Yahoo, video search, image retrieval, live weather, word meaning, predict and remind users of scheduled
events and tasks. This is often the sole result of over-contributing by multiple contributors, such as AIML’s usability and ability to
dynamically merge with platforms like Python [pyttsx] and gTTS [Google Text to Speech] ] results in the same JARVIS standard
structure showing general reusability and almost zero or no maintainability.
1. Introduction AI voice assistants often perform simple tasks for end users,
such as adding tasks to the calendar; provide information
AI voice assistant, also known as a virtual or digital that can usually be searched in an Internet browser; or
assistant, is a device that uses voice recognition technology, control and check the health of sensitive devices in the
natural language processing, and Artificial Intelligence (AI) home, send emails, setting up of alarms, getting weather
to respond to people. Through technology, the device reports, can give your location, perform some basic
aggregates user messages, breaks them down, rates them, mathematical calculations, check news, start the music, and
and gives meaningful feedback in return. Artificial open different websites like stack overflow, you tube,
intelligence can bring real conversations. Virtual assistants, Facebook etc.
understand natural language voice commands and performs
tasks for users. These tasks, previously performed by a 2. Related Work
personal assistant or secretary, include dictation, reading text
messages or exchanging email messages aloud, schedule 2.1 Generalization
appointments for end users. The AI assistant can also
perform other activities, such as sending messages, The below mentioned pie chart shows the analysis of virtual
answering phone calls, and getting directions. It also helps to assistants in context to education as well as purpose of this
read news and weather updates, open Google, You Tube, work with a total of papers from 13 countries. The highest
Stack Overflow, etc. , answer any questions, web scraping, contribution was made by country England with most
play mu-sic, etc. Although this definition emphasizes the number of papers (3), followed by Russia and Switzerland
digital style of a virtual assistant, the term virtual assistant or (2 papers each). The remaining countries, namely Singapore,
virtual personal assistant is additionally unremarkably wont Pakistan, Canada, India, France, Bulgaria, Saudi Arabia and
to describe contract employees United Nations agency work Germany are also mentioned with 1 paper each (Figure 2.1)
from home and perform body tasks un-remarkably
performed by executives, assistant or secretary. Digital
assistants can also be compared with other form of
consumer-facing AI programming known as responsive
advisors. Sensible adviser programs are topic-oriented,
whereas virtual assistants are task-oriented.
3. System Analysis And, with the help of machine learning modules and
Deep Learning modules built emotions in the model and
3.1 Training Model dataset to help the model in training.
When an input layer is specified, weight area units are Figure 3.2: Natural Language Processing
assigned. These weights make it easy to see the importance
of a particular variable. Large variables pay a lot of attention 3.3 Speech Recognition System
to the output for different inputs. Then the units of all input
areas are incremented and summed with different weights. The speech recognition system is the core of the voice
Then the output is passed. When this output exceeds a application system, which is capable of understanding the
certain threshold, the node is triggered and knowledge is voice input given by the user, and at the same time operating
propagated to future layers in the network. This makes the the applications efficiently and generating voice feedback to
exit of one node the entrance of future nodes. This method the user. This system is an important component for users as
of passing knowledge from one layer to the future layer a gateway to use their voice as an input component. (Figure
defines this neural network as a feed forward network. 3.3) . In a word, in order to clearly recognize the user’s
speech command and get a response from the system, we
should consider that the speech recognition system contains
the whole process by which the application system directs
the generation. voice signal to text data and some important
meanings, forms of speech.
Initially the condition is that if the Jarvis voice assistant is Now after the proceedings if the skills to be executed are
active or not, if it is active then it asks for the user input adequate to Jarvis then it gives a positive response to the
otherwise make jarvis active(make it on). Then user user in form of speech and then executes the commands for
provides the input in the form of speech or text, after that if operations, hence gives the console output and speech.On
the input provided is in text then it goes for the action to be the other hand if the skills to be executed are not adequate or
taken or the skills to be executed, else if the input is in inappropriate to Jarvis it gives a negative response and
speech then it uses the speech recoginition feature and executes no further commands to give console output.
converts it into text and goes for the action. (Figure 4.1) (Figure 4.1)
4.2Sequence Diagram
The user sends command to the voice assistant Jarvis Figure 4.3: Use Case Diagram
then it forwards it to Interpreter i.e. speech recoginition
feature here and then is directed perform the specific 4.4 Activity Diagram
task, after the processing in task model Jarvis executes
the task and give the response or feedback to the user.
(Figure 4.2)
If after the processing at task model there is some
missing information then Jarvis asks for that information,
takes the input again, gathers all information and follow
the same process as detailed above. (Figure 4.2)
5. Experimental Result
On User speech command voice assistant display google
search of the query asked and read the solution for the user
too.(Figure 5.1)
On User speech command voice assistant open ‘my location’ (Figure 5.3)
References
[1] Alotto, F., Scidà, I., and Osello, A. (2020). “Building
modeling with artificial intelligence and speech
recognition for learning purpose.” Proceedings of
EDULEARN20 Conference, Vol. 6. 7th.
[2] Beirl, D., Rogers, Y., and Yuill, N. (2019). “Using
voice assistant skills in family life.” Computer-
Supported Collaborative Learning Conference, CSCL,
Vol. 1, Inter-national Society of the Learning Sciences,
Inc. 96–103.
[3] Canbek, N. G. and Mutlu, M. E. (2016). “On the track
of artificial intelligence: Learning with intelligent
personal assistants.” Journal of Human Sciences,
13(1), 592–601.
[4] Malodia, S., Islam, N., Kaur, P., and Dhir, A. (2021).
“Why do people use artificial intelligence (AI)-enabled
voice assistants?.” IEEE Transactions on Engineering
Management.