Sample Report
Sample Report
(MUMBAI)
A
Project Report
Submitted by
VEDANT ASHOK SAWANT
AKSHATA ARVIND BIDWE
SHUBHANGI KIRAN THORAT
ARAFAT TABREZ SHAIKH
I
MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION
(MUMBAI)
A Project Report
Submitted by:
II
MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION
MUMBAI
A
Project Report
on
Submitted by:
I
Department of Computer Technology
K. K. WAGH POLYTECHNIC, NASHIK
Academic Year 2021-22
K. K. Wagh Education Society’s
K. K. WAGH POLYTECHNIC
Hirabai HaridasVidyanagari, Amrutdham, Panchavati, Nashik-422003, Maharastra
Certificate
This is certify that :
From the institute - K. K. Wagh Polytechnic, Nashik has completed the Project (Capstone
Project Planning and Execution (CPE)) for their final year having title Question
Answering AI System Using SQUAD during the Academic Year 2021-22 in the partial
fulfillment of Diploma in Computer Technology. The project is completed in a group
consisting of 4 persons under the guidance of the Faculty Guide.
Date : 20/05/2022
Place : Nashik
II
Sponsor’s Certificate
III
IIC Participation Certificate/ Appreciation Certificate(s)
IV
V
2. International Level SRM Hackhthon [SRM-6.0]
- Category : Prototype Hackthon Competition
- Selected on Top 10 Finalized Teams
VI
VII
3. National Level Project Presentation Competition [SHODH-2022]
- Category : National Level Project Competition
- Received First Rank Prize Award with Exciting Cash Prize
VIII
IX
ACKNOWLEDGEMENT
With deep sense of gratitude we would like to thanks all the people who have lit our path
with their kind guidance. We are very grateful to these intellectuals who did their best to
help during our project work.
It is our proud privilege to express deep sense of gratitude to, Prof. P. T. Kadave,
Principal, K. K. Wagh Polytechnic, Nashik for his comments and kind permission to
complete this project. We remain indebted to Prof. G.B. Katkade, Head of Computer
Technology Department for his timely suggestion and valuable guidance.
The special gratitude goes our external guide Ms. Poonam Kulkarni ,Computer
Expertise at Purushottam English School, Nashik for their sponsorship permissions and
directions for our project selection and implementations. We are greateful and remain
indebt to our Internal Guide Mrs. M. A. Shaikh, for her consistent instructions, guidance
for the completion of project.
With various industry owners or lab technicians to help, it has been our endeavor
to through out our work to cover the entire project work.
We also thankful to our parents who providing their wishful support for our project
completion successfully.Lastly we thanks to our all friends and the people who are directly
or indirectly related to our project work.
Names of Students
1) Vedant Sawant class: TYCM-SS
2) Akshata Bidwe class: TYCM-SS
3) Shubhangi Thorat class: TYCM-SS
4) Arafat Shaikh class: TYCM-SS
X
Vision & Mision
Institute Vision : - Strive to empower students with Quality Technical Education.
Department Mission :-
M1- To provide quality in education and facilities for students to help them to achieve higher
academic career growths.
M2- To impart education to meet the requirements of the industry and society by technological
solutions.
M3- Develop technical & soft skill through co–curricular and extra-curricular activities for
improving personality.
PSO 1: Computer Software and Hardware Usage: Use state-of-the-art technologies for
operation and application of computer software and hardware.
XI
Program Outcomes:-
PO 1: Basic knowledge: Apply knowledge of basic mathematics, sciences and basic
engineering tosolve the broad-based Computer engineering problem.
PO 3: Experiments and practice: Plan to perform experiments and practices to use the results
to solvebroad -based Computer engineering problems.
PO 5: The engineer and society: Assess societal, health, safety, legal and cultural issues and
the consequent responsibilities relevant to practice in field of Computer engineering.
PO 8: Individual and team work: Function effectively as a leader and team member in diverse/
multidisciplinary teams.
PO10: Life-long learning: Engage in independent and life-long learning activities in the context
of technological changes in the Computer engineering field and allied industry.
XII
Abstract
In recent years, one needs answer to the question from a huge data on finger tips. Artificial
Intelligence Question Answering is about making a computer program that could answer
questions in natural language. It can be achieved using SQUAD (Stanford Question
Answering Dataset) which will include questions asked by humans from the given
comprehension.
The project aims at creation of a system specifically using BERT (Bidirectional
Encoder Representations from Transformers) algorithm where user can input a question
from the passage of text containing the answer, then span of text corresponding to the text
will get highlighted and user will get the most relevant answer. BERT is a computational
model that converts words into numbers. This process is crucial because machine learning
models take in numbers (not words) as inputs, so an algorithm that converts words into
numbers allows you to train machine learning models on your originally-textual data.
Unlike previous models, BERT is a deeply bidirectional, unsupervised
language representation, pre-trained using only a plain text corpus.
Question answering is at the heart of natural language processing and is composed
of two sections: Reading Comprehension and Answer Selection. Question Answering were
based on statistical methods and researchers generated set of features based on text input.
Answer Selection is a fundamental task in Question Answering, also a tough one because
of the complicated semantic relations between questions and answers. Attention is a
mechanism that has revolutionized deep learning community. These techniques are widely
used among search engines, personal assistant applications on smart phones, voice control
systems and a lot more other applications. We concluded that BERT Model is superior in
all aspects of answering various types of questions.
XIII
Table of Contents
Sr.No. Name of topic Page no.
Certificates I
Acknowledgement VIII
Abstract X
Table of Contents XI
1 Introduction 1
2.1 Analysis 6
3 Project Requirement 8
3.6 Advantages 11
3.7 Limitations 11
4 Project Design and Implementation 12
XIV
4.1 Block Diagram, DFD Diagram & UML 15
Diagram
4.2 Module Analysis 21
4.3 User Interface Design 12
5 Results 23
6 Software Testing 28
7 Cost Estimation 36
8 Applications 40
9 Future Scope 41
10 Conclusion 42
11 References 43
XV
Index of Figures
Fig.No. Name of figure Page no.
1.3.1 Proposed System Workflow 4
4.1.1 Q A system Flow 13
4.1.2 System Context Diagram 14
4.2.1.1 Block Diagram 15
4.2.1.2 Use Case Diagram 16
4.2.2.1 DFD Level-0 17
4.2.2.2 DFD Level-1 17
4.2.2.3 DFD Level-2 18
4.2.3.1 Entity Realtionship Diagram 19
4.2.3.1 Flowchart of the system 20
XVI
Chapter 1
INTRODUCTION
Artificial Intelligence Question Answering is about making a computer program that could
answer questions in natural language automatically. Question Answering techniques are
widely used among search engines, personal assistant application on smart phones, voice
control systems and a lot more than other applications. This Project comprises BERT
(Bidirectional Encoder Representations from Transformers) which is used by the renowned
search engine ‘Google’, to under understand the users’ search intensions and contents that
are indexed by the search engine. BERT (Bidirectional Encoder Representations from
Transformers) is a transformer-based machine learning technique for natural language
processing pre-training and it was created, published and developed by Google in 2018.
In the past decade, several datasets for question answering tasks have been
proposed. These resources, while valuable towards the end-goal of training question-
answering systems, all experience a considerable tradeoff between data quality and size.
Some older datasets, such as those of Brent et al. and Richardson, et al. use human curated
questions and effectively capture the nuances of natural language, but – due to the labor
involved in generating the questions– are often insufficient for training robust machine
learning models for the task. Conversely, datasets generated via automation, such as those
of Hermann, et al. and Hill, et al. lack the structure of authentic human language and, as a
result, lose the ability to test for the core skills involved in reading comprehension.
Moreover, many of these older datasets employ multiple choice or single-word formats for
the ground-truth answers which inherently limits the ability of models to learn linguistic
structure when formulating answers. BERT is a computational model that converts words
into numbers. This process is crucial because machine learning models take in numbers
(not words) as inputs, so an algorithm that converts words into numbers allows you to train
machine learning models on your originally-textual data. Unlike previous models, BERT
is a deeply bidirectional, unsupervised language representation, pre-trained using only a
plain text corpus.
1
the form of two indices, a start index as, and an end index ae, which represent words in the
context paragraph.
The existing system works for the Amharic QA that is based on factoid. So authors
have designed an Amharic non-factoid QA. The question types here solved are definition,
biography, and description types. They have used lexical patterns for the extraction of
answers. [6] Here the author has stated a QA system that uses the answer triggering for the
providence of answers. The answer triggering system selects an answer from the given set
of the answers. For this system, a cognitive approach is used for the selection of an
appropriate answer from the set. They have used the WIQIKA dataset for the experiment.
The precision achieved in this system is 48.89, Recall as 64.17, and F1 as 55.49.
The previous paper[1] here gives a survey on the QA system where the answer is
provided precisely in the form of natural language. They have studied both the structured
and unstructured datasets and the combination of both. State less QA is surveyed here in
areas like RFD, linked data, etc. 21 system on this is studied here and 23 evaluations are
analyzed.Here a factoid QA model is given by the authors for the answering of questions
in the natural language. Here the authors have presented a model and named it as
Temporality-Enhanced Knowledge Memory Network (TE-KMN). They have applied this
model on the model of trivia named s\as quiz bowl. They have obtained 74.46% accuracy
with the proposed model.In this paper, authors have worked on a Chinese QA system. They
have worked on the application of both the techniques like Named Entity Recognition
(NER) and the Metric Cluster (MC). The model proposed by them worked well for factoid
QA and achieved an MRR value of .6883.The authors have presented a semantic-based
QA for answering in this paper. The questions here asked and answered are for the tourism
area. So this model can also be said as tourism QA. Firstly the detection of question is done
then the SPARQL query is optimized. The accuracy found here is 80%. The paper works
on the QA system for factoid QA.
They have worked on the development of a web-scale factoid QA system and the
open domain is presented here for the factoid questions. They have divided the questions
into major 5 categories and worked on the answering of these questions. They have
achieved accuracy with the Wikipedia data up to 62.11%. In this paper, a survey on the
accuracy evaluation for the QA system based on the webis done. The paper[2] states the
2
study done with the three categories of QA system that are, the extraction of an answer,
answer scoring, and the aggregation of the answers. They stated that the survey will help
in the selection of appropriate QA systems for usage.Here the authors have developed a
QA system for the biomedical field and the complexity in this field for the answering is
solved with their approach. Here they have used a multi-label classification method for the
classification of questions and types that are listed for QA of this field. They have enhanced
the F1 by 2% and the MRR by 3%.
Here the classification of the text is done by the authors and for this, they have used
capsule network. CNN is used for the classification purpose. Cap-Net is used for the
multitasking framework. Various data sets have been used for the testing of the proposed
approach.
The world has moved on since the days of these early pioneers, and today we use
NLP solutions without even realizing it. We live in the world Turing dreamt of, but are
scarcely aware of doing so. Certain turning points in this history changed the field forever,
and focused the attention of thousands of researchers on a single path forward. In recent
years, such systems are most available to private hi-tech companies: hardware and large
groups of researchers are more easily allocated to a particular task by Google, Facebook
and Amazon than by the average university, even in the United States. The former is a
word embedding algorithm devised by Tomas Mikolov and others in 2013.
In the past, one had to read a complete compression to find specific answer, which
was time consuming. Specific answers were not retained hence the percentage of correct
answers was low. These existing systems needs to be overcome with the use of BERT
algorithm which provides a linguistic answer to the user. Henceforth, Can be used in these
embedded systems.
3
independent of its context. BERT is then forced to identify the masked word based on
context alone. In BERT, words are defined by their surroundings, not by a pre-fixed
identity. This breakthrough was the result of Google research on transformers: models that
process words in relation to all the other words in a sentence, rather than one-by-one in
order. BERT models can therefore consider the full context of a word by looking at the
words that come before and after it—particularly useful for understanding the intent behind
search queries.
But it’s not just advancements in software that can make this possible: we needed
new hardware too. Some of the models we can build with BERT are so complex that they
push the limits of what we can do using traditional hardware, so for the first time we’re
using the latest Cloud TPUs to serve search results and get you more relevant information
quickly.
Google also applied BERT to make Search better for people across the world. A
powerful characteristics of these systems is that they can take learnings from one language
and apply them to others. So we can take models that learn from improvements in English
(a language where the vast majority of web content exists) and apply them to other
languages. This helps Google better return relevant results in the many languages that
Search is offered in. Henceforth, BERT is superior to all the QA algorithms.
4
Stanford Question Answering Dataset (SQuAD) is a reading comprehension
dataset, consisting of questions posed by crowd workers on a set of Wikipedia articles,
where the answer to every question is a segment of text, or span, from the corresponding
reading passage, or the question might be unanswerable. The dataset is span-based, meaning
that given a context paragraph and a question, the dataset outputs the span of text that is the
most likely to be the answer to the question. Since we are interested in a sentence-level task,
we converted SQuAD to a new sentence-level dataset. For each original context paragraph,
we divide the paragraph into sentences. Then, we label each sentence based on whether or
not the originally given span answer is within the sentence. If yes, then the sentence is
labelled as 1, and all the other sentences within the same paragraph are labelled as 0.
5
Chapter 2
2.1 Analysis
The proposed prototype aims at creation of a system using BERT (Bidirectional Encoder
Representations from Transformers) algorithm where user can input a question from the
passage of text containing the answer, then span of text corresponding to the text will get
highlighted and user will get the most relevant answer. BERT is a computational model
that converts words into numbers. This process is crucial because machine learning models
take in numbers (not words) as inputs, so an algorithm that converts words into numbers
allows you to train machine learning models on your originally-textual data. Unlike
previous models, BERT is a deeply bidirectional, unsupervised language representation,
pre-trained using only a plain text corpus.
6
2.2.2 Operational Feasibility
Web-based applications are now omnipresent and users have basic knowledge of
surfing internet and using web-based applications. No extra knowledge is required to use
our prototype.However, the users need to know the basics about , “To enter specific
question/context”.The prototype can be used by different organizations. Hence, user can
easily can easily operate the website with the help of simple form and easy navigation. So,
its user friendly and operationally feasible.
7
Chapter 3
PROJECT REQUIREMENTS
We will use this algorithm with training it on Squad 2.0 Dataset. Bert was originally
pretrained on Squad 1.0. Algorithm focuses on syntactic as well as semantic relations and
validations. Furthermore, Transformer is bi-directional to increase semantic understanding
and excel the accuracy. Model will try to take context and answer the question in best
possible way from context. Edge cases present are, Answer can be wrong if question asked
is out of context.
8
3.3 System Requirements and Specification (SRS)
A Functional Requirement (FR) is a description of the service that the software must offer.
It describes a software system or its component.
1. Pytorch - PyTorch is an open source machine learning framework based on the Torch
library, used for applications such as computer vision and natural language processing,
primarily developed by Facebook's AI Research lab. It is one of the widely used Machine
learning libraries, others being TensorFlow and Keras.
3. BERT QNA - The BERT family of models uses the Transformer encoder
architecture to process each token of input text in the full context of all tokens before and
after, hence the name: Bidirectional Encoder Representations from Transformers.
5. numpy as np - NumPy is usually imported under the np alias. alias: In Python alias
are an alternate name for referring to the same thing. Now the NumPy package can be
referred to as np instead of numpy
9
3.4.1 Project Operations/Use:
Sr. Operation
Hardware Description
No.
To develop the
2 Language [Front End] HTML-CSS
system.
To develop the
3 Language [Back End] Python (3)
system.
10
3.6 Advantages of Project
1. Used for a variety of NLP tasks.
2. Organizations can use AI QA system.
3. Accurate and Latest technology on Neural network developed by Google.
4. Improvements in key areas of computational linguistics, including chatbots, question
answering ,summarization, and sentiment detection.
5. An easy route to using pre-trained models (training learning) Capabilities to the fine
tune your data to
the specific language context and problem you face.
11
Chapter 4
This breakthrough was the result of Google research on transformers: models that
process words in relation to all the other words in a sentence, rather than one-by-one in
order. BERT models can therefore consider the full context of a word by looking at the
words that come before and after it—particularly useful for understanding the intent behind
search queries.
But it’s not just advancements in software that can make this possible: we needed
new hardware too. Some of the models we can build with BERT are so complex that they
push the limits of what we can do using traditional hardware, so for the first time we’re
using the latest Cloud TPUs to serve search results and get you more relevant information
quickly.
Google also applied BERT to make Search better for people across the world. A powerful
characteristic of these systems is that they can take learnings from one language and apply
them to others. So we can take models that learn from improvements in English (a language
where the vast majority of web content exists) and apply them to other languages. This
helps Google better return relevant results in the many languages that Search is offered in.
12
paragraph, we divide the paragraph into sentences. Then, we label each sentence based on
whether or not the originally given span answer is within the sentence. If yes, then the
sentence is labelled as 1, and all the other sentences within the same paragraph are labelled
as 0. In this QA system user ask the query which is to be retrieved from documents using
the search engine and response of the query in the form of answer as shown in the figure.
Question processing unit takes the input and context from the user to analyze and classify
the questions written in natural English language. The objective of this analysis is finding
out the question type, sentiment, meaning and the concerns of the questions to escape
ambiguity for the answer. An outline methodology steps is depicted in Figure below.
13
Question
Dataset
Preprocessing
Word Embedding
Model
Implementation
Output
Answer
This QA System is to give the answers to the user whatever user asks. The QA gives
answers by extracting it from context taken from SQuAD dataset that features a diverse
range of question and answer types. QA system automatically answers the question
specifically in short and precise manner queried by the user. In QA system the user needs
to input a paragraph. Then through the Question asking System Question is asked by the
user. And then an index number is allocated to every word and same number is allocated
14
to same word. After this, if the question is answerable then the question is answered with
using the BERT.
We will use BERT algorithm with training it on Squad 2.0 Dataset. Bert was
originally pretrained on Squad 1.0. Algorithm focuses on syntactic as well as semantic
relations and validations. Furthermore, Transformer is bi-directional to increase semantic
understanding and excel the accuracy. Model will try to take context and answer the
question in best possible way from context. Edge cases present are, Answer can be wrong
if question asked is out of context.
Ask Question
Request to
Solution model at
backend
The diagram above is a block diagram of whole question answering system. The
QA gives answers by extracting it from context taken from SQuAD dataset that features a
diverse range of question and answer types.At first user will be authenticated.Then, the
user needs to input a paragraph. Then through the Question asking System Question is
asked by the user. And then an index number is allocated to every word. QA system
automatically finds the annswer specifically in short and precise manner queried by the
user. After this, if the question is answerable then the question is answered with using the
BERT
15
4.2.1 Use Case Diagram
16
4.2.2 DFD Diagram
DFD Level 0:
Keyword
Extraction
New
Question QA
Archive
Topic Modelling
Use
BERT BERT
Sentence BERT + Topic
Sentence
transformer Representation
Ranked
Questions
Question
similarity
and ranking
17
DFD Level 2:
DATASET
Knowled
get base Answer
knowledge
Identification
vocabulary
construction Question BERT
Processing
Analysis
Answer
Word
frequency Answer
Sequence- Extraction
to-sequence
Question attention
Classificatio
Knowledge
n Answer
Vocabulary Validation
Answer
Question
Generation
Reformulations Module
ANSWER
18
4.2.3 E-R Diagram
19
4.2.4 Flow Chart:-
20
4.3 Module Analysis
4.3.1 Module
1. Question Answering AI System Homepage
4.3.3 Algorithm
Step 1: Start
(Assuming required pacakges are already installed)
Step 2: Take Input from the user (context)
Step 4: Now, Model will divide the text into different sentences assigning index each.
(Assuming dataset is pretrained in given format)
Step 5: Furthermore, It will ignore all the white spaces available in the context.
Step 6: Now, Every word in the sentence will get tokenization, which will be done by
BERTTokenizer
pacakage.
21
Step 7: If the model finds same word again, it will assign the same token to the next
repeating word.
Step 8: After Tokenization, Now the Model will verify how much relevant the question
is.
Step 9: The model will now compare the similar words from question ,and find in the
given context.
Step 10: As the model is pretrained, It will try to give most specific and relevant answer.
Step 12: At last it will match the index and similarity in question and context(paragraph).
Step 13: Finally it will display the span of suitable answer from given paragraph.
(Relevent(most matching) answer will be shown if the answer is not available in
the paragraph) .
22
Chapter 5
Results
23
Image 5. 2 : Context,Question & its Displayed Solution
24
</div>
{% endblock %}
• base.html
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<link rel="stylesheet" href="{{url_for('static', filename='css/main.css')}}" />
{% block head%}{% endblock %}
</head>
<body>
{% block body %}{% endblock %}
</body>
</html>
• app.py
import torch
from model import BertQnA
from flask import request
import flask
import os
from flask import Flask, render_template, request
# Tokenizer / model
from transformers import DistilBertForQuestionAnswering
model = DistilBertForQuestionAnswering.from_pretrained("model/")
# Tokenizer
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("model/")
import os
from transformers import BertTokenizer, BertForSequenceClassification
import torch
import model
import torch
from transformers import BertForQuestionAnswering
import os
import torch
from transformers import BertTokenizer, BertForSequenceClassification, AdamW
import pyttsx3, time
engine = pyttsx3.init()
engine.say("Hi, I am text to speach")
engine.runAndWait()
25
app = Flask(__name__)
@app.route("/")
def index():
return render_template("index.html", pred="Please ask a question!")
@app.route("/predict", methods=["POST"])
def predict():
data = [request.form["question"]]
name = [request.form["name"]]
return render_template(
"index.html", pred=f"{name} ...I think the answer is {answer} !?"
)
if __name__ == "__main__":
app.run(debug=True)
• model.py
from transformers import BertTokenizer, BertForSequenceClassification
import torch
from transformers import DistilBertForQuestionAnswering
model = DistilBertForQuestionAnswering.from_pretrained("model/")
# Tokenizer
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("model/")
import os
import numpy as np
import os
import torch
def BertQnA(answer_text, question):
# tokenize
encoded_dict = tokenizer.encode_plus(
text=question, text_pair=answer_text, add_special_tokens=True
)
# Apply the tokenizer to the input text, treating them as a text-pair.
input_ids = encoded_dict["input_ids"]
26
answer_end = torch.argmax(output["end_logits"])
# Get the string versions of the input tokens.
tokens = tokenizer.convert_ids_to_tokens(input_ids)
# Start with the first token.
answer = tokens[answer_start]
# Select the remaining answer tokens and join them with whitespace.
for i in range(answer_start + 1, answer_end + 1):
# If it's a subword token, then recombine it with the previous token.
if tokens[i][0:2] == "##":
answer += tokens[i][2:]
# Otherwise, add a space then the token.
else:
answer += " " + tokens[i]
return answer
27
Chapter 6
SOFTWARE TESTING
28
6.2 GUI Testing :
Test Cases :
The page
To verify the The page
loading
page loading loading symbol
symbol
symbol is should be
Load the displayed
displayed when it displayed when
TC_06 system on when it takes Pass
takes longer than it takes longer
browser longer than the
the default time than the default
default time to
to load the time to load the
load the results
results page. results page.
page.
29
To verify
The prototype The prototype
whether the Load the
should work works only in
TC_07 prototype works system on Pass
only in English English
only in English browser
language language
language
To check
The internal
whether the Load the The internal
tokenization
TC_08 internal system on tokenization is Fail
should not be
tokenization is browser visible.
visible.
not visible
1. Open the
prototype.
2. Open new
Check whether The Context The Context
window for
the Context can should be successfully
searching the
TC_10 be pasted from pasted from pasted from Pass
new context.
source to the source to the source to the
3. Then paste
textbox. textbox. textbox.
the context
which is
search.
1. Open the
prototype.
To verify 2. Open new The relevant
The relevant
whether the window for answer should
answer should
relevant answer searching the be displayed
be displayed
gets displayed new context. when specific
TC_11 when specific Pass
when specific 3. Then paste answer is not
answer is not
answer is not the context available in the
available in the
available in the which is given
given Paragraph
given Paragraph search. Paragraph
4. Click the
Predict button.
30
1. Open the
prototype.
To Check
2. Open new The
whether the The appropriate
window for appropriate
appropriate answer gets
searching the answer gets
answer gets displayed when
new context. displayed
TC_12 displayed when entering Pass
3. Then paste when entering
entering appropriate
the context appropriate
appropriate context and
which is context and
context and question.
search. question.
question.
4. Click the
Predict button.
31
6.3 Unit Testing :
Test Cases :
Test Test Case
Steps Expected result Actual Result Status
case ID Objective
1.Open
To Check
Chrome The website link
whether website link is
TC_01 Browser & should be Pass
prototype available
enter available
exists.
prototype
The
To check
The prototype prototype
whether
Enter the should load load
the
TC_02 Prototype properly and page properly Pass
prototype
URL should be and
loads or
displayed page is
not.
displayed
To check
1.Open The
whether
Chrome user
the user is The user
Browser & should
TC_03 able to authenticated Pass
enter be
use the successfully.
Prototype’s authenticated
particular Text
url successfully.
Box's
1. Open
Chrome
To check Browser &
whether the enter The Context The Context
TC_04 Context textbox Prototype’s textbox should be textbox work pass
is work properly url. work properly properly
or not. 2. Then click
the Context
textbox
1. Open
Chrome
To check
Browser &
whether the
enter The Question The Question
Question
TC_05 Prototype’s textbox should textbox works Pass
textbox is
url. work properly. properly.
working
2. Then click
properly or not.
the Question
textbox.
32
To verify if the
1. Click the Predict button Predict button
predict button
TC_06 predict should work works Pass
work properly or
button. properly. properly.
not.
1.Click on
Predict Answer is
To test the Answer should be
TC_07 Button for properly Pass
Predict Button. display.
getting the displayed.
answer.
1.Click on
Predict
To verify the correct answer correct answer
Button for
TC_08 correct answer is should be display is display Pass
getting the
displayed or not. properly. properly.
Correct
answer.
To check 1.Click on
Correct answer Correct answer
whether the Predict
should be is displayed in
answer is Button for
TC_09 displayed in a particular Pass
display in getting the
particular format format
particular format Correct
succesfully succesfully
or not answer.
Check whether
The numeric The numeric
all the numeric Enter the
values should be values are
TC_10 values are Prototype Pass
displayed properly
formatted URL
properly. display.
properly or not
33
6.4 Stress Testing :
Test Cases :
Test
Test Case
case Steps Expected result Actual Result Status
Objective
ID
To verify the
Open Chrome
project is Project Should Project is
Browser &
TC_01 running on all run on all the properly running Pass
enter Prototype
the windows versions on all the versions
url
version
Verify if the Open Chrome
Project should Project is
project is Browser &
TC_02 run on 4GB properly run on Pass
running on 4GB enter Prototype
RAM 4GB RAM
RAM url
If multiple tabs
are open at the
Open the prototype should prototype work
TC_04 same time then Pass
prototype tabs work properly properly
prototype work
properly or not
To check
Whether the At Open Chrome
Device should
TC_05 a run time Browser enter Device lags Fail
not get lagged.
Particular device Prototype’s url
are hang or not.
Prototype Prototypes
Prototype should
should show doesnot show
Load the show appropriate
appropriate appropriate
TC_06 system on warning message Pass
warning warning message
browser when its under
message when when its under
load.
its under load. load.
34
To verify if user
User should be User is able to
will be able to Load the
able to download download
TC_07 download system on Pass
extension for the extension for the
extension for the browser
browser. browser
browser.
To verify
whether large
Load the Prototype should Prototype work
number of users
TC_08 system on be work properly properly at that Pass
can use the
browser at that time. time.
prototype at the
same time
To verify
whether the
prototype show Prototype should Prototype does
Load the
appropriate not show the not show the
TC_09 system on Pass
warning warning message warning message
browser
message when it at that time. at that time.
is under load
condition.
1.Open
Check whether The user should
Chrome The user is able to
the user is able be able to
Browser & download
TC_10 to download download Pass
enter the extension for the
extension for the extension for the
appropriate browser
browser browser
Prototype
To check
whether the The multiple hits
The multiple hits
multiple hits on Load the on the solution
on the solution
TC_11 the solution system on button should be Pass
button show error
button show browser show error
massage or load
error massage or massage or load
load
35
Chapter 7
COST ESTIMATION
36
➢ Step 2: Multiply each number by a weight factor according to the complexity of the
parameter, associated with that number.
Complexity considered is average.
SR.NO Function point Numbers Weight Factor Multiplication
1 User inputs 2 2 4
2 User Outputs 1 1 1
3 Internal Files 2 2 4
4 External interfaces 1 1 1
➢ Step 3: Calculate the total UFP (Unadjusted function points) by adding the
multiplication column in above table
UFP = 4+1+4+1= 10
➢ Step 4: Calculate the total TCF (Technical Complexity Factor) by giving a value
between 0 and 5
1 Data communication 4
3 Performance criteria 4
7 Online Updating 4
9 Complex Computations 4
37
10 Reusability 3
11 Ease of Installation 3
12 Ease of Operation 3
13 Portability 4
14 Maintainability 4
➢ Step 5: Sum the resulting numbers to obtain DI (degree of influence) by adding the
value column in above table
DI = 50
38
➢ Step 10: Calculate the cost required to develop product by multiplying development
time and average salary of engineers.
39
Chapter 8
APPLICATIONS
1. BERT can be used for a variety of NLP tasks such as Text Classification or
Sentence Classification.
2. Many educational and other organizations can use AI QA system.
3. Latest technology on Neutral network developed by Google and also it is very
accurate.
4. Being accurate it doesn’t require a lot of dataset as compared to other algorithm.
Improvements in key areas of computational linguistics, including chatbots,
question answering, summarization, and sentiment detection.
40
Chapter 9
FUTURE SCOPE
In this QA system user needs to input a paragraph. Then the answer will be retrieved
for the question asked by using the BERT algorithm. But this can improved further.
The further plans are, to reach better accuracies, increase the speed of performance
and implementation of this model with deep learning algorithms to improve performance.
It can include linguistic features that will provide users with a variety of languages to ask
questions. Also, a feature where user can ask more than one question at a time Future
enhancements like language translations ,voice support. In future user can input a pdf as
context and can ask any questions from it.
41
Chapter 10
CONCLUSION
The need of QA System is to give the answers to the user whatever user asks. The QA
gives answers by extracting it from context taken from SQuAD dataset that features a
diverse range of question and answer types. QA system automatically answers the question
specifically in short and precise manner queried by the user. In this QA system user needs
to input a paragraph. Then the answer will be retrieved for the question asked by using the
BERT algorithm.We will test our model with a variety of articles and the system will test
a large number of different questions.
We will use BERT Algorithm in our project, this is a technology that enables anyone
to train their own state-of-the-art question answering system. BERT uses a method of
masked language modeling to keep the word in focus from "seeing itself" - that is, having
a fixed meaning independent of its context. Also, we will use SQUAD Dataset, which is a
reading comprehension dataset, consisting of questions posed by crowd workers on a set
of Wikipedia articles, where the answer to every question is a segment of text, or span,
from the corresponding reading passage, or the question might be unanswerable.
42
Chapter 11
REFERENCE
1] T. Lai, T. Bui, and S. Li, “A review on deep learning techniques applied to answer
selection,” in Proceedings of the 27th International Conference on Computational
Linguistics, 2018, pp. 2132–2144.
2] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning
to align and translate,” arXiv Prepr. arXiv1409.0473, 2014
3] P. Rajpurkar, et al. "SQuAD : 100,000+ questions for machine comprehension of text".
atXiv preprint arXiv: 1606.05250, 2016
4] Sharma, Yashvardhan and Gupta, Sahil,”Deep Learning Approachesfor Question
Answering System”, Procedia Computer Science, vol.132, pp.785-794, 2018
5] Minh-Thang Luong and Hieu Pham and Christopher D. Manning,”Effective
Approaches to Attention-based Neural Machine Translation”,CoRR, vol.
abs/1508.04025, http://arxiv.org/abs/1508.04025, 2015
6] Taeuk Kim, ”Re-implementation of BiDAF in
PyTorch”,https://github.com/galsang/BiDAF-pytorch
7] T.P Sahu, N.K Nagwani, and S. Verma, “Selecting best answer: An empirical
analysis on community question answering sities,” IEEE Access, vol.4, pp. 4797-
4808, 2016.
8] D. Patel, et al. “Comparative Study of Machine Learning Models and BERT on
SQuAD,” arXiv preprint arXiv2005.11313, 2020
43
Published/Presented Paper/Project
Sr. Date of
Title of paper Level Venue Award won
No publication
Paper presented
Question
Gharda
Answering AI
Foundation's First Rank
1. System Using National 23/03/2022
Gharda Institute of
SQUAD
Technology, Lavel
Question
Answering AI Best Reseach
International 25/02/2022 Guru Gobind Singh
2. System Using Paper Award
Foundation, Nashik
SQUAD
Question
Answering AI Amrutvahini
State Participation
2 System Using 28/04/22 Polytechnic,
SQUAD Sangamner(0080)
Project Presented
Question
Gharda Foundation
Answering AI First Rank
5. National 23/03/2022 of Technology,
System Using
Lavel
SQUAD
Question
Institute of Science
Answering AI International Selected Top 10
6. 03/04/2022 & Technology,
System Using
Chennai
SQUAD
Question
JSPM's Rajarshi
Answering AI Participation
7. National 26/03/2022 Shahu College of
System Using
Engineering, Pune
SQUAD
Question
Amrutvahini
Answering AI Participation
8. State 28/04/2022 Polytechnic,
System Using
Sangamner(0080)
SQUAD
44