0% found this document useful (0 votes)
30 views62 pages

Sample Report

This document is a project report submitted by four students - Vedant Ashok Sawant, Akshata Arvind Bidwe, Shubhangi Kiran Thorat, and Arafat Tabrez Shaikh - to the Maharashtra State Board of Technical Education in partial fulfillment of their diploma in computer technology. The report describes a question answering AI system developed using the SQUAD framework. It includes certificates of participation in related conferences and competitions, as well as visions, missions, and outcomes for the institute and computer technology department.

Uploaded by

koxaky1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views62 pages

Sample Report

This document is a project report submitted by four students - Vedant Ashok Sawant, Akshata Arvind Bidwe, Shubhangi Kiran Thorat, and Arafat Tabrez Shaikh - to the Maharashtra State Board of Technical Education in partial fulfillment of their diploma in computer technology. The report describes a question answering AI system developed using the SQUAD framework. It includes certificates of participation in related conferences and competitions, as well as visions, missions, and outcomes for the institute and computer technology department.

Uploaded by

koxaky1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION

(MUMBAI)

“Question Answering AI System using SQUAD”

A
Project Report

Submitted by
VEDANT ASHOK SAWANT
AKSHATA ARVIND BIDWE
SHUBHANGI KIRAN THORAT
ARAFAT TABREZ SHAIKH

In partial fulfillment for the award of the Diploma Engineering


in the course Computer Technology at

Department of Computer Technology


K. K. WAGH POLYTECHNIC, NASHIK
Academic Year 2021-22

I
MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION
(MUMBAI)

“Question Answering AI System using SQUAD”

A Project Report

Submitted by:

Sr. No. Name of Student Exam Seat No.


1) Vedant Ashok Sawant : 319875
2) Akshata Arvind Bidwe : 319842
3) Shubhangi Kiran Thorat : 319883
4) Arafat Tabrez Shaikh : 319876

In partial fulfillment for the award of the Diploma Engineering


in the course Computer Technology at

Department of Computer Technology


K. K. WAGH POLYTECHNIC, NASHIK
Academic Year 2021-22

II
MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION
MUMBAI

A
Project Report
on

“Question Answering AI System using SQUAD”

Submitted by:

Sr. No. Name of Student Exam Seat No.


1) Vedant Ashok Sawant : 319875
2) Akshata Arvind Bidwe : 319842
3) Shubhangi Kiran Thorat : 319883
4) Arafat Tabrez Shaikh : 319876

Under the Guidance of:


Name of Guide: Mrs. M. A. Shaikh
Designation: Lecturer in Computer Technology

I
Department of Computer Technology
K. K. WAGH POLYTECHNIC, NASHIK
Academic Year 2021-22
K. K. Wagh Education Society’s

K. K. WAGH POLYTECHNIC
Hirabai HaridasVidyanagari, Amrutdham, Panchavati, Nashik-422003, Maharastra

Certificate
This is certify that :

Name of Student Class Enrolment No. Exam Seat No.

1. Vedant Ashok Sawant TYCM-SS 1911030117 319875


2. Akshata Arvind Bidwe TYCM-SS 1911030084 319842
3. Shubhangi Kiran Thorat TYCM-SS 1911030125 319883
4. Arafat Tabrez Shaikh TYCM-SS 1911030118 319876

From the institute - K. K. Wagh Polytechnic, Nashik has completed the Project (Capstone
Project Planning and Execution (CPE)) for their final year having title Question
Answering AI System Using SQUAD during the Academic Year 2021-22 in the partial
fulfillment of Diploma in Computer Technology. The project is completed in a group
consisting of 4 persons under the guidance of the Faculty Guide.

Date : 20/05/2022
Place : Nashik

Prof. M. A. Shaikh Prof. G. B. Katkade


Internal Faculty Guide HOD - Computer
Seal of
Institute

Prof. P.T Kadave


Principal – K. K. Wagh

II
Sponsor’s Certificate

III
IIC Participation Certificate/ Appreciation Certificate(s)

1. First International Conference on RTETM-2022


- Category : International Paper Competition
- Received Best Paper Award

IV
V
2. International Level SRM Hackhthon [SRM-6.0]
- Category : Prototype Hackthon Competition
- Selected on Top 10 Finalized Teams

VI
VII
3. National Level Project Presentation Competition [SHODH-2022]
- Category : National Level Project Competition
- Received First Rank Prize Award with Exciting Cash Prize

VIII
IX
ACKNOWLEDGEMENT

With deep sense of gratitude we would like to thanks all the people who have lit our path
with their kind guidance. We are very grateful to these intellectuals who did their best to
help during our project work.
It is our proud privilege to express deep sense of gratitude to, Prof. P. T. Kadave,
Principal, K. K. Wagh Polytechnic, Nashik for his comments and kind permission to
complete this project. We remain indebted to Prof. G.B. Katkade, Head of Computer
Technology Department for his timely suggestion and valuable guidance.

The special gratitude goes our external guide Ms. Poonam Kulkarni ,Computer
Expertise at Purushottam English School, Nashik for their sponsorship permissions and
directions for our project selection and implementations. We are greateful and remain
indebt to our Internal Guide Mrs. M. A. Shaikh, for her consistent instructions, guidance
for the completion of project.

We are thankful to all Faculty members, Technical staff members of Computer


Technology Department for their expensive, excellent and precious guidance in completion
of this work. We thanks to all the class colleagues for their appreciable help for our working
project.

With various industry owners or lab technicians to help, it has been our endeavor
to through out our work to cover the entire project work.

We also thankful to our parents who providing their wishful support for our project
completion successfully.Lastly we thanks to our all friends and the people who are directly
or indirectly related to our project work.

Names of Students
1) Vedant Sawant class: TYCM-SS
2) Akshata Bidwe class: TYCM-SS
3) Shubhangi Thorat class: TYCM-SS
4) Arafat Shaikh class: TYCM-SS

X
Vision & Mision
Institute Vision : - Strive to empower students with Quality Technical Education.

Institute Mission :- Committed to develop students as Competent and Socially Responsible


Diploma Engineers by inculcating learning to learn skills, values and ethics, entrepreneurial
attitude, safe and eco-friendly outlook and innovative thinking to fulfill aspirations of all the
stakeholders and contribute in the development of Organization, Society and Nation.

Department Vision :- To impart quality technical education for development of


technocrats.

Department Mission :-

M1- To provide quality in education and facilities for students to help them to achieve higher
academic career growths.
M2- To impart education to meet the requirements of the industry and society by technological
solutions.
M3- Develop technical & soft skill through co–curricular and extra-curricular activities for
improving personality.

Program Educational Objectives:-

PEO1:Provide socially responsible, environment friendly solutions to Computer engineering


related broad-based problems adapting professional ethics.
PEO2: Adapt state-of-the-art Computer engineering broad-based technologies to work in multi-
disciplinary work environments.

PEO3: Solve broad-based problems individually and as a team member communicating


effectively in the world of work.

Program Specific Outcome:-(Version – 1.2)

PSO 1: Computer Software and Hardware Usage: Use state-of-the-art technologies for
operation and application of computer software and hardware.

PSO 2: Computer Engineering Maintenance: Maintain computer engineering related


software and hardware systems.

XI
Program Outcomes:-
PO 1: Basic knowledge: Apply knowledge of basic mathematics, sciences and basic
engineering tosolve the broad-based Computer engineering problem.

PO 2: Discipline knowledge: Apply Computer engineering discipline - specific knowledge to


solvecore computer engineering related problems.

PO 3: Experiments and practice: Plan to perform experiments and practices to use the results
to solvebroad -based Computer engineering problems.

PO 4: Engineering tools: Apply relevant Computer technologies and tools with an


understanding of thelimitations.

PO 5: The engineer and society: Assess societal, health, safety, legal and cultural issues and
the consequent responsibilities relevant to practice in field of Computer engineering.

PO 6: Environment and sustainability: Apply Computer engineering solutions also for


sustainable development practices in societal and environmental contexts and
demonstrates the knowledge and need for sustainable development.

PO 7: Ethics: Apply ethical principles for commitment to professional ethics, responsibilities


and norms of the practice also in the field of Computer engineering.

PO 8: Individual and team work: Function effectively as a leader and team member in diverse/
multidisciplinary teams.

PO 9: Communication:Communicate effectively in oral and written form.

PO10: Life-long learning: Engage in independent and life-long learning activities in the context
of technological changes in the Computer engineering field and allied industry.

XII
Abstract

In recent years, one needs answer to the question from a huge data on finger tips. Artificial
Intelligence Question Answering is about making a computer program that could answer
questions in natural language. It can be achieved using SQUAD (Stanford Question
Answering Dataset) which will include questions asked by humans from the given
comprehension.
The project aims at creation of a system specifically using BERT (Bidirectional
Encoder Representations from Transformers) algorithm where user can input a question
from the passage of text containing the answer, then span of text corresponding to the text
will get highlighted and user will get the most relevant answer. BERT is a computational
model that converts words into numbers. This process is crucial because machine learning
models take in numbers (not words) as inputs, so an algorithm that converts words into
numbers allows you to train machine learning models on your originally-textual data.
Unlike previous models, BERT is a deeply bidirectional, unsupervised
language representation, pre-trained using only a plain text corpus.
Question answering is at the heart of natural language processing and is composed
of two sections: Reading Comprehension and Answer Selection. Question Answering were
based on statistical methods and researchers generated set of features based on text input.
Answer Selection is a fundamental task in Question Answering, also a tough one because
of the complicated semantic relations between questions and answers. Attention is a
mechanism that has revolutionized deep learning community. These techniques are widely
used among search engines, personal assistant applications on smart phones, voice control
systems and a lot more other applications. We concluded that BERT Model is superior in
all aspects of answering various types of questions.

Keywords: BERT (Bidirectional Encoder Representations from Transformers), SQUAD


(Stanford Question Answering Dataset)

XIII
Table of Contents
Sr.No. Name of topic Page no.
Certificates I

Acknowledgement VIII

Abstract X

Table of Contents XI

Figure Index XIII

1 Introduction 1

1.1 Literature Survey 2

1.2 Existing System 3

1.3 Proposed System 3

2 Analysis and Feasibility 6

2.1 Analysis 6

2.2 Feasibility Study 6

3 Project Requirement 8

3.1 About Proposed Project 8

3.2 Area of Implementation 8

3.3 System Requirement and Specification 9

3.4 Hardware Requirement 9

3.5 Software Requirements 10

3.6 Advantages 11

3.7 Limitations 11
4 Project Design and Implementation 12

XIV
4.1 Block Diagram, DFD Diagram & UML 15
Diagram
4.2 Module Analysis 21
4.3 User Interface Design 12
5 Results 23

6 Software Testing 28

6.2 GUI Testing 29

6.3 Unit Testing 32

6.4 Stress Testing 34

7 Cost Estimation 36

7.1 COCOMO Model 36

8 Applications 40

9 Future Scope 41

10 Conclusion 42

11 References 43

XV
Index of Figures
Fig.No. Name of figure Page no.
1.3.1 Proposed System Workflow 4
4.1.1 Q A system Flow 13
4.1.2 System Context Diagram 14
4.2.1.1 Block Diagram 15
4.2.1.2 Use Case Diagram 16
4.2.2.1 DFD Level-0 17
4.2.2.2 DFD Level-1 17
4.2.2.3 DFD Level-2 18
4.2.3.1 Entity Realtionship Diagram 19
4.2.3.1 Flowchart of the system 20

XVI
Chapter 1

INTRODUCTION

Artificial Intelligence Question Answering is about making a computer program that could
answer questions in natural language automatically. Question Answering techniques are
widely used among search engines, personal assistant application on smart phones, voice
control systems and a lot more than other applications. This Project comprises BERT
(Bidirectional Encoder Representations from Transformers) which is used by the renowned
search engine ‘Google’, to under understand the users’ search intensions and contents that
are indexed by the search engine. BERT (Bidirectional Encoder Representations from
Transformers) is a transformer-based machine learning technique for natural language
processing pre-training and it was created, published and developed by Google in 2018.

In the past decade, several datasets for question answering tasks have been
proposed. These resources, while valuable towards the end-goal of training question-
answering systems, all experience a considerable tradeoff between data quality and size.
Some older datasets, such as those of Brent et al. and Richardson, et al. use human curated
questions and effectively capture the nuances of natural language, but – due to the labor
involved in generating the questions– are often insufficient for training robust machine
learning models for the task. Conversely, datasets generated via automation, such as those
of Hermann, et al. and Hill, et al. lack the structure of authentic human language and, as a
result, lose the ability to test for the core skills involved in reading comprehension.
Moreover, many of these older datasets employ multiple choice or single-word formats for
the ground-truth answers which inherently limits the ability of models to learn linguistic
structure when formulating answers. BERT is a computational model that converts words
into numbers. This process is crucial because machine learning models take in numbers
(not words) as inputs, so an algorithm that converts words into numbers allows you to train
machine learning models on your originally-textual data. Unlike previous models, BERT
is a deeply bidirectional, unsupervised language representation, pre-trained using only a
plain text corpus.

In response to the shortcomings of these previous question answering datasets,


Rajpurkar et al. released the Stanford Question Answering Dataset (SQuAD) in 2016.
Utilizing questions generated from 536 Wikipedia articles by a team of crowdworkers,
SQuAD consists of over 100,000 rows of data – far exceeding the size of similar datasets
– in the form of a question, an associated Wikipedia context paragraph containing the
answer to the question, and the answer. The ground-truth answer labels are represented in

1
the form of two indices, a start index as, and an end index ae, which represent words in the
context paragraph.

1.1 Literature Survey


After analyzing IEE papers with their Authors here have developed a QA system that
answers concisely for the questions through natural language processing. Their proposed
system worked well than the existing QA system.

The existing system works for the Amharic QA that is based on factoid. So authors
have designed an Amharic non-factoid QA. The question types here solved are definition,
biography, and description types. They have used lexical patterns for the extraction of
answers. [6] Here the author has stated a QA system that uses the answer triggering for the
providence of answers. The answer triggering system selects an answer from the given set
of the answers. For this system, a cognitive approach is used for the selection of an
appropriate answer from the set. They have used the WIQIKA dataset for the experiment.
The precision achieved in this system is 48.89, Recall as 64.17, and F1 as 55.49.

The previous paper[1] here gives a survey on the QA system where the answer is
provided precisely in the form of natural language. They have studied both the structured
and unstructured datasets and the combination of both. State less QA is surveyed here in
areas like RFD, linked data, etc. 21 system on this is studied here and 23 evaluations are
analyzed.Here a factoid QA model is given by the authors for the answering of questions
in the natural language. Here the authors have presented a model and named it as
Temporality-Enhanced Knowledge Memory Network (TE-KMN). They have applied this
model on the model of trivia named s\as quiz bowl. They have obtained 74.46% accuracy
with the proposed model.In this paper, authors have worked on a Chinese QA system. They
have worked on the application of both the techniques like Named Entity Recognition
(NER) and the Metric Cluster (MC). The model proposed by them worked well for factoid
QA and achieved an MRR value of .6883.The authors have presented a semantic-based
QA for answering in this paper. The questions here asked and answered are for the tourism
area. So this model can also be said as tourism QA. Firstly the detection of question is done
then the SPARQL query is optimized. The accuracy found here is 80%. The paper works
on the QA system for factoid QA.

They have worked on the development of a web-scale factoid QA system and the
open domain is presented here for the factoid questions. They have divided the questions
into major 5 categories and worked on the answering of these questions. They have
achieved accuracy with the Wikipedia data up to 62.11%. In this paper, a survey on the
accuracy evaluation for the QA system based on the webis done. The paper[2] states the

2
study done with the three categories of QA system that are, the extraction of an answer,
answer scoring, and the aggregation of the answers. They stated that the survey will help
in the selection of appropriate QA systems for usage.Here the authors have developed a
QA system for the biomedical field and the complexity in this field for the answering is
solved with their approach. Here they have used a multi-label classification method for the
classification of questions and types that are listed for QA of this field. They have enhanced
the F1 by 2% and the MRR by 3%.

Here the classification of the text is done by the authors and for this, they have used
capsule network. CNN is used for the classification purpose. Cap-Net is used for the
multitasking framework. Various data sets have been used for the testing of the proposed
approach.

1.2 Existing System:


Natural Language Processing has been the highest research in the field of Artificial
Intelligence. One of the founding fathers of artificial intelligence, Alan Turing, suggested
this as a possible application for the “learning machines” he imagined as early as the late
1940s.

The world has moved on since the days of these early pioneers, and today we use
NLP solutions without even realizing it. We live in the world Turing dreamt of, but are
scarcely aware of doing so. Certain turning points in this history changed the field forever,
and focused the attention of thousands of researchers on a single path forward. In recent
years, such systems are most available to private hi-tech companies: hardware and large
groups of researchers are more easily allocated to a particular task by Google, Facebook
and Amazon than by the average university, even in the United States. The former is a
word embedding algorithm devised by Tomas Mikolov and others in 2013.

In the past, one had to read a complete compression to find specific answer, which
was time consuming. Specific answers were not retained hence the percentage of correct
answers was low. These existing systems needs to be overcome with the use of BERT
algorithm which provides a linguistic answer to the user. Henceforth, Can be used in these
embedded systems.

1.3 Proposed System:


Google introduced and open-sourced a neural network-based technique for natural
language processing (NLP) pre-training called Bidirectional Encoder Representations from
Transformers, called it BERT, for short. This technology enables anyone to train their own
state-of-the-art question answering system. BERT uses a method of masked language
modeling to keep the word in focus from "seeing itself" - that is, having a fixed meaning

3
independent of its context. BERT is then forced to identify the masked word based on
context alone. In BERT, words are defined by their surroundings, not by a pre-fixed
identity. This breakthrough was the result of Google research on transformers: models that
process words in relation to all the other words in a sentence, rather than one-by-one in
order. BERT models can therefore consider the full context of a word by looking at the
words that come before and after it—particularly useful for understanding the intent behind
search queries.

But it’s not just advancements in software that can make this possible: we needed
new hardware too. Some of the models we can build with BERT are so complex that they
push the limits of what we can do using traditional hardware, so for the first time we’re
using the latest Cloud TPUs to serve search results and get you more relevant information
quickly.

Google also applied BERT to make Search better for people across the world. A
powerful characteristics of these systems is that they can take learnings from one language
and apply them to others. So we can take models that learn from improvements in English
(a language where the vast majority of web content exists) and apply them to other
languages. This helps Google better return relevant results in the many languages that
Search is offered in. Henceforth, BERT is superior to all the QA algorithms.

Fig.1.3.1: Proposed Systems Workflow

4
Stanford Question Answering Dataset (SQuAD) is a reading comprehension
dataset, consisting of questions posed by crowd workers on a set of Wikipedia articles,
where the answer to every question is a segment of text, or span, from the corresponding
reading passage, or the question might be unanswerable. The dataset is span-based, meaning
that given a context paragraph and a question, the dataset outputs the span of text that is the
most likely to be the answer to the question. Since we are interested in a sentence-level task,
we converted SQuAD to a new sentence-level dataset. For each original context paragraph,
we divide the paragraph into sentences. Then, we label each sentence based on whether or
not the originally given span answer is within the sentence. If yes, then the sentence is
labelled as 1, and all the other sentences within the same paragraph are labelled as 0.

5
Chapter 2

ANALYSIS AND FEASIBILITY

2.1 Analysis
The proposed prototype aims at creation of a system using BERT (Bidirectional Encoder
Representations from Transformers) algorithm where user can input a question from the
passage of text containing the answer, then span of text corresponding to the text will get
highlighted and user will get the most relevant answer. BERT is a computational model
that converts words into numbers. This process is crucial because machine learning models
take in numbers (not words) as inputs, so an algorithm that converts words into numbers
allows you to train machine learning models on your originally-textual data. Unlike
previous models, BERT is a deeply bidirectional, unsupervised language representation,
pre-trained using only a plain text corpus.

2.2 Feasibility Study


For all the new systems, the engineering process should act with the feasibility study. The
input to the feasibility study is only the description of the system and how it will be used
within an organization. The result of the feasibility study should be a report, which
recommends whether it is worth carrying with the requirement engineering and the system
development process.

2.2.1 Technical Feasibility


The technical requirement of this system is a PC or smartphone with active uninterrupted
internet connection. As this is a prototype which can be embedded in other softwares, the
user who will be using the software must have a computer with a operating system. If we
see the prototype a whole, it will require large storage.

As the prototype is implemented in python,under the hood HTML,CSS which


makes is user friendly, error free, efficient, and free of cost as well as platform
Independent.The analysis of answer is done real time with the help of BERT algorithm
which run on highly reliable and high-speed python script, enables users to get the most
revevent answer within miliseconds. As the project front-end is web-based, users don’t
have restrictions of the OS platform through which they can access the prototype.

6
2.2.2 Operational Feasibility
Web-based applications are now omnipresent and users have basic knowledge of
surfing internet and using web-based applications. No extra knowledge is required to use
our prototype.However, the users need to know the basics about , “To enter specific
question/context”.The prototype can be used by different organizations. Hence, user can
easily can easily operate the website with the help of simple form and easy navigation. So,
its user friendly and operationally feasible.

2.2.3 Economic Feasibility


Economic analysis the most frequently used method for evaluating the effectiveness of the
system. More commonly known as cost analysis the procedure to determine the benefits
and saving that are expected from a system, the labour expenses is reduced. Owning to the
proposed system, a considerable amount of money will be saved as earlier, analyzing
answer from the context was done manually, it was time consuming. Moreover, the system
is developed on top of the BERT algorithm, thus eliminating need of initial investment in
hardware. Last but not the least, all the platforms used from OS to webserver software are
open-source, making licencing costs zero. The cost for development of the system is very
moderate. Thus, the system is economically feasible. The initial cost of this prototype will
be around Rs. 15,000 per month for the entire prototype.The development cost in future
will be putting the system on its extension.

7
Chapter 3

PROJECT REQUIREMENTS

3.1 About Proposed Project


This QA System is to give the answers to the user whatever user asks. The QA
gives answers by extracting it from context taken from SQuAD dataset that features a
diverse range of question and answer types. QA system automatically answers the question
specifically in short and precise manner queried by the user. In QA system the user needs
to input a paragraph. Then through the Question asking System Question is asked by the
user. And then an index number is allocated to every word and same number is allocated
to same word. After this, if the question is answerable then the question is answered with
using the BERT algorithm.

We will use this algorithm with training it on Squad 2.0 Dataset. Bert was originally
pretrained on Squad 1.0. Algorithm focuses on syntactic as well as semantic relations and
validations. Furthermore, Transformer is bi-directional to increase semantic understanding
and excel the accuracy. Model will try to take context and answer the question in best
possible way from context. Edge cases present are, Answer can be wrong if question asked
is out of context.

3.2 Area of Implementation


Organizations :
These systems can be embedded in software’s, E-commerce web portals, T &
C & many other oragnizations make use. It can be in E-commerce websites where one have
to find a specific answer from a big context of information about some product. Also in
chatbots, which will have less service friction which can improve the brand experience for
customers. For companies looking to improve their customer experiences, the addition of
chatbots to answer simple questions can improve satisfaction, streamline the customer
journey, and provide customer-centric support. Furthermore, whereever the you find big
paragraphs this QA system plays a big role.

8
3.3 System Requirements and Specification (SRS)

3.3.1 Project Functional and System Requirements

A Functional Requirement (FR) is a description of the service that the software must offer.
It describes a software system or its component.
1. Pytorch - PyTorch is an open source machine learning framework based on the Torch
library, used for applications such as computer vision and natural language processing,
primarily developed by Facebook's AI Research lab. It is one of the widely used Machine
learning libraries, others being TensorFlow and Keras.

2. Forge transformers - If you've worked on machine learning problems, you probably


know that transformers in Python can be used to clean, reduce, expand or generate
features. The fit method learns parameters from a training set and the transform method
applies transformations to unseen data.

3. BERT QNA - The BERT family of models uses the Transformer encoder
architecture to process each token of input text in the full context of all tokens before and
after, hence the name: Bidirectional Encoder Representations from Transformers.

4. BertTokenizer - BERT uses what is called a WordPiece tokenizer. It works by


splitting words either into the full forms (e.g., one word becomes one token) or into word
pieces — where one word can be broken into multiple tokens. An example of where this
can be useful is where we have multiple forms of words.

5. numpy as np - NumPy is usually imported under the np alias. alias: In Python alias
are an alternate name for referring to the same thing. Now the NumPy package can be
referred to as np instead of numpy

3.4 Hardware Requirements

3.4.1 Project Development:


Sr. No. Hardware Specification

1 Processor Intel core i3 or above

2 RAM 4GB or more

9
3.4.1 Project Operations/Use:
Sr. Operation
Hardware Description
No.

Processor Intel core i3 or above To access the web


1
portal.

RAM 4GB or more To handle the


2
computations.

3.5 Software Requirements

3.5.1 Project Development


Sr. Operation
Software Description
No.

Windows 7 (x64 bits) or


1 Operating System To run the prototype.
above

To develop the
2 Language [Front End] HTML-CSS
system.

To develop the
3 Language [Back End] Python (3)
system.

Transformers, BERT To acess prewritten


4 Libraries/Packages
Tokenizer, Pytorch. code.

Anaconda and Visual To edit the project


5 IDE
Studio Code files.

10
3.6 Advantages of Project
1. Used for a variety of NLP tasks.
2. Organizations can use AI QA system.
3. Accurate and Latest technology on Neural network developed by Google.
4. Improvements in key areas of computational linguistics, including chatbots, question
answering ,summarization, and sentiment detection.
5. An easy route to using pre-trained models (training learning) Capabilities to the fine
tune your data to
the specific language context and problem you face.

3.7 Limitations and Constraints of Project


1. Requires dataset in a particular format in our project it is squad. Requires a lot of
space.
2. BERT is heavy model, , it requires a lot of storage. It requires dataset in particular
format only for ex- SQuAD dataset.
3. In this QA system user is limited to type question in how, who, what, where, when,
why.
4. User can ask questions only in English.

11
Chapter 4

PROJECT DESIGN AND IMPLENTATION

4.1 Design Concept


Google introduced and open-sourced a neural network-based technique for natural
language processing (NLP) pre-training called Bidirectional Encoder Representations from
Transformers, called it BERT, for short. This technology enables anyone to train their own
state-of-the-art question answering system. BERT uses a method of masked language
modeling to keep the word in focus from "seeing itself" - that is, having a fixed meaning
independent of its context. BERT is then forced to identify the masked word based on
context alone. In BERT, words are defined by their surroundings, not by a pre-fixed
identity.

This breakthrough was the result of Google research on transformers: models that
process words in relation to all the other words in a sentence, rather than one-by-one in
order. BERT models can therefore consider the full context of a word by looking at the
words that come before and after it—particularly useful for understanding the intent behind
search queries.

But it’s not just advancements in software that can make this possible: we needed
new hardware too. Some of the models we can build with BERT are so complex that they
push the limits of what we can do using traditional hardware, so for the first time we’re
using the latest Cloud TPUs to serve search results and get you more relevant information
quickly.
Google also applied BERT to make Search better for people across the world. A powerful
characteristic of these systems is that they can take learnings from one language and apply
them to others. So we can take models that learn from improvements in English (a language
where the vast majority of web content exists) and apply them to other languages. This
helps Google better return relevant results in the many languages that Search is offered in.

Stanford Question Answering Dataset (SQuAD) is a reading comprehension


dataset, consisting of questions posed by crowd workers on a set of Wikipedia articles,
where the answer to every question is a segment of text, or span, from the corresponding
reading passage, or the question might be unanswerable. The dataset is span-based,
meaning that given a context paragraph and a question, the dataset outputs the span of text
that is the most likely to be the answer to the question. Since we are interested in a sentence-
level task, we converted SQuAD to a new sentence-level dataset. For each original context

12
paragraph, we divide the paragraph into sentences. Then, we label each sentence based on
whether or not the originally given span answer is within the sentence. If yes, then the
sentence is labelled as 1, and all the other sentences within the same paragraph are labelled
as 0. In this QA system user ask the query which is to be retrieved from documents using
the search engine and response of the query in the form of answer as shown in the figure.

Search Engine QA System


User
Document Answer
Query
retrival Extraction

Fig. 4.1.1 : QA System Flow

Question processing unit takes the input and context from the user to analyze and classify
the questions written in natural English language. The objective of this analysis is finding
out the question type, sentiment, meaning and the concerns of the questions to escape
ambiguity for the answer. An outline methodology steps is depicted in Figure below.

13
Question

Dataset

Preprocessing

Word Embedding

Model
Implementation

Output

Answer

Fig. 4.1.2 : System Context Diagram

This QA System is to give the answers to the user whatever user asks. The QA gives
answers by extracting it from context taken from SQuAD dataset that features a diverse
range of question and answer types. QA system automatically answers the question
specifically in short and precise manner queried by the user. In QA system the user needs
to input a paragraph. Then through the Question asking System Question is asked by the
user. And then an index number is allocated to every word and same number is allocated

14
to same word. After this, if the question is answerable then the question is answered with
using the BERT.
We will use BERT algorithm with training it on Squad 2.0 Dataset. Bert was
originally pretrained on Squad 1.0. Algorithm focuses on syntactic as well as semantic
relations and validations. Furthermore, Transformer is bi-directional to increase semantic
understanding and excel the accuracy. Model will try to take context and answer the
question in best possible way from context. Edge cases present are, Answer can be wrong
if question asked is out of context.

4.2 Block Diagram, DFD Diagram and UML Diagram

4.2.1 Block Diagram

User Login User Add Context

Ask Question

Request to
Solution model at
backend

Fig. 4.2.1.1 : Block Diagram

The diagram above is a block diagram of whole question answering system. The
QA gives answers by extracting it from context taken from SQuAD dataset that features a
diverse range of question and answer types.At first user will be authenticated.Then, the
user needs to input a paragraph. Then through the Question asking System Question is
asked by the user. And then an index number is allocated to every word. QA system
automatically finds the annswer specifically in short and precise manner queried by the
user. After this, if the question is answerable then the question is answered with using the
BERT

15
4.2.1 Use Case Diagram

Fig. 4.2.1.2 : Use Case Diagram

The diagram displayed above is graphical depiction of user’s possible interactions


with the system. It shows various use cases and different types of actors the system has and
will often be accompanied by other types. In this Question Answering system the user will
be the front and main actor. User will be open to enter context i.e. a paragraph with a
question based on the context. On the other hand, the model will store the context in to
dataset in backend and the dataset will act as tokenizer (tokens to every word will be
provided). After the tokenization the BERT will find the relevant span of text which is
corresponding to the question and the sentence in context. The model will then provide the
linguistic answer as solution to the user.

16
4.2.2 DFD Diagram

DFD Level 0:

Fig. 4.2.2.1 : DFD Level 0 Diagram


DFD Level 1:

Keyword
Extraction

New
Question QA
Archive
Topic Modelling
Use
BERT BERT
Sentence BERT + Topic
Sentence
transformer Representation

Ranked
Questions
Question
similarity
and ranking

Fig. 4.2.2.2 : DFD Level 1 Diagram

17
DFD Level 2:

DATASET

Source Question Answer


text

Knowled
get base Answer
knowledge
Identification
vocabulary
construction Question BERT

Processing
Analysis

Answer
Word
frequency Answer
Sequence- Extraction
to-sequence
Question attention
Classificatio
Knowledge
n Answer
Vocabulary Validation
Answer
Question
Generation
Reformulations Module

ANSWER

Fig. 4.2.2.3 : DFD Level 2 Diagram

18
4.2.3 E-R Diagram

Fig. 4.2.3.1 : Entity Relationship Diagram

Above is the Entity Relationship diagram of question answering system


which displays the attributes of different module. It is used for relating each module.
As of above diagram, the module context has two attributes named token_id and
word_no which are used for tokenization. Then as user have to ask a question both
modules(Context & Question) are co-related. Furthermore, Question entity will
request backend which have attributes as BERT & NLP for solution. Backend will
search through database which have attributes as SQUAD and Libraries and provide
a specific solution to user as per the question request.

19
4.2.4 Flow Chart:-

Fig. 4.2.3.1 : Flowchart of the system

20
4.3 Module Analysis

4.3.1 Module
1. Question Answering AI System Homepage

4.3.2 Purpose of Module:


Input 1: Enter the Context
Purpose: This field is for user to enter the context i.e. paragraph. The model will use
the same paragraph to find or display the span of answer.
For E.g. Avul Pakir Jainulabdeen Abdul Kalam (15 October 1931 – 27 July 2015)
was an Indian aerospace scientist who served as the 11th president of India from
2002 to 2007. He was born and raised in Rameswaram, Tamil Nadu and studied
physics and aerospace engineering.
Input 2 : Ask a Question
Purpose: This field is for user to enter or ask a question. The model will use the same
question to compare for the solution.
For E.g. Where was Avul Pakir Jainulabdeen Abdul Kalam born?

Input 3 : Output to be displayed


Purpose: This field is for user to see output. The model very efficient that it will
answer withing couple of seconds.
For E.g. Click on submit button to display the answer
Answer - Rameshwaram

4.3.3 Algorithm
Step 1: Start
(Assuming required pacakges are already installed)
Step 2: Take Input from the user (context)

Step 3: Take another input i.e. question from user

Step 4: Now, Model will divide the text into different sentences assigning index each.
(Assuming dataset is pretrained in given format)

Step 5: Furthermore, It will ignore all the white spaces available in the context.

Step 6: Now, Every word in the sentence will get tokenization, which will be done by
BERTTokenizer
pacakage.

21
Step 7: If the model finds same word again, it will assign the same token to the next
repeating word.

Step 8: After Tokenization, Now the Model will verify how much relevant the question
is.

Step 9: The model will now compare the similar words from question ,and find in the
given context.

Step 10: As the model is pretrained, It will try to give most specific and relevant answer.

Step 11: Now, It will compare the indexes (tokenized words).

Step 12: At last it will match the index and similarity in question and context(paragraph).

Step 13: Finally it will display the span of suitable answer from given paragraph.
(Relevent(most matching) answer will be shown if the answer is not available in
the paragraph) .

22
Chapter 5

Results

Image 5.1 : Example 1 Context,Question & its Displayed Solution

Image 5.2 : Context,Question & its Displayed Solution

23
Image 5. 2 : Context,Question & its Displayed Solution

4.5.2 Source Code


• index.html
{% extends 'base.html' %}
{% block head%}
{% endblock %}
{% block body %}
<div>
<center>
<p style="margin-bottom:2cm;"></p>
<h1>Bert , </h1>
<h1> to answer your Question </h1>
<h3></h3>
<p style="margin-bottom:2cm;"></p>
<form action="/predict" method="POST">
<div>
<p>Context (tweet)?</p>
<input type="text" name="question" id="question" placeholder="Enter
Context Here ">
<p>Your Question ?</p> <input type="text" name="name" id="name"
placeholder="Enter Question Here ">
</div>
<p style="margin-bottom:1cm;"></p>

<input type="submit" value="Predict">


</form>
<p style="margin-bottom:1cm;"></p>
<h2>{{pred}}</h2>
</center>

24
</div>
{% endblock %}

• base.html
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<link rel="stylesheet" href="{{url_for('static', filename='css/main.css')}}" />
{% block head%}{% endblock %}
</head>
<body>
{% block body %}{% endblock %}
</body>
</html>

• app.py
import torch
from model import BertQnA
from flask import request
import flask
import os
from flask import Flask, render_template, request

# Tokenizer / model
from transformers import DistilBertForQuestionAnswering

model = DistilBertForQuestionAnswering.from_pretrained("model/")
# Tokenizer
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("model/")
import os
from transformers import BertTokenizer, BertForSequenceClassification
import torch
import model
import torch
from transformers import BertForQuestionAnswering
import os
import torch
from transformers import BertTokenizer, BertForSequenceClassification, AdamW
import pyttsx3, time
engine = pyttsx3.init()
engine.say("Hi, I am text to speach")
engine.runAndWait()

25
app = Flask(__name__)
@app.route("/")
def index():
return render_template("index.html", pred="Please ask a question!")
@app.route("/predict", methods=["POST"])
def predict():
data = [request.form["question"]]

name = [request.form["name"]]

answer = model.BertQnA(name[0], data[0])

return render_template(
"index.html", pred=f"{name} ...I think the answer is {answer} !?"
)
if __name__ == "__main__":
app.run(debug=True)

• model.py
from transformers import BertTokenizer, BertForSequenceClassification
import torch
from transformers import DistilBertForQuestionAnswering
model = DistilBertForQuestionAnswering.from_pretrained("model/")
# Tokenizer
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("model/")
import os
import numpy as np
import os
import torch
def BertQnA(answer_text, question):
# tokenize
encoded_dict = tokenizer.encode_plus(
text=question, text_pair=answer_text, add_special_tokens=True
)
# Apply the tokenizer to the input text, treating them as a text-pair.
input_ids = encoded_dict["input_ids"]

# Report how long the input sequence is.


print("Query has {:,} tokens.\n".format(len(input_ids)))
# Segment Ids
segment_ids = encoded_dict["token_type_ids"]
# evaluate
output = model(torch.tensor([input_ids]))
# Find the tokens with the highest `start` and `end` scores.
answer_start = torch.argmax(output["start_logits"])

26
answer_end = torch.argmax(output["end_logits"])
# Get the string versions of the input tokens.
tokens = tokenizer.convert_ids_to_tokens(input_ids)
# Start with the first token.
answer = tokens[answer_start]
# Select the remaining answer tokens and join them with whitespace.
for i in range(answer_start + 1, answer_end + 1):
# If it's a subword token, then recombine it with the previous token.
if tokens[i][0:2] == "##":
answer += tokens[i][2:]
# Otherwise, add a space then the token.
else:
answer += " " + tokens[i]
return answer

27
Chapter 6

SOFTWARE TESTING

6.1 Software Testing


“Testing is the process of executing a program with the intent of finding errors”.Software
testing is a processor, a series of processes, designed to make sure computer code does
what it was designed to do and that it does not do anything unintended. Software should be
predictable and consistent, offering no surprises to users.
Purpose of testing can be quality assurance, verification and validation, or reliability
estimation. Software testing is to assess and evaluate the quality of work performed at each
step of the software development process. The goal of software testing is to ensure that the
software performs as intended, and to improve software quality, reliability, and
maintainability.

Objectives of Software Testing


The major objectives of software testing are as follows:
1. Finding defects which may get created by the programmer while developing
the software.
2. Gaining confidence and providing information about the level of quality.
3. To prevent defects.
4. To make sure that the end result meets the business and user requirements.
5. Gain the confidence of the customers by providing them a quality product.
6. To ensure that it satisfies the BRS that is Business Requirement
Specification and SRS that is System Requirement Specification.

28
6.2 GUI Testing :

Test Cases :

Test case Test Case


Steps Expected result Actual Result Status
ID Objective
To verify 1.Load the
Graphical User system on GUI should be GUI is clear
TC_01 Interface is clear browser. clear and and Pass
and 2.Check for understandable understandable
understandable. the GUI.
1. .Load the
To check prototype on All UI All GUI
whether all UI browser components components
TC_02 Pass
components 2. Check for should work as work as
work as expected all UI expected. expected.
components.
To verify all All numeric All numeric
Load the
numeric values values should be values are
TC_03 system on Pass
are formatted formatted formatted
browser
properly or not properly. properly.
To verify The scrollbar
The scrollbar
whether the Load the is properly
should get
TC_04 scrollbar enables system on enabled only Pass
enabled only
only when browser when
when necessary.
necessary. necessary
1. Open the
To verify the
prototype. The context text The context
context text box
TC_05 2. Check the box should be text box is Pass
should be multi-
Context text multi-lined. multi-lined.
lined.
box

The page
To verify the The page
loading
page loading loading symbol
symbol
symbol is should be
Load the displayed
displayed when it displayed when
TC_06 system on when it takes Pass
takes longer than it takes longer
browser longer than the
the default time than the default
default time to
to load the time to load the
load the results
results page. results page.
page.

29
To verify
The prototype The prototype
whether the Load the
should work works only in
TC_07 prototype works system on Pass
only in English English
only in English browser
language language
language

To check
The internal
whether the Load the The internal
tokenization
TC_08 internal system on tokenization is Fail
should not be
tokenization is browser visible.
visible.
not visible

To check Search criteria


1. Open the Search criteria
whether the used for
prototype. used for
Search criteria searching
2. Open new searching should
TC_09 used for properly Pass
window for be displayed in
searching is displayed in
searching the the solution
displayed in the the solution
new context. grid.
solution grid. grid.

1. Open the
prototype.
2. Open new
Check whether The Context The Context
window for
the Context can should be successfully
searching the
TC_10 be pasted from pasted from pasted from Pass
new context.
source to the source to the source to the
3. Then paste
textbox. textbox. textbox.
the context
which is
search.

1. Open the
prototype.
To verify 2. Open new The relevant
The relevant
whether the window for answer should
answer should
relevant answer searching the be displayed
be displayed
gets displayed new context. when specific
TC_11 when specific Pass
when specific 3. Then paste answer is not
answer is not
answer is not the context available in the
available in the
available in the which is given
given Paragraph
given Paragraph search. Paragraph
4. Click the
Predict button.

30
1. Open the
prototype.
To Check
2. Open new The
whether the The appropriate
window for appropriate
appropriate answer gets
searching the answer gets
answer gets displayed when
new context. displayed
TC_12 displayed when entering Pass
3. Then paste when entering
entering appropriate
the context appropriate
appropriate context and
which is context and
context and question.
search. question.
question.
4. Click the
Predict button.

31
6.3 Unit Testing :
Test Cases :
Test Test Case
Steps Expected result Actual Result Status
case ID Objective
1.Open
To Check
Chrome The website link
whether website link is
TC_01 Browser & should be Pass
prototype available
enter available
exists.
prototype
The
To check
The prototype prototype
whether
Enter the should load load
the
TC_02 Prototype properly and page properly Pass
prototype
URL should be and
loads or
displayed page is
not.
displayed
To check
1.Open The
whether
Chrome user
the user is The user
Browser & should
TC_03 able to authenticated Pass
enter be
use the successfully.
Prototype’s authenticated
particular Text
url successfully.
Box's

1. Open
Chrome
To check Browser &
whether the enter The Context The Context
TC_04 Context textbox Prototype’s textbox should be textbox work pass
is work properly url. work properly properly
or not. 2. Then click
the Context
textbox

1. Open
Chrome
To check
Browser &
whether the
enter The Question The Question
Question
TC_05 Prototype’s textbox should textbox works Pass
textbox is
url. work properly. properly.
working
2. Then click
properly or not.
the Question
textbox.

32
To verify if the
1. Click the Predict button Predict button
predict button
TC_06 predict should work works Pass
work properly or
button. properly. properly.
not.

1.Click on
Predict Answer is
To test the Answer should be
TC_07 Button for properly Pass
Predict Button. display.
getting the displayed.
answer.

1.Click on
Predict
To verify the correct answer correct answer
Button for
TC_08 correct answer is should be display is display Pass
getting the
displayed or not. properly. properly.
Correct
answer.

To check 1.Click on
Correct answer Correct answer
whether the Predict
should be is displayed in
answer is Button for
TC_09 displayed in a particular Pass
display in getting the
particular format format
particular format Correct
succesfully succesfully
or not answer.
Check whether
The numeric The numeric
all the numeric Enter the
values should be values are
TC_10 values are Prototype Pass
displayed properly
formatted URL
properly. display.
properly or not

33
6.4 Stress Testing :
Test Cases :

Test
Test Case
case Steps Expected result Actual Result Status
Objective
ID
To verify the
Open Chrome
project is Project Should Project is
Browser &
TC_01 running on all run on all the properly running Pass
enter Prototype
the windows versions on all the versions
url
version
Verify if the Open Chrome
Project should Project is
project is Browser &
TC_02 run on 4GB properly run on Pass
running on 4GB enter Prototype
RAM 4GB RAM
RAM url

Verify whether Open Chrome


Project should Project is
the project is Browser &
TC_03 work on 64 bit working on 64 Pass
work on 64 bit enter Prototype
OS bit OS
OS. url

If multiple tabs
are open at the
Open the prototype should prototype work
TC_04 same time then Pass
prototype tabs work properly properly
prototype work
properly or not

To check
Whether the At Open Chrome
Device should
TC_05 a run time Browser enter Device lags Fail
not get lagged.
Particular device Prototype’s url
are hang or not.

Prototype Prototypes
Prototype should
should show doesnot show
Load the show appropriate
appropriate appropriate
TC_06 system on warning message Pass
warning warning message
browser when its under
message when when its under
load.
its under load. load.

34
To verify if user
User should be User is able to
will be able to Load the
able to download download
TC_07 download system on Pass
extension for the extension for the
extension for the browser
browser. browser
browser.

To verify
whether large
Load the Prototype should Prototype work
number of users
TC_08 system on be work properly properly at that Pass
can use the
browser at that time. time.
prototype at the
same time
To verify
whether the
prototype show Prototype should Prototype does
Load the
appropriate not show the not show the
TC_09 system on Pass
warning warning message warning message
browser
message when it at that time. at that time.
is under load
condition.
1.Open
Check whether The user should
Chrome The user is able to
the user is able be able to
Browser & download
TC_10 to download download Pass
enter the extension for the
extension for the extension for the
appropriate browser
browser browser
Prototype
To check
whether the The multiple hits
The multiple hits
multiple hits on Load the on the solution
on the solution
TC_11 the solution system on button should be Pass
button show error
button show browser show error
massage or load
error massage or massage or load
load

35
Chapter 7

COST ESTIMATION

Cost Estimation is a well-formulated prediction of probable manufacturing, developing the


cost of a specific project. A cost estimation is a powerful management tool for providing
an idea for a budget. It accounts for all the items from various stages of cost estimation.
✓ Conceptual Estimation :
It is the process of determining the cost before project execution.
✓ Detailed Estimation :
It is the process of determining the cost by breaking each stage of operation & finding the
cost of each component by using a format.
7.1 COCOMO Model:
➢ Step 1: Measure the size in terms of the amount of functionality in a system. Function
points are computed by first calculating an unadjusted function point count (UFC).

Sr.no. Function points Numbers Description

1. User inputs 2 Context Text Box, Question


Text Box
2. User Outputs 1 Predict Button

3. Internal Files 2 BERT Algorithm ,Dataset

4. External interfaces 1 SQUAD Dataset

36
➢ Step 2: Multiply each number by a weight factor according to the complexity of the
parameter, associated with that number.
Complexity considered is average.
SR.NO Function point Numbers Weight Factor Multiplication

1 User inputs 2 2 4

2 User Outputs 1 1 1

3 Internal Files 2 2 4

4 External interfaces 1 1 1

➢ Step 3: Calculate the total UFP (Unadjusted function points) by adding the
multiplication column in above table
UFP = 4+1+4+1= 10

➢ Step 4: Calculate the total TCF (Technical Complexity Factor) by giving a value
between 0 and 5

SR.NO TECHNICAL COMPLEXITY FACTOR VALUE

1 Data communication 4

2 Distributed Data Processing 3

3 Performance criteria 4

4 Heavily Utilized Hardware 0

5 High Transaction Rates 4

6 Online Data Entry 5

7 Online Updating 4

8 End user efficiency 5

9 Complex Computations 4

37
10 Reusability 3

11 Ease of Installation 3

12 Ease of Operation 3

13 Portability 4

14 Maintainability 4

➢ Step 5: Sum the resulting numbers to obtain DI (degree of influence) by adding the
value column in above table
DI = 50

➢ Step 6: TCF (Technical Complexity Factor) by given formula


TCF = 0.65+0.01*DI
= 0.65+0.01*50
= 1.15

➢ Step 7: Calculate FP (Function Points) using the given formula


FP = UTF*TCF
= 10*1.15
= 11.5
➢ Step 8: To find KLOC (Lines of code) using language factor and FP
Language factor of python = 52
KLOC= Language factor * FP
= 52*11.5
= 5.98
➢ Step 9: To calculate the effort and nominal development time using given formula and
constants.
Effort = a1*(KLOC)a 2 PM
Tdev =b1*(Effort)b 2 Months
a1=2.0 a2=1.03 b1=2.4 b2=0.34
Effort = 2.0*(5.98) ^1.03
= 12.6 PM Tdev
Tdev =2.4*(12.6) ^0.34
= 5.67Months

38
➢ Step 10: Calculate the cost required to develop product by multiplying development
time and average salary of engineers.

Average salary is 30,000

Cost required to develop the product = 5.67* 30,000 =1,70,100


Hence the total cost required to develop the product is ₹1,70,100/-

39
Chapter 8

APPLICATIONS

1. BERT can be used for a variety of NLP tasks such as Text Classification or
Sentence Classification.
2. Many educational and other organizations can use AI QA system.
3. Latest technology on Neutral network developed by Google and also it is very
accurate.
4. Being accurate it doesn’t require a lot of dataset as compared to other algorithm.
Improvements in key areas of computational linguistics, including chatbots,
question answering, summarization, and sentiment detection.

40
Chapter 9

FUTURE SCOPE

In this QA system user needs to input a paragraph. Then the answer will be retrieved
for the question asked by using the BERT algorithm. But this can improved further.

The further plans are, to reach better accuracies, increase the speed of performance
and implementation of this model with deep learning algorithms to improve performance.
It can include linguistic features that will provide users with a variety of languages to ask
questions. Also, a feature where user can ask more than one question at a time Future
enhancements like language translations ,voice support. In future user can input a pdf as
context and can ask any questions from it.

41
Chapter 10

CONCLUSION

The need of QA System is to give the answers to the user whatever user asks. The QA
gives answers by extracting it from context taken from SQuAD dataset that features a
diverse range of question and answer types. QA system automatically answers the question
specifically in short and precise manner queried by the user. In this QA system user needs
to input a paragraph. Then the answer will be retrieved for the question asked by using the
BERT algorithm.We will test our model with a variety of articles and the system will test
a large number of different questions.

We will use BERT Algorithm in our project, this is a technology that enables anyone
to train their own state-of-the-art question answering system. BERT uses a method of
masked language modeling to keep the word in focus from "seeing itself" - that is, having
a fixed meaning independent of its context. Also, we will use SQUAD Dataset, which is a
reading comprehension dataset, consisting of questions posed by crowd workers on a set
of Wikipedia articles, where the answer to every question is a segment of text, or span,
from the corresponding reading passage, or the question might be unanswerable.

42
Chapter 11

REFERENCE

1] T. Lai, T. Bui, and S. Li, “A review on deep learning techniques applied to answer
selection,” in Proceedings of the 27th International Conference on Computational
Linguistics, 2018, pp. 2132–2144.
2] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning
to align and translate,” arXiv Prepr. arXiv1409.0473, 2014
3] P. Rajpurkar, et al. "SQuAD : 100,000+ questions for machine comprehension of text".
atXiv preprint arXiv: 1606.05250, 2016
4] Sharma, Yashvardhan and Gupta, Sahil,”Deep Learning Approachesfor Question
Answering System”, Procedia Computer Science, vol.132, pp.785-794, 2018
5] Minh-Thang Luong and Hieu Pham and Christopher D. Manning,”Effective
Approaches to Attention-based Neural Machine Translation”,CoRR, vol.
abs/1508.04025, http://arxiv.org/abs/1508.04025, 2015
6] Taeuk Kim, ”Re-implementation of BiDAF in
PyTorch”,https://github.com/galsang/BiDAF-pytorch

7] T.P Sahu, N.K Nagwani, and S. Verma, “Selecting best answer: An empirical
analysis on community question answering sities,” IEEE Access, vol.4, pp. 4797-
4808, 2016.
8] D. Patel, et al. “Comparative Study of Machine Learning Models and BERT on
SQuAD,” arXiv preprint arXiv2005.11313, 2020

43
Published/Presented Paper/Project
Sr. Date of
Title of paper Level Venue Award won
No publication
Paper presented
Question
Gharda
Answering AI
Foundation's First Rank
1. System Using National 23/03/2022
Gharda Institute of
SQUAD
Technology, Lavel
Question
Answering AI Best Reseach
International 25/02/2022 Guru Gobind Singh
2. System Using Paper Award
Foundation, Nashik
SQUAD

Question Institute of Civil


Answering AI and Rural
15/03/2022 First Rank
3. System Using State Engineering,
SQUAD Gargoti

Question Pravin Patil


Answering AI College of
State Participation
4. System Using 09/04/2022 Engineering,
SQUAD Mumbai

Question
Answering AI Amrutvahini
State Participation
2 System Using 28/04/22 Polytechnic,
SQUAD Sangamner(0080)

Project Presented

Question
Gharda Foundation
Answering AI First Rank
5. National 23/03/2022 of Technology,
System Using
Lavel
SQUAD

Question
Institute of Science
Answering AI International Selected Top 10
6. 03/04/2022 & Technology,
System Using
Chennai
SQUAD

Question
JSPM's Rajarshi
Answering AI Participation
7. National 26/03/2022 Shahu College of
System Using
Engineering, Pune
SQUAD

Question
Amrutvahini
Answering AI Participation
8. State 28/04/2022 Polytechnic,
System Using
Sangamner(0080)
SQUAD

44

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy