NLP Synopsis
NLP Synopsis
of
by
In today’s world the volume of information is dramatically increasing, and the value of that
information is growing fast. Modern organizations deal with terabytes of text, such as email, that
often plays a significant role in their day-to-day operations. Even small and medium-sized
organizations are dealing with growing volumes of text that require rapid access and meaningful
analysis on a daily basis.
Identification of useful information from the available datasets is quite difficult and requires some
sort of a mechanism. One possible solution is to use a text classification and summarization tool.
Text categorizer automatically arranges a set of documents into predefined concepts (or categories)
and the Summarizer gives a condensed and meaningful depiction of input data such that the output
includes the most significant extracts of the source.
TABLE OF CONTENTS
Abstract...........................................................................................................................................i
Table of Contents............................................................................................................................ii
1.0 Introduction …………………………………………………………………………………..1
2.0 Problem Statement & Feasibility Study………………………………………………………2
3.0 Hardware and Software Requirements………………………………………………………..3
3.1 Hardware Requirements………………………………………………………………3
3.2 Software Requirements……………………………………………………………….3
4.0 Workload Matrix...……………………………………………………………………………4
5.0 Quality Paramters……………………………………………………………………………..5
Reference…...……………………………………………………………………………………..6
CHAPTER 1- INTRODUCTION
With the massive growth of information on the Internet, the conventional techniques of retrieving
information have become quite challenging as well as time consuming for finding relevant and
significant information effectively. A simple keyword-based search on the internet returns
thousands of lengthy documents, thus overwhelming the user. It is therefore essential to develop
tools that can efficiently assist users in the identification of the desired documents.
Text Classification and Summarization is done on the input documents. After obtaining the
summary of all the classified documents, sentiment analysis is done on each of them in-order to
identify whether the result of the summary is positive or negative.
Text classification has always been a vital application because it is used in ordering of the
documents to support data retrieval tasks. The text classification task can be defined as assigning
category to the documents based on the knowledge gained from the Knowledge Base (KB).
Text summarization is the process of generating short, fluent, and most importantly accurate
summary of a respectively longer text document (Brownlee, 2017a). The main idea behind
automatic text summarization is to be able to find a short subset of the most essential information
from the entire set and present it in a human-readable format.
Sentiment analysis helps to evaluate ideas, feelings and behavior, which is used to make decisions.
The task in sentiment analysis is basically to categorize the polarity of a given text in the document,
whether the expressed sentiment in a document is positive or negative. It not only helps the general
public, but also assists the companies with thorough evaluation of behaviour and opinions of the
customers who are using their products, thus helping them during the decision-making process.
CHAPTER 2- PROBLEM STATEMENT & FEASIBILITY STUDY
Today, our world is parachuted by the gathering and dissemination of huge amounts of data. With
such a big amount of data circulating in the digital space, there is need to develop machine learning
algorithms that can automatically shorten verbose texts, classify them, and deliver accurate
summaries that can fluently deliver the intended information.
The aim is to create a coherent and fluent summary having only the main points outlined in the
document. The Natural Level Processing technique has proved to be critical in quickly and
accurately summarizing and classifying voluminous texts, something which could be expensive and
time consuming if done without machines.
The Project is operationally feasible since all the small, medium and big companies as well as the
general internet users having basic knowledge about computer and Internet can use it effectively.
The text summarizer and classifier tool is based on client-server architecture, where client is users
and server is the machine where the datasets are stored.
CHAPTER 3- HARDWARE AND SOFTWARE REQUIREMENTS
Innovation 2
Creativity 2
Thoroughness 2
Knowledge Gained 3
Accuracy of Conclusions 2
Easy to use 3
Scalable 3
REFERENCE