Synopsis format
Synopsis format
on
By:
Ojas Garg
2103213060
Radhika
Mehrotra
2103213074
DEPARTMENT OF CSE-AIML
ABES ENGINEERING COLLEGE, GHAZIABAD
AFFILIATED TO
DR. A.P.J. ABDUL KALAM TECHNICAL UNIVERSITY, U.P., LUCKNOW
(Formerly UPTU)
Student’s Declaration
I / we hereby declare that the work being presented in this report entitled “Opinion
Mining of Pandemic using Machine Learning.” is an authentic record of my/ our own
work carried out under the supervision of Mr. Shyam Sharma, Assistant Professor,
CSE-AIML. The matter embodied in this report has not been submitted by us anywhere
else.
Date:
This is to certify that the above statement made by the candidate(s) is correct to the best
of my knowledge.
Date: CSE-AIML
i
Acknowledgement
We would like to convey our sincere thanks to Mr. Shyam Sharma for giving the
motivation, knowledge and support throughout the course of the project. The continuous
support helps in a successful completion of project. The knowledge provided is very
useful for us.
We also like to give a special thanks to the department of CSE-AIML for giving us the
continuous support and opportunities for fulfilling our mini project.
ii
Table of Contents
Chapter 1: Introduction 1
References 8
iii
List of Tables
iv
List of Figures
V
ABSTRACT
originated in Wuhan, China in 2019, and it have been spread all parts of the
world. In India the first case is found in the early 2020. Soon after it the
lockdown was imposed to control the situation. By now India have become 2nd
most affected country by the virus. In this project, the sentiments of the people
on the social media platform during this current pandemic is determined and
also it is tried to find that which machine learning algorithm will fits best for
analyzing the sentiments. About 1.5 lac tweets from Twitter have been
VI
Chapter 1
Introduction
The first case of this novel coronavirus was reported in December 2019 in China. From
there it spreads different countries like Italy, Spain, USA, India etc. World Health
Organization declared it a health emergency. Soon after it all the countries started taking
measures to stop spread of the novel coronavirus. On March 25, the nationwide lockdown
was imposed as a safety measure. By now, India became the 2 nd most affected country
This project has been made to examine the opinions of the people after the lockdown
was imposed all over the India and people were locked in their homes. Analyzing the
sentiments are the emerging area of NLP which categorize the opinions and the
sentiments of the people using different text mining techniques. It can be helpful in many
ways. For example, it helps a seller to gain feedback of its product from the customer
from the online sites and by analyzing those feedbacks, the seller can improve the quality
of their product.
Social media platform is a place where everyone can express themselves without any
hesitation [6,8]. Twitter is a popular social media platform on which people express
themselves in the form of tweets. These tweets are studied to find out the sentiments or
1. To analyze the tweets from the twitter and divide the emotions in three categories
(i.e. either positive, negative or neutral) and the emotions of the people.[3].
1
2. To study different machine learning algorithms for sentiment analysis and to
2
Chapter 2
Related
Work
pandemic [9]
To measure and study the early changes in content and opinion about
the COVID-19[1].
In the existing projects, the words with positive or negative polarity are
obtained but our project we are obtaining the polarity of the overall data set.
best for sentiment analysis but in our project we will be determining that too.
3
Chapter 3
Project Objective
This project will analyze the emotions of people during the pandemic.
This project will analyze different Machine Learning Algorithms and finds the
4
Chapter 4
Proposed Methodology
Step 1: Identify the famous hashtags during the pandemic in India on Twitter. Tweets
under those hashtags are extracted from the Twitter API using Tweepy library.
Step 2: The preprocessing of the dataset is done. It involves the following steps:
Removal of hashtags.
Removal of links, gifs, emoji, images and special characters.
Removal of stop words.
Removal of non-English words.
Lemmatization
Step 3: Analyzing the polarity of the dataset.
Step 4: Giving the step 3 output in different machine learning algorithms and analyze it to
find the algorithm with best accuracy.
• Pre-processing of Data to remove special characters, punctuations, Stop Words and Images
• To use Machine Learning Algorithm and find which fits best for performing Sentiment Analysis
python. Python library Numpy is used for the numerical computation and pandas is
used for the data manipulation. Natural Language Toolkit is used for the
preprocessing of the dataset. Text Blob library is used for spelling checks and
7
Chapter 6
The result we got from analyzing the tweets is given below in Fig.3.
Fig.3. shows that 46 % of the total tweets are neutral, about 36.5% tweets are positive
8
Chapter 7
The project will give the overall polarity score of Tweets and will find which is the
From the analyses of the tweets, we observe that most of the people feel neutral
platforms Instagram, Facebook, etc. and also try to further classify the sentiments.
9
References
[1] Medford, R. J., Saleh, S. N., Sumarsono, A., Perl, T. M., & Lehmann, C. U. (2020). An"
Infodemic": Leveraging High-Volume Twitter Data to Understand Public Sentiment for the
COVID-19 Outbreak. medRxiv.
[2] Rajput, N. K., Grover, B. A., & Rathi, V. K. (2020). Word frequency and sentiment
analysis of twitter messages during Coronavirus pandemic. arXiv preprint
arXiv:2004.03925.
[3] Samuel, J., Ali, G. G., Rahman, M., Esawi, E., & Samuel, Y. (2020). Covid-19 public
sentiment insights and machine learning for tweets classification. Information, 11(6), 314.
[4] Kumar, A., Khan, S. U., & Kalra, A. (2020). COVID-19 pandemic: a sentiment
analysis. European Heart Journal.
[5] Ahuja, S., & Dubey, G. (2017, August). Clustering and sentiment analysis on Twitter
data. In 2017 2nd International Conference on Telecommunication and Networks
(TEL- NET) (pp. 1-5). IEEE.
[6] Suman, C., Saha, S., Bhattacharyya, P., & Chaudhari, R. S. (2020). Emoji Helps! A
Multi-modal Siamese Architecture for Tweet User Verification. Cognitive Computation, 1-
16
[7] Neethu, M. S., & Rajasree, R. (2013, July). Sentiment analysis in twitter using machine
learning techniques. In 2013 Fourth International Conference on Computing,
Communications and Networking Technologies (ICCCNT) (pp. 1-5). IEEE.
[8] Gupta, S., Singh, A., & Ranjan, J. (2020). Sentiment Analysis: Usage of Text and
Emoji for Expressing Sentiments. In Advances in Data and Information Sciences (pp.
477-486). Springer, Singapore
[9] Rajput, N. K., Grover, B. A., & Rathi, V. K. (2020). Word frequency and sentiment
analysis of twitter messages during coronavirus pandemic. arXiv preprint
arXiv:2004.03925.
[10] Medford, R. J., Saleh, S. N., Sumarsono, A., Perl, T. M., & Lehmann, C. U. An
“Infodemic”: Leveraging High-Volume Twitter Data to Understand Early Public
Sentiment for the COVID-19 Outbreak. In Open Forum Infectious Diseases.
1
0