Final BE Project Report
Final BE Project Report
BY
CERTIFICATE
Submitted by
is a bonafide work carried out by students under the supervision of Prof. Shubhada
Bhalerao and it is submitted towards the partial fulfillment of the requirement of
bachelor of engineering (Computer Engineering) Project.
Dr. B P Patil
External Examiner Principal
on
is successfully completed by
at
Abstract
Depression has become a huge mayhem plaguing the world today. About 265 mil-
lion individuals of all ages suffer from depression worldwide. Of these, about 75%
remain untreated, with one million individuals taking their lives every year. Thus,
depression is amongst the leading causes of suicide esp. amongst adolescents.
Social media platforms are becoming an inseparable part of people’s daily lives.
They mirror the user’s personal life as users share their happiness, joy, insecurities
and sorrow on social media. These platforms are often utilized by researchers to spot
the causes of depression and retract it. Detection of early depression could prove to
be an enormous step in improving mental health of our society collectively.
Acknowledgments
It gives us great pleasure in presenting the final project report on ‘Depression De-
tection using Sentiment Analysis of Social Media Posts’.
We would like to take this opportunity to thank our project guide Prof. Shubhada
Bhalerao for giving us all the guidance needed. We are very grateful for her kind
support. Her valuable suggestions were extremely helpful.
We also extend our sincere gratitude to Dr. S.R Dhore, Head of Computer Engi-
neering Department, for creating a competitive environment and providing us with
all the essential facilities and encouragement at the department and institute level.
We would also like to acknowledge all our friends and classmates for their co-
operation. Lastly, we express our gratitude to our parents and other family mem-
bers, whose continuous encouragement, love and affection enabled us to complete
this piece of work successfully.
Aroop Kumar
Ashish
K Chaitanya
Saurabh Kulkarni
(B.E. Computer Engg.)
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I
Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI
1 INTRODUCTION 1
1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Scope Of The Project . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.4 Motivation of The project . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Organization of report . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 LITERATURE SURVEY 4
2.1 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Possible Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Inferences from Literature Survey . . . . . . . . . . . . . . . . . . 8
5 DETAILED DESIGN 21
5.1 Architectural Design . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 UML Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2.1 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2.2 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . 23
5.2.3 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . 24
5.2.4 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . 25
5.2.5 Deployment Diagram . . . . . . . . . . . . . . . . . . . . . 26
5.3 Data design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.3.1 Internal software data structure . . . . . . . . . . . . . . . . 27
5.3.2 Global data structure . . . . . . . . . . . . . . . . . . . . . 27
5.3.3 Temporary data structure . . . . . . . . . . . . . . . . . . . 27
5.3.4 Database description . . . . . . . . . . . . . . . . . . . . . 27
6 PROJECT PLANNING 28
6.1 Tasks Involved . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2 Technical Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.3 Budget/Time Constraints . . . . . . . . . . . . . . . . . . . . . . . 29
7 CODING 30
7.1 Algorithms / Flowcharts . . . . . . . . . . . . . . . . . . . . . . . 30
7.2 Software Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
7.2.1 Utility Packages / Applications . . . . . . . . . . . . . . . . 31
7.2.2 Model Development . . . . . . . . . . . . . . . . . . . . . 31
7.2.3 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.3 Hardware Specifiation . . . . . . . . . . . . . . . . . . . . . . . . 32
7.4 Programming Language . . . . . . . . . . . . . . . . . . . . . . . . 32
7.5 Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.6 Coding Style Format . . . . . . . . . . . . . . . . . . . . . . . . . 32
9 TESTING 36
9.1 Formal Technical Reviews . . . . . . . . . . . . . . . . . . . . . . 36
9.2 Test Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
9.3 Test Cases & Results . . . . . . . . . . . . . . . . . . . . . . . . . 37
12 CONCLUSION 41
13 References 42
ANNEXURE A - Plagiarism Report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
INTRODUCTION
The aim of the project is to detect whether a person is showing signs of clinical
depression. The project will be developed using various machine learning models
trained on social media data (e.g. tweets). The model would then predict whether
a person is showing symptoms of depression and if yes, necessary actions will be
taken.
1.2 Objectives
• To build a machine learning model that would analyse twitter posts and predict
whether a person is showing signs of clinical depression.
• To improve the F1-score (primary) and accuracy (secondary) for the model by
cross validation and hyperparameter tuning.
sure that the data imbalance is minimal. The Naive Bayes model would use
this filtered data for making predictions. The project will specifically target
twitter posts. The reasons are stated as follows: -
• End user identification is a crucial step in scope definition. For the purpose of
our project, the following scenarios have been identified: -
1. The project could be deployed along with the already existing social media
platforms (esp. Twitter) whereby it can fetch user posts and analyse depres-
sion level in the individual. Here, the social media user acts as the end user.
2. The project can also be beneficial to psychologists who wish to study de-
pression and mood disorders especially in young adults. Hence, psychologists
can be taken to be the end user.
3. The project could be made open source which could help other researchers
making necessary strides in this field, improve the model further. Hence, re-
searchers make up the last set of end users.
LITERATURE SURVEY
According to the report, “Exploring opportunities to support mental health care using
social media: A survey of social media users with mental illness” [2], it was found
that millennials are more open to talk about their mental health issues on social me-
dia. Machine Learning (ML) has advanced significantly in recent years, allowing
for the solution of real-world problems and also the implementation of automated
systems.
In “Predicting future mental illness from social media: A big-data approach” [3],
the author predicted future mental illness based on the posts from an individual’s
post on Reddit, by gathering the posts from clinical sub - reddits and then classify-
ing them to the corresponding mental illness. After gathering the posts, clustering
was applied on those posts to find the markers of mental illnesses present in their
—Depression Detection using Sentiment Analysis of Social Media Posts—– 5
In “Utilizing Neural Networks and Linguistic Metadata for Early Detection of De-
pression Indications in Text Sequences” [6] the authors used machine learning mod-
els focused on messages on a social network to identify depression early. In particu-
lar, a classification based on user-level linguistic metadata is compared to a convolu-
tional neural network based on different word embeddings. In addition, the current
common ERDE score as a metric for early detection systems is discussed in depth,
as well as its drawbacks in the context of shared tasks. Finally, a broad corpus was
used to train a new word embedding.
In “Depression Detection by Analyzing Social Media Posts of User” [8] it has been
demonstrated that depression can lead an individual to severe mental illness, even
to the path of suicide and how a machine learning approach can detect depression
of social media users. Micro-blogging social networking sites such as: twitter and
Facebook provide users to express their day-to-day thoughts and activities which re-
flect users’ behavioral attributes and personality traits. This paper proposed a model
that takes a username and analyzes the social media posts of the user to determine
the levels of vulnerability to depression. Correlating with this result the authors eval-
uated the accuracy of this model to be 74% and a precision of 100%.
In “Facebook Social Media for Depression Detection in the Thai Community” [9]
the author provides a tool by which depression could be easily and early detected.
This would help people to be aware of their emotional states and seek help from pro-
fessional services. This study uses Natural Language Processing (NLP) techniques
to create a depression detection algorithm for the Thai language on Facebook, a so-
cial media platform where people share their thoughts, emotions, and life events.
Results from 35 Facebook users indicated that Facebook behaviors could predict de-
pression level.
stress scores which correlated well with negative sentiment expressed in the content.
major factors of depression among the age group of 15-29 which they found during
the course of the project are parental pressure, love, failures, bullying, body sham-
ing, inferiority complex, exam pressure, peer pressure, physical and sexual abuse
etc. Depression being a recurrent type of illness, repeated episode of the same are
common. Finally, little is known about the prevention and identification of the disor-
der at an early stage. Among future directions, the authors researched to understand
how social media behavior analysis can help in leading to development of methods
for analyzing depression at scale.
SOFTWARE REQUIREMENT
SPECIFICATION
3.1 Introduction
• Purpose
The aim of the project is to detect whether a person is showing signs of clinical
depression. The project will be developed using various machine learning
models trained on twitter data. The model would then predict whether a person
is showing symptoms of depression and if yes, necessary actions will be taken.
The model developed will be trained on a comprehensive dataset containing a
mix of depressive and non-depressive tweets. The dataset would be prepared
from a mix of various open source repositories and data scraped through the
Twitter API. A hybrid model comprising of XGBoost and Naive BAyes has
been identified for predictive modelling.
• Intended Audience
For the purpose of our project, the following end users have been identified: -
1. The project could be deployed along with the already existing social media
platforms (esp. Twitter) whereby it can fetch user posts and analyse depres-
sion level in the individual. Here, the social media user acts as the end user.
2. The project can also be beneficial to psychologists who wish to study de-
pression and mood disorders especially in young adults. Hence, psychologists
—Depression Detection using Sentiment Analysis of Social Media Posts—– 11
3. The project could be made open source which could help other researchers
making necessary strides in this field, improve the model further. Hence, re-
searchers make up the last set of end users.
• Scope
The project deals with detecting depression in twitter posts using a machine
learning model which will be built as a combination of XGBoost and Naı̈ve
Bayes algorithm. The literature survey conducted in the initial phases pointed
to the fact that Naı̈ve Bayes works best for textual data. The XGBoost data
will work as a filter which would make sure that the data imbalance is mini-
mal.
The project would be limited to twitter data. The reasons for choosing twitter
for our project have been listed below: -
1. Twitter data is easy to handle.
2. Being text heavy, it is simple and easy to pre-process.
3. Quantitative and Qualitative availability.
4. Smaller memory storage size required compared to image and video data.
score exceeds a particular threshold value, the user will be classified as de-
pressive and assistance in the form medical help notifications, positive feeds
etc. will be provided.
In addition to the above assumptions, the system would have some constraints
some of which have identified below: -
1. Some amount of Latency will always be there regardless of how fast the
servers are.
2. It is impossible to achieve 100% accuracy and some cases of false positives
will always be there.
3. For the development phase, the Heroku servers used would need some time
to start before they can be fully functional. Hence, the server may not be avail-
able at all times.
The dependencies for the successful development and deployment of the project
have been listed below: -
1. Software Dependencies: Windows OS, Anaconda, Jupyter, Python 3.7,
Standard ML Libraries, Pipenv, FastAPI, Uvicorn, Heroku CLI, An IDE (VS-
Code, Sublime Text etc.).
2. Hardware Dependencies: Intel i5/i7 processor, 4/8 GB RAM, Heroku Cloud
Server.
3. Other Dependencies: Sentiment140 Dataset, Twitter Developer Account,
curated word list of depressive keywords (available on GitHub [14]), Google
Colab GPU.
2.Hardware Interface: -
• Not Applicable.
3. Software Interface: -
• User Level Interface: Any OS (Windows Preferable), Web Browser.
4. Communication Interface: -
• Communication b/w components will be carried out using HTTP Protocol
and data transfer in JSON format
• System Features
1. The model must be able to correctly identify cases of clinical depression
from tweets with high accuracy and a decent F1-Score.
2. Latency should be minimal so as to increase efficiency of our system.
3. The model must have the ability to deal with highly imbalanced data and
should not be biased.
4. The model developed should facilitate easy integration with the Twitter API
so that it could be deployed in the real world.
5. The system must ensure that user data is protected and confidentiality is
maintained.
• Non-Functional Requirements
1. Performance Requirement
The response time of the model must be as little as possible. To achieve this
we will build our API with FastAPI, which is one of the fastest python frame-
works available.
2. Usability Requirement
The system must be easy to use and should have the ability to be easily inte-
grated with existing software. This will be achieved by hosting our API on the
cloud from where the model could directly plugged in any application.
3. Reliability Requirement
The system should be reliable and must produce accurate results. Also, the
model must be available at all times and should not break down in events of
failure (E.g., Server failure etc.). To achieve this, we will use Heroku deploy-
ment to achieve shared servers which would prevent total failure.
4.2 XGBoost
XGBoost is a Gradient Boosting Machine Learning library that has been tailored.
It was written in C++ at first, but it has APIs in many other languages. The core
XGBoost algorithm is parallelizable, which means it can run in parallel in a single
tree.
The decision tree is made up of a set of binary questions, and the final predictions
are made at the leaf. XGBoost is typically used for a tree as the base learner. XG-
Boost is an ensemble system in and of itself. The trees are designed in stages before
a stopping criterion is reached.
CART(Classification and Regression Trees) Decision trees are used by XGBoost.
CART refers to trees in which each leaf contains a real-valued ranking, regardless of
whether they are used for classification or regression. If required, real-valued scores
can be translated to categories for classification.
XGBoost makes use of advanced regularisation to increase model generalisation.
XGBoost outperforms Gradient Boosting in terms of efficiency. It has a short learn-
ing curve and can be parallelized across clusters.
The objective function above comprises of the loss function as well as the regulariza-
tion function. Our motive is to minimize the above function. This is done internally
using the Taylor approximation technique. And finally, we will have our prediction.
Let the probability of prediction for XGBoost be P(xg). This will be used further to
calculate final result
At the Naı̈ve Bayes layer, Bayes Theorem is used for prediction. The standard Bayes
Theorem is represented by the formula below: -
Here,
P(y | X) is the posterior probability of class (y, target) given predictor (X, features).
P(y) is the prior probability of class.
P(X | y) is the likelihood which is the probability of predictor given class.
P(X) is the prior probability of predictor.
In Naı̈ve Bayes we make the naı̈ve assumption that all the features are independent
hence we’ll have: -
P(X) is constant and thus our earlier formula will reduce to: -
The goal of Naive Bayes is to choose the class y with the maximum probability.
Thus, our final optimization function will be: -
Finally, we will take weighted average of the probabilities of prediction of both the
algorithms and set a threshold value. If the final probability will surpass the thresh-
old only then the tweet will be classified as depressive.
DETAILED DESIGN
The life cycle helps us identify the primary processes which need to be followed for
successful implementation of the project. For the purpose of our project, four stages
of technical work have been identified which have been shown in the block diagram
and flowchart below: -
Class diagram shows relationship and dependency between various classes in the
system. For our purpose, we’ll use pre-defined classes of the Twitter API. The class
diagram has been shown below.
A use case diagram at its simplest is a representation of a user’s interaction with the
system that shows the relationship between the user and the different use cases in
which the user is involved. The use case diagram for our system has been shown
below.
A UML deployment diagram is a diagram that shows the configuration of run time
processing nodes and the components that live on them. Deployment diagrams is a
kind of structure diagram used in modeling the physical aspects of an object-oriented
system. They are often be used to model the static deployment view of a system
(topology of the hardware). Deployment diagrams are important for visualizing,
specifying, and documenting embedded, client/server, and distributed systems and
also for managing executable systems through forward and reverse engineering. A
deployment diagram is just a special kind of class diagram, which focuses on a
system’s nodes. Graphically, a deployment diagram is a collection of vertices and
arcs.
The twitter API stores information in the form of various classes internally and trans-
mits this data in the form of JSON Objects.
Our API will extend the api/predict interface as the global structure accessible through
the twitter API. This will send prediction details in JSON format.
Some temporary files may be creted for storing user data which would be deleted
once our prediction is done.
No external database will be used as such. However, the Twitter API will enable us
to access the twitter database server for tweets and other information.
PROJECT PLANNING
• Social media posts are highly imbalanced due to which machine learning mod-
els often develop a bias which in turn leads to erroneous results. This risk is
mitigated using Gradient Boosting.
• Time is limited and can be a factor which could lead to failure of the project.
But this risk could be mitigated by managing time properly using a well de-
fined timeline.
CODING
For developing the model, we need to follow the standard NLP procedures which
comprise of Data Collection, Data Cleaning, Tokenization, Model Selection with
Hyperparameter Tuning, Model Stacking and Evaluation. The flow of events has
been depicted below: -
—Depression Detection using Sentiment Analysis of Social Media Posts—– 31
• OS : Windows 10
• IDE : VSCode
7.2.3 Deployment
7.5 Platform
The model was developed on the Google Colab platform, which is a cloud based
IPython Kernel integrated with the standard ML libraries. For deployment, the
Heroku cloud deployment platform has been utilized.
For testing the model, we need to define the metrics upon which the models must
be evaluated. In our case the metrics defined are: Precision, Recall, F1-score (pri-
mary) and Accuracy (secondary). Evaluating the model using the traditional train-
test split method will not be of much use due to imbalanced distribution. Hence, we
use Stratified 5-Fold Cross Validation for obtaining our prediction results.
The stacking model (MNB + XGB) was tested against standalone MNB and XGB
models to get a good understanding of the properties of our model. The results for
the 5-fold cross validation have been summarized below: -
The results have shown convincingly that on both F1-score and Accuracy metrics,
the proposed stacking model has had superior results to the state-of-the-art models.
Our proposed stacking model gives an accuracy of 96% and an F1-Score of 93%.
TESTING
• Unit Testing
• Integration Testing
• Aplha Testing
• Beta Testing
—Depression Detection using Sentiment Analysis of Social Media Posts—– 37
• Integration Testing : Once the model and the web interface were created, we
had to make sure that integration of these two components didn’t cause the
system to break. Hence, we used Selenium WebDriver to test for any potential
bug arising due to integration.
• Aplha Testing : This testing was done after deployment of the project to
Heroku. This test helped us to check the functionalities of the final product.
• Beta Testing : The final deployed product was tested by different members of
our team on their systems and the project was put under different situations to
assess its endurance.
The testing phase helped us identify bugs and rectify them. The testing results clearly
show that almost all bugs have been removed and the project works smoothly under
all conditions.
Our project uses Git and Heroku CLI for configuration management. Different ver-
sions of the project can be developed and changes be pushed to Heroku directly
using the Command Line Interface(CLI) provided by Heroku. During the develop-
ment phase we used Git for version control as it is openly available and easy to use.
After deployment, all configuration related issues can be tackled by Heroku itself.
This can range from version control, configuring add-ons, scaling of dyno formation,
analyzing usage etc.
Another very useful feature available on Heroku is the Project Dashboard which
provides UI support for tasks like viewing app metrics, managing heroku teams,
configuring deployment integrations etc.
Hence, Heroku comes with inbuilt features which take care of the configuration
phase.
Chapter 11
Software quality is one of the most significant factors determining the success of the
project. Software Quality Assurance Plan lays down the guidelines for ensuring that
at each step the software developed is up to the mark. The SQAP followed for our
project is as follows: -
• All modules need to be developed using proper naming conventions and other
specifications conforming to the Python PEP8 standard. This step ensures that
the code developed is consistent and neat.
• Code must be well documented, with Doc strings and images wherever possi-
ble.
• Data collected through scraping via Twitter API must be manually evaluated
to find bad data points and remove them. This step makes sure that we have
relevant data points in our data set and avoids the ”Garbage In Garbage Out”
phenomena.
• No module must be put into production without adequate testing. This ensures
that bugs are detected and rectified early.
• During model development, the machine learning based models must be tested
and validated in each cycle to check for over-fitting and under-fitting of data.
• The web interface should be developed keeping in mind good design practices
for web apps. This would ensure that the web app is responsive, consistent
and easy to navigate.
• The project should regularly updated even after Deployment. This would make
sure that existing bugs are fixed and additional functionalities are added from
time to time.
CONCLUSION
To sum up, it has been well established that depression is one of the leading issues
faced by our society. Detecting depression early can play a major role in preventing
suicides and improving mental health of the society collectively. Social media has
been a revelation in sentiment analysis and can be used to effectively tackle depres-
sion.
Our project uses twitter data to train a stack ensembled model consisting of Multino-
mial Naı̈ve Bayes and XGBoost base-models along with a Logistic Regression meta-
classifier. The results have shown convincingly that on both F1-score and Accuracy
metrics, the proposed stacking model has had superior results to the state-of-the-art
models. With a F1-score of 93% and an Accuracy of 96%, our model stands out as
one of the best performers among all models developed to date.
This project could go a long way in integrating emotional AI with social media for
eradicating depression from our society.
Chapter 13
References
[3] R. Thorstad and P. Wolff, “Predicting future mental illness from social media:
A big-data approach,” Behav. Res. Methods, 2019.
[4] Mandar Deshpande and Vignesh Rao, “ Depression detection using Emotional
Artificial Intelligence., ” 2017.
[5] Cong, Z. Feng, F. Li, Y. Xiang, G. Rao and C. Tao, ”X-A-BiLSTM: a Deep
Learning Approach for Depression Detection in Imbalanced Data,” 2018 IEEE
International Conference on Bioinformatics and Biomedicine (BIBM), Madrid,
Spain, 2018, pp. 1624-1627, doi: 10.1109/BIBM.2018.8621230.
[6] Trotzek, Marcel Koitka, Sven Friedrich, Christoph. (2018). ”Utilizing Neural
Networks and Linguistic Metadata for Early Detection of Depression Indica-
tions in Text Sequences”. IEEE Transactions on Knowledge and Data Engineer-
ing. 32. 588-601. 10.1109/TKDE.2018.2885515.
—Depression Detection using Sentiment Analysis of Social Media Posts—– 43
Detection System using Questionnaires and Twitter,” 2019 IEEE Students Con-
ference on Engineering and Systems (SCES), Allahabad, India, 2019, pp. 1-6,
doi: 10.1109/SCES46477.2019.8977211.
[14] https://github.com/halolimat/Social-media-Depression-
Detector/blob/master/depression lexicon.json
[15] https://towardsdatascience.com/xgboost-mathematics-explained-
58262530904a
[16] https://towardsdatascience.com/a-mathematical-explanation-of-naive-bayes-
in-5-minutes-44adebcdb5f8
[17] https://www.kaggle.com/kazanova/sentiment140
[18] https://medium.com/topic/machine-learning
PLAGIARISM REPORT
Sources found:
Date Saturday, June 05, 2021
Click on the highlighted sentence to see sources.
Words 1451 Plagiarized Words / Total 10110 Words
<1% https://www.psychologytoday.com/gb/basic
A Preliminary Project Report on Depression Detection using Sentiment Analysis of Social Media Posts <1% https://ourworldindata.org/rise-of-socia
SUBMITTED TOWARDS THE PARTIAL FULFILLMENT OF THE REQUIREMENTS OF Bachelor of
<1% https://www.researchgate.net/publication
Engineering (Computer Engineering) BY Aroop Kumar Roll No: 3423 Ashish Roll No: 3426 K Chaitanya Roll
<1% https://www.researchgate.net/publication
No: 3448 Saurabh Kulkarni Roll No: 7435 Under The Guidance of Prof. Shubhada Bhalerao Department of
Computer Engineering Army Institute of Technology, Pune - 411015. SAVITRIBAI PHULE PUNE <1% https://issuu.com/iasir/docs/hass_issue_
<1% https://www.mmit.edu.in/index.php/facult
Shubhada Bhalerao and it is submitted towards the partial ful?llment of the requirement of bachelor of
<1% https://sstc.ac.in/ssgi/4-Preliminary%20
engineering (Computer Engineering) Project. Prof. Shubhada Bhalerao Dr. SR Dhore Internal Guide H.O.D
<1% http://eprints.usm.my/23695/1/ADW_622_-_
Dr. B P Patil External Examiner Principal Place : AIT, Pune Date : PROJECT APPROVAL SHEET A Project
Stage-I Report on (Depression Detection using Sentiment Analysis of Social Media Posts) is successfully <1% https://sites.google.com/site/hecpm2013/
completed by Aroop Kumar (Roll No: 3423) Ashish (Roll No: 3426) K Chaitanya (Roll No: 3448) Saurabh <1% https://www.academia.edu/4066547/Researc
Kulkarni (Roll No: 7435) at Department Of Computer Engineering Army Institute of Technology, Pune-411
<1% https://www.easa.europa.eu/sites/default
015. SAVITRIBAI PHULE PUNE UNIVERSITY 2020-21 Prof. Shubhada Bhalerao Dr. S.R
<1% https://www.slideshare.net/SomnathLinKin
Dhore Project Guide HOD �Depression Detection using Sentiment Analysis of Social Media Posts�� I <1% https://www.slideshare.net/gajapandiyan/
Abstract Depression has become a huge mayhem plaguing the world today. About 265 mil- lion individuals of <1% https://www.academia.edu/34291111/Intern
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021
Plagiarism Checking Result for your Document Page 2 of 19
<1% https://www.sciencedirect.com/science/ar
all ages suffer from depression worldwide. Of these, about 75% remain untreated, with one million individuals
taking their lives every year. Thus, depression is amongst the leading causes of suicide esp. amongst <1% http://heath.cs.illinois.edu/scicomp/con
adolescents. Social media platforms are becoming an inseparable part of people�s daily lives. <1% https://www.researchgate.net/publication
<1% https://www.hhs.gov/ohrp/sachrp-committe
They mirror the user�s personal life as users share their happiness, joy, insecurities and sorrow on social
<1% https://www.researchgate.net/publication
media. These platforms are often utilized by researchers to spot the causes of depression and retract it.
Detection of early depression could prove to be an enormous step in improving mental health of our society <1% https://monkeylearn.com/text-analysis/
collectively. Thus, to address our problem, we propose a stacking-based ensemble machine learn- ing model <1% https://research.cyber.ee/~janwil/publ/N
which uses XGBoost and Multinomial Naive Bayes as the base-learners and Logistic Regression as the meta-
<1% https://www.researchgate.net/publication
learner. The model is developed for twitter data and would ?ag tweets which are found to be depressive.
<1% https://www.slideshare.net/Vivekreddy91/
The stacked model pro- duced a very high accuracy and F1-Score, which is superior to any other standalone <1% https://www.sciencedirect.com/science/ar
model proposed earlier. The project would help us employ emotional AI in twitter which would in turn lead to <1% https://www.ijcaonline.org/archives/volu
lower suicide rates and improved mental health. Keywords: depression, mental health, social networking sites,
<1% https://www.projectmanager.com/blog/stat
twitter, machine learning, emotional AI. Department of Computer Engineering, AIT, Pune �Depression
<1% https://link.springer.com/article/10.100
Detection using Sentiment Analysis of Social Media Posts�� II Acknowledgments It gives us great pleasure
in presenting the ?nal project report on �Depression De- tection using Sentiment Analysis of Social Media <1% https://www.aclweb.org/anthology/W18-060
<1% https://www.mlq.ai/what-are-convolutiona
We would like to take this opportunity to thank our project guide Prof. Shubhada Bhalerao for giving us all the
<1% https://pubmed.ncbi.nlm.nih.gov/29052947
guidance needed. We are very grateful for her kind support. Her valuable suggestions were extremely helpful.
We also extend our sincere gratitude to Dr. S.R Dhore, Head of Computer Engi- neering Department, for <1% https://www.vogue.com/article/celebrity-
creating a competitive environment and providing us with all the essential facilities and encouragement at the <1% https://link.springer.com/article/10.375
department and institute level. We would also like to acknowledge all our friends and classmates for their co-
<1% https://www.researchgate.net/publication
operation.
<1% https://www.researchgate.net/publication
Lastly, we express our gratitude to our parents and other family mem- bers, whose continuous <1% https://www.researchgate.net/publication
encouragement, love and affection enabled us to complete this piece of work successfully. Aroop Kumar <1% https://www.apnns.org/ICONIP2020/file/IC
Ashish K Chaitanya Saurabh Kulkarni (B.E. Computer Engg.) Department of Computer Engineering, AIT,
<1% https://www.researchgate.net/publication
Pune INDEX Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I Acknowledgment . . . . . . . . . . . . . . . . . . . .
<1% http://export.arxiv.org/pdf/1804.07000v1
. . . . . . . . . . II List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI 1 INTRODUCTION 1 1.1 Problem
<1% https://www.researchgate.net/profile/Chu
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021
Plagiarism Checking Result for your Document Page 3 of 19
<1% https://medium.datadriveninvestor.com/a-
Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Inferences from Literature Survey . . . . . . . . . . . . . . . . . . 8 3
<1% https://tbiomed.biomedcentral.com/articl
. . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Overall Description . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 System
<1% https://www.sciencedirect.com/science/ar
Features and Requirements . . . . . . . . . . . . . . . . . . 14 4 ALGORITHM ANALYSIS AND MATHEMATICAL
1% https://eprints.usq.edu.au/38102/1/besc2
MODELING 16 4.1 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.2 XGBoost . . . . . . . . . . . . . . . . . . .
<1% https://www.researchgate.net/publication
18 �Depression Detection using Sentiment Analysis of Social Media Posts�� IV 5 DETAILED DESIGN 21
<1% https://people.dmi.uns.ac.rs/~svc/papers
5.1 Architectural Design . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.2 UML Diagrams . . . . . . . . . . . . . . . . . . . . . . . . .
<1% https://www.researchgate.net/publication
. . . . . . . . . . . . . . . . . . . . . . . . 27 5.3.1 Internal software data structure . . . . . . . . . . . . . . . . 27 5.3.2 Global
32 8 RESULT & ANALYSIS 33 Department of Computer Engineering, AIT, Pune �Depression Detection <1% http://www.ijeast.com/papers/32-34,Tesma
using Sentiment Analysis of Social Media Posts�� V 9 TESTING 36 9.1 Formal Technical Reviews . . . . . . . . <1% https://www.tutorialspoint.com/biopython
. . . . . . . . . . . . . . 36 9.2 Test Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 9.3 Test Cases & Results . . . . .
<1% https://bolt.mph.ufl.edu/6050-6052/unit-
. . . . . . . . . . . . . . . . . . . . 37 10 CONFIGURATION MANAGEMENT PLAN 38 11 SOFTWARE QUALITY
ASSURANCE PLAN 39 12 CONCLUSION 41 13 References 42 Department of Computer Engineering, AIT, <1% https://codeburst.io/implement-a-product
Pune List of Figures 4.1 XGBoost Objective Function . . . . . . . . . . . . . . . . . . . . . 18 4.2 Bayes Theorem . . . . . . <1% http://boqf.consegnameloacasa.it/fastapi
<1% http://ijirt.org/master/publishedpaper/I
19 4.3 Independence of Features . . . . . . . . . . . . . . . . . . . . . . . 19 4.4 Proportionality Relation . . . . . . . . . . . . .
<1% https://www.transpower.co.nz/system-oper
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021
Plagiarism Checking Result for your Document Page 4 of 19
<1% https://stackoverflow.com/questions/5735
. . . . . . . . . . . 19 4.5 Optimization Function . . . . . . . . . . . . . . . . . . . . . . . . 20 5.1 Machine Learning Life Cycle .
<1% https://www.softwaretestinghelp.com/what
UML : Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . 25 5.7 UML : Sequence Diagram . . . . . . . . . . . . . . . . . .
The aim of the project is to detect whether a person is showing signs of clinical depression. The project will be <1% https://tutorialspoint.dev/language/pyth
developed using various machine learning models trained on social media data (e.g. tweets). The model
<1% https://researchportal.port.ac.uk/portal
would then predict whether a person is showing symptoms of depression and if yes, necessary actions will be
<1% https://www.thetechplatform.com/post/int
taken. 1.2
<1% https://www.sciencedirect.com/science/ar
Objectives � To build a machine learning model that would analyse twitter posts and predict whether a person <1% https://www.math.arizona.edu/~hzhang/mat
is showing signs of clinical depression. � To collect ample data which could be used for future research. � To <1% https://link.springer.com/article/10.100
improve the F1-score (primary) and accuracy (secondary) for the model by cross validation and
<1% https://towardsdatascience.com/machine-l
hyperparameter tuning. � To successfully deploy the model. 1.3 Scope Of The Project � The project deals
with detecting depression in twitter posts using a machine learning model which will be built as a combination <1% https://pubmed.ncbi.nlm.nih.gov/30442593
<1% https://www.sciencedirect.com/science/ar
The XGBoost data will work as a ?lter which would make �Depression Detection using Sentiment Analysis of
<1% https://www.researchgate.net/publication
Social Media Posts�� 2 sure that the data imbalance is minimal. The Naive Bayes model would use this ?
ltered data for making predictions. The project will speci?cally target twitter posts. The reasons are stated as <1% http://dcase.community/documents/challen
follows: - 1. Twitter data is easy to handle. 2. Being text heavy, it is simple and easy to pre-process. 3. <1% https://www.geeksforgeeks.org/ml-normal-
Quantitative and Qualitative availability. 4. Smaller memory storage size required compared to image and
<1% https://www.hindawi.com/journals/complex
video data.
<1% https://www.analyticsvidhya.com/blog/201
<1% https://inblog.in/Categorical-Naive-Baye
� End user identi?cation is a crucial step in scope de?nition. For the purpose of our project, the following
scenarios have been identi?ed: - 1. The project could be deployed along with the already existing social media <1% https://danielpimentel.github.io/teachin
platforms (esp. Twitter) whereby it can fetch user posts and analyse depres- sion level in the individual. Here, <1% https://www.kdnuggets.com/2020/07/data-c
the social media user acts as the end user. 2. The project can also be bene?cial to psychologists who wish to
<1% https://www.rfwireless-world.com/Tutoria
study de- pression and mood disorders especially in young adults.
<1% https://www.coep.org.in/mycoep/yblcompco
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021
Plagiarism Checking Result for your Document Page 5 of 19
<1% https://www.careerride.com/UML-differenc
Hence, psychologists can be taken to be the end user. 3. The project could be made open source which could <1% https://www.geeksforgeeks.org/unified-mo
help other researchers making necessary strides in this ?eld, improve the model further. Hence, re- searchers <1% https://www.geeksforgeeks.org/swim-lanes
make up the last set of end users. 1.4 Motivation of The project � The motivation for doing this project was
<1% https://www.professionalcipher.com/2017/
primarily an interest in undertaking a challenging project in the domain of Machine Learning.
<1% https://www.conceptdraw.com/examples/use
The opportunity to learn about various Machine Learning algorithms and their role in preventing bias in <1% http://inpressco.com/wp-content/uploads/
imbalanced datasets was appealing. Depression is a major challenge plaguing our society esp. millennials <1% https://en.wikipedia.org/wiki/Sequence_d
and we are extremely motivated to ?nd a solution for its early detection. Department of Computer Engineering,
<1% https://www.lucidchart.com/pages/uml-seq
AIT, Pune �Depression Detection using Sentiment Analysis of Social Media Posts�� 3 1.5 Organization of
<1% https://vpkbiet.org/dept_Computer.php
report The report will cover all the project work which has been done this year.
<1% https://zelfmoordmiluje.com/cyhin160ygg2
The report will further cover topics such as literature survey, Software Requirement Speci?ca- tion, Algorithms <1% https://online.visual-paradigm.com/diagr
used and Mathematical Model, Project design and Planning, Cod- ing, Testing, SQA etc. The report will
<1% http://pvpsiddhartha.ac.in/dep_it/lectur
provide a comprehensive understanding of various aspects of the project and will serve as a useful
<1% https://www.freeprojectz.com/
documentation of our work. Department of Computer Engineering, AIT, Pune Chapter 2 LITERATURE
SURVEY 2.1 Literature Survey This is a ?eld where immense research is taking place. <1% https://www2a.cdc.gov/cdcup/library/temp
<1% https://www.dataquest.io/blog/streaming-
Over the last few years, social media has been used to examine mental health by many researchers. In
<1% https://www.coep.org.in/departments
�Proceed- ings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard
<1% https://www.mckinsey.com/industries/publ
to Clinic� [1] the authors considered that social media platforms can re?ect the users� personal life on many
levels. Their primary objective was to detect depression using the most effective deep neural architecture from <1% https://towardsdatascience.com/how-to-ge
two of the most popular deep learning approaches in the ?eld of natural language processing: Convolutional <1% https://www.sumologic.com/blog/microserv
Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).
<1% https://www.mdpi.com/1999-4893/13/9/208/
<1% https://towardsdatascience.com/customer-
According to the report, �Exploring opportunities to support mental health care using social media: A survey
<1% https://www.fiverr.com/devpartho/do-fron
of social media users with mental illness� [2], it was found that millennials are more open to talk about their
mental health issues on social me- dia. Machine Learning (ML) has advanced signi?cantly in recent years, <1% https://machinelearningmastery.com/faq/
allowing for the solution of real-world problems and also the implementation of automated systems. In
<1% http://www.powershow.com/view0/8d89fd-MT
�Predicting future mental illness from social media: A big-data approach� [3], the author predicted future
<1% https://www.researchgate.net/publication
mental illness based on the posts from an individual�s post on Reddit, by gathering the posts from clinical
sub - reddits and then classify- ing them to the corresponding mental illness. <1% https://towardsdatascience.com/test-your
<1% https://www.researchgate.net/publication
After gathering the posts, clustering was applied on those posts to ?nd the markers of mental illnesses present <1% https://www.ritchieng.com/machine-learni
in their �Depression Detection using Sentiment Analysis of Social Media Posts�� 5 everyday spoken
<1% https://www.sciencedirect.com/science/ar
language. In �Depression detection using Emotional Arti?cial Intelligence� [4], Natural Lan- guage
<1% https://journals.plos.org/plosone/articl
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021
Plagiarism Checking Result for your Document Page 6 of 19
<1% https://www.researchgate.net/publication
Processing was applied on Twitter feeds for conducting emotion analysis fo- cusing on depression. Speci?c
tweets were labelled as neutral or negative using a curated word list to detect depression. <1% https://dokumen.pub/machine-learning-for
<1% https://machinelearningmastery.com/diffe
For preditive modelling, support vector ma- chine and Naive-Bayes classi?er have been used. The results
<1% https://www.sciencedirect.com/science/ar
showed that Naive Bayes gave a better accuracy and F1-Score than SVM. In �X-A-BiLSTM: a Deep Learning
<1% https://www.researchgate.net/publication
Approach for Depression Detection in Imbal- anced Data� [5] the authors proposed a deep learning model
(X-A-BiLSTM) for depression detection in imbalanced social media data. This approach focused on solving the <1% https://www.analyticsvidhya.com/blog/201
<1% https://www.researchgate.net/publication
The X-A-BiLSTM model comprised of two components: the ?rst XGBoost component, which permit- ted
<1% https://www.researchgate.net/publication
acquiring balanced data by means of an end-to-end scalable tree boosting system, and the second
component, BiLSTM with the attention mechanism, which achieved good classi?cation performance. In <1% https://www.sciencedirect.com/topics/com
�Utilizing Neural Networks and Linguistic Metadata for Early Detection of De- pression Indications in Text <1% https://blog.radware.com/security/clouds
Sequences� [6] the authors used machine learning mod- els focused on messages on a social network to
<1% https://www.passporthealthusa.com/employ
identify depression early.
<1% https://www.researchgate.net/publication
In particu- lar, a classi?cation based on user-level linguistic metadata is compared to a convolu- tional neural <1% https://dl.acm.org/doi/abs/10.1145/34425
network based on different word embeddings. In addition, the current common ERDE score as a metric for <1% https://www.safetonet.com/wp-content/upl
early detection systems is discussed in depth, as well as its drawbacks in the context of shared tasks. Finally,
<1% https://www.ncbi.nlm.nih.gov/pubmed/3103
a broad corpus was used to train a new word embedding. In �Detection of Mood Disorder Using Modulation
<1% http://scholar.google.com/citations?user
Spectrum of Facial Action Unit Pro?les� [7] the authors constructed a database of facial expressions
responding to emotional stimuli from the patients with BD, UD and healthy controls. <1% https://www.researchgate.net/publication
<1% https://www.researchgate.net/publication
To detect mood disorder, the subject�s facial expressions in CHIMEI database were applied Department of <1% https://eprints.usq.edu.au/38102/
Computer Engineering, AIT, Pune �Depression Detection using Sentiment Analysis of Social Media
<1% http://tao-xiaohui.com/
Posts�� 6 to generate the AU pro?les. The MS characterizing the ?uctuation of AU pro?le sequence over a
video segment was then used for mood disorder detection. From the comparison results of mood disorder <1% https://www.researchgate.net/publication
detection, we can ?nd that the proposed ANN-based method achieved the best performance. <1% http://www.ijirset.com/upload/2021/may/1
<1% https://www.aitpune.com/Documents/Comp/p
In �Depression Detection by Analyzing Social Media Posts of User� [8] it has been demonstrated that
depression can lead an individual to severe mental illness, even to the path of suicide and how a machine
learning approach can detect depression of social media users. Micro-blogging social networking sites such
as: twitter and Facebook provide users to express their day-to-day thoughts and activities which re- ?ect
This paper proposed a model that takes a username and analyzes the social media posts of the user to
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021
Plagiarism Checking Result for your Document Page 7 of 19
determine the levels of vulnerability to depression. Correlating with this result the authors eval- uated the
accuracy of this model to be 74% and a precision of 100%. In �Facebook Social Media for Depression
Detection in the Thai Community� [9] the author provides a tool by which depression could be easily and
early detected. This would help people to be aware of their emotional states and seek help from pro- fessional
services.
This study uses Natural Language Processing (NLP) techniques to create a depression detection algorithm for
the Thai language on Facebook, a so- cial media platform where people share their thoughts, emotions, and
life events. Results from 35 Facebook users indicated that Facebook behaviors could predict de- pression
level. In �Twitter Analysis for Depression on Social Networks based on Sentiment and Stress� [10] the
author says that Detecting words that express negativity in a social media message is one step towards
The authors applied a multistep approach which allowed us to identify potential users and then discover the
words that expressed negativity by these users. Results showed that the senti- ment of these words can be
obtained and scored ef?ciently as the computation on these datasets were narrowed to only these selected
users. They also obtained the Department of Computer Engineering, AIT, Pune �Depression Detection using
Sentiment Analysis of Social Media Posts�� 7 stress scores which correlated well with negative sentiment
expressed in the content.
In �A novel Co-training based approach for the classi?cation of mental illnesses using Social media posts�
[11] the authors performed several experiments to classify the posts and their associated comments related to
four mental issues such as Anxi- ety, ADHD, Depression and Bipolar. They also mined date from the Reddit
platform where community related posts are published. The Authors used an API to extract posts and
associated comments and performed experiments by using SVM, NB, and RF classi?ers.
The experimental results indicate that SVM, NB, and RF outper- formed with Co-training technique as
compared to their individual use in terms of Precision, Recall, and F-measure. In �Realizing a Stacking
Generalization Model to Improve the Prediction Accu- racy of Major Depressive Disorder in Adults� [12] the
authors developed a stack- ing generalization model for improving the accuracy in predicting MDD.
In the ?rst step, they have implemented a KNN Imputation preprocessing technique for han- dling the missing
values in the data. Then in the next step, the authors have used Random Forest-Based Backward Elimination,
which is a wrapper-based feature se- lection method for reducing the feature dimension, which would reduce
the feature interactions and helps in increasing the prediction accuracy. The initial number of features was 22,
and then RF-BE has reduced to 12 features with which further process.
The stacking generalisation is accomplished by combining three low-level learners, MLP, SVM, and RF, and
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021
Plagiarism Checking Result for your Document Page 8 of 19
then averaging them to create a Meta-level learner (MLP). The classi?ers are also implemented individually to
compare the results. The accuracy of individual classi?ers MLP, SVM, RF is 96.38%, 95.06%, and 96.90%,
respectively. The accuracy of the stacking generalization model is 98.16%. In �A Machine Learning based
Depression Analysis and Suicidal Ideation Detection System using Questionnaires and Twitter� [13] the
authors analyzed social media posts (especially twitter), conducted questionnaire and asked students and
According to the research, Department of Computer Engineering, AIT, Pune �Depression Detection using
Sentiment Analysis of Social Media Posts�� 8 major factors of depression among the age group of 15-29
which they found during the course of the project are parental pressure, love, failures, bullying, body sham-
ing, inferiority complex, exam pressure, peer pressure, physical and sexual abuse etc. Depression being a
recurrent type of illness, repeated episode of the same are common. Finally, little is known about the
Among future directions, the authors researched to understand how social media behavior analysis can help
in leading to development of methods for analyzing depression at scale. 2.2 Possible Challenges Some
possible risks / challenges associated with the project: - 1. Machine Learning algorithms cannot grant a human
level accuracy in predic- tion of depression. 2. There is signi?cant noise in the Tweets collected before pre-
processing, which would lead to a lot of unnecessary data due to third person and news references. 3.
Also, social media posts are highly imbalanced due to which machine learning models often develop a bias
which in turn leads to erroneous results. On the other hand, deep learning models require a huge amount of
data to train which is generally not possible with the datasets being used in earlier models. 2.3 Inferences from
Literature Survey 1. The literature survey conducted earlier clearly shows that for limited datasets Ma- chine
Learning algorithms will be most effective. 2. Na� ive Bayes works extremely well with textual data and gave
better results com- pared to SVM. Hence, we would use this algorithm for our purpose. 3.
We would use the Sentiment140 dataset for scraping depressive tweets and prepar- Department of Computer
Engineering, AIT, Pune �Depression Detection using Sentiment Analysis of Social Media Posts�� 9 ing
our ?nal dataset. 4. The Sentiment140 dataset contains 1,600,000 tweets extracted using the Twitter API. The
tweets have been annotated (0 = negative, 2 = neutral, 4 = positive) and they can be used to detect sentiment.
5. The dataset would be highly imbalanced and thus, we�ll use XGBoost Algorithm as a ?lter for avoiding
bias. 6.
Finally, model evaluation, cross validation and hyperparameter tuning will be car- ried out to test the
effectiveness of the model. Department of Computer Engineering, AIT, Pune Chapter 3 SOFTWARE
REQUIREMENT SPECIFICATION 3.1 Introduction � Purpose The aim of the project is to detect whether a
person is showing signs of clinical depression. The project will be developed using various machine learning
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021
Plagiarism Checking Result for your Document Page 9 of 19
The model would then predict whether a person is showing symptoms of depression and if yes, necessary
actions will be taken. The model developed will be trained on a comprehensive dataset containing a mix of
depressive and non-depressive tweets. The dataset would be prepared from a mix of various open source
repositories and data scraped through the Twitter API. A hybrid model comprising of XGBoost and Naive
� Intended Audience For the purpose of our project, the following end users have been identi?ed: - 1. The
project could be deployed along with the already existing social media platforms (esp. Twitter) whereby it can
fetch user posts and analyse depres- sion level in the individual. Here, the social media user acts as the end
user. 2. The project can also be bene?cial to psychologists who wish to study de- pression and mood
Hence, psychologists �Depression Detection using Sentiment Analysis of Social Media Posts�� 11 can be
taken to be the end user. 3. The project could be made open source which could help other researchers
making necessary strides in this ?eld, improve the model further. Hence, re- searchers make up the last set of
end users. � Scope The project deals with detecting depression in twitter posts using a machine learning
model which will be built as a combination of XGBoost and Na� ive Bayes algorithm. The literature survey
conducted in the initial phases pointed to the fact that Na� ive Bayes works best for textual data.
The XGBoost data will work as a ?lter which would make sure that the data imbalance is mini- mal. The
project would be limited to twitter data. The reasons for choosing twitter for our project have been listed below:
- 1. Twitter data is easy to handle. 2. Being text heavy, it is simple and easy to pre-process. 3. Quantitative
and Qualitative availability. 4. Smaller memory storage size required compared to image and video data. The
project would be deployed as an API which would be developed using FastAPI framework and Uvicorn.
The API will be hosted on Heroku, which is provides free cloud-based servers. We would use Twitter API for
interaction with Twitter databases for data retrieval. � De?nitions and Acronyms 1. API: It stands for
Application Programming Interface. An API is a set of programming code that enables data transmission
between one software prod- uct and another. It also contains the terms of this data exchange. 2. XGBoost: It
XGBoost is an op- timized distributed gradient boosting library designed to be highly ef?cient, ?exible and
portable. It implements machine learning algorithms under the Department of Computer Engineering, AIT,
Pune �Depression Detection using Sentiment Analysis of Social Media Posts�� 12 Gradient Boosting
framework. 3. Na� ive Bayes: Naive Bayes classi?ers are a collection of classi?cation al- gorithms based on
Bayes� Theorem. It is not a single algorithm but a family of algorithms where all of them share a common
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021
Plagiarism Checking Result for your Document Page 10 of 19
principle, i.e.,
every pair of features being classi?ed is independent of each other. 4. FastAPI: FastAPI is a modern, fast
(high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints. It
is one of the fastest python frameworks available. 5. Uvicorn: It is an ASGI server based on uvloop and
httptools, with an em- phasis on speed. 6. Heroku: Heroku is a platform as a service (PaaS) that enables
develop- ers to build, run, and operate applications entirely in the cloud. 3.2
Overall Description � Product Perspective The main motive of the project is not only to detect depression in
tweets but to also ensure that the user experience is not hampered. Due to this it is necessary that our system
works in the back-end and works as an independent module. Hence, the project would be built as an API
which will host our hybrid ML model on the cloud, which will only be active when the user tweets and will be in
The user will only interact with the Twitter interface whereas the model would constantly monitor tweets via the
Twitter API, which provides various features for developers who wish to work with twitter. The project will also
use a de- pression score metric for classifying a user as depressive. If the depression Department of
Computer Engineering, AIT, Pune �Depression Detection using Sentiment Analysis of Social Media
Posts�� 13 score exceeds a particular threshold value, the user will be classi?ed as de- pressive and
assistance in the form medical help noti?cations, positive feeds etc. will be provided.
� Constraints, Assumptions and Dependencies For simplicity and ensuring computability, we would make the
following as- sumptions: - 1. The model is highly accurate and would provide human-level depression
predictions. 2. The servers will always remain active and would never crash. 3. The user will express his/her
true emotions in his tweets and would not put-up depressive tweets unless depressed. In addition to the above
assumptions, the system would have some constraints some of which have identi?ed below: - 1.
Some amount of Latency will always be there regardless of how fast the servers are. 2. It is impossible to
achieve 100% accuracy and some cases of false positives will always be there. 3. For the development
phase, the Heroku servers used would need some time to start before they can be fully functional. Hence, the
server may not be avail- able at all times. The dependencies for the successful development and deployment
Software Dependencies: Windows OS, Anaconda, Jupyter, Python 3.7, Standard ML Libraries, Pipenv,
FastAPI, Uvicorn, Heroku CLI, An IDE (VS- Code, Sublime Text etc.). 2. Hardware Dependencies: Intel i5/i7
processor, 4/8 GB RAM, Heroku Cloud Server. 3. Other Dependencies: Sentiment140 Dataset, Twitter
Developer Account, curated word list of depressive keywords (available on GitHub [14]), Google Department
of Computer Engineering, AIT, Pune �Depression Detection using Sentiment Analysis of Social Media
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021
Plagiarism Checking Result for your Document Page 11 of 19
System Features and Requirements � External Interface Requirements 1. User Interface: - � Front-end:
Twitter Interface. � Back-end: Python Based ML Model (with FastAPI and Uvicorn) hosted on Heroku.
2.Hardware Interface: - � Not Applicable. 3. Software Interface: - � User Level Interface: Any OS (Windows
Preferable), Web Browser. 4. Communication Interface: - � Communication b/w components will be carried
out using HTTP Protocol and data transfer in JSON format � System Features 1.
The model must be able to correctly identify cases of clinical depression from tweets with high accuracy and a
decent F1-Score. 2. Latency should be minimal so as to increase ef?ciency of our system. 3. The model must
have the ability to deal with highly imbalanced data and should not be biased. 4. The model developed should
facilitate easy integration with the Twitter API so that it could be deployed in the real world. 5. The system
must ensure that user data is protected and con?dentiality is Department of Computer Engineering, AIT, Pune
�Depression Detection using Sentiment Analysis of Social Media Posts�� 15 maintained. � Non-Functional
Requirements 1. Performance Requirement The response time of the model must be as little as possible.
To achieve this we will build our API with FastAPI, which is one of the fastest python frame- works available.
2. Usability Requirement The system must be easy to use and should have the ability to be easily inte- grated
with existing software. This will be achieved by hosting our API on the cloud from where the model could
directly plugged in any application. 3. Reliability Requirement The system should be reliable and must produce
accurate results.
Also, the model must be available at all times and should not break down in events of failure (E.g., Server
failure etc.). To achieve this, we will use Heroku deploy- ment to achieve shared servers which would prevent
total failure. Department of Computer Engineering, AIT, Pune Chapter 4 ALGORITHM ANALYSIS AND
MATHEMATICAL MODELING 4.1 Naive Bayes The supervised learning algorithms based on Bayes�
theorem with the �naive� as- sumption of conditional independence between any pair of features given the
value of the class variable are known as naive Bayes methods. It�s a classi?cation method based on
A Naive Bayes classi?er, in simple terms, assumes that the existence of one function in a class is unrelated to
the presence of any other feature. Despite their oversimpli?ed assumptions, Naive Bayes classi?ers have
performed admirably in a number of real-world applications, most notably document classi?- cation and
spam ?ltering. To estimate the necessary parameters, they only need a small amount of training data. When
compared to more advanced methods, Naive Bayes learners and classi?ers can be extremely swift.
Since the class conditional feature distributions are decoupled, each distribution can be calculated as a one-
dimensional distribution independently. This, in essence, results in the alleviation of problems created by the
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021
Plagiarism Checking Result for your Document Page 12 of 19
curse of dimen- sionality. �Depression Detection using Sentiment Analysis of Social Media Posts�� 17 4.2
XGBoost XGBoost is a Gradient Boosting Machine Learning library that has been tailored. It was written in
The core XGBoost algorithm is parallelizable, which means it can run in parallel in a single tree. The decision
tree is made up of a set of binary questions, and the ?nal predictions are made at the leaf. XGBoost is
typically used for a tree as the base learner. XG- Boost is an ensemble system in and of itself. The trees are
designed in stages before a stopping criterion is reached. CART(Classi?cation and Regression Trees)
CART refers to trees in which each leaf contains a real-valued ranking, regardless of whether they are used
for classi?cation or regression. If required, real-valued scores can be translated to categories for classi?cation.
XGBoost makes use of advanced regularisation to increase model generalisation. XGBoost outperforms
Gradient Boosting in terms of ef?ciency. It has a short learn- ing curve and can be parallelized across clusters.
4.3 Proposed Algorithm The algorithms described above are state of the art algorithms which work very well
But we are dealing with highly imbalanced datasets and thus, our algorithm must have the ability to remove
this imbalance and avoid any bias. Our model will be developed as a combination of XGBoost and Naive
Bayes. This hybrid model would be able to extract most relevant information using the power of both
algorithms. The XGBoost layer will act as a ?lter which would remove imbalance in the data and the Naie
Department of Computer Engineering, AIT, Pune �Depression Detection using Sentiment Analysis of Social
Media Posts�� 18 4.4 Mathematical Modelling Our algorithm will be implemented as a combination of two
algorithms - XGBoost and Naive Bayes. Thus, the mathematics of the proposed system is based on these
algorithms. Let our dataset be comprised of the following sets of features: - 1. The set of independent
features, X = F1, F2, F3, . . . . . . . . . . 2. The dependent feature, y. The data is ?rst passed through the
XGBoost layer.
The objective function for XG- Boost is shown below: - Figure 4.1: XGBoost Objective Function The objective
function above comprises of the loss function as well as the regulariza- tion function. Our motive is to minimize
the above function. This is done internally using the Taylor approximation technique. And ?nally, we will have
This will be used further to calculate ?nal result At the Na� ive Bayes layer, Bayes Theorem is used for
prediction. The standard Bayes Theorem is represented by the formula below: - Department of Computer
Engineering, AIT, Pune �Depression Detection using Sentiment Analysis of Social Media Posts�� 19 Figure
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021
Plagiarism Checking Result for your Document Page 13 of 19
4.2: Bayes Theorem Here, P(y| X) is the posterior probability of class (y, target) given predictor (X, features). P
(y) is the prior probability of class. P(X| y) is the likelihood which is the probability of predictor given class. P(X)
In Na� ive Bayes we make the na� ive assumption that all the features are independent hence we�ll have: -
Figure 4.3: Independence of Features P(X) is constant and thus our earlier formula will reduce to: - Figure 4.4:
Proportionality Relation Department of Computer Engineering, AIT, Pune �Depression Detection using
Sentiment Analysis of Social Media Posts�� 20 The goal of Naive Bayes is to choose the class y with the
maximum probability. Thus, our ?nal optimization function will be: - Figure 4.5:
Optimization Function Let the Na� ive Bayes probability of prediction be P(nb). Finally, we will take weighted
average of the probabilities of prediction of both the algorithms and set a threshold value. If the ?nal
probability will surpass the thresh- old only then the tweet will be classi?ed as depressive. FinalProbability =
(k1* P (xg ) + k2* P (nb )) / (k1 + k2 ) (4.1) Department of Computer Engineering, AIT, Pune Chapter 5
Architectural Design The requirement analysis done earlier, makes way for identifying and analyzing the
various processes involved in the development of the project. Any machine learning / deep learning project
follows the data science life cycle which has been shown below. Figure 5.1: Machine Learning Life Cycle
�Depression Detection using Sentiment Analysis of Social Media Posts�� 22 The life cycle helps us identify
the primary processes which need to be followed for successful implementation of the project.
For the purpose of our project, four stages of technical work have been identi?ed which have been shown in
the block diagram and ?owchart below: - Figure 5.2: Architecture : Block Diagram Figure 5.3: Architecture :
Flowchart Department of Computer Engineering, AIT, Pune �Depression Detection using Sentiment Analysis
of Social Media Posts�� 23 5.2 UML Diagrams 5.2.1 Class Diagram Class diagram shows relationship and
dependency between various classes in the system.
For our purpose, we�ll use pre-de?ned classes of the Twitter API. The class diagram has been shown below.
Figure 5.4: UML : Class Diagram 5.2.2 Activity Diagram Activity diagram shows the sequential representation
of various activities involved in the project. It portrays the control ?ow from a start point to a ?nish point
showing the various decision paths that exist while the activity is being executed. The activity diagram for our
Department of Computer Engineering, AIT, Pune �Depression Detection using Sentiment Analysis of Social
Media Posts�� 24 Figure 5.5: UML : Activity Diagram 5.2.3 Use Case Diagram A use case diagram at its
simplest is a representation of a user�s interaction with the system that shows the relationship between the
user and the different use cases in which the user is involved. The use case diagram for our system has been
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021
Plagiarism Checking Result for your Document Page 14 of 19
shown below. Department of Computer Engineering, AIT, Pune �Depression Detection using Sentiment
Analysis of Social Media Posts�� 25 Figure 5.6: UML : Use Case Diagram 5.2.4
Sequence Diagram A sequence diagram shows object interactions arranged in time sequence. It depicts the
objects involved in the scenario and the sequence of messages exchanged be- tween the objects needed to
carry out the functionality of the scenario. Sequence Diagrams show elements as they interact over time and
they are organized accord- ing to object (horizontally) and time (vertically). Sequence diagrams are sometimes
Department of Computer Engineering, AIT, Pune �Depression Detection using Sentiment Analysis of Social
Media Posts�� 26 Figure 5.7: UML : Sequence Diagram 5.2.5 Deployment Diagram A UML deployment
diagram is a diagram that shows the con?guration of run time processing nodes and the components that live
on them. Deployment diagrams is a kind of structure diagram used in modeling the physical aspects of an
object-oriented system. They are often be used to model the static deployment view of a system (topology of
the hardware).
Deployment diagrams are important for visualizing, specifying, and documenting embedded, client/server, and
distributed systems and also for managing executable systems through forward and reverse engineering. A
deployment diagram is just a special kind of class diagram, which focuses on a system�s nodes. Graphically,
a deployment diagram is a collection of vertices and arcs. Department of Computer Engineering, AIT, Pune
�Depression Detection using Sentiment Analysis of Social Media Posts�� 27 Figure 5.8: UML : Deployment
Diagram 5.3
Data design 5.3.1 Internal software data structure The twitter API stores information in the form of various
classes internally and trans- mits this data in the form of JSON Objects. 5.3.2 Global data structure Our API
will extend the api/predict interface as the global structure accessible through the twitter API. This will send
Temporary data structure Some temporary ?les may be creted for storing user data which would be deleted
once our prediction is done. 5.3.4 Database description No external database will be used as such. However,
the Twitter API will enable us to access the twitter database server for tweets and other information.
Tasks Involved � Data Collection : Sentiment140 dataset, Scraping data using Twitter API / Tweepy. � Data
Pre-processing : NLTK library, Python ML libraries. � Model Selection : Na� ive Bayes, XGBoost etc. �
Model Evaluation : Precision, Recall, F1-score (Primary metric), Accuracy (Secondary metric). � Model
Improvement : Cross Validation, Hyperparameter tuning, data clean- ing, Co-training etc. 6.2 Technical Risks
� Machine Learning algorithms cannot grant a human level accuracy in predic- tion of depression.
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021
Plagiarism Checking Result for your Document Page 15 of 19
This risk can be mitigated by developing hybrid models. � There is signi?cant noise in the Tweets collected
before pre-processing, which would lead to a lot of unnecessary data due to third person and news refer-
ences. This risk is mitigated using Natural language Pre-processing libraries which help in extracting the most
useful words in textual data. �Depression Detection using Sentiment Analysis of Social Media Posts�� 29
� Social media posts are highly imbalanced due to which machine learning mod- els often develop a bias
which in turn leads to erroneous results. This risk is mitigated using Gradient Boosting.
� We need a huge amount of data to train machine learning models which is generally not possible with
existing repositories. Thus, we can scrape etra data using Twitter API. 6.3 Budget/Time Constraints � There
is no signi?cant budget constraint as all the software and hardware being used for the purpose of this project
will be open source. However, in the future more robust and fast hardware may be required to deploy the
� Time is limited and can be a factor which could lead to failure of the project. But this risk could be mitigated
by managing time properly using a well de- ?ned timeline. The work?ow timeline has been shown below: -
Figure 6.1: Project Timeline Department of Computer Engineering, AIT, Pune Chapter 7 CODING 7.1
Algorithms / Flowcharts The proposed model will be developed as a stacking generalized combination of
This hybrid model would be able to extract relevant information using the power of both algorithms. Here, the
XGBoost and Multinomial Na� ive Bayes models would act as the base-learners for our stack while a Logistic
Regression classi?er will act as the meta-model. The architecture of the stacking model is shown below: -
Figure 7.1: Proposed Model For developing the model, we need to follow the standard NLP procedures which
comprise of Data Collection, Data Cleaning, Tokenization, Model Selection with Hyperparameter Tuning,
The ?ow of events has been depicted below: - �Depression Detection using Sentiment Analysis of Social
Media Posts�� 31 Figure 7.2: Flowchart 7.2 Software Used 7.2.1 Utility Packages / Applications � OS :
Windows 10 � IDE : VSCode � Package Manager (Python): pip 7.2.2 Model Development � Dataset
Storage : MS-Excel(csv format). � Twitter API Scraping : tweepy & dotenv. � Python ML Libraries : numpy,
pandas, scikit-learn, matplotlib, seaborn, xg- boost, plotly, wordcloud etc. � IPython Kernel : Jupyter
Notebook & Google Colab.
� Saving Models : pickle module Department of Computer Engineering, AIT, Pune �Depression Detection
using Sentiment Analysis of Social Media Posts�� 32 7.2.3 Deployment � Front-end web development :
HTML, CSS, Javascript & Bootstrap 5 � API development : FastAPI & Uvicorn � Cloud Deployment: Heroku
7.3 Hardware Speci?ation The project utilizes resources openly available on the cloud and hence has no
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021
Plagiarism Checking Result for your Document Page 16 of 19
speci?c hardware requirements. The project has been developed on a system having an Intel i7 processor
along with 8GB RAM. 7.4 Programming Language The project has been developed in Python 3.7.
The reason for choosing Python lies in the fact that it is a simple & extremely powerful language which has a
huge repository of Machine Learning tools and libraries. 7.5 Platform The model was developed on the Google
Colab platform, which is a cloud based IPython Kernel integrated with the standard ML libraries. For
deployment, the Heroku cloud deployment platform has been utilized. 7.6 Coding Style Format The project
has been developed using PEP8 coding style format. PEP8 ( Python Enhancement Proposals 8) de?nes
useful naming conventions and other guidelines for programming in python and is extremely helpful in writing
Department of Computer Engineering, AIT, Pune Chapter 8 RESULT & ANALYSIS For testing the model, we
need to de?ne the metrics upon which the models must be evaluated. In our case the metrics de?ned are:
Precision, Recall, F1-score (pri- mary) and Accuracy (secondary). Evaluating the model using the traditional
train- test split method will not be of much use due to imbalanced distribution. Hence, we use Strati?ed 5-Fold
The stacking model (MNB + XGB) was tested against standalone MNB and XGB models to get a good
understanding of the properties of our model. The results for the 5-fold cross validation have been
summarized below: - Figure 8.1: State-of-the-art models vs Stacked Model �Depression Detection using
Sentiment Analysis of Social Media Posts�� 34 Figure 8.2: F1-Score Comparison Figure 8.3: Accuracy
Comparison Department of Computer Engineering, AIT, Pune �Depression Detection using Sentiment
Analysis of Social Media Posts�� 35 The results have shown convincingly that on both F1-score and
Accuracy metrics, the proposed stacking model has had superior results to the state-of-the-art models.
Our proposed stacking model gives an accuracy of 96% and an F1-Score of 93%. Department of Computer
Engineering, AIT, Pune Chapter 9 TESTING 9.1 Formal Technical Reviews Formal Technical Reviews helped
us in assessing how our model is performing at each stage of development. In Machine Learning, the logic is
not explicitly coded by the programmer but is inferred by the model so instead of performing traditional
software tests, we need to make sure that the model performs consistently at all times with all data-sets. So
For each fold, we assessed various metrics like precision, recall, F1-score and accuracy to obtain our ?nal
result. 9.2 Test Plan In this phase, we analyzed our methodology and identi?ed the key areas which could
lead to bugs. Using this information we came to the conclusion the following types of tests are to be carried
out for our project. � Unit Testing � Integration Testing � Aplha Testing � Beta Testing �Depression
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021
Plagiarism Checking Result for your Document Page 17 of 19
Test Cases & Results � Unit Testing : Here, our main aim was to make sure that each model works properly
without over?tting or under?tting the data. We used cross validation testing along with hyperparameter tuning
to make sure that all units(models) work consistently. � Integration Testing : Once the model and the web
interface were created, we had to make sure that integration of these two components didn�t cause the
system to break. Hence, we used Selenium WebDriver to test for any potential bug arising due to integration.
� Aplha Testing : This testing was done after deployment of the project to Heroku. This test helped us to
check the functionalities of the ?nal product. � Beta Testing : The ?nal deployed product was tested by
different members of our team on their systems and the project was put under different situations to assess its
endurance. The testing phase helped us identify bugs and rectify them. The testing results clearly show that
almost all bugs have been removed and the project works smoothly under all conditions.
Project con?guration management is managing the con?guration of all the project�s key products and assets.
This includes any end products that will be delivered to the customer, as well as all management products,
such as the project management plan and performance management baseline. Implementation of con?
guration manage- ment and project change management need to happen hand-in-hand.
Any change must be monitored and assessed to determine its impact on project con?guration. Thus, con?
guration management is an extremely important step in project develop- ment. Our project uses Git and
Heroku CLI for con?guration management. Different ver- sions of the project can be developed and changes
be pushed to Heroku directly using the Command Line Interface(CLI) provided by Heroku. During the develop-
ment phase we used Git for version control as it is openly available and easy to use.
After deployment, all con?guration related issues can be tackled by Heroku itself. This can range from version
control, con?guring add-ons, scaling of dyno formation, analyzing usage etc. Another very useful feature
available on Heroku is the Project Dashboard which provides UI support for tasks like viewing app metrics,
managing heroku teams, con?guring deployment integrations etc. Hence, Heroku comes with inbuilt features
Chapter 11 SOFTWARE QUALITY ASSURANCE PLAN Software quality is one of the most signi?cant factors
determining the success of the project. Software Quality Assurance Plan lays down the guidelines for ensuring
that at each step the software developed is up to the mark. The SQAP followed for our project is as follows: -
� All modules need to be developed using proper naming conventions and other speci?cations conforming to
the Python PEP8 standard. This step ensures that the code developed is consistent and neat. � Code must
be well documented, with Doc strings and images wherever possi- ble.
� Data collected through scraping via Twitter API must be manually evaluated to ?nd bad data points and
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021
Plagiarism Checking Result for your Document Page 18 of 19
remove them. This step makes sure that we have relevant data points in our data set and avoids the
�Garbage In Garbage Out� phenomena. � No module must be put into production without adequate testing.
This ensures that bugs are detected and recti?ed early. � During model development, the machine learning
based models must be tested and validated in each cycle to check for over-?tting and under-?tting of data.
� No con?dential information should be exposed by the API to the outside world. This would help in ensuring
that privacy of users is not compromised. �Depression Detection using Sentiment Analysis of Social Media
Posts�� 40 � The web interface should be developed keeping in mind good design practices for web apps.
This would ensure that the web app is responsive, consistent and easy to navigate. � The project should
regularly updated even after Deployment. This would make sure that existing bugs are ?xed and additional
Department of Computer Engineering, AIT, Pune Chapter 12 CONCLUSION To sum up, it has been well
established that depression is one of the leading issues faced by our society. Detecting depression early can
play a major role in preventing suicides and improving mental health of the society collectively. Social media
has been a revelation in sentiment analysis and can be used to effectively tackle depres- sion. Our project
uses twitter data to train a stack ensembled model consisting of Multino- mial Na� ive Bayes and XGBoost
base-models along with a Logistic Regression meta- classi?er.
The results have shown convincingly that on both F1-score and Accuracy metrics, the proposed stacking
model has had superior results to the state-of-the-art models. With a F1-score of 93% and an Accuracy of
96%, our model stands out as one of the best performers among all models developed to date. This project
could go a long way in integrating emotional AI with social media for eradicating depression from our society.
Chapter 13 References [1] A. H. B. P. O. M. H. . I. Orabi, �Deep Learning for Depression Detection of Twitter
Users.,�
Proceedings of the Fifth Workshop on Computational Linguis- tics and Clinical Psychology: From Keyboard to
Clinic. doi: 10.18653/v1/w18- 0609, 2018. [2] J. A. Naslund, K. A. Aschbrenner, G. J. McHugo, J. Un� utzer,
L. A. Marsch, and S. J. Bartels, �Exploring opportunities to support mental health care using social media: A
survey of social media users with mental illness,� Early Interv. Psychiatry, 2019. [3] R. Thorstad and P. Wolff,
�Predicting future mental illness from social media: A big-data approach,� Behav. Res. Methods, 2019.
[4] Mandar Deshpande and Vignesh Rao, � Depression detection using Emotional Arti?cial Intelligence., �
2017. [5] Cong, Z. Feng, F. Li, Y. Xiang, G. Rao and C. Tao, �X-A-BiLSTM: a Deep Learning Approach for
Depression Detection in Imbalanced Data,� 2018 IEEE International Conference on Bioinformatics and
Biomedicine (BIBM), Madrid, Spain, 2018, pp. 1624-1627, doi: 10.1109/BIBM.2018.8621230. [6] Trotzek,
Marcel Koitka, Sven Friedrich, Christoph. (2018). �Utilizing Neural Networks and Linguistic Metadata for Early
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021
Plagiarism Checking Result for your Document Page 19 of 19
IEEE Transactions on Knowledge and Data Engineer- ing. 32. 588-601. 10.1109/TKDE.2018.2885515.
�Depression Detection using Sentiment Analysis of Social Media Posts�� 43 [7] T. Yang, C. Wu, M. Su and
C. Chang, �Detection of mood disorder using modulation spectrum of facial action unit pro?les,� 2016
International Con- ference on Orange Technologies (ICOT), Melbourne, VIC, 2016, pp. 5-8, doi:
Islam, �Depres- sion Detection by Analyzing Social Media Posts of User,� 2019 IEEE Inter- national
Conference on Signal Processing, Information, Communication Sys- tems (SPICSCON), Dhaka, Bangladesh,
P. Yomaboot and Y. Kaewpitakkun, �Facebook Social Media for Depression Detection in the Thai
Community,� 2018 15th International Joint Conference on Computer Science and Soft- ware Engineering
(JCSSE), Nakhonpathom, 2018, pp. 1-6, doi: 10.1109/JC- SSE.2018.8457362. [10] X. Tao, R. Dharmalingam,
Gurura- jan, �Twitter Analysis for Depression on Social Networks based on Senti- ment and Stress,� 2019
6th International Conference on Behavioral, Economic and Socio-Cultural Computing (BESC), Beijing, China,
2019, pp. 1-4, doi: 10.1109/BESC48373.2019.8963550. [11] S. Tariq et al., �A Novel Co-Training-Based
Approach for the Classi?cation of Mental Illnesses Using Social Media Posts,� in IEEE Access, vol. 7, pp.
166165- 166172, 2019, doi: 10.1109/ACCESS.2019.2953087. [12] N. Mahendran, P. M. Durai Raj Vincent, K.
Jayakody, �Realizing a Stacking Generalization Model to Improve the Predic- tion Accuracy of Major
Depressive Disorder in Adults,� in IEEE Access, vol. 8, pp. 49509-49522, 2020, doi:
10.1109/ACCESS.2020.2977887. [13] S. Jain, S. P. Narayan, R. K. Dewang, U. Bhartiya, N. Meena and V.
Ku- mar, �A Machine Learning based Depression Analysis and Suicidal Ideation Department of Computer
Engineering, AIT, Pune �Depression Detection using Sentiment Analysis of Social Media Posts�� 44
Detection System using Questionnaires and Twitter,� 2019 IEEE Students Con- ference on Engineering and
Systems (SCES), Allahabad, India, 2019, pp. 1-6, doi: 10.1109/SCES46477.2019.8977211. [14]
file:///C:/Users/hp/AppData/Local/Temp/3F8XUI8B.htm 05-06-2021