0% found this document useful (0 votes)
20 views13 pages

NLPin Stock Marketpredictionby Rodrigue Andrawos

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views13 pages

NLPin Stock Marketpredictionby Rodrigue Andrawos

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/360726066

NLP in Stock Market Prediction: A Review

Preprint · May 2022


DOI: 10.13140/RG.2.2.17142.68160

CITATIONS READS
0 4,312

1 author:

Rodrigue Andrawos
Lebanese American University
2 PUBLICATIONS 1 CITATION

SEE PROFILE

All content following this page was uploaded by Rodrigue Andrawos on 20 May 2022.

The user has requested enhancement of the downloaded file.


NLP in Stock Market Prediction: A Review
Rodrigue Andrawos
rodrigue.andrawos@lau.edu

Computer Science and Mathematics Department, LAU

CSC688J course by Dr. Pauline Maouad

16 May 2022

Abstract
Stock market prediction is the act of trying to determine the fu-
ture value of a company stock or other financial instrument traded on
an exchange. The successful prediction of a stock’s future price could
yield significant profit. The use of Text Mining together with Ma-
chine Learning algorithms received more attention in the last years,
with the use of textual content from Internet as input to predict price
changes in Stocks and other financial markets. This review focuses on
how can NLP be used by traders, investors, and financial analysts to
get the most out of textual, numerical and sentiment analysis. Stock
movements are hard to predict, but the researches covered by this
review used textual and numerical data with different machine learn-
ing models to predict certain stocks’ movements, achieving promising
results.

1 Introduction
Known as NLP in the Tech industry, Natural Language Processing has
been around for nearly 50 years now [1]. It is a form of Artificial Intelligence
denoting the capability of a computer program to understand natural human
language; as it is spoken and written. NLP is used on a daily basis via our

1
smartphones, PCs, smart homes, and a variety of everyday instruments in
the form of personal assistants like Apple’s Siri and Amazon’s Alexa along
with other voice controlled smart home gadgets. It is also widely used in
email filtering, speech recognition, chat-bots, auto-correct, search engines,
language translation services, sentiment analysis, text processing and classi-
fication, and countless other applications in different fields including business
intelligence, medical analysis, scientific research, and financial applications.
NLP allows economical analysts to obtain relevant details through infor-
mation extraction and filtering for risk analysis and stock market prediction.
Stock market prediction aims to determine the future movement of the
stock value of a financial exchange. The accurate prediction of share price
movement will lead to more profit investors can make [4]. Due to the stock
market volatility, price fluctuations based on sentiment and news reports are
common. Traders draw upon a wide variety of publicly-available informa-
tion to inform their market decisions [23]. Stock prices are determined by
the supply and demand of investors who are heavily influenced by market
psychology and the public mood [7]. Some of many news analytic sources
which can be used include mainstream media, print media, social media, news
feeds, blogs, investors’ advisory portals, experts’ opinions, brokers updates,
web-based information, company’ internal news and public announcements
regarding policies and reforms [12]. The financial sector is a significant driver
of broader industry, and the increasing amount of data in this field has given
rise to a number of applications that can be used to improve the field and
achieve commercial objectives [11].
To build sentiment index, Faisal Khalil and Gordon Pipa [12] parsed
and analyzed text analytic information, collected textual information that
is relevant to selected stocks, aggregated, categorized, refined it with NLP
and eventually converted scientifically into hourly sentiment index. Thien
Hai Nguyen and Kiyoaki Shirai [15] built a model to predict stock price
movement using sentiments on social media, a new feature which captures
topics and their sentiments simultaneously is introduced in the prediction
model.
The goal of this review is to look into the methods used and the achieved
results along with the latest discoveries by researchers to predict financial
and stock markets using NLP and sentiment analysis. Furthermore, how
accurate can NLP be as a predictive feature in financial forecasting covering
recent advances in stock market prediction using data mining.

2
Figure 1: The experimental flow chart of a proposed stock price forecast
model [17]

2 Methodology and Market Function


Many papers on extracting information from texts and documents have
been published. Abdullah, Rahaman and Rahman [2] proposed a model that
shows users the set of stocks that is suitable for their choice based on their
input. Muxi Xu [22] has developed a couple of different neural networks to
forecast the final market movement, using four major components or mod-
ules.
[19] did sentiment analysis of a selected stock and then suggest whether
to buy, sell or hold, then they calculated the maximum risk involved in the
investment using a threefold approach; market risk, sector risk and stock
risk and finally calculated expected return and compared them with actual
returns using Support Vector Regression (SVR) which also has been used to
predict future prices.
Analyzing the stock market requires analysis to textual data related to
the market as well as analysis with numerical data. The textual data may
be news, articles, notice, announcement or sometimes rumors about some
stocks or the whole stock market. On the other hand, numerical data actually

3
represents price, volume, turnover, number of outstanding shares etc. of the
enlisted companies. However, both type of information is equally important
for making any decision on stock market [2].
Despite the fact that few of the forecasting systems reported in the lit-
erature have been shown to make a profit in the long run with transaction
cost deducted, many meaningful hypotheses and significant observations have
been drawn from stock market data [21].
The value of stock prices is influenced by a number of other factors as
well, like Political Stability, Growth of GDP, Inflation, Liquidity and different
interest rate and other factors. [10]
While the prior consensus may have been that financial markets are un-
predictable, text mining has challenged this notion [11]. Text mining is a
process through which the user derives high-quality information from a given
piece of text and has seen a significant increase in demand over the last few
years [11].

3 Stock Prediction Models


In this section we explore the models, techniques and approaches used by
researchers for stock market prediction: what methods are used, what kind
of data is used, what are the implemented preprocessing techniques, while
uncovering the achieved results of the different models.

3.1 Deep learning using numerical and textual infor-


mation [3]
A paper by Akita, Yoshihara, Matsubara and Uehara at Kobe University
[3] predicted 10 companies’ closing stock prices by regression from textual
and numerical information by using long short term memory (LSTM), which
can memorize the previous time-steps due to its architecture. They used
multiple companies to learn more about their correlations. For example, an
event like “Nissan recalls...” might make Nissan’s stock price decrease while
making the stock price of Toyota (another company in the same industry) to
increase at the same time [3]. They used the morning edition of the Nikkei
newspaper published from 2001 to 2008 for the experiments, with the news
from year 2001 to 2006 as the training data, 2007 as validation data, and 2008
as test data. They chose the 10 companies that most frequently appeared in

4
the news articles.

3.2 A Robust Predictive Model [14]

In their paper published on July 2021 at NSHM Knowledge Campus, Sidra


and Jaydip [14] have presented several approaches to stock price and move-
ment prediction on a weekly forecast horizon using eight regression and eight
classification methods. These models are based on machine learning and
deep learning approaches built and fine-tuned then tested using daily his-
torical data of NIFTY 50 from January 2, 2015 till June 28, 2019. Raw
data is suitably pre-processed. After designing and testing the models, the
predictive framework is further augmented by bringing in public sentiment
of social media in addition to the historical stock prices, as the two inputs
a fuzzy neural network-based SOFNN algorithm. The performance of this
sentiment analysis-enhanced model is found to be the best among all models
in its ability to accurately forecast the stock price movement of NIFTY 50.
The study has conclusively proved that public sentiments in the social media
serve as a very significant input in predictive model building for stock price
movement [14].

3.3 Analyzing firms’ 10-K and 10-Q reports to identify


sentiment [9]
Processing text as data for statistical analysis can be a high-dimensional
problem. Since machine learning excels in handling high-dimensional data,
Alexander Fleiss and Han Cui [9] investigated three supervised machine
learning models. More specifically, they employed the following machine
learning classifiers in their project: logistic regression, random forest and
XGBoost. They trained the algorithms on 10-K and 10-Q reports of 48 com-
panies in the S&P 500 from 2013 to 2017 and tested their models on the cor-
responding reports from 2018 to 2019. The 10-K is an annual report filed by
a publicly-traded company and includes its history, organizational structure,
financial statements, earnings per share, subsidiaries, executive compensa-
tion, and any other relevant data. The 10-Q on the other hand is a quarterly
report that shows the company’s financial statements, management discus-
sion and analysis, disclosures, and internal controls. They obtained the data
using web scraping from the SEC EDGAR website, distilled the text using
Beautiful Soup and RegEx, then extracted the management sections using

5
string slicing. Lastly, after verifying the models’ accuracy on out-of-sample
data, they created investment portfolios and examined their performance
over time, and achieved the following results: All three models achieve the
same trading performance metrics for the 3-day holding period. XGBoost
performs best for the 5-day and 30-day holding period. Logistic regression
outperforms XGBoost for the 60-day holding period, which may be as a result
of overfitting. This study [9] concluded that using machine learning models
that incorporate dictionary, word embedding, and contextual models shows
promising results.

3.4 Stock market forecasting using NLP and LSTM


[13]

A paper published by IJERT [13] shows how they extracted information


from news and latest trends to forecast the market. Their methodology
consisted of the following steps: data Collection, data Preprocessing, splitting
the dataset into training and test data, building a LSTM model, then make
predictions. They incorporated various text pre-processing methods such as
stop-words removal, normalization, lemmatization, stemming, tokenization,
BOW, TF/IDF and LSTM(Long Short Term Memory) which is a special kind
of Recurrent Neural Network, capable of learning long-term dependencies in
time series. After training, their model [13] learns from previous stock price
close and improves in terms of accuracy. They used API to get data of AAPL
stocks which can be used well with Bombay Stock Exchange, National Stock
Exchange, or NASDAQ. The result is a graphical representation forecasting
the next 30 days based on the past 100 days, with good accuracy and low
error rates.

3.5 News impact on stock market trends using NLP


and ML algorithms [10]
News is a common way via which people get updates about the latest hap-
penings around the world and hence form opinions about industries, compa-
nies and stocks. This affects their trading decisions. [10] focuses on building
software models that could analyze general news during trading hours and
predict the probable stock index closing trend for the end of that day. They
used top 25 articles from the Reddit World News Channel and tried to cor-
relate their impact on the DJIA in this study.

6
In the Language Processing Approach, the MLP classifier provided the
highest testing accuracy of 85.70% among the four (Logistic regression on
unigram vocabulary; Logistic regression, k-NN and MLP classifier on bi-
gram vocabulary). Using a model built on this algorithm, the prediction
for stock trends based on real-time news articles can be done with a fair
accuracy and accordingly, based on predicted trends for the day, short-term
traders can make intelligent investment decisions. In the Sentiment Analysis
Approach, the MLP Classifier provided the highest accuracy of 57.80%.
[10] also concluded that the DJIA status prediction solely using simply
the vocabulary building approach, or the sentiment analysis approach on
the everyday top 25 Reddit World News items via classification algorithms
is not highly efficient. A reason for that low efficiency is that the value of
stock prices is influenced by a number of other factors as well, like Political
Stability, Growth of GDP, Inflation, Liquidity and different interest rate. A
complete analysis on basis of just one factor might indeed be helpful, but
won’t be very fruitful. A predictive model that would take into account
all the above-mentioned factors would possibly serve with a much higher
accuracy. Companies can use this model to analyze what value they lose
or gain from a particular news trend and what kind of incidents triggers
investors into investing more money into the market.

3.6 LSTM-based sentiment analysis [6]

Ko and Chang [6] used the news articles and PTT forum discussions as
the fundamental analysis, and the stock historical transaction information as
technical analysis. BERT is used to recognize the sentiments of text, LSTM
is applied to forecast the stock price with stock historical transaction infor-
mation and text sentiments. The average root mean square error (RMSE)
has 12.05 accuracy improvement, according to their proposed model results.

4 Social Interactions
It is a well known thing between traders and investors that the first enemy
is one’s emotions. Any person who can’t be emotionally intelligent can’t beat
the market. This section is an overview of NLP applications implemented for
emotion detection, and having a fair idea of the overall sentiment between
people.

7
A research by Tanya Nijhawan, Girija Attigeri and T. Ananthakrishna
[16] aimed to extend sentiment and emotion analysis for detecting the stress
of an individual based on the posts and comments shared by him/ her on so-
cial networking platforms. They leveraged large-scale datasets with tweets to
accomplish sentiment analysis with the aid of machine learning algorithms
and a deep learning model, BERT for sentiment classification. They also
adopted Latent Dirichlet Allocation which is an unsupervised machine learn-
ing method for scanning a group of documents, recognizing the word and
phrase patterns within them, and gathering word groups and alike expres-
sions that most precisely illustrate a set of documents. This helped to predict
which topic is linked to the textual data. Detection of the emotions of users
online can be achieved with the aid of these models. Further, these emotions
can be used to analyze stress or depression. In conclusion, the ML models
and a BERT model have a very good detection rate.
[5] discussed a methodology by which it is possible to determine the pop-
ularity, opinion and sentiment of a product in different locations across male
and female users, and the methodology defined is much generalized and can
be applied to tweets from any country for any product as long as a suitable
number of tweets can be obtained.
This can be further implemented to detect stress and emotions across
traders and investors, or by using data from twitter financial news pages,
trading channels, to detect the market sentiment at the time; bullish or
bearish, fear or greed, and lots of other useful info can be extracted.

5 Limitations
The huge amount of data available is highly unstructured and has explicit
meanings in addition to implicit ones. The data needs to undergo proper pre-
processing before it can be used for analysis [11]. Although lexicon lists are
available for various domains, the financial sector has to have a specific dic-
tionary for such approaches, so as to assign proper weights to corresponding
aspects in the document [11].
Yancong and Hongxun [20] forecasted the stock market using the text
mining technology: A Support Vector Machine Method, concluding that
when the news amount is low, the predicting result is not good enough and
that some basic and simple linear regression algorithm can do a better job.
There are also several aspects needing to make sure and improve in order

8
to get a more robust SVM method: adding more text sources, designing a
standard sentiment evaluation system and finding more specialists to score
the sentiment dictionary, making sure the stock to be predicted has enough
text documents and trading volume while also expanding the dataset [20].

6 Future Scope
According to investopedia.com, fundamental analysts have a staid ap-
proach to analyzing stock performance. They look at a variety of factors that
they believe influence a stock’s performance. These include the industry as a
whole, the competition, a company’s management structure, its income and
revenue, as well as its growth potential. And one of the main drivers of the
stock market is the overall sentiment across the wide variety of investors.
That along with lots of other challenges, makes it hard to predict stock
market movements precisely, but it doesn’t stop some NLP researchers and
techniques from achieving results with a remarkable accuracy [10] [9] [3]. By
implementing the right NLP technique on the right datasets, traders and
financial analysts can achieve milestones that can’t be done else way.
[21] suggests that online, or real-time algorithms will modify the key
variables stored with the model each time a new batch of data comes in. For
this reason, online models have very good adaptability, which is necessary
for monitoring fast-changing markets.
With the help of reinforcement learning the next milestone is to create a
bot and regulate it with a reward function for getting returns on the stock
market [18]. Further another aspect of future work is to achieve a real time
stock market prediction for intraday trading with the help of assessing real
time news and real time stock price movements [18].
Investment markets in the past two decades have been kind to those who
are experts in recognizing patterns in prices. The next two decades may
reward individuals who are experts in uncovering clues in previously hard-
to-discern places, including patterns in human speech, that massive data
availability and near-costless processing have opened up for investigation [8].

9
References
[1] https://www.techtarget.com/searchenterpriseai/definition/natural-
language-processing-NLP. Accessed: May 5, 2022.

[2] Sheikh Abdullah, Mohammad Rahaman, and Mohammad Rahman.


Analysis of stock market using text mining and natural language pro-
cessing, 05 2013.

[3] Ryo Akita, Akira Yoshihara, Takashi Matsubara, and Kuniaki Uehara.
Deep learning for stock prediction using numerical and textual informa-
tion. In 2016 IEEE/ACIS 15th International Conference on Computer
and Information Science (ICIS), pages 1–6, 2016.

[4] Faten Subhi Alzazah and Xiaochun Cheng. Recent Advances in Stock
Market Prediction Using Text Mining: A Survey. In Robert M.X. Wu
and Marinela Mircea, editors, E-Business - Higher Education and Intel-
ligence Applications, Chapters. IntechOpen, March 2021.

[5] Akib Anwar, M Tahmid Ekram, Mohammad Samiul Islam, Faysal


Ahmed, and Mohammad Rahman. Localized twitter opinion mining
using sentiment analysis. Decision Analytics, 2, 12 2015.

[6] Ko C and Chang H. Lstm-based sentiment analysis for stock price


forecast. PeerJ Computer Science 7:e408, 2021.

[7] Jaebin (Jay) Chang. Natural language processing as a predictive feature


in financial forecasting. EAS499 Senior Thesis, 4 2020.

[8] Marina Druz, Ivan Petzev, Alexander F. Wagner, and Richard J. Zeck-
hauser. When managers change their tone, analysts and investors change
their tune. In Financial Analysts Journal, 2020, 76( 2): 47–69, HKS
Working Paper No. 16-004, Swiss Finance Institute Research Paper No.
15-02, 12 2019.

[9] Alexander Fleiss and Han Cui. Forecasting stock price changes using
natural language processing, 10 2021.

[10] Publishing India Group. Understanding the impact of news on stock


market trends using natural language processing and machine learning

10
algorithms. International Journal of Knowledge Based Computer Sys-
tems 6.2, page 23–30, 2018.

[11] A. Gupta, V. Dengre, and H.A. Kheruwala. Comprehensive review of


text-mining applications in finance. 2020.

[12] Faisal Khalil and Gordon Pipa. Is deep-learning and natural language
processing transcending the financial forecasting? investigation through
lens of news analytic process, 06 2021.

[13] Dileep Kumar. Stock forecasting using natural language and recurrent
network. In 2020 3rd International Conference on Emerging Technolo-
gies in Computer Engineering: Machine Learning and Internet of Things
(ICETCE), pages 1–5, 2020.

[14] Sidra Mehtab and Jaydip Sen. A robust predictive model for stock price
prediction using deep learning and natural language processing. SSRN
Electronic Journal, 01 2019.

[15] Thien Hai Nguyen and Kiyoaki Shirai. Topic modeling based sentiment
analysis on social media for stock market prediction. In Proceedings of
the 53rd Annual Meeting of the Association for Computational Linguis-
tics and the 7th International Joint Conference on Natural Language
Processing (Volume 1: Long Papers), pages 1354–1364, Beijing, China,
July 2015. Association for Computational Linguistics.

[16] Tanya Nijhawan, Girija Attigeri, and Ananthakrishna Thalengala.


Stress detection using natural language processing and machine learning
over social interactions. Journal of Big Data, 9, 12 2022.

[17] PeerJ. Image 1: The experimental flow chart of a proposed stock price
forecast model. [Online; accessed May 8, 2022].

[18] Priyank Sonkiya, Vikas Bajpai, and Anukriti Bansal. Stock price pre-
diction using bert and gan. ArXiv, abs/2107.09055, 2021.

[19] Nishant Verma, S G David Raj, Ackley J Lyimo, and Kakelli Anil
Kumar. Stock Market Prediction and Risk Analysis using NLP and
Machine Learning. International Journal of Engineering and Advanced
Technology (IJEAT), 9(5):813–815, June 2020.

11
[20] Yancong Xie and Hongxun Jiang. Stock market forecasting based on
text mining technology: A support vector machine method, 09 2019.

[21] Frank Z. Xing, Erik Cambria, and Roy E. Welsch. Natural language
based financial forecasting: A survey. Artif. Intell. Rev., 50(1):49–73,
jun 2018.

[22] Muxi Xu. Nlp for stock market prediction with reddit data. 2021.

[23] Jinjian Zhai, Nicholas Cohen, and Anand Atreya. Sentiment analysis
of news articles for financial signal prediction. CS224N Final Project, 4
2020.

12

View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy