NLPin Stock Marketpredictionby Rodrigue Andrawos
NLPin Stock Marketpredictionby Rodrigue Andrawos
net/publication/360726066
CITATIONS READS
0 4,312
1 author:
Rodrigue Andrawos
Lebanese American University
2 PUBLICATIONS 1 CITATION
SEE PROFILE
All content following this page was uploaded by Rodrigue Andrawos on 20 May 2022.
16 May 2022
Abstract
Stock market prediction is the act of trying to determine the fu-
ture value of a company stock or other financial instrument traded on
an exchange. The successful prediction of a stock’s future price could
yield significant profit. The use of Text Mining together with Ma-
chine Learning algorithms received more attention in the last years,
with the use of textual content from Internet as input to predict price
changes in Stocks and other financial markets. This review focuses on
how can NLP be used by traders, investors, and financial analysts to
get the most out of textual, numerical and sentiment analysis. Stock
movements are hard to predict, but the researches covered by this
review used textual and numerical data with different machine learn-
ing models to predict certain stocks’ movements, achieving promising
results.
1 Introduction
Known as NLP in the Tech industry, Natural Language Processing has
been around for nearly 50 years now [1]. It is a form of Artificial Intelligence
denoting the capability of a computer program to understand natural human
language; as it is spoken and written. NLP is used on a daily basis via our
1
smartphones, PCs, smart homes, and a variety of everyday instruments in
the form of personal assistants like Apple’s Siri and Amazon’s Alexa along
with other voice controlled smart home gadgets. It is also widely used in
email filtering, speech recognition, chat-bots, auto-correct, search engines,
language translation services, sentiment analysis, text processing and classi-
fication, and countless other applications in different fields including business
intelligence, medical analysis, scientific research, and financial applications.
NLP allows economical analysts to obtain relevant details through infor-
mation extraction and filtering for risk analysis and stock market prediction.
Stock market prediction aims to determine the future movement of the
stock value of a financial exchange. The accurate prediction of share price
movement will lead to more profit investors can make [4]. Due to the stock
market volatility, price fluctuations based on sentiment and news reports are
common. Traders draw upon a wide variety of publicly-available informa-
tion to inform their market decisions [23]. Stock prices are determined by
the supply and demand of investors who are heavily influenced by market
psychology and the public mood [7]. Some of many news analytic sources
which can be used include mainstream media, print media, social media, news
feeds, blogs, investors’ advisory portals, experts’ opinions, brokers updates,
web-based information, company’ internal news and public announcements
regarding policies and reforms [12]. The financial sector is a significant driver
of broader industry, and the increasing amount of data in this field has given
rise to a number of applications that can be used to improve the field and
achieve commercial objectives [11].
To build sentiment index, Faisal Khalil and Gordon Pipa [12] parsed
and analyzed text analytic information, collected textual information that
is relevant to selected stocks, aggregated, categorized, refined it with NLP
and eventually converted scientifically into hourly sentiment index. Thien
Hai Nguyen and Kiyoaki Shirai [15] built a model to predict stock price
movement using sentiments on social media, a new feature which captures
topics and their sentiments simultaneously is introduced in the prediction
model.
The goal of this review is to look into the methods used and the achieved
results along with the latest discoveries by researchers to predict financial
and stock markets using NLP and sentiment analysis. Furthermore, how
accurate can NLP be as a predictive feature in financial forecasting covering
recent advances in stock market prediction using data mining.
2
Figure 1: The experimental flow chart of a proposed stock price forecast
model [17]
3
represents price, volume, turnover, number of outstanding shares etc. of the
enlisted companies. However, both type of information is equally important
for making any decision on stock market [2].
Despite the fact that few of the forecasting systems reported in the lit-
erature have been shown to make a profit in the long run with transaction
cost deducted, many meaningful hypotheses and significant observations have
been drawn from stock market data [21].
The value of stock prices is influenced by a number of other factors as
well, like Political Stability, Growth of GDP, Inflation, Liquidity and different
interest rate and other factors. [10]
While the prior consensus may have been that financial markets are un-
predictable, text mining has challenged this notion [11]. Text mining is a
process through which the user derives high-quality information from a given
piece of text and has seen a significant increase in demand over the last few
years [11].
4
the news articles.
5
string slicing. Lastly, after verifying the models’ accuracy on out-of-sample
data, they created investment portfolios and examined their performance
over time, and achieved the following results: All three models achieve the
same trading performance metrics for the 3-day holding period. XGBoost
performs best for the 5-day and 30-day holding period. Logistic regression
outperforms XGBoost for the 60-day holding period, which may be as a result
of overfitting. This study [9] concluded that using machine learning models
that incorporate dictionary, word embedding, and contextual models shows
promising results.
6
In the Language Processing Approach, the MLP classifier provided the
highest testing accuracy of 85.70% among the four (Logistic regression on
unigram vocabulary; Logistic regression, k-NN and MLP classifier on bi-
gram vocabulary). Using a model built on this algorithm, the prediction
for stock trends based on real-time news articles can be done with a fair
accuracy and accordingly, based on predicted trends for the day, short-term
traders can make intelligent investment decisions. In the Sentiment Analysis
Approach, the MLP Classifier provided the highest accuracy of 57.80%.
[10] also concluded that the DJIA status prediction solely using simply
the vocabulary building approach, or the sentiment analysis approach on
the everyday top 25 Reddit World News items via classification algorithms
is not highly efficient. A reason for that low efficiency is that the value of
stock prices is influenced by a number of other factors as well, like Political
Stability, Growth of GDP, Inflation, Liquidity and different interest rate. A
complete analysis on basis of just one factor might indeed be helpful, but
won’t be very fruitful. A predictive model that would take into account
all the above-mentioned factors would possibly serve with a much higher
accuracy. Companies can use this model to analyze what value they lose
or gain from a particular news trend and what kind of incidents triggers
investors into investing more money into the market.
Ko and Chang [6] used the news articles and PTT forum discussions as
the fundamental analysis, and the stock historical transaction information as
technical analysis. BERT is used to recognize the sentiments of text, LSTM
is applied to forecast the stock price with stock historical transaction infor-
mation and text sentiments. The average root mean square error (RMSE)
has 12.05 accuracy improvement, according to their proposed model results.
4 Social Interactions
It is a well known thing between traders and investors that the first enemy
is one’s emotions. Any person who can’t be emotionally intelligent can’t beat
the market. This section is an overview of NLP applications implemented for
emotion detection, and having a fair idea of the overall sentiment between
people.
7
A research by Tanya Nijhawan, Girija Attigeri and T. Ananthakrishna
[16] aimed to extend sentiment and emotion analysis for detecting the stress
of an individual based on the posts and comments shared by him/ her on so-
cial networking platforms. They leveraged large-scale datasets with tweets to
accomplish sentiment analysis with the aid of machine learning algorithms
and a deep learning model, BERT for sentiment classification. They also
adopted Latent Dirichlet Allocation which is an unsupervised machine learn-
ing method for scanning a group of documents, recognizing the word and
phrase patterns within them, and gathering word groups and alike expres-
sions that most precisely illustrate a set of documents. This helped to predict
which topic is linked to the textual data. Detection of the emotions of users
online can be achieved with the aid of these models. Further, these emotions
can be used to analyze stress or depression. In conclusion, the ML models
and a BERT model have a very good detection rate.
[5] discussed a methodology by which it is possible to determine the pop-
ularity, opinion and sentiment of a product in different locations across male
and female users, and the methodology defined is much generalized and can
be applied to tweets from any country for any product as long as a suitable
number of tweets can be obtained.
This can be further implemented to detect stress and emotions across
traders and investors, or by using data from twitter financial news pages,
trading channels, to detect the market sentiment at the time; bullish or
bearish, fear or greed, and lots of other useful info can be extracted.
5 Limitations
The huge amount of data available is highly unstructured and has explicit
meanings in addition to implicit ones. The data needs to undergo proper pre-
processing before it can be used for analysis [11]. Although lexicon lists are
available for various domains, the financial sector has to have a specific dic-
tionary for such approaches, so as to assign proper weights to corresponding
aspects in the document [11].
Yancong and Hongxun [20] forecasted the stock market using the text
mining technology: A Support Vector Machine Method, concluding that
when the news amount is low, the predicting result is not good enough and
that some basic and simple linear regression algorithm can do a better job.
There are also several aspects needing to make sure and improve in order
8
to get a more robust SVM method: adding more text sources, designing a
standard sentiment evaluation system and finding more specialists to score
the sentiment dictionary, making sure the stock to be predicted has enough
text documents and trading volume while also expanding the dataset [20].
6 Future Scope
According to investopedia.com, fundamental analysts have a staid ap-
proach to analyzing stock performance. They look at a variety of factors that
they believe influence a stock’s performance. These include the industry as a
whole, the competition, a company’s management structure, its income and
revenue, as well as its growth potential. And one of the main drivers of the
stock market is the overall sentiment across the wide variety of investors.
That along with lots of other challenges, makes it hard to predict stock
market movements precisely, but it doesn’t stop some NLP researchers and
techniques from achieving results with a remarkable accuracy [10] [9] [3]. By
implementing the right NLP technique on the right datasets, traders and
financial analysts can achieve milestones that can’t be done else way.
[21] suggests that online, or real-time algorithms will modify the key
variables stored with the model each time a new batch of data comes in. For
this reason, online models have very good adaptability, which is necessary
for monitoring fast-changing markets.
With the help of reinforcement learning the next milestone is to create a
bot and regulate it with a reward function for getting returns on the stock
market [18]. Further another aspect of future work is to achieve a real time
stock market prediction for intraday trading with the help of assessing real
time news and real time stock price movements [18].
Investment markets in the past two decades have been kind to those who
are experts in recognizing patterns in prices. The next two decades may
reward individuals who are experts in uncovering clues in previously hard-
to-discern places, including patterns in human speech, that massive data
availability and near-costless processing have opened up for investigation [8].
9
References
[1] https://www.techtarget.com/searchenterpriseai/definition/natural-
language-processing-NLP. Accessed: May 5, 2022.
[3] Ryo Akita, Akira Yoshihara, Takashi Matsubara, and Kuniaki Uehara.
Deep learning for stock prediction using numerical and textual informa-
tion. In 2016 IEEE/ACIS 15th International Conference on Computer
and Information Science (ICIS), pages 1–6, 2016.
[4] Faten Subhi Alzazah and Xiaochun Cheng. Recent Advances in Stock
Market Prediction Using Text Mining: A Survey. In Robert M.X. Wu
and Marinela Mircea, editors, E-Business - Higher Education and Intel-
ligence Applications, Chapters. IntechOpen, March 2021.
[8] Marina Druz, Ivan Petzev, Alexander F. Wagner, and Richard J. Zeck-
hauser. When managers change their tone, analysts and investors change
their tune. In Financial Analysts Journal, 2020, 76( 2): 47–69, HKS
Working Paper No. 16-004, Swiss Finance Institute Research Paper No.
15-02, 12 2019.
[9] Alexander Fleiss and Han Cui. Forecasting stock price changes using
natural language processing, 10 2021.
10
algorithms. International Journal of Knowledge Based Computer Sys-
tems 6.2, page 23–30, 2018.
[12] Faisal Khalil and Gordon Pipa. Is deep-learning and natural language
processing transcending the financial forecasting? investigation through
lens of news analytic process, 06 2021.
[13] Dileep Kumar. Stock forecasting using natural language and recurrent
network. In 2020 3rd International Conference on Emerging Technolo-
gies in Computer Engineering: Machine Learning and Internet of Things
(ICETCE), pages 1–5, 2020.
[14] Sidra Mehtab and Jaydip Sen. A robust predictive model for stock price
prediction using deep learning and natural language processing. SSRN
Electronic Journal, 01 2019.
[15] Thien Hai Nguyen and Kiyoaki Shirai. Topic modeling based sentiment
analysis on social media for stock market prediction. In Proceedings of
the 53rd Annual Meeting of the Association for Computational Linguis-
tics and the 7th International Joint Conference on Natural Language
Processing (Volume 1: Long Papers), pages 1354–1364, Beijing, China,
July 2015. Association for Computational Linguistics.
[17] PeerJ. Image 1: The experimental flow chart of a proposed stock price
forecast model. [Online; accessed May 8, 2022].
[18] Priyank Sonkiya, Vikas Bajpai, and Anukriti Bansal. Stock price pre-
diction using bert and gan. ArXiv, abs/2107.09055, 2021.
[19] Nishant Verma, S G David Raj, Ackley J Lyimo, and Kakelli Anil
Kumar. Stock Market Prediction and Risk Analysis using NLP and
Machine Learning. International Journal of Engineering and Advanced
Technology (IJEAT), 9(5):813–815, June 2020.
11
[20] Yancong Xie and Hongxun Jiang. Stock market forecasting based on
text mining technology: A support vector machine method, 09 2019.
[21] Frank Z. Xing, Erik Cambria, and Roy E. Welsch. Natural language
based financial forecasting: A survey. Artif. Intell. Rev., 50(1):49–73,
jun 2018.
[22] Muxi Xu. Nlp for stock market prediction with reddit data. 2021.
[23] Jinjian Zhai, Nicholas Cohen, and Anand Atreya. Sentiment analysis
of news articles for financial signal prediction. CS224N Final Project, 4
2020.
12