0% found this document useful (0 votes)
49 views9 pages

Research Project Stelios Gavriel

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views9 pages

Research Project Stelios Gavriel

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Stock Market Prediction using Long Short-Term Memory

Stylianos Gavriel
University of Twente
P.O. Box 26, 7523SB Enschede
The Netherlands
s.gavriel@student.utwente.nl

ABSTRACT creation of many hybrid algorithms using core principles


Strategies of the stock market are widely complex and rely for the prediction of stock value [10]. However, models at
on an enormous amount of data. Hence, predicting stock the time period of the stock market crash of 2008 often
prices has always been a challenge for many researchers referred as the depression of 2008, demonstrated limita-
and investors. Much research has been done, and many tions at their abilities to forecast during periods of rapid
machine learning techniques have been developed to solve changing prices [25].
complex computational problems and improve predictive Furthermore, most studies are conducted using a single
capabilities without being explicitly programmed. This re- time scale feature of the stock market index, it is there-
search attempts to explore the capabilities of Long Short- fore reasonable for studying multiple time scale features to
Term Memory a type of Recurrent Neural Networks in the determine a more accurate model outcome. It is important
prediction of future stock prices. Long Short-Term Mem- to note that markets are affected by many elements such as
ory variations with single and multiple feature models are political, industrial development, market news, social me-
created to predict the value of S&P 500 based on the earn- dia and economic environments. One reason for the luck
ings per share and price to earnings ratio. of predictability is that appropriate variables to model are
unknown and hard to acquire.
Keywords 1.1 Research Question
Long Short-Term Memory, Market Prediction, Recurrent This research attempts to answer the following questions:
Neural Networks, Root Mean Square Error. RQ: To what extend can one find a more accurate Long
Short-Term Memory (LSTM) based method for stock mar-
1. INTRODUCTION ket prediction?
The stock market can be seen as the public marketplace, In order to answer this question an analysis of the input
where shares and other financial instruments are being data selection and prediction methods will be made. Con-
sold and bought everyday. Each share represents a por- sequently, the following two questions will be addressed.
tion of a company’s ownership, and S&P 500 constitutes
shares of the five hundred most important United States • RQ1: Can the prediction performance increase by
companies [19]. selecting a different combination of variables?
From the appearance of markets, investors explored ways • RQ2: Can the prediction performance increase by
to acquire more knowledge of the companies listed in the using other LSTM based features?
market, and further tried to keep up with the enormous
number of news feed in the world. With the increase 1.2 Main Contribution
of market size and the speed at which trades are exe- This research attempts to analyse the capabilities of Re-
cuted investors became less capable on relying on per- current Neural Networks using LSTM to predict future
sonal experience to identify market patterns. As technol- stock prices. A popular data-set from finance.yahoo.com,
ogy progressed, investors and researches have developed will be compared with an alternative data-set from multpl-
many techniques and various models to solve problems .com. Variations of LSTM based models will be trained
that arise. Examples of those techniques are statistical and evaluated. There are two main contributions of this
models [3, 26], machine learning methods [18], artificial research:
neural networks [31] and many more. The first generated
• A deep understanding of the S&P 500 databases.
trade procedures used historical data and can be traced
back to the early 1990s, focused on achieving positive re- • Customized deep learning methods based on LSTM,
turns with minimal risk [10]. In the 2000s major advances both for single and multiple features, aiming to ob-
in deep learning and reinforcement learning allowed for the tain a more accurate prediction model.

Permission to make digital or hard copies of all or part of this work for In the remaining of this paper included in Section 2 is the
personal or classroom use is granted without fee provided that copies background giving more insight into the stock markets and
are not made or distributed for profit or commercial advantage and that recurrent neural networks used in this research. Further,
copies bear this notice and the full citation on the first page. To copy oth- some related researches are elaborated in Section 3, the
erwise, or republish, to post on servers or to redistribute to lists, requires approach of the research is being explained in Section 4,
prior specific permission and/or a fee.
data used is analysed in Section 5 and experiments for the
34th Twente Student Conference on IT Jun. 29th , 2021, Enschede, The
Netherlands. LSTM variants are being presented in Section 6. Finally a
Copyright 2021, University of Twente, Faculty of Electrical Engineer- discussion and future work are being presented in Section
ing, Mathematics and Computer Science. 7.

1
2. BACKGROUND tional LSTM. The study concluded that the bidirectional
In order to perform research in the field of predicting stock and stacked LSTMs had better performance for short term
prices it is important to understand the features found in prices opposed to the long term prediction results. Fur-
the market, and the machine learning techniques that will ther, the results have shown that deep architecture out-
be used. In this section, firstly an indication of quantita- performed their shallow counter parts. Another example
tive data about S&P 500, and secondly recurrent neural is a study made in 2015 by Roondiwala et al. [27], which
networks are being elaborated. attempted to create a model based on LSTM for an accu-
rate read of the future market indices. The researchers fur-
2.1 Stock Market ther analysed various hyperparameters of their model and
The S&P 500 is a stock market index which measures the mainly the number of epochs used with various variable
performance of the five hundred largest companies in the combinations of the market. At the end they concluded
United States, such companies include Apple Inc., Mi- that using multiple variables (High/Low/Open/Close) re-
crosoft, Amazon.com and many more. A share is char- sulted to the least errors. Latter on, in 2020 Hao et al. [34],
acterized by a price which is available on the S&P 500 proposed a hybrid model based on LSTM and multiple
index [1]. Stock markets usually open during weekdays at time scale feature learning and compared it to other ex-
nine-thirty a.m. and close at four p.m eastern time. Many isting models. The study also compared models based
data-sets used for the prediction of prices include features on single time scale feature learning. Furthermore, de-
such as the Open, Close, High, Low, Adjusted Closing sign was made to combine the output representation from
price and Volume [17]. High and Low refer to the prices three LSTM based architectures. A study by David G.
of a given stock at its maximum and minimum of a day, McMillan [24] attempts to understand the variables proxy
respectively. Adjusted Closing refers to the closing price for changes in the expected future cash flow. It was con-
taking into account any corporate actions, which differs cluded that forecasting combinations outperform single fu-
from the raw closing price. Finally, Volume characterises ture models.
the amount of stocks sold and bought each day. Earnings Looking into the combinations of the features, the hyper-
per share (EPS) is an important measure, which indicates parameters and different LSTM variations would allow to
how profitable a company is [14]. Price to Earnings ra- better understand LSTMs and expand on the research of
tio (P/E) refers to the ration of the current stock price to this topic in general. From the related studies we expect
their EPS [13]. that deep architectures would outperform their shallow
counterparts and multiple feature combinations would per-
2.2 Recurrent Neural Networks form generally better than single features.
Recurrent Neural Networks (RNN) are a class of neural
networks specifically designed to handle sequential data.
There are two types of RNNs, the discrete-time RNNs and 4. METHODOLOGY
continue-time RNNs [35]. They are designed with a cyclic In this section included is an introduction with all the ma-
connection architecture, which allows them to update their jor steps which will be considered in this research project:
current state given the previous states and current input (i) data acquisition, (ii) data prepossessing, (iii) details
data. RNNs are usually artificial neural networks that about the RNN-based models, and (iv) the evaluation met-
consist of standard recurrent cells. These types of neural rics.
networks are known to be very accurate for solving prob- (i) Data used for this research will be extracted from
lems. It is specialized in processing a sequence of values multpl.com [2] and finance.yahoo.com [1].
χ1 ..χn where n is the total number of features, and χ are
the features, such as time-series data. Scaling of images (ii) Data will be normalized using a python library sklearn.
with large width and height, and processing images of vari- Feature scaling is a method to normalize the range of inde-
able size is also feasible to a large extent. Furthermore, pendent feature variables. Data is then split into a training
most RNNs are capable of processing sequences of variable and a testing set.
length. However, RNNs are lacking the ability to learn (iii) Data will be used to train the single time scale feature
long-term dependencies as it is illustrated in a research models over many iterations to predict each variable inde-
contacted by Yashoua et al [8]. Therefore, in order to han- pendently. As many traditional predicting models Closing
dle these long-term dependencies, in 1997 Hochreiter and Price will be used as a feature to train the control model.
Schmidhuber proposed a solution called Long Short-Term Another model will be then trained using the price from
Memory (LSTM) [16]. multpl.com as feature. After evaluating these methods,
multiple time scale feature models are created and trained.
3. RELATED WORK First a control model will be trained using the best possible
Since the evolution of artificial intelligence many attempted features of the standard data-set, which will be selected by
to combine deep learning and machine learning using core running tests for each combination of features and com-
principles. Artificial intelligence methods include convolu- paring their losses, and the label will be set to the Close
tional neural network, multi-layer perceptron, naive Bayes price. The stock price can be calculated with the equation
network, back propagation network, recurrent neural net- below (2) using EPS and PE creating a new feature Calcu-
work, single-layer LSTM, support vector machine and many lated Price. Multiple feature models will be then trained
more [12]. A study in 2018 by Sima et al. [30] has shown using the EPS, PE and Calculated Price as features and
a comparison between LSTM and ARIMA [4] a model the Price as the label, and further compared with the con-
used for analysing time series data. This study focused trol model. Finally a comparison of the traditional models
on implementing and applying financial data on an LSTM with the proposed multiple feature models will be made. A
which was superior to the ARIMA model. Further, a study standard dropout LSTM model will be optimized though
by Khaled A. Althelaya et al. in 2018 [5], evaluated the experimentation of hyperparameters and compared with
performance of bidirectional and stacked LSTM for stock other variants of LSTM.
market prediction. The performance of the tuned mod- (iv) Evaluation of the methods will be made using root
els where also compared with a shallow and an unidirec- mean squared error (1) and visuals will be created de-

2
picting predicted and real values, where N are the total
number of values, Yi is the predicted price value and Ŷi is
the real price.
v
u
u1 X N
RM SE = t (Ŷi − Yi )2 (1)
N i=1

Figure 1 is depicting the sequence of the process that is


followed for each model.

Figure 2. Real Price and Calculated Price (PriceCal) using


formula (2) as an introduced feature.
Figure 1. Process for each model

4.1 Long Short-Term Memory This section focuses on describing the original data col-
As mentioned in Section 2.2 Long Short-Term Memory lected, the processing steps, and the feature selection.
(LSTM) was introduced by Hochreiter and Schmidhuber
in 1997 [16] to cope with the problem of long-term depen- The first data-set is collected from finance.yahoo.com [1].
dencies. LSTM consist of a similar RNN architecture that Yahoo is one of the best resources for stock research be-
has been shown to outperform traditional RNN on numer- cause it is freely available and provides stock data from
ous tasks [16]. LSTM networks work extremely well with around the world. Yahoo provides approximately 1,822,800
sequence data and long-term dependencies due to their records of the S&P 500 index from 1927 to 2020. For
powerful learning capabilities and memory mechanisms. the purposes of this research, data of ten years is used
By introducing gates they were able to improve memory from 2010 to 2020, with a total number of 19,600 approx-
capacity and control the memory cell. One gate is dedi- imately records. The second data-set is collected from
cated for reading out the entries from the cell, the output multpl.com [2]. This website provides S&P 500 data not
gate. Another gate is needed to decide when data should only of the price index but of the price to earnings ra-
be read into the cell, this is called the input gate. Finally tio, earnings per share and dividend yield, to name a few.
a forget gate which resets the content of the cell. This There are approximately 5,400 records of monthly data,
design was used in order to decide when to remember and from April 1st, 1871 to January 28, 2021. Moreover, data
ignore inputs at the hidden state. A sigmoid activation of the last 120 years is used, with approximately 4,350
function computes the values of the three gates, these val- records. Needless to say that calculating the price gives
ues belong in the range of (0, 1), and represent the current values very similar to the real price. The formula (2) can
time step and hidden state of the previous time step. The be used to introduce another feature to the data-set. The
hidden states values are then calculated with a gated ver- graph in Figure 2 depicts the real price of S&P 500 from
sion of the tangent activation function of the memory cell multpl.com and the calculated price.
which take values in the range of (-1, 1) [37]. EP S × P/E = StockP rice (2)
4.2 Stacked Long Short-Term Memory
Stacked LSTMs are now a stable technique for challeng- 5.1 Datasets Basic Statistics
ing different sequential problems introduced by Graves et In order to understanding this data, numpy from python
al. in the paper of speech recognition in 2013 [15]. Exist- was used to calculate the mean and standard deviation
ing studies [22] have shown that LSTM architectures with for each feature. Table 1 shows some statistics, Figure 3
several hidden layers can build up higher level of repre- shows some box plots with the data collected from fi-
sentation of sequential data, therefore working more effec- nance.yahoo.com and data from multpl.com, with an extra
tively and with higher accuracy. Its architecture comprises calculated price using formula (2). Data in the boxplots
of multiple stacked LSTM layers, where the output of a and for the rest of the research will be scaled between zero
model’s hidden layer will be fed directly at the input of the and one. Figure 3 would allow to understand the distri-
subsequent hidden layer. Instead of the traditional multi- bution of numerical data and skewness through displaying
layer LSTM architecture where a single layer provides a the data quartiles and averages. We can hence observe
single output value to the next layer, stacked LSTM pro- that Open, High, Low, Close and Adjusted Close follow
vides a sequence of values. a very similar trend with the mean being almost iden-
tical. Moreover, Volume has a huge number of outliers,
4.3 Bidirectional Long Short-Term Memory that differ significantly from other observations or overall
A bidirectional LSTM (BiLSTM) invented in 1997 by Schus- Volume data has huge variations of numbers. The same
ter and Paliwal [29], is capable of getting trained with the can be said for PE, EPS, Price and the price calculated
sequence of data both forwards and backwards into two with formula (2). Machine learning are generally sensitive
separate recurrent networks which are connected into the to the distribution and range of values. Therefore, out-
same output layer [29, 6]. The idea is that you split the liers may mislead and spoil the training process resulting
state of neurons of a network in a part that is responsible in more losses and longer training times. A paper by Kai
for the forward states starting from a date frame of t=1 Zhang et al. in 2015 has concluded that outliers played
and a part for the backwards direction starting from t=T. a huge role at the performance of Extreme Learning ma-
chines (ELM) [38].
5. DATA In order to make Open, High, Low and Close more clear

3
Figure 3. A box plot for, Open, High, Low, Close, Adjusted Close and Volume

high correlation levels with the real Price, and should al-
Table 1. Data-set statistics, in terms of mean (µ) and stan- low for overall good results when used.
dard deviation (σ 2 ). The first six rows are data from fi-
nance.yahoo.com, while the next three rows are data from Statistics should be able to give us more insight into the
multpl.com. data and is generally considered an indispensable piece
Features Mean Standard Deviation to the field of machine learning. Understanding the data
Open 2570.66 432.82 and the characteristics of it is really important to finally
High 2583.49 435.41 come to a conclusion about certain results found in the
Low 2556.43 430.15 subsequent sections. In the next section I will execute
some experiments in order to select the best features that
Close 2570.89 432.77
could be applied on LSTMs.
Adj. Close 2570.89 432.77
Volume [×107 ] 383.20 95.51
6. EXPERIMENTS AND RESULTS
PE 16.13 9.25
In this section a model is constructed as a basis of testing
EPS 38.29 28.42 features, their combinations and model parameters.
Price 379.47 676.48
Calculated Price 678.29 704.70 6.1 LSTM Model Details
LSTMs in general are capable of coping with historical
data, hence they are really good candidates for stock pre-
diction. LSTMs can learn the order dependence between
items in a sequence and are known for their good per-
formance on sequence data-sets. For the purposes of se-
lecting the best combination of features, a dropout based
LSTM model (DrLSTM) with four hidden LSTM layers
and 50 units per hidden layer is trained and evaluated.
Each hidden LSTM layer has a subsequent dropout layer
and finally a dense layer is used to connect all the neurons
followed by the last dropout. Dropout is a technique which
selects neurons that will be ignored during training, this
means that their contribution to the activation of down-
stream neurons is temporarily removed. The structure of
the DrLSTM is found in Figure 7 of the appendix. The
DrLSTM is trained with windows of 60 previous days pre-
Figure 4. Ten days of S&P 500, Open, High, Low and Close. dicting the next day. Table 2 show the windows of days,
where X are the input arrays for the 60 days of data and
y are the predicted prices per day, the outcomes of the
Figure 4 is included. It is observed that Open and Close model for each array X and finally n is the total amount
fluctuate between High and Low, and the overall data is of days in the data-set.
following the same trend hence the high correlation levels
observed. The Calculated Price can be used as an extra
feature for prediction purposes. Table 2. Sliding window input (X, blue), the outcomes (y,
red), and n the number of days in total.
While Open, High, Low, Close and Adjusted Close price Days 1 2 3 ... 60 61 62 63 ... d
are almost identical they present some very minor differ- X1 y1
ences, which should in theory pose no huge effects for the X2 y2
selection process of the model. Adjusted Close price is X3 y3
identical to the feature Close, therefore for the purpose of
...
this research Adjusted Close will not be used. Figure 5
shows the correlation between features in heat maps. It is
noted that Volume has the least correlation between fea- 6.2 Feature Selection
tures. Further, Close and Adjusted Close have 100% cor- In order to perform the feature selection step I have done
relation supporting the previous statement of being iden- a grid-search using all future combinations. There are
d!
tical. Price is highly dependent on EPS, but surprisingly (r!(d−r)!)
of possible combinations for each data-set, where
less on P/E ratio. The Calculated Price as expected has d is the total number of possibilities and r is the number

4
Figure 5. Correlation Coefficient for the data-set of finance.yahoo.com.

of selections made. For each of the r values between two results.


to five selected and a total of five features, there are 27 to- I will further investigate some types of optimizers which
tal combinations of the data-set from finance.yahoo.com can contribute to the DrLSTM’s optimization process. Op-
including the single feature Close price. For the data-set timizers are algorithms used to change parameters of neu-
from multpl.com, there are a total of 12 combinations tak- ral networks such as weights and learning rate in order to
ing into account the Calculated Price as a feature. Results reduce losses [28]. Keras from python is used to create
of the trained DrLSTM model for each combination of the and train the DrLSTM where optimizers are one of the
data-sets are found in Table 4 and Table 5 of the appendix. two parameters required for compiling a model. There-
Based on this results, we can therefore conclude that a sin- fore, analysing the performance of optimizers in this sce-
gle feature DrLSTM is capable of performing surprisingly nario could potentially prove worthy. In order to run these
better than multiple feature combinations. For the selec- tests the same DrLSTM is used as a basis of the compar-
tion of two features, {High, V olume} and {EP S, P rice} ison. Figure 6 shows the results from: Adam [20], RM-
had the best results throughout the combinations. For a Sprop [32], SGD [33], Adadelta [36] and Adamax [20] op-
selection of three features {High, Low, Close} as well as timizers. In conclusion there is a remarkable difference of
{P E, P rice, CalculatedP rice} had low loss values. For a the Adam and the rest of the optimizers, hence for the rest
selection of four features {High, Low, Close, V olume} had of this research the Adam optimizer will be used. Adam
performed well while the rest of the data from multpl- .com was firstly introduced in 2014 by Kingma and Ba [20]. It is
did not perform equally or better than the rest of the com- an adaptive learning rate optimization algorithm generally
binations. Since {Close} had good results during feature performing well in a vast array of problems.
selection it is therefore used to run the rest of the tests to
optimize DrLSTM.
6.4 Model Variants
6.3 LSTM Model Hyperparameters Now that we have established some features and parame-
In this section I will attempt to make a selection of pa- ters, we can proceed into testing different variants of the
rameters opposing to the model’s loss. LSTM model. This comparison would allow us to find the
best LSTM variant throughout the models analysed in this
Model parameters such as neuron weights, are the fit- research paper.
ted parameters of a model that are being estimated and
learned from the training set. On the other hand, hy- We start with the DrLST model introduced in Section
perparameters which are adjustable and must be tuned in 6.1, we then proceed with a StLSTM and a shallow LSTM
order to obtain a model with optimal performance. There- model (ShLSTM) consisting of one LSTM hidden layer
fore, I will run some experiments to determine the opti- with 200 nodes and finally a bidirectional LSTM (BiL-
mal number of nodes, dropout probability and optimizers STM) model consisting of the same number of nodes. Ar-
for the best adequate performance of DrLSTM. Figure 6 chitectures of the model variants are included in Figure 7
shows the results for the number of nodes per layer with of the appendix. The tests are completed using the best
a static dropout probability of 0.2 for the DrLSTM model features which provided the least losses for every number
introduced in the previous section. Secondly Figure 6 also of combinations. Finally the optimizer used for testing is
depicts the results for a dropout probability range of 0.05 set to Adam. Table 3 depicts the losses from the tests
to 0.3 with 50 nodes per layer. From Figure 6, we can con- that have been performed, and Figure 8 of the appendix
clude that adding more nodes would lead to better results shows the graphs plotted using pyplot of python for the
in some cases. However since the time required to train best result of each model.
the DrLSTM with 150 nodes exceeding by far the process From Table 3 it is depicted that the DrLSTM had the least
for 50 nodes, and the results show insignificant difference performance throughout the models. A stacked LSTM
we will proceed the research with a selection of 50 nodes with a loss of 0.0247 has proven to perform better than
totaling 200 throughout. As with the dropout probability the model with dropouts, this is mainly caused by the ab-
we can observe that a decreased number of ignored nodes sent of dropout layers. Surprisingly ShLSTM seems to be
can potentially lead to better results. With this in mind slightly better than the previous models. This result seems
we expect that a stacked LSTM (StLSTM) architecture out of order since many researches have shown that deep
where the dropout layers are skipped, can lead to better recurrent networks usually outperform their shallow coun-

5
Figure 6. Losses for number of nodes, dropout probability and optimizers used respectively.

Table 3. RMSE losses for LSTM four layered model with Dropouts (DrLSTM), stacked LSTM (StLSTM), shallow LSTM
(ShLSTM) and bidirectional LSTM (BiLSTM).
Features/Models DrLSTM StLSTM ShLSTM BiLSTM
Close 0.0346 0.0247 0.0230 0.0224
High, Vol. 0.0408 0.0275 0.0238 0.0233
High, Low, Vol. 0.0356 0.0297 0.0231 0.0219
High, Low, Close, Vol. 0.0389 0.0574 0.0233 0.0252
Price 0.0552 0.0454 0.0346 0.0712
EPS, Price 0.0411 0.0682 0.0535 0.0651
PE, Price, Calc.Price 0.0507 0.0818 0.0374 0.1197

terparts [23]. Finally, the best performing model was the used and seemed to perform generally good. However,
BiLSTM which had a loss of 0.0219 with the use of mul- running more experiments while adjusting the Adam’s pa-
tiple features. In order to understand the representative rameters accordingly can provide improvements. More im-
losses with real prices, it would mean that at the aver- provement can be achieved also by looking into the depth
age closing price of 2570.89 a model with a loss of 0.0346 (number of layers) and width (number of nodes) for each
has a deviation of 124.73 dollars, and the best performing variant. The window span used to create the input data for
with a loss of 0.0219 has a deviation of 56.18 dollars. In the models could be tested for values more or less than 60
Section 7, I will be discussing the possible reasons for the days. Finding a more suitable window could also fix the
behaviours observed. lag observer of the predicted price from the real values.
A study by Salah Bouktif et al. [9], tried solving the lag
arising from time features, with a selection of appropriate
7. DISCUSSION AND FUTURE WORK lag length using a genetic algorithm (GA). A deviation of
While experimenting with DrLSTM, we have observed 56.18 dollars for short term transactions could seem high
that dropouts introduce a bottleneck in the adjustment since a stock index in general requires more than a couple
of the model’s parameters. In many machine learning of days to deviate significantly in order to minimize trad-
processes it is useful to know how certain the output of ing losses, therefore even the BiLSTM leaves much room
a model is. For example, a prediction is more likely to for improvement.
be closer to the actual price when an input is very sim-
ilar to elements of the training set [21]. The outputs of
a dropout layer are randomly ignored, therefore having 8. CONCLUSION
the effect of reducing the capacity of a network during This research paper attempts to forecast the S&P 500 in-
training. Requiring more nodes in the context of dropout dex using multiple LSTM variants while performing sev-
could potentially remove this bottleneck. In Figure 6 we eral experiments for optimization purposes. I trained the
have observed that increasing the nodes gives more posi- models with a popular data-set from finance.yahoo.com
tive outcomes. The StLSTM supports this argument since and a data-set from multpl.com. This paper has proven
dropout layers are absence hence the better performance. that a single feature selection has performed better in
A ShLSTM was the second most performing model, con- some instances while multiple features have proven ad-
trary to what I was expecting. A good reason is that the vantageous in BiLSTMs. The testing results conform that
200 nodes used to train the ShLSTM in one layer was the LSTM variants are capable of tracing the evolution
a much better fit for the data used contrary to the 50 of closing price for long term transactions leaving much
nodes per layer of deep counterparts. A book by Andrew room for improvement of daily transactions. This study
R. Barron in 1993 [7] gives more insight into the size of a gave insight into two different data-sets and analysed the
single-layer neural network needed to approximate various results of different variants of LSTM, which should allow
tasks. Furthermore, the BiLSTM had performed the best researches and investors to use and expand upon in the
throughout the experiments and could potentially be used future. Although one of the many machine learning tech-
for long term transactions in the stock market, however niques has been used in this research, there are many more
it leaves much room for improvement. Since the BiLSTM methods that can be broken down into two categories (sta-
passes the data-set twice it makes certain trends more visi- tistical techniques and artificial intelligence).
ble, adding more weight to certain neurons and extending
data usage [11]. LSTM architecture is mainly used for 9. ACKNOWLEDGMENT
long term dependencies, so it is generally good to have I would like to thank Dr. Elena Mocanu for helping me
more and more contextual information. during the execution of this research. I also appreciate the
In this research the default optimizer parameters where time she spend directing me to the right sources.

6
10. REFERENCES of the istanbul stock Exchange. Expert Syst.
[1] SNP, Dec 10, 2020: S&P 500 (^GSPC), Appl.,38, 5311–5319. 2011.
retrieved from: https://yhoo.it/3ikW3DM. [19] W. Kenton. SP 500 Index – Standard Poor’s 500
[2] multpl https://www.multpl.com/s-p-500-pe-ratio. Index, https://bit.ly/2LWBUYO, Dec 22, 2020.
[3] A. Adebiyi, A. Adewumi, and C. Ayo. Comparison [20] D. P. Kingma and J. Ba. Adam: A method for
of arima and artificial neural networks models for stochastic optimization, 2017.
stock price prediction. J. Appl. Math., 1–7. 2014. [21] A. Labach, H. Salehinejad, and S. Valaee. Survey of
[4] R. Adhikari and R. K. Agrawal. An introductory Dropout Methods for Deep Neural Networks. 25
study on time series modeling and forecasting, 2013. Octomber 2019.
[5] K. A. Althelaya, E. M. El-Alfy, and S. Mohammed. [22] Y. LeCun, Y. Bengio, and G. Hinton. “Deep
Evaluation of bidirectional lstm for short-and learning,” Nature, vol. 521, no. 7553, pp. 436–444.
long-term stock market prediction. In 2018 9th 2015.
International Conference on Information and [23] Y. Levine, O. Sharir, and A. Shashua. Benefits of
Communication Systems (ICICS), pages 151–156, Depth for Long-Term Memory of Recurrent
2018. Networks. 15 February 2018.
[6] P. Baldi, S. Brunak, P. Frasconi, G. Soda, and [24] D. G. McMillan. Which variables predict and
G. Pollastri. Exploiting the past and the future in forecast stock market returns? 2016.
protein secondary structure prediction BIOINF: [25] T. Moyaert and M. Petitjean. The performance of
Bioinformatics, p. 15. 1999. popular stochastic volatility option pricing models
[7] A. R. Barron. Universal approximation bounds for during the subprime crisis. Applied Financial
superpositions of a sigmoidal function. IEEE Economics. 21(14). 2011.
Transactions on Information Theory, 39(3):930–945, [26] P.-F. Pai and C.-S. Lin. A hybrid arima and support
1993. vector machines model in stock price forecasting.
[8] Y. Bengio, P. Frasconi, and P. Simard. The problem Omega, 33, 497–505. 2005.
of learning long-term dependencies in recurrent [27] M. Roondiwala, H. Patel, and S. Varma. Predicting
networks. In IEEE International Conference on stock prices using lstm. International Journal of
Neural Networks, pages 1183–1188 vol.3, 1993. Science and Research (IJSR), 6, 04 2017.
[9] S. Bouktif, A. Fiaz, A. Ouni, and M. Serhani. [28] S. Ruder. An overview of gradient descent
Optimal deep learning lstm model for electric load optimization algorithms. arXiv preprint
forecasting using feature selection and genetic arXiv:1609.04747, 2016.
algorithm: Comparison with machine learning [29] M. Schuster and K. Paliwal. Bidirectional recurrent
approaches †. Energies, 11(7):1636, Jun 2018. neural networks IEEE Transactions on Signal
[10] S. Chakraborty. Capturing financial markets to Processing, 45, pp. 2673-2681. 1997.
apply deep reinforcement learning, 2019. [30] S. Siami-Namini and A. S. Namin. Forecasting
[11] I. Chalkidis, M. Fergadiotis, P. Malakasiotis, and economics and financial time series: Arima vs. lstm,
I. Androutsopoulos. Neural contract element 2018.
extraction revisited. In Workshop on Document [31] T. J. Strader, J. J. Rozycki, T. H. Root, and Y. J.
Intelligence at NeurIPS 2019, 2019. Huang. Machine learning stock market prediction
[12] G. Ding and L. Qin. Study on the prediction of studies: Review and research directions, Journal of
stock price based on the associated network model International Technology and Information
of lstm. International Journal of Machine Learning Management, 28(3). 2020.
and Cybernetics, 11, 06 2020. [32] T. Tieleman and G. Hinton. Lecture 6.5—RmsProp:
[13] J. Fernando. Price-to-Earnings Ratio – P/E Ratio Divide the gradient by a running average of its
https://www.investopedia.com/terms/p/price- recent magnitude. COURSERA: Neural Networks
earningsratio.asp. Nov 13, for Machine Learning, 2012.
2020. [33] J. Yang and G. Yang. Modified convolutional neural
[14] J. Fernando. Earnings Per Share – EPS Definition network based on dropout and the stochastic
https://www.investopedia.com/terms/e/eps.asp. gradient descent optimizer. Algorithms, 11(3), 2018.
Nov 17, 2020. [34] H. Yaping and G. Qiang. Predicting the trend of
[15] A. Graves, N. Jaitly, and A. r. Mohamed. “Hybrid stock market index using the hybrid neural network
speech recognition with deep bidirectional lstm,” in based on multiple time scale feature learning.
Automatic Speech Recognition and Understanding Applied Sciences, 10(11), 2020.
(ASRU), 2013 IEEE Workshop on. IEEE, pp. 273– [35] Y. Yu, X. Si, C. Hu, and J. Zhang. A review of
278. 2013. recurrent neural networks: Lstm cells and network
[16] S. Hochreiter and J. Schmidhuber. Long short-term architectures. Neural Computation, 31(7):1235–1270,
memory. Neural Comput., 9(8):1735–1780, Nov. 2019.
1997. [36] M. D. Zeiler. ADADELTA: an adaptive learning rate
[17] J. Jagwani, M. Gupta, H. Sachdeva, and A. Singhal. method. CoRR, abs/1212.5701, 2012.
Stock price forecasting using data from yahoo [37] A. Zhang, Z. Lipton, M. Li, and A. Smola;. Dive
finance and analysing seasonal and nonseasonal into Deep Learning. 2020.
trend. pages 462–467, 06 2018.
[38] K. Zhang and M. Luo. Outlier-robust extreme
[18] Y. Kara, M. Acar, and Baykan. Predicting direction learning machine for regression problems, volume
of stock price index movement using artificial neural 151, part 3. pages 1519–1527, March 5 2015.
networks and support vector machines: The sample

7
APPENDIX Table 5 shows the results from the combination of fea-
tures collected from multpl.com. In contrast to Table 4 a
A. EXPERIMENTS AND RESULTS combination of two features performed the best.
A.1 Feature combinations A.2 LSTM Model Parameters
Table 4. RMSE losses from feature combinations. (to pre-
Table 6. RMSE for a number of nodes per layer.
dict Close values)
Nodes no. RMSE (Dropout 0.2)
Features RMSE Loss
25 0.0639
Close 0.0346 50 0.0346
Open - High 0.0446 75 0.0376
Open - Low 0.0573 100 0.0647
Open - Close 0.0616 125 0.0356
Open - Volume 0.0578 150 0.0329
High - Low 0.0513
High - Close 0.0505
High - Volume 0.0408 Table 6 shows the results of the tests performed to op-
Low - Close 0.0675 timize the DrLSTM for the number of nodes per layer.
Low - Volume 0.0412 It was observed that 150 nodes have performed the best
Close - Volume 0.0479 however the time required to train the DrLSTM was sig-
Open - High - Low 0.0497 nificantly higher than 50 nodes. In the discussion Section
Open - High - Close 0.0666 7 I expand on this observation.
Open - High - Volume 0.0506
Open - Low - Close 0.0755 Table 7. RMSE for a number of nodes per layer.
Open - Low - Volume 0.0623 Dropout Probability RMSE (Nodes 50)
Open - Close - Volume 0.0607 0.05 0.0274
High - Low - Close 0.0367 0.1 0.0314
High - Low - Volume 0.0356 0.15 0.0554
High - Close - Volume 0.0711 0.20 0.0346
Low - Close - Volume 0.0683 0.25 0.0705
Open - High - Low - Close 0.0432 0.30 0.0805
Open - High - Low - Volume 0.0455
Open - High - Close - Volume 0.0697
Open - Low - Close - Volume 0.0588 Table 7 shows the results of the tests that have performed
High - Low - Close - Volume 0.0389 in order to optimize the DrLSTM model, respectively with
Open - High - Low - Close -Volume 0.0548 the dropout probability of the layers. You can find more
information about the structure of the DrLSTM model in
Figure 7. It is observed that decreasing the dropout prob-
Table 4 shows the results from the combination of features ability would give better results. Therefore, the dropout
collected from finance.yahoo.com. Even though many ma- layers are creating a barrier to the DrLSTM’s process of
chine learning models have better results with a selection adjusting parameters during training.
of multiple features, in this research it was proven that a
single feature was capable of performing the better.
Table 8. RMSE for a number of nodes per layer.
Optimizers RMSE (50 nodes)
Table 5. RMSE losses from feature combinations(to predict (0.2 dropout)
Price). Adam 0.0346
Features RMSE Loss RMSprop 0.0847
Price 0.0552 SGD 0.0905
EPS - PE 0.5294 Adadelta 0.1198
EPS - Price 0.0411 Adamax 0.0528
EPS - Calculated Price 0.3440
PE - Price 0.1212
Table 8 shows the results of the tests that have performed
PE - Calculated Price 0.5054 in order to optimize the DrLSTM model, respectively with
Price - Calculated Price 0.0935 the optimizers that the DrLSTM uses to adjust the param-
EPS - PE - Price 0.0968 eters of the model during training. It was observed that
EPS - PE - Calculated Price 0.5305 the Adam optimizer is performing the best for the purpose
EPS - Price - Calculated Price 0.0916 of this research. In Section 7, I discuss the possibility of
PE - Price - Calculated Price 0.0507 adjusting the parameters of the optimizer, for simplicity
EPS - PE - Price - Calculated Price 0.0953 purposes this research used the default parameters.

8
A.3 Model Variants

(a) Dropout LSTM. (b) Stacked LSTM. (c) Bidirectional LSTM.

(d) Shallow LSTM.

Figure 7. Model variant architecture, (a) is for the dropout LSTM model (DrLSTM) which consists of 4 LSTM layers with
a dropout layer each. A stacked LSTM (StLSTM) (b) consists of the same four layers LSTM excluding the dropout layers.
The bidirectional LSTM (BiLSTM) (c) consists of a single forward and backward layer. The shallow LSTM (ShLSTM) in
graph (d) has a single LSTM 200 node layer.

(a) Dropout LSTM. (b) Stacked LSTM.

(c) Bidirectional LSTM. (d) Shallow LSTM.

Figure 8. Best results for each model depicting the actual price and the predicted price, (a) is for dropout LSTM model,
(b) is for stacked LSTM, (c) is for bidirectional LSTM and (d) is for shallow LSTM. In graph (a) it is noticeable that the
predicted values (in orange) and the real price (in blue) deviate and have a noticeable lag which is touched upon in Section
7. This lag is most noticeable in (a) but can be found in the rest of the graphs as well. In graph (c) it is noticeable that the
bidirectional LSTM had good performance, hence the darker color of the line.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy