Analysis of Machine Learning Methods For
Analysis of Machine Learning Methods For
1 Introduction
Predicting the future price of stock indices has been shown to be an extremely
challenging endeavor, largely due to the noisy and non-stationary characteristics
of their time series [4]. Several approaches have been investigated in forecasting
stock indices, eg. [3], [13]. Statistical approaches such as the linear autoregres-
sive Integrated Moving Average (ARIMA) were used in forecasting the monthly
stock price of the S&P 500 [2]. Deep learning architectures such as the LSTM
implemented in [1] and Convolutional Neural Networks (CNN) as used in [11] are
some of the non-linear models that have shown promise in this domain. These
research initiatives provide evidence that using more sophisticated models can
deliver better results.
Motivation. Stock market indices such as the Standard and Poor’s 500
(S&P500) have been shown, through the Granger-causality test, to have predic-
tive power as a leading indicator of the economy [6]. Therefore, accurate forecasts
of stock market indices will help economic policy makers reach more informed
conclusions as regards the right economic policies for desired economic outcomes.
Institutional investors will also benefit from accurate forecasts of a stock market
?
This publication has emanated from research conducted with the financial support
of Science Foundation Ireland (SFI) and the Department of Agriculture, Food and
Marine on behalf of the Government of Ireland under Grant Numbers 16/RC/3835
and SFI/12/RC/2289-P2.
Copyright 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
2 Onibonoje et al.
index, as these predictions will help inform the portfolio optimization process
across different financial asset classes.
Contribution. This project attempts to forecast the univariate time series
of four major stock indices, namely, the S&P 500, Dow Jones Industrial Aver-
age (DJIA), Euro Stoxx 50 (Stoxx50E) and the National Association of Securi-
ties Dealers Automated Quotation (NASDAQ) exchange. We applied three deep
learning algorithms, namely, LSTM, CNN and CNN-LSTM on the daily closing
prices for each index. In addition to investigating and understanding the efficacy
of neural network architectures in time series forecasting, our research attempts
to examine the merits of using de-noising techniques such as wavelet transform
and Kalman filters on these financial time series. Novel ensemble approaches
composed of these de-noising techniques and the neural network models were
introduce in this paper were developed in a bid to improve the accuracy of the
time series forecasts of our baseline model.
Paper Structure. The remainder of this paper is structured as follows: in
section 2, we provide an overview of related research; in section 3, we examine the
theoretical concepts and models that underpin our implementation models; in
section 4, we present our methodology for detecting the best model configuration
for day ahead forecasts of stock Indices; In section 5, we present our evaluation
and discuss our findings; and finally, in section 6, we conclude the paper.
2 Related Research
In [1], the authors applied Deep Neural Networks to predict one-month ahead
stock returns of stocks in the MSCI Japan Index. Their approach used 25 funda-
mental analysis factors for each stock in the cross-section of the Japanese stock
market. The experimental results, which were evaluated using Rank Correlation,
Directional Accuracy and Mean Square Error (MSE), showed that Deep Neu-
ral Networks outperformed other models including Support Vector Regression
(SVR), Random Forest(RF) and Shallow Neural Network models with a 30%
average uplift using the rank correlation metric and a 2.6% reduction in MSE.
The efficacy of Deep Belief Networks (DBN) was investigated in [9] with
Technical Analysis Indicators as features and a 2-Dimensional Principal Com-
ponent Analysis (PCA) model in predicting the S&P 500. Three models were
formulated and evaluated using the RMSE metric in order to properly evaluate
the usefulness of the Technical Analysis Indicators. The first model is composed
of a Back Propagation Neural Network (BPNN) and the basic features in the
raw dataset while the second model is composed of a DBN, basic features and
extracted Technical Indicator features. The final model adds the complexity of
a 2-Dimensional PCA to the previous model. The experimental results indicate
that Technical Indicators coupled with PCA can help improve the predictive
power of Deep Learning Algorithms. The final model had a 43.5% reduction in
RMSE in comparison to the first model and a 16.91% reduction in RMSE when
compared to the second model.
Analysis of Machine Learning Methods for Predicting Stock Prices 3
3 Background Models
In this section, we provide a brief overview of the models used in our evaluation
and the 2 denoising techniques used in an attempt to improve model perfor-
mance.
Recurrent Neural Networks (RNN) maintain an internal loop that allows for
information persistence. The output of a RNN is used in conjunction with the
current element in the input tensor, to compute the next element in the output
sequence[10]. In more simplistic RNN models, the memory unit or state of the
RNN is often equivalent to the previous output while other complex models have
different values for the state and the previous element in the output sequence.
4 Onibonoje et al.
LSTMs [14] address the long term dependency problem of RNNs by intro-
ducing three gate structures namely the forget gate F , input gate I, and output
gate O shown in Fig. 1. The forget gate function in equation 2 takes as input the
previous state h(t−1) and the current input vector X(t) and passes these inputs
into a sigmoid function which returns a value between 1 and 0 that represents
the amount of information to flow through the gate.
∞ ∞
t−b
Z Z
1 da
f (x) = W f(a,b) Ψ db (8)
CΨ 0 −∞ a a2
The father (ϕ) and mother (Ψ ) wavelets have the properties [7] shown in
equation 9.
Z Z
φ(t)dt = 1 ψ(t)dt = 0 (9)
When a given signal is decomposed into the approximation and detail coeffi-
cients using a Discrete Wavelet Transform (DWT) at j-level, the father wavelets
and mother wavelets can be represented as in equation 10 and 11.
−1
ϕ(j,k) (t) = 2( 2 ) ϕ(2(−j) − k) (10)
−1
Ψ(j,k) (t) = 2( 2 ) Ψ (2(−j) − k) (11)
Here, we applied the Haar wavelet to de-noise our input financial time series.
The Haar wavelet is computationally more efficient than other mother wavelets
and has shown to be capable of improving results in this domain [4].
Kalman Filters estimate the state of a system given measurements with expected
errors. Their efficiency in making time series forecasts make them widely applied
in time series analysis and real-time applications [21]. A linear Gaussian model
for the state and observation of a measured process is shown in equations 12 and
13, where xt is the real value at a given time t for the measured system and yt is
the measured value at t.
xt = F × xt−1 + B × xt + wt (12)
x t = A × x t + vt (13)
In order to determine the real state of the system at a given time t, there are
three functional components. The first component, F ×xt−1 , shows the functional
relationship between the value of the previous state xt−1 and the current state
xt . The second component, B × ut , is an external force term [21]. The third
component, wt is a stochastic term which captures dynamics not present in the
previous state. The measured value, yt , is determined by applying a function to
the real value of the current state, A × xt , and adding a white Gaussian noise
vt . The Kalman filter forecasts the future value using equation 14, where Kt is
the Kalman gain.
The Kalman Filter recursively iterates between the prediction and filtering
phase [8] with the prediction phase described by equations 15, 16, and the fil-
tering phase by equations: 17, 19, where Pt the estimate of the state covariance;
R is the measurement error variance; and Q is a tunable hyper-parameter for
improving the performance of the model.
x̂−
t = F × x̂t−1 + B × ut (15)
P̂ − T
t = F × Pt−1 × F + Q (16)
x̂t = x̂− −
t + Kt × (yt − A × x̂t ) (17)
Pt = (I − Kt × A) × Pt− (18)
The Kalman gain, Kt , attempts to determine the relative importance of the
measured error of the estimate when compared to the error of the real value.
The computation of the Kalman gain is described as follows [21] and shown in
equation 19.
4 Methodology
In this research, with the exception of our baseline ARIMA model, we relax the
stationarity condition for the financial time series in our neural network models
as in [15,24]. The implementation logic for our NN models was adapted from the
approach presented in [5].
Data Preprocessing. The dataset used in this research was obtained from
Yahoo! Finance from 01-01-2004 to 31-12-2019 for the following stock indices:
NASDAQ 100 (NDX), S&P 500, Euro Stoxx 50 and the Dow Jones Industrial
Average. The daily closing price series for each index was transformed into a 1-
D tensor composed of 100 successive daily values and mapping our independent
variable x̂ = xt+1 , xt+2 . . . xn to the corresponding dependent variable ŷ = xn+1 .
The values of the independent variable were standardized using Min-Max nor-
malization to have values within the range of (0,1).
Baseline Model ARIMA was selected as a baseline model where the result is
a benchmark to measure the improvement in forecast accuracy from other more
sophisticated models. The steps are as follows:
– Step 1: The dataset is split into a training set and a test set using a 70:30
ratio;
– Step 2: The hyper-parameters, p,d,q are grid searched on the training set to
produce the optimal model with the lowest Mean Squared Error;
8 Onibonoje et al.
– Step 3: predictions from the ARIMA model are compared with the values in
the test set and evaluated using the MSE.
Wavelet Transform - Convolutional Neural Network (WT-CNN).
The specification of this sequential CNN architecture is as follows: Layer 1 is
a 1-D convolutional layer with 3 filters and 3 kernels with the Rectified Linear
Unit (ReLU) as the activation function. Layer 2 is also a 1-D convolutional
layer that has identical hyper-parameters to the preceeding layer. Layer 3 is
a 1-D maximum pooling layer followed by a 4th layer which flattens the two
dimensional tensor into a vector fed into a fully connected layer, not unlike the
data model approach taken in [12]. The loss function which the model optimizes
using the Adam Optimizer is the Mean Squared Error (MSE). The following
steps were followed for the implementation of the WT-CNN model:
– Step 1: The time series is de-noised by Haar Wavelet with soft thresholding.
– Step 2: The de-noised time series is scaled to the range {0,1}. This is to allow
the CNN to converge faster.
– Step 3: The dataset is divided into a training and test set using a 70:30 split.
– Step 4: Both the training and test sets are converted to a supervised learning
problem with 100 past sequences representing the independent variable used
to predict the next value in the sequence.
– Step 5: The training and test input tensors are reshaped to have the following
dimensions: [samples, timesteps, f eatures].
– Step 6: The input tensors are fed into the CNN and the network is trained
over 100 epochs.
– Step 7: The predictions generated from the CNN are standardized to their
normal range and then compared to the test set to generate values for the
evaluation metrics.
WT-CNN-Long Short Term Memory (WT-CNN-LSTM). The spec-
ification of the sequential LSTM architecture is as follows: Layer 1 is a 1-D
convolutional layer with 64 filters and a kernel with the Rectified Linear Unit as
the activation function. Layer 2 is a 1-D maximum pooling layer with a pooling
size of 2, followed by a layer which flattens the two dimensional tensor into a vec-
tor fed into a LSTM layer, with 50 neurons and a ReLu activation function. The
LSTM layer outputs a tensor to a fully connected layer. Once again, the MSE loss
function is optimized using the Adam Optimizer. The steps followed in the im-
plementation of the WT-CNN-LSTM model mirror those of the WT-CNN model
with the major differences being the change in dimension of the input tensors
from [samples, timesteps] into [samples, subsequences, timesteps, f eatures].
WT-LSTM. The specification of the sequential LSTM architecture is as
follows: the first three layers are LSTM layers with 50 neurons and the final
layer is a fully connected layer. The implementation logic of the WT-LSTM is
not too different from previous models, where the major difference being the
dimension of the input tensors: [samples, timesteps].
Kalman Filter-LSTM (KF-LSTM). In this model, the input time series
is first passed into the Kalman filter as a 1-D tensor. The output of the de-noising
Analysis of Machine Learning Methods for Predicting Stock Prices 9
process returns a 2-D tensor that is reshaped into a 1-D shape before it is fed
into an LSTM network.
5 Evaluation
5.1 Results
Table 3. Neural Network Model Performance after Denoising using Wavelet Transform
Using Table 3, we find similar patterns to those which were obtained in Table
2 with respect to LSTMs outperforming other models for stock indices such as
the S&P 500 and DJIA. For the best forming neural network models on stock
indices such as the STOXX50E and S&P 500, wavelet transform delivered similar
results with the ARIMA model. The worst model performance came from the
CNN-LSTM model with the exception of the STOXX50E dataset.
Table 4. Neural Network Model Performance after Denoising using Kalman Filters
In many tasks involving time series analysis, CNNs generally outperform LSTMs
but this was not the case for the evaluated results of the time series forecasts
presented in Table 4. Our assumption is that this is due to the two-dimensional
architecture of the network that allows it to capture both spatial and temporal
information inherent in the time series.
We found that using wavelets to remove noise in our univariate time series
to be inconclusive. On one hand, predictions using the Dow Jones index showed
a 20% average reduction of in forecast error when de-noised using wavelet trans-
forms. However, predictions using the NDX and Stoxx50E indices showed no
substantial reduction in forecast error. We interpret that the application of Haar
wavelet decomposition may not be suitable for all types of financial time series as
suggested by [17]. Indeed, each financial time series may require a unique mother
wavelet for its decomposition. With the exception of models that used Kalman
filtering, no model could consistently beat the (baseline) ARIMA model in fore-
casting for each index. These results show that increasing model complexity
does not guarantee improved performance. Many studies show that statistical
Analysis of Machine Learning Methods for Predicting Stock Prices 11
6 Conclusions
In this paper, we applied three deep learning algorithms, LSTM, CNN, and CNN-
LSTM to forecast the univariate time series of four stock indices; S&P 500, Dow
Jones Industrial Average, Euro Stoxx 50 and the Nasdaq Exchange. Initially,
we attempted to forecast the future prices of the stock indices using Neural
Networks; we then investigated the efficacy of Discrete Wavelet Transforms,
particularly the Haar Mother Wavelet to de-noise the input financial time series;
and finally, we investigated the use of Kalman Filters and discovered better
performance when compared to the wavelet transform approach. Results were
evaluated using the RMSE and MAE metrics. While our evaluation does provide
support for ARIMA, for forecasting using time series, we believe that our results
using some de-noising techniques suggest that other approaches may outperform
ARIMA, given the appropriate experimental configurations.
Our current work is focused on the exploration of other input features to
enhance the performance of the neural network models: configuring the type of
mother wavelet applied, decomposition level and window size may well deliver
improved performance. We are also seeking to investigate the optimal configu-
rations of DWT for more frequent observations of these time series.
References
1. M. Abe and H. Nakayama, ”Deep Learning for Forecasting Stock Returns in the
Cross-Section”, ArXiv, 2018. [Accessed 14 August 2020].
2. A. Ariyo, A. Adewumi and C. Ayo, ”Stock Price Prediction Using the ARIMA
Model”, 2014 UKSim-AMSS 16th International Conference on Computer Modelling
and Simulation, 2014. Available: 10.1109/uksim.2014.67 [Accessed 1 August 2020
3. Ken Bailey, Mark Roantree, Martin Crane, and Andrew McCarren. Data Mining in
Agri Warehouses Using MODWT Wavelet Analysis. 23rd Intl. Conf. on Information
and Software Technologies, pp. 241-253, Springer, 2017.
4. W. Bao, J. Yue and Y. Rao, ”A deep learning framework for financial time series
using stacked autoencoders and long-short term memory”, PLOS ONE, vol. 12, no.
7, p. e0180944, 2017.
5. J. Brownlee, Deep Learning for Time series Forecasting. 1st edn. 2020.
6. B. Comincioli, ”The Stock Market As A Leading Indicator: An Application Of
Granger Causality”, University Avenue Undergraduate Journal of Economics, vol.
1, no. 1, 1996. Available: https://digitalcommons.iwu.edu/uauje/vol1/iss1/1.
12 Onibonoje et al.