Comsys2020 Paper
Comsys2020 Paper
net/publication/346526680
CITATIONS READS
22 1,929
1 author:
Samit Bhanja
Government General Degree College, Singur
15 PUBLICATIONS 120 CITATIONS
SEE PROFILE
All content following this page was uploaded by Samit Bhanja on 03 April 2021.
Abstract Recently, Deep Neural Network (DNN) architecture with a deep learn-
ing approach has become one of the robust techniques for time-series forecasting.
Although DNNs provide fair forecasting results for the time-series prediction, still
they are suffering from various challenges. Because most of the time-series data,
especially the financial time-series data are multidimensional, dynamic, and nonlin-
ear. Hence, to address these challenges, here, we have proposed a new deep learning
model, Stacked Long Short-Term Memory (S-LSTM) model to forecast the multi-
variate time-series data. The proposed S-LSTM model is constructed by the stacking
of multiple Long Short-Term Memory (LSTM) units. In this research work, we have
used six different data normalization techniques to normalize the dataset as the pre-
processing step of the deep learning methods. Here, to evaluate and analyze the
performance of our proposed model S-LSTM, we have used the multivariate finan-
cial time-series data, such as stock market data. We have collected these data from
two stock exchanges, namely, Bombay Stock Exchange (BSE) and New York Stock
Exchange (NYSE). The experimental results show that the prediction performance
of the S-LSTM model can be improved with the appropriate selection of the data
normalization technique. The results also show that the prediction accuracy of the
S-LSTM model is higher than the other well-known methods.
S. Bhanja
Government General Degree College, Singur 712409, Hooghly, India
e-mail: samitbhanja@gmail.com
A. Das (B)
Aliah University, New Town, Kolkata 700160, India
e-mail: adas@aliah.ac.in
1 Introduction
2 Literature Review
In the last few years, there are so many sincere efforts which have been made to
successfully predict the stock market. These efforts are broadly classified into two
categories, viz., statistical approach and soft-computing approach. Support Vector
Machine (SVM) [1] and Autoregressive Integrated Moving Average (ARIMA) [2] are
Deep Neural Network for Multivariate … 269
the two well-known statistical methods for time-series forecasting. These statistical
models can handle the nonlinear time-series data and exhibit a high degree of success
for the prediction of the univariate time-series data.
The Artificial Neural Network (ANN)-based models are the most popular soft-
computing approach for the time-series prediction. The Artificial Neural Network-
based models can perform a large variety of computational tasks faster than the
traditional approach [3]. Multilayer Perceptron (MLP) neural network, Back Propa-
gation (BP) [4, 5] neural network, etc. are the popular ANN models. These models
are successfully applied to solve the various problems, viz., classification problems,
time-series forecasting problems, etc. These ANN models are not suitable for the
large volume of highly dynamic, nonlinear, and complex data.
Nowadays, Deep Neural Networks (DNNs) [6–9] exhibit its great success in a
wide range of application areas, including multivariate time-series forecasting. The
basic difference between the shallow neural networks and the deep neural networks
is that the shallow networks have only one hidden layer whereas the deep neural
networks have many hidden layers. These multiple hidden layers allow the DNNs to
extract the complex features from the large volume of highly dynamic and nonlinear
time-series data.
In recent times, Recurrent Neural Networks (RNNs) [10, 11] have become one of
the most popular DNN architectures for the time-series classification and forecasting
problems. In RNN, output of one time stamp is considered as the input of the next
time stamp and for these timestamp concepts, it is most suitable for the processing of
the time-series data. But RNNs suffered from the vanishing gradient and exploding
gradient problems. For these problems, it cannot represent the long-term dependen-
cies of the historical time-series data. The Long Short-Term Memory (LSTM) [12,
13] is a specialized RNN that overcomes the shortfalls of the traditional RNNs.
3 Basic Concepts
When a neural network has two or more hidden layers, then it becomes a Deep
Neural Network (DNN). The most common neural networks, viz., Multilayer Per-
ceptron (MLP) or feedforward neural networks with two or more hidden layers are
the representatives of DNN models. DNN models are the basis of any deep learn-
ing algorithms. These multiple hidden layers allow the DNN models to capture the
complex features from the large volume of the dataset. It also allows the DNN to pro-
cess nonlinear and highly dynamic information. In recent times, a number of DNN
models have been proposed. Out of these large numbers of DNNs, Recurrent Neural
Network (RNN) is one of the most popular DNN models to process time-series data.
270 S. Bhanja and A. Das
RNN [10] is one of the most powerful Deep Neural Network (DNN) models that
can process the sequential data. It was first developed in 1986. Since it performs
the same set of operations on every element of the sequential data, it is called the
recurrent neural network. As per the theory, it can process very long sequences of
time-series data, but in reality, it can look only a limited number of steps behind.
Figure1 represents the typical architecture of RNN and its expanded form.
In RNN, following equations are used for the computational purpose:
h t = f (U xt + W h t−1 ) (1)
Ot = softmax(V h t ) (2)
where h t and xt are, respectively, the hidden sate and the input at the time stamp t,
Ot is the output at the time stamp t, and function f is a nonlinear function, viz., tanh
or ReLU .
The basic difference between the traditional DNNs and the RNN is that RNN
uses the same set of parameters (U, V, W as above) for all the steps. This parameter
sharing drastically reduces the total number of memorizable parameters of the model.
If a series of data are collected over a fixed time intervals, then that dataset is called
the time-series data. If every data points of time-series dataset, in spite of a single
value, it is a set of values, then that type of time-series data is called the multivariate
time-series data. There are numerous application areas present where multivariate
time-series data are present, viz., weather, pollution, sales, stocks, etc. and these
data can be analyzed for the forecasting purpose [14, 15]. The general format of the
time-series data is as follows:
where x(t) is current value and x(1) is the oldest value. If X is multivariate time-
series data then every data point x(i) will be a vector of a fixed-length k. So, x(i) =
{xi,1 , xi,2 , ..., xi,k }.
The efficiency of any DNN models is heavily dependent on the normalization meth-
ods [16]. The main objective of the data normalization is to generate quality data
for the DNN model. The nonlinear time-series data, especially the stock market data
fluctuates over a large scale. So, the data normalization is essential to scale down
the data to a smaller range to accelerate the learning process of the DNN models.
Although there are different numbers of data normalization techniques are available,
in all of these techniques, each input value a of each attribute A of the multivariate
time-series data is converted to anorm to the range [low, high]. Some of the well-known
data normalization techniques are described below.
Here, the data scale down to a range of [0, 1] or [–1, 1]. The formulae for this method
are as follows:
(high − low) ∗ (a − min A)
anorm = (4)
max A − min A
where min A and max A are, respectively, the smallest and the largest values of the
attribute A.
In this method, all the values of each attribute are converted to the complete frac-
tional number by moving the decimal points of each value. And this decimal point
movement is done based on the maximum value of each attribute.
a
anorm = (5)
10d
where d is the number of digits present in the integer part of the biggest number of
each attribute A.
272 S. Bhanja and A. Das
In this normalization method, all the values of each attribute A are scaled down to
a common range of 0 and standard deviation of that attribute. The formulae are as
follows:
a − μ(A)
anorm = (6)
δ( A)
where μ(A) and δ(X ) are, respectively, the mean value and the standard deviation
of the attribute A.
In this method, all the values of each attribute A is normalized by the following
formulae: a
anorm = (7)
meadian (A)
In this technique, the sigmoid function is used to normalize all the values of each
attribute A. The formulae are as follows:
1
anorm = (8)
1 − e−a
This method is developed by Hample. Here, the data normalization is done by the
following formulae:
� � � �
0.01 ∗ (a − μ)
anorm = 0.5 tanh +1 (9)
δ
where μ is the mean value of the attribute A and δ is the standard deviation of the
attribute A.
Deep Neural Network for Multivariate … 273
In this section, we have described the overall architecture of our proposed DNN
model, named stacked Long Short-Term Memory (S-LSTM) model. Figure 2 shows
the detailed architecture of our proposed model S-LSTM. The basic building blocks
of the S-LSTM model is the LSTM unit. The main reason for the selection of the
LSTM unit over RNN is that RNN suffers from the vanishing gradient and exploding
gradient problem and due to these problems RNN is not capable to learn the features
from the long sequences of the historical time-series data. On the contrary, the LSTM
unit has the gated structure and due to this gated structure, it can extract the features
from the long sequences of the historical data. The key part of the LSTM unit is its
memory cell (cell state). This memory cell comprises three gates, viz., input gate,
forget gate, and output gate. The basic gated structure of the LSTM unit is shown in
Fig. 3 [9].
In this research work, we have done all the experiments by MATLAB R2016b with
Neural Network Toolbox.
Here, as the performance metric, we have used the Mean Absolute Error (MAE)
and Mean Squared Error (MSE). The formulae for calculating these errors are as
follows:
k
1�
MAE = (|oi − pi |) (10)
k i=1
k
1�
MSE = (oi − pi )2 (11)
k i=1
where the number of observation is k. oi and pi are, respectively, the actual value
and the predicted value.
In Tables 1 and 2, we represent the different prediction errors (MSE and MAE)
of the proposed model for each data normalization method of BSE and NYSE data,
respectively. Figures 4 and 5 graphically show the foretasted closing price of BSE
Deep Neural Network for Multivariate … 275
and NYSE for each data normalization technique. In Table 3, we have compared
our proposed model S-LSTM with the other popular models concerning with their
prediction errors (MSE and MAE).
From Tables 1 and 2, we can observe that the prediction errors are varied with the
different normalization methods and the Tanh estimator produces lower prediction
errors for both the prediction of BSE and NYSE indices compared to the other
276 S. Bhanja and A. Das
normalization methods. Figures 4 and 5 also show that the Tanh estimator data
normalization method produces better forecasting results. It is quite clear from Table 3
that our proposed model (S-LSTM) exhibits the smallest forecasting errors (MSE
and MAE) compared to other well-known models.
7 Conclusion
In this work, we have proposed a deep neural network model S-LSTM for forecasting
the multivariate time-series data. Moreover, we have also tried to find out the most
suitable data normalization method for the deep neural network models. Here, as a
case study, we have used BSE and NYSE historical time-series data for multivariate
time-series forecasting purposes. From Tables 1 and 2 and also from Figs. 4 and
5, we can conclude that the Tanh estimator data normalization method is the best
normalization method for deep neural network models. From all these observations,
View publication stats
we can draw the conclusion that our proposed deep neural network model S-LSTM
has outperformed all other well-known models for the forecasting of the BSE and
NYSE data.
In the future, we also want to analyze our proposed model for the forecasting of
different multivariate time-series data, such as weather, pollution, etc.
References
1. Meesad, P., Rasel, R.I.: Predicting stock market price using support vector regression. In: 2013
International Conference on Informatics, Electronics and Vision (ICIEV), pp. 1–6. IEEE (2013)
2. Rodriguez, G.: Time series forecasting in turning processes using arima model. Intell. Distrib.
Comput. XII 798, 157 (2018)
3. Sulaiman, J., Wahab, S.H.: Heavy rainfall forecasting model using artificial neural network for
flood prone area. In: IT Convergence and Security 2017, pp. 68–76. Springer (2018)
4. Werbos, P.J., et al.: Backpropagation through time: what it does and how to do it. Proc. IEEE
78(10), 1550–1560 (1990)
5. Lee, T.S., Chen, N.J.: Investigating the information content of non-cash-trading index futures
using neural networks. Expert Syst. Appl. 22(3), 225–234 (2002)
6. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117
(2015)
7. Du, S., Li, T., Horng, S.J.: Time series forecasting using sequence-to-sequence deep learning
framework. In: 2018 9th International Symposium on Parallel Architectures, Algorithms and
Programming (PAAP), pp. 171–176. IEEE (2018)
8. Cirstea, R.G., Micu, D.V., Muresan, G.M., Guo, C., Yang, B.: Correlated time series fore-
casting using multi-task deep neural networks. In: Proceedings of the 27th ACM International
Conference on Information and Knowledge Management, pp. 1527–1530. ACM (2018)
9. Bhanja, S., Das, A.: Deep learning-based integrated stacked model for the stock market pre-
diction. Int. J. Eng. Adv. Technol. 9(1), 5167–5174 (2019). October
10. Williams, R.J., Zipser, D.: A learning algorithm for continually running fully recurrent neural
networks. Neural Comput. 1(2), 270–280 (1989)
11. Shih, S.Y., Sun, F.K., Lee, H.Y.: Temporal pattern attention for multivariate time series fore-
casting. Mach. Learn. 108(8–9), 1421–1441 (2019)
12. Bengio, Y., Simard, P., Frasconi, P., et al.: Learning long-term dependencies with gradient
descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
13. Sagheer, A., Kotb, M.: Time series forecasting of petroleum production using deep lstm recur-
rent networks. Neurocomputing 323, 203–213 (2019)
14. Hsu, C.M.: Forecasting stock/futures prices by using neural networks with feature selection.
In: 2011 6th IEEE Joint International Information Technology and Artificial Intelligence Con-
ference, vol. 1, pp. 1–7. IEEE (2011)
15. Tang, Q., Gu, D.: Day-ahead electricity prices forecasting using artificial neural networks. In:
2009 International Conference on Artificial Intelligence and Computational Intelligence, vol. 2,
pp. 511–514. IEEE (2009)
16. Nayak, S., Misra, B., Behera, H.: Impact of data normalization on stock index forecasting. Int.
J. Comput. Inform. Syst. Ind. Manag. Appl. 6(2014), 257–269 (2014)
17. Yahoo! finance (June 2019). https://in.finance.yahoo.com/quote/%5EBSESN/history?p=
%5EBSESN
18. Yahoo! finance (June 2019). https://finance.yahoo.com/quote/%5ENYA/history/