Research Article
Research Article
Complexity
Volume 2020, Article ID 8285149, 16 pages
https://doi.org/10.1155/2020/8285149
Research Article
Modeling Traders’ Behavior with Deep Learning and Machine
Learning Methods: Evidence from BIST 100 Index
1
Sabanci School of Management, Sabanci University, Istanbul, Turkey
2
Computer Engineering Department, Yildiz Technical University, Istanbul, Turkey
3
Computer Engineering Department, Istanbul Medipol University, Istanbul, Turkey
Received 23 December 2019; Revised 15 April 2020; Accepted 27 May 2020; Published 29 June 2020
Copyright © 2020 Afan Hasan et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Although the vast majority of fundamental analysts believe that technical analysts’ estimates and technical indicators used in these
analyses are unresponsive, recent research has revealed that both professionals and individual traders are using technical in-
dicators. A correct estimate of the direction of the financial market is a very challenging activity, primarily due to the nonlinear
nature of the financial time series. Deep learning and machine learning methods on the other hand have achieved very successful
results in many different areas where human beings are challenged. In this study, technical indicators were integrated into the
methods of deep learning and machine learning, and the behavior of the traders was modeled in order to increase the accuracy of
forecasting of the financial market direction. A set of technical indicators has been examined based on their application in
technical analysis as input features to predict the oncoming (one-period-ahead) direction of Istanbul Stock Exchange (BIST100)
national index. To predict the direction of the index, Deep Neural Network (DNN), Support Vector Machine (SVM), Random
Forest (RF), and Logistic Regression (LR) classification techniques are used. The performance of these models is evaluated on the
basis of various performance metrics such as confusion matrix, compound return, and max drawdown.
computational intelligence such as machine learning and learning and machine learning methods may differ
data mining have also been used to analyze financial according to the inefficiencies of the markets [28]. This study
information. investigates the case of Stock Exchange Istanbul and
One of the main objectives of machine learning methods emerging markets. Another contribution of this study is the
is to find hidden patterns in the data by using automatic or use of threshold values to control transaction costs in fi-
semiautomatic methods. Useful patterns allow us to make nancial estimates. In some studies, transaction costs are not
meaningful estimates on new data [9]. Machine learning covered, although estimates seem profitable. It is known that
techniques used in real life, such as time series analysis [10], when transaction costs are included, profitability may dis-
communication [11], Internet traffic analysis [12], medical appear [23]. To avoid this problem, the threshold level is
imaging [13], astronomy [14], document analysis [15], and dynamically adjusted according to the standard deviation of
biology [16], have demonstrated impressive performance in the profit distribution, and optimal values are selected to
solving classification problems. While the vast majority of reduce the number of transactions in order to increase the
previous financial engineering research focuses on complex return (profit on an investment) per transaction. Accord-
computational models such as Neural Networks [17–20] and ingly, the aim is to create profitable operations in the long
Support Vector Machines [21, 22], there is also research run with the right combination of parameter values and
based on new deep learning models that yield better results property selection of the training set size.
in nonfinancial applications [23, 24]. The structure of the paper is organized as follows. Section
Deep learning is one of the machine learning methods 2 provides the related work and similar studies of deep
that use past data to train models and make predictions from learning and machine learning in making decisions on
new data. Recent developments in deep learning have market direction. Section 3 briefly describes the method-
allowed computers to recognize and tag images, recognize ology, general experimental setup, datasets, the attribute
and translate speech, be very successful in games that require selection for feeding the models, the specific parameter
skill, and even perform better than human beings [25]. In settings to provide comprehensive information for deep
these applications, the goal is usually to train a computer to learning, and other machine learning algorithms used in
perform tasks that humans can do as well. Deep learning experiments and their use in future work. The results of the
methods allow the task to be performed without human analysis of each of the trading scenarios are presented in
participation; perhaps the task that can be done differently Section 4. Finally, Section 5 concludes the study by providing
by a person is unlikely to be completed with human power the obtained results and future considerations.
over a limited period of time, or there is too much of a
benefit in tasks where supernatural performance is needed, 2. Related Work
as in the case of medical diagnoses [26].
Current state-of-the-art practices of deep learning differ Researchers have intensified their studies on the direction of
from market direction forecasting problems in many as- movements of various financial instruments using time
pects. However, one of the most striking aspects is that series and machine learning methodologies. Both academic
market forecasting problems are not those that people can researchers and practitioners have developed financial
already do well. Unlike interpreting, perceiving objects in a trading strategies to make forecasts about future movements
picture, understanding texts in the pictures, people do not of the stock market index and transform the predictions into
have the innate ability to choose a stock that will perform profits. This section includes a summary of research about
well in some future periods. However, deep learning tech- the stock prediction that covers methods that use technical
niques may be useful for such selection problems because indicators as features, traditional machine learning algo-
these techniques essentially convert any function mapping rithms, studies done for Istanbul Stock Exchange (ISE), and
data to a return value. At least, in theory, a deep learner can current methods that use deep learning algorithms in
find a return value for a relationship among data, no matter finance.
how complex and nonlinear it is. This is far from both the The majority of the studies based on stock market
simple linear factor models of traditional financial eco- prediction with machine learning algorithms use technical
nomics and relatively coarse statistical arbitrage methods, indicators as part of the training dataset. Neural Networks
and other quantitative asset management techniques [27]. (NN) [17, 18] and Support Vector Machines (SVM) are one
In this study, we investigate the benefits of Deep Neural of the mostly used machine learning methods. There are also
Network (DNN), Support Vector Machine (SVM), Random studies that use classification methods such as Decision
Forest (RF), and Logistic Regression (LR) classifiers in Trees (DTs) [29], Random Forests (RFs) [30], Logistic Re-
making decisions on market direction. In particular, we gression (LR) [31], and Naive-Bayes (NB) [32]. Patel et al.
show whether these classification approaches can make [33] focused on predicting future values of Indian stock
trading consistent and profitable for a long period of time. market indices using Support Vector Regression (SVR),
The main contribution of the study is developing a deep Artificial Neural Network (ANN), and Random Forest (RF).
learning model taking into consideration OHLC prices and The best overall prediction performance is achieved by SVR-
transaction costs and also to compare the classification ANN hybrid model. Accuracy in the range of 85–95% has
performance of the developed model with the most com- been achieved for long-term prediction on stocks such as
monly used machine learning methods on estimating the AAPL, MSFT, and Samsung using Random Forest classifier
direction of a stock market index. The success of deep by building a predictive model in Khaidem’s research [34].
Complexity 3
Buy, hold, or sell decision prediction is performed on Stock Prediction of the stock movement direction with Con-
Exchange of Thailand (SET) by Boonpeng and Jeatrakul volutional Neural Networks (CNN), which is one of the
[35], comparing the performance of the traditional neural DNN methods most commonly used for analysing visual
network with One vs. All (OAA) and One vs. One (OAO) imagery [43], is applied first on predicting the intraday
neural network (NN). With an average accuracy of 72.50%, direction of ISE 100 stocks by Gunduz et al. [44]. The feature
OAA-NN showed better output than OAO-NN and tradi- set is composed of different indicators. Closing price,
tional NN models. temporal information, and trading data of classifiers are
In order to improve the profitability and stability of labeled by using hourly closing prices. The proposed clas-
trading that includes seasonality events, Booth et al. [36] sifier with seven layers outperforms both Logistic Regression
introduced an automated trading system based on perfor- and CNN, which utilizes randomly ordered features. Chong
mance weighted ensembles of random forests. Tests are done et al. [24] proposed a deep feature learning-based stock
on a large sample of stocks from the DAX, and they have market prediction model as a case study using stock returns
found that recency-weighted ensembles of random forests from the KOSPI market, the major stock market in South
produce superior results. The research in [37] investigated Korea. A time period of five minutes is used in order to
methods for predicting the direction of movement of stock evaluate deep learning network’s performance on market
and stock price index for Indian stock markets, by com- prediction at high frequencies. The aim is to provide a
paring four machine learning prediction models: Artificial comprehensive and objective assessment of both the ad-
Neural Network (ANN), Support Vector Machine (SVM), vantages and drawbacks of deep learning algorithms for
Random Forest, and Naive-Bayes. It was found that Random stock market analysis and prediction. The proposed model
Forest outperforms the other three prediction models on has been tested with covariance-based market structure
overall performance. Likewise, a hybridized framework of analysis and it is found that the proposed model improves
Support Vector Machine (SVM) with K-Nearest Neighbor covariance estimation effectively. From experimental results,
approach for the prediction of Indian stock market indices is practical and potentially useful directions are suggested for
proposed by Nayak et al. [38]. This paper investigates how to further investigation into how to use deep learning
combine several techniques on predicting future stock values networks.
in the horizon of 1 day, 1 week, and 1 month. It is pointed A simple method has been proposed to leverage financial
out that the proposed hybridized model can be used where news to predict stock movements by using the popular word
there is a need for scaling high-dimensional data and better embedding representation and deep learning techniques
prediction capability. [45]. They have used DNN composed of 4 hidden layers and
Kara et al. [39] developed two efficient models based on 1024 hidden nodes in each layer to predict stock’s price
two classification techniques, Artificial Neural Networks movement based on a variety of features. By adding features
(ANNs) and Support Vector Machines (SVMs), and derived from financial news, they have managed to decrease
compared their performances in predicting the direction of the error rate significantly.
movement in the daily Istanbul Stock Exchange (ISE)
National 100 Index. Ten technical indicators were selected 3. Methodology
as inputs of the proposed models. It was found that the
ANN model performed significantly better than the SVM Our objective in this study is to use the best features and
model. In Pekkaya’s study [40], the results of Linear Re- machine learning methods in order to model traders’ be-
gression and NN model have been compared to predict havior so that we can predict market direction. Big traders
YTL/USD currency using macrovariables as input data. It is including investment banks, hedge funds, and brokerage
shown that NN gives better results. In [41], optimal subset firms build their proprietary trading software for stock
indicators are selected with ensemble feature selection trading. The methods used by these firms are kept as con-
approach in order to increase the performance of pre- fidential and trade secrets, which makes their comparison
dicting the next day’s stock price direction. A real dataset is impossible. In our exploration of the best methods and
obtained from Istanbul Stock Exchange (ISE), and the strategies, we decided to use a rich set of features and deep
subset is composed using technical and macroeconomic learning methods in addition to traditional machine
indicators. From the results of this study, it has been found learning algorithms because of their success in many areas.
that the reduced dataset shows an improvement over the As a deep learning framework, we use TensorFlow which is a
next day’s direction estimation. The effectiveness of using powerful and open-source software built by Google Brain
technical indicators, such as simple moving average of team to service many different artificial intelligence tasks
closing price and momentum, in the Turkish stock market [46]. Our dataset is organized as TensorFlow data structure
has been evaluated in Göçken’s study [42]. Hybrid Artificial for holding features, labels, and other parameters.
Neural Network (ANN) models such as Harmony Search Figure 1 illustrates the steps performed to predict market
(HS) and Genetic Algorithm (GA) are used in order to direction by using the TensorFlow framework. It starts with
select the most relevant technical indicators in capturing preprocessing step that extracts features and performs
the relationship between the technical indicators and the normalization. While reading the dataset, a set of features
stock market. As a result from this study, it has been found and labels are defined. If there are string variables, they are
that HS-based ANN model performs better in stock market encoded. After this step, the dataset is divided into two
forecasting. parts as training and testing datasets. Time series k-fold
4 Complexity
Start 0
1
2
CV interation
Preprocessing of dataset 3
4
Read the Define features
dataset and labels 5
6
7
8
Divide the dataset into
Encode the two parts for training
dependent variable 0 20 40 60 80 100
and testing
Sample index
Testing set
Training set
Figure 2: Cross-validation time series split.
TensorFlow data
Implement the model structure for holding
features, labels, etc.
forecasting capabilities of the models. This section gives a
brief description of the classification approaches which we
have used.
Train the
model
Repeat the 3.1.1. DNN Classifier. A Multilayer Perceptron (MLP) is
process to composed of one input layer, one or more hidden layers, and
decrease the
one output layer. Every layer except the output layer includes
loss
Reduce MSE (actual output a bias neuron and is fully connected to the next layer. When
–desired output) an ANN has two or more hidden layers, it is called a Deep
Neural Network (DNN) [47].
For creating fully connected neural network layers,
handy functions of TensorFlow are used. The DNNClassifier
Make prediction on the test
calls tf.estimator.DNNClassifier from the TensorFlow Python
data
API [46]. This command builds a feedforward multilayer
neural network that is trained with a set of labeled data in
order to perform classification on similar, unlabeled data. As
End an activation function, we used ReLU and also regularization
and normalization hyperparameters are optimized.
Figure 1: Steps involved in implementing the TensorFlow use case. The flexibility of neural networks is also one of their
main drawbacks since there are many hyperparameters to
cross-validation method is used for evaluation. In this study, tweak. Apart from using it in any imaginable topology, one
k-fold is set to ten. Financial time series data is split into two can use it even in a simple MLP, where the number of layers,
parts as shown in Figure 2. In each cross-validation step, the neurons per layer, the type of activation function used in
training data gets bigger and includes all data prior to the each layer, the weight initialization logic, and many other
testing data whereas the size of testing data stays the same. In parameters can be modified. Therefore, on choosing the best
the last cycle of cross-validation, the size of the training combination of hyperparameters for the DNN model, both
dataset is nine times bigger than that of the test dataset. After grid search and randomized search are used. Since Grid-
the formation of a model with the training dataset at each SearchCV evaluates all combinations, it can take a long time
step, the model is tested with the testing dataset and pre- to find the best hyperparameters. For that reason, the
cision, accuracy, cumulative return, maximum drawdown, hyperparameter adjustment process for DNN is carried out
and return on investment are calculated. Here, the ultimate in two steps. First, RandomizedSearchCV is used to narrow
goal is to achieve the highest precision, accuracy, cumulative the range for each hyperparameter. Than GridSearchCV is
return, and the lowest maximum drawdown. Hence, optimal implemented using a grid based on the best values provided
parameters are obtained based on the trade-off between by the RandomizedSearchCV.
accuracy and cumulative return. Fitting parameters that are used for Random-
izedSearchCV are as follows: “n-neurons”: [64, 128, 256, 512,
1024, 2048]; “n-hidden layers”: [3, 4, 5, 6, 7, 8]; “batch size”:
3.1. Classification Methods. In this study, four types of data [10, 50, 100, 200]; “learning rate”: [0.01, 0.02, 0.05, 0.1];
mining algorithms were used to compare the financial “activation”: [tf.nn.relu, tf.nn.elu, leaky-rela (alpha � 0.01),
Complexity 5
parameters. Parameters that are placed on the grid are 3.2. Dataset. In this study, nine years of BIST 100 index data
number of trees in the forest—n-estimators: [100, 200, 300, ranging from January 2008 to December 2016 is obtained
400]; maximum depth of the tree—max-depth: [50, 60, 70, from Borsa Istanbul Datastore [51]. Although the BIST 100
80, 90]; min number of samples required to split an internal data in the last few years are published with a time period of
node—min-samples-split: [8, 10, 12]; min number of one second, the time period in the data we have obtained is
samples required to be at a leaf node—min-samples-leaf: [3, ten seconds. Open-high-low-close (OHLC) prices were used
4, 5]; and the number of features to consider when looking to convert the dataset from ten seconds to different time
for the best split—max-features: [2, 3]. After the grid search periods. The conversion process is shown in Figure 3. Since
is fitted to the data, best parameters are obtained. Obtained dataset is converted from a lower time period to higher time
best parameters that are used in this research are n- periods, it can be inferred that there is not any missing data
estimators � 200; max-depth � 60; min-samples-split � 12; in the converted time periods.
min-samples-leaf � 5; and max-features � 3. For example, in the process of converting to an hourly
dataset, the price at the beginning of the hour is taken as
open price, maximum and minimum values at that hour are
3.1.4. Support Vector Machines. For assigning new unseen used as high and low prices, and the last price value of the
objects into a particular category by training a model, SVM hour is used as close price. In the same way, all the volumes
is one of the most used binary classifiers. The main idea of in the hour were agglomerated and the total volume of that
SVM is to establish a decision boundary (hyperplane) in hour was obtained. We used open, high, low, and close prices
which the correct separation of rising and falling samples is and volume of index data within two hours, hourly, and
maximized [50]. A hyperplane of n-dimensional feature 30 min periods. Bihourly, hourly, and 30 min datasets are
vectors x � x1 , . . . , xn can be defined as in equation (4) composed of 9157, 18314, and 33673 rows, respectively. An
where the sum of the elements will be greater than 0 on one example of hourly dataset is shown in Table 1. For each
side and less than 0 on the other: cross-validation k-fold value, in-sample period is used for
n training and out-of-sample period is used for evaluating
β0 + β1 X1 + · · · + βn Xn � β0 + βi Xi � 0. (4) forecasting performance.
i�1 When publications about stock predictions are reviewed,
it is observed that technical indicators used in technical
The class of each point xi can be denoted by yi ∈ {1, −1} analysis are generally utilized to generate feature sets of
where y � β0 + ni�1 βi Xi . By maximizing the distance be- prediction models [52]. Technical indicators are mathe-
tween the boundary and any point, we can get an optimal matical calculation methods used to analyze the prices of
hyperplane. The best data splitting boundary is called financial instruments. After some specific calculations on
maximum margin hyperplane. Data points close to the time series data, most of the indicators help investors to
hyperplane are known as a Support Vector Classifier (SVC), forecast price movement trends in the future. Some indi-
and only these points are relevant to hyperplane selection. cators, on the other hand, try to show whether a trend will
SVC cannot be applied to nonlinear functions. For solving continue or not. Indicators are calculated for a specific
this issue in SVM, a more general kernel function is applied moment and period to enlighten the investors.
as in equation (5) which is a quadratic programming (QP) There are literally hundreds of technical indicators that
optimization problem with linear constraints and can be can be used for forecasting. Some of these indicators extract
solved by using standard QP solver: similar information and produce similar signals. The se-
n
lection of the right and diverse set of indicators is important
⎝β + α y K x, x ⎞
f(x) � sgn⎛ ⎠. (5) so that a diverse set of measures/indicators can be used as
0 i i i
i�1 features in the formation of prediction models. The names
and descriptions of the selected technical indicators used in
SVM is implemented through Pedregosa et al. Scikit- the study are given in Table 2. Similar abbreviations have
learn Python library using LinearSVC package [48]. Line- been used for the definition of indicators in Kumar’s et al.
arSVC implements “one-vs-the-rest” multiclass strategy; [53] and Gündüz’s et al. [54] studies. We use the same
since we have only two classes, only one model is trained. naming conventions in this study.
In order to improve the performance of SVM, we are After the selection of the technical indicators, we have to
focused on tuning three major hyperparameters. Kernels, determine time periods and required OHLC price data to be
Regularisation, and Gamma are the most important pa- used in the calculation of these indicators. For example, the
rameters that affect performance. These parameters are SMA, EMA, ROCP, and MOM indicators were calculated
placed on the grid in order to be used by GridSearchCV for using the closing price of the BIST 100 index and on 3, 5, 10,
grid search. The model is evaluated for each combination of 15, and 30 previous values of time series on two hour, hourly,
algorithm parameters specified in the grid. Used hyper- and 30 min interval periods. The WILLR, CCI, UO, and ATR
parameters are as follows: C: [0.1, 1, 10, 100], gamma: [1, 0.1, indicators were found using the daily maximum, minimum,
0.01, 0.001], and kernel: [rbf, poly, sigmoid]. After fitting and closing prices of the BIST 100 index. These values are
GridSearchCV in the training data, the best estimators are calculated using 4 time periods for WILLR, and one time
acquired. Obtained best hyperparameters that are used in period for CCI, UO, and ATR. With the calculation of
this study are C � 10, gamma � 0.1, and kernel � rbf. different indicators for different time periods, we obtained
Complexity 7
10 : 50
10 : 55
10 : 00
10 : 05
10 : 10
10 : 15
10 : 20
10 : 25
10 : 30
10 : 35
10 : 40
10 : 45
minutes, one hour, and two hours, respectively. Also pi
Time denotes the closing price of i-th trading period and p(i+1)
denotes the closing price of the next trading period as it is
Figure 3: Conversion of ten seconds time period BIST 100 data to
used in the following equation:
hourly OHLC.
p(i+1) − pi
r(i+1) � . (7)
pi
Table 1: BIST 100 hourly data structure.
The class label for i-th period, i.e., yRi for Rise and yFi for
Date Open High Low Close Volume
Fall, is set based on the following equations:
2008010209 55160.20 55171.04 54889.51 54951.62 180514
2008010210 54891.50 55281.66 54821.49 54854.45 132182 1, If r(i+1) > r(i) + θ,
2008010211 54853.80 54951.23 54481.52 54638.57 76451
yRi � (8)
0, otherwise,
2008010212 54527.59 54527.59 54527.59 54527.59 25427
2008010214 54741.95 54939.83 54584.49 54618.13 136968
1, If r(i+1) < r(i) − θ,
yFi � (9)
0, otherwise.
Table 2: Selected technical indicators.
In the class labeling equations, the threshold value θ is
Name Description used to arrange transaction costs and define targeted returns.
OPP Opening price of period Due to the transaction costs and risk of a stock exchange,
HPP Highest price of period investors are not willing to do too many transactions, at least
LPP Lowest price of period the transaction costs are targeted to be met. In order to be
CPP Closing price of period
able to take off from the transactions where the return is less
ROC (x) Rate of change of closing price
ROCP (x) Percentage rate of change of closing price
than the transaction cost and also to be able to evaluate the
%K Stochastic oscillator success of the system according to prediction performance
%D Moving average of % for x period and compound return, different threshold values are used.
BR Bias ratio They were obtained from multiplying the standard deviation
MA (x) Moving average of x periods of returns by predetermined values. Predetermined values
EMA (x) Exponential moving average of x periods start from 0 and increase by 0.1 until they reach 0.5. In this
TEMA (x) Triple exponential moving average of x periods way, six different threshold values were obtained.
MOM (x) Momentum
Moving average convergence divergence of x
MACD (x, y)
periods 3.3. Performance Measures and Implementation of Prediction
PPO (x, y) Percentage price oscillator
Model. Predicting the market direction, whether it moves
CCI (x) Commodity channel index
WILLR (x) William’s %
upside or downside, is equally important since traders can
RSI (x) Relative strength index make a profit from both sides. Therefore, predicting index
ULTOSC (x, y, z) Ultimate oscillator rise and index fall is modeled separately. In the first model,
RSI (x) Relative strength index the system is trained to predict whether there will be a rise or
RDP (x) Relative difference in percentage not, and in the second model, the system is trained to predict
ATR (x) Average true range whether there will be a fall or not. In order to overcome the
MEDPRICE (x) Median price transaction costs problem, we have used a dynamic
MIDPRICE (x) Medium price threshold variable which helps us to eliminate small returns
SignalLine (x, y) Signal/trigger line that are less than transaction costs. Evaluation metrics are
HHPP (x) Highest closing price of last x periods needed to measure and compare the predictability of clas-
LLPP (x) Lowest closing price of last x periods
sifiers. To evaluate the performance and robustness of the
proposed models, we have used performance metrics that
97 features for each period of the BIST 100 index. After the are derived from confusion matrix like accuracy, precision,
features are composed, min-max normalization is applied to and recall. To evaluate the model’s performance from a fi-
each feature as in equation (6) where x � x1 , . . . , xn nancial return perspective, we have used compound return
8 Complexity
and return of investment metrics. Additionally, max Table 3: Confusion matrix of upward movement.
drawdown measurement is used for evaluating the model’s Actual/predicted Rise Not rise
risk of investment.
Rise TP FN
Not rise FP TN
3.3.1. Confusion Matrix. In machine learning algorithms,
classifiers’ performance evaluation is mainly done by the
confusion matrix. The number of true and false estimates is Table 4: Confusion matrix of downward movement.
summarized by the counting of values separated by each Actual/predicted Fall Not fall
class. It provides a simple way to visualize the performance Fall TP FN
and robustness of an algorithm. Not fall FP TN
Since we aim to estimate gains that cover transaction
costs and focus on eliminating small returns, we use the
threshold structure as shown in equations (8) and (9). For 2 × precision × recall
the evaluation of upward movement predictions, the con- F1 − score � . (13)
fusion matrix is shown in Table 3. And also for evaluating precision + recall
downward movement predictions, the confusion matrix is
shown in Table 4. For upward movement, positive obser-
vation is Rise and negative observation is Not Rise. Similarly, 3.3.2. Compound Return. Calculating the rate of return of
for downward movement, positive observation is Fall and our predictions correctly is one of the main concerns, since
negative observation is Not Fall. we are assuming to put all investment without excluding
Assessments of performance and robustness of the profit or compensate losses, in each trade. The compound
proposed models are calculated based on these four values of return is one of the best measurement tools that fit for this
the confusion matrix. Accuracy, precision, recall, and F- purpose. Shown as a percentage, compound return indicates
score are among important measures which are calculated the outcome of a series of profits or losses on the initial
from these values. investment over a while, in a continuous manner.
Accuracy percentage calculation is given in equation When evaluating the performance of an investment’s
(10). Since accuracy measures true orders and our dataset is return over a time period, it is known that average return as a
unbalanced, only the model evaluation by accuracy will not measurement tool is not as proper as compound return. This
be enough: is because when the average return is used, the returns are
TP + TN independent of each other and the effect of each return
accuracy% � × 100. (10)
TP + TN + FP + FN cannot be carried on to the next step, resulting in failure to
clearly determine the success of the model. For average
In the trading model, false positive (FP) means that return calculation, discrete returns can be used. Discrete
actually there is no opportunity for profit, but the model returns are calculated as shown in equation (14), where Pt
indicates that you need to enter into trade (buy or sell). In represents the price at time t and Pt+1 represents the price at
this case, you will lose money, which is the worst possible time t + 1:
situation. Thus, choosing a model with minimum FP is
P
crucial. This can be achieved by maximizing precision. The PEd (t, t + 1) � t+1 − 1. (14)
calculation of the percentage of precision is shown in the Pt
following equation: When calculating the average return, discrete returns are
TP summed and divided by the number of periods. The return
precision% � × 100. (11)
TP + FP of the aggregated multiperiod performance will only be
correct if period returns are contributed. Since discrete
On the other hand, false negative (FN) means that al-
returns are multiplicative, they will not be appropriate in this
though there is an opportunity to make money from trade,
case. Thus, the correct aggregated performance is calculated
the model does not indicate that. In this case, the oppor-
using the compound return formula as shown in the fol-
tunity to make money will have escaped, but it will not be
lowing equation [55]:
perceived as a major problem as there is no expectation in
T−1
trading to predict every movement of the market. Recall
maximization indicates FN minimization. Percentage of PEd (0, T) � 1 + PEd (t, t + 1) − 1. (15)
0
recall calculation is shown in the following equation:
TP At the beginning of each period, trained models decide
recall% � × 100. (12) whether to enter the trade. If it enters a trade, at the end of
TP + FN
the period, the trade closes. For each trade, discrete return is
Lastly, the F-score provides insights for the relation calculated. This means, if it is traded on the hourly time
between precision and recall. Since precision and recall are period, at the beginning of the hour, the model decides
prioritized equally, F1-score is used as F-score. The following whether it should open an order a not. And at the end of the
equation provides definitions of F1-score: hour, the model closes the open order.
Complexity 9
3.3.3. Maximum Drawdown. The main concerns of the After a long process of literature review, we decided to
investment are capital protection and consistent estimations. focus on intraday intervals rather than daily, weekly, or
Since maximum drawdown (MDD) is one of the most monthly time periods. Our decision is based on the fact that
important measures of risk for a trading strategy, it plays a the vast majority of recent studies are focusing on intraday
crucial role in evaluating the performance of the prediction trading research. Also, their cumulative returns are higher
model [56]. MDD value is calculated as shown in the fol- than larger timeframes. These facts are the main reasons for
lowing equation: focusing on an intraday investigation. In addition, being less
risky is another important factor that forced us to examine
P−L intraday market direction prediction.
MDD � , (16)
P In order to compare the performance of classification
where P represents peak profit before largest loss and L techniques according to the prepared dataset, four different
represents the lowest value of loss before the new profit peak machine learning methods were used in three different time
is established. periods and bidirectional (buy/sell) operations were tested
MDD is used to express the difference between the on six different threshold values, resulting in a total of 144
highest capital level and the lowest capital level, where the aspects. To avoid the problem of overfitting which may arise
highest capital level must occur before the lowest capital while designing a supervised classification model for pre-
level. The maximum drawdown duration is the longest time dicting the direction of the index, k-fold cross-validation is
it takes for the forecasting model to recover the capital loss applied to each aspect where k value is set to ten. In the
[57]. MDD structure has been illustrated in Figure 4. In this strategies applied according to the methods of deep learning
study, drawdowns are measured in percentage terms. and machine learning, there are 48 different results in each
period. In the obtained results, the threshold value was tested
with a total of six different threshold values starting at 0 up to
4. Experimental Results 0.5, with incremental steps of 0.1.
The detailed evaluation of the BIST 100 index direction
Supply and demand helps to determine the price of each forecast performance concerning rise and fall is listed in
security or the willingness of participants—investors and Tables 5 and 6, respectively. We compare the predictive
traders—to trade. Buyers, in exchange, offer a maximum performance of Deep Neural Network (DNN), Support
amount they would like to pay, which is usually lower than Vector Machine (SVM), Random Forest (RF), and Logistic
the demand of the sellers. In order a trade to take place, Regression (LR) on the out-of-sample test set in terms of
either buyer increases the price or seller reduces the price. confusion matrix values, accuracy (acc. %), precision (pre.
According to this, if the purchase occurs, the price increases %), recall (rec. %), and the F1-score (f1 %). Also we added
and the price decreases if sales are made. This shows that the maximum drawdown (mdd. %) and compound return
decision of the investors has a direct effect on the price. (cmp.) on performance evaluation metrics.
As mentioned before, we know that traders use technical It may be misleading to use only traditional machine
analysis methods in decision making. The main idea in this learning performance assessment measures to evaluate the
study is that if the market’s direction of movement is shaped trading model estimate. For trading applications, higher
by traders’ transactions [2], and if the majority of traders are accuracy in estimates does not always mean higher profits.
using technical analysis methods in the decision-making Any trading strategy will ultimately lose money, even if the
process [8], by training deep learning and classic machine strategy appears to be profitable on paper if the returns are
learning methods using technical analysis indicators to es- not high enough to come up above the transaction costs
timate the market direction, we are actually modeling associated with commissions, spreads, and slips in a series of
traders’ behavior. consecutive transactions. In a particular way, parameters
To strengthen this idea, first of all, we had to choose the such as threshold value, average return per transaction,
best timeframes. We started with examining high-frequency maximum drawdown, and cumulative return represent a
trading (HFT) studies [58]. In these studies, the processing more appropriate measure for such a study [23].
time ranged from milliseconds to seconds, and we observed Our main target was to investigate whether if it is
that market makers frequently use these strategies. As we do possible to predict the BIST 100 index consistently using
not have an appropriate infrastructure, we have decided that deep learning and machine learning classification ap-
these methods will not be applicable to us because the proaches. For supporting decisions on financial markets,
transactions to be made within these periods will not cover results are compared with the “buy and hold” strategy. Since
the costs. the average return of the “buy and hold” strategy on the BIST
In addition, we investigated studies, in which deep 100 index on the test period is %15, from the results in
learning and machine learning methods have been suc- Tables 5 and 6, it can be seen that compound return of both
cessfully applied. These studies attempt to predict a wider DNN and other methods outperform buy and hold strategy.
time frame, such as weekly, monthly, or annual estimates From Tables 5 and 6, it can be pointed out that the
[30]. We did not find appropriate to use these time intervals outcome acquired by the average 10-fold cross-validation
because the sample size decreased dramatically in these seems to demonstrate the inverse correlation between ac-
studies. We think that predicting for such long horizons curacy and compound return. For instance, in Table 5
would be too risky. considering threshold values ranging from 0 to 0.5 for the
10 Complexity
Maximum
drawdown duration
A drawdown
3 × 104
Cumulative return
2 × 104
Maximum
drawdown
1 × 104
DNN, it can be noticed that precision and compound return return on different time periods when index performs well
(cmp.) decreases from 60 to 48 and from 3.34 to 1.12 whereas and when it causes loss.
accuracy and average return (ret.) per trade increases from 2009 is one of the most profitable periods of the BIST 100
58 to 77 and from 0.15 to 0.33, respectively. Additionally, by index in the dataset. In Figure 5, the DNN model’s results are
increasing the time period from 30 min to 2 hours, com- compared with the BIST 100 index return of this period.
pound return decreases. The reason for the increase in ac- Furthermore, in Figure 6, the results of the DNN model were
curacy and decrease in compound return is that by compared with the BIST 100 index, which has suffered a loss
increasing the threshold value, we are aiming to minimize in 2016. As can be seen from both results, even investing in
risk and to maximize return per trade. Results indicate that the index to achieve a diversified portfolio can lead to losses,
we are reaching our goal of minimizing the number of trades while a more stable investment instrument can be obtained
and increasing return per trade. By targeting larger returns, by investing according to our proposed deep learning and
we reduce the number of transactions and eliminate smaller machine learning models. Independently of the index per-
returns, which results in a reduction of compound return. forming well or poorly, risks can be minimized while profit
The numbers of correct predictions though are increasing increases simultaneously with the proposed model.
and likewise accuracy increases. Recall decreases as we are In predicting the direction of the BIST 100 index with
limiting the number of trades. deep learning and machine learning methods, we have
Similar results can be seen in Table 6 where fall direction noticed that true-positive trades’ gains are much higher than
is predicted. As expected, DNN performs better in smaller false-positive trades’ losses. The accuracy and precision of
time periods where there are more records. By decreasing our test results are close to 60%, which means, even when
the number of instances, DNN performance decreases. On accuracy and precision are close to the 50% level, the system
the other hand, the performance of Random Forest and will be profitable since the gains from true-positive trades are
SVM increases. By using threshold and time period struc- greater than the losses from false-positive trades.
ture, we are enabling investors to weigh the potential reward In most of the studies where deep learning and machine
against the risk to decide if the pain is worth the potential learning are applied, average return per trade is not evaluated
gain. [35, 44, 53]. There are not many studies that include cumulative
The results obtained for predicting price rise are reported return in their results [23]; however, we could not find any
in Table 5. All models were compared according to the test study where the average return per trade is compared.
results corresponding to each threshold value. From Table 5, As can be seen from Tables 5 and 6, return percentage
it can be concluded that the highest compound return with ret. % row is positive and it increases when the threshold
minimum drawdown and maximum precision can be value and time period increase. In Table 5, when the
achieved with the DNN model. threshold value is 0 and the time period is 30 minutes, the
In order to minimize risk, investors aim to diversify their return percentage is 0.15, and it increases to 0.74 when the
portfolio. To build it without purchasing many individual threshold value is 0.5 and the time period is 2 hours.
stocks, they are investing in index funds instead. Even when Similarly in Table 6, when the threshold value is 0 and the
financial system suffers from erratic behaviors and high time period is 30 minutes, the return percentage is 0.19 and it
volatility, it reflects as a loss to investors. Figures 5 and 6 increases to 0.58 where the threshold value is 0.5 and the
compare BIST 100 index return with our DNN models time period is 2 hours.
Complexity 11
According to the results, we observe that DNN has a learning methods on predicting BIST 100 direction, profits
higher average return per transaction. The profit from the of being right will be greater than the losses of being wrong.
right decisions is greater than the loss from the wrong From the results, we can see that, in smaller time periods,
decisions, resulting in a higher compound return. Creating compound return is bigger and max drawdown is lower.
more profit from right decisions compared to the losses Therefore, by using smaller time periods, we can achieve
incurred from wrong decisions is the main objective of lower risk and increase profit. We used three different time
money management. As Druckenmiller, who was manager intervals to compare estimation performance and com-
at Soros’ Quantum Fund, says “I’ve learned many things pound return over different time periods. Selection of the
from George Soros, but perhaps the most significant is that time period can be optimized by trying different values, but
it’s not whether you’re right or wrong that’s important, but time period optimization is not the main focus of this study.
how much money you make when you’re right and how From our results, we can infer that, by optimizing time
much you lose when you’re wrong” [59]. From our results, period selection, compound return can increase and max
we can infer that by applying deep learning and machine drawdown can be decreased.
12 Complexity
One of the most important implications of our results is experiments show that a deep learning algorithm, indirectly,
that deep learning and machine learning methods produce has the capacity to produce an appropriate representation of
successful results when used in predicting market direction. information.
Our results indicate why large funds and experts are in- Even though according to the confusion matrix DNN
volved in using and studying deep learning and machine model performs notably better than other models according
learning methods to predict financial markets [60]. to compound return, max drawdown, and precision, we
aimed to identify if any observed difference is statistically
significant. For comparing the statistical significance of the
4.1. Implications from Experiment Results. To summarize the models, McNemar test is used, in which it captures the errors
obtained results, although traditional machine learning made by both models [61]. The null hypothesis is the ex-
techniques are still preferred mainstream methods in pre- pression that classifiers have a similar proportion of errors
dictive analysis, recent research shows that these methods do on the test set. On McNemar test, the p value is below a given
not capture the properties of complex, nonlinear problems threshold (0.05) only on DNN-RF comparison. We can
as well as deep learning methods. Accordingly, these reject the null hypothesis since the p value is 0.048 and infer
Complexity 13
4
Compound return
2009 /04 2009 /05 2009 /06 2009 /07 2009 /08 2009 /09 2009 /10
Date
BIST 100 return DNN 30M th. 0.3
DNN 30M th. 0.0 DNN 30M th. 0.4
DNN 30M th. 0.1 DNN 30M th. 0.5
DNN 30M th. 0.2
Figure 5: Comparison of one of the most profitable periods of BIST 100 index with the return of 30 min DNN.
3.0
Compound return
2.5
2.0
1.5
1.0
2016 /04 2016 /05 2016 /06 2016 /07 2016 /08 2016 /09
Date
BIST 100 return DNN 30M th. 0.3
DNN 30M th. 0.0 DNN 30M th. 0.4
DNN 30M th. 0.1 DNN 30M th. 0.5
DNN 30M th. 0.2
Figure 6: Comparison of one of the worst periods of BIST 100 index with the return of 30 min DNN.
relatively to a neanderthal genome,” in Lecture Notes in regression,” International Journal of Business Information
Computer Science, vol. 10255, pp. 235–242, Springer, Iberian Systems, vol. 7, no. 1, 2015.
Conference on Pattern Recognition and Image Analysis, [32] A. Shihavuddin, M. N. Ambia, M. M. Nazmul Arefin,
Springer, 2017. M. Hossain, and A. Anwar, “Prediction of stock price ana-
[17] L. Di Persio and O. Honchar, “Artificial neural networks lyzing the online financial news using Naive Bayes classifier
architectures for stock price prediction: comparisons and and local economic trends,” in Proceedings of the 2010 3rd
applications,” International Journal of Circuits, Systems and International Conference on Advanced Computer Theory and
Signal Processing, vol. 10, pp. 403–413, 2016. Engineering (ICACTE), pp. V4-22–V4-26, Chengdu, China,
[18] C. S. Vui, G. K. Soon, C. K. On, R. Alfred, and P. Anthony, “A August 2010.
review of stock market prediction with artificial neural net- [33] J. Patel, S. Shah, P. Thakkar, and K. Kotecha, “Predicting stock
work (ANN),” in Proceedings of the IEEE International market index using fusion of machine learning techniques,”
Conference on Control System, Computing and Engineering Expert Systems with Applications, vol. 42, no. 4, pp. 2162–2172,
(ICCSCE), pp. 477–482, IEEE, Mindeb, Malaysia, December 2015.
2013. [34] L. Khaidem, S. Saha, and S. Dey, “Predicting the direction of
[19] Z. H. Khan, T. S. Alin, and A. Hussain, “Price prediction of stock market prices using random forest,” 2016, https://arxiv.
share market using artificial neural network (ANN),” Inter- org/abs/1605.00003.
national Journal of Computer Applications, vol. 22, no. 2, [35] S. Boonpeng and P. Jeatrakul, “Decision support system for
pp. 42–47, 2011. investing in stock market by using OAA-neural network,” in
[20] Y. Zhang and L. Wu, “Stock market prediction of S&P 500 via Proceedings of the 8th International Conference on Advanced
combination of improved BCO approach and BP neural Computational Intelligence, Chiang Mai, Thailand, February
network,” Expert Systems with Applications, vol. 36, no. 5, 2016.
pp. 8849–8854, 2009. [36] A. Booth, E. Gerding, and F. McGroarty, “Automated trading
[21] M. Henrique, V. A. Sobreiro, and H. Kimura, “Stock price with performance weighted random forests and seasonality,”
prediction using support vector regression on daily and up to Expert Systems with Applications, vol. 41, no. 8, pp. 3651–3661,
the minute prices,” The Journal of Finance and Data Science, 2014.
vol. 4, no. 3, pp. 183–201, 2018. [37] J. Patel, S. Shah, P. Thakkar, and K. Kotecha, “Predicting stock
[22] N. Sapankevych and R. Sankar, “Time series prediction using and stock price index movement using trend deterministic
support vector machines: a survey,” IEEE Computational
data preparation and machine learning techniques,” Expert
Intelligence Magazine, vol. 4, no. 2, pp. 24–38, 2009.
Systems with Applications, vol. 42, no. 1, pp. 259–268, 2015.
[23] E. A. Gerlein, M. McGinnity, A. Belatreche, S. Coleman, and
[38] R. K. Nayak, D. Mishra, and A. K. Rath, “A Naı̈ve SVM-KNN
M. McGinnity, “Evaluating machine learning classification for
based stock market trend reversal analysis for Indian
financial trading: an empirical approach,” Expert Systems with
benchmark indices,” Applied Soft Computing, vol. 35,
Applications, vol. 54, pp. 193–207, 2016.
pp. 670–680, 2015.
[24] E. Chong, C. Han, and F. C. Park, “Deep learning networks for
[39] Y. Kara, M. Acar Boyacioglu, and Ö. K. Baykan, “Predicting
stock market analysis and prediction: methodology, data
direction of stock price index movement using artificial neural
representations, and case studies,” Expert Systems with Ap-
networks and support vector machines: the sample of the
plications, vol. 83, pp. 187–205, 2017.
[25] T. Revell, AI Trained on 3500 Years of Games Finally Beats Istanbul stock exchange,” Expert Systems with Applications,
Humans at Dota 2, New Scientist, London, UK, 2018, https:// vol. 38, no. 5, pp. 5311–5319, 2011.
www.newscientist.com/article/2172612-ai-trained-on-3500-years- [40] M. Pekkaya and C. Hamzaçebi, “An application on forecasting
of-games-finally-beats-humans-at-dota-2/. exchange rate by using neural network (Yapay sinir ağları ile
[26] J. B. Heaton, N. G. Polson, and J Witte, “Deep learning for döviz kuru tahmini üzerine bir uygulama),” in Proceedings of
finance: deep portfolios, applied stochastic models in business the 27th YA/EM National Congress, pp. 973–978, Izmir,
and industry,” SSRN Electrical Journal, vol. 33, no. 1, pp. 3–12, Turkey, November 2007.
2016. [41] A. Ç. Pehlivanlı, B. Aşıkgil, and G. Gülay, “Indicator selection
[27] A. Hasan, O. Kalıpsız, and S. Akyokuş, “Predicting financial with committee decision of filter methods for stock market
market in big data: deep learning,” in Proceedings of the In- price trend in ISE,” Applied Soft Computing, vol. 49,
ternational Conference on Computer Science and Engineering, pp. 792–800, 2016.
Antalya, Turkey, October 2017. [42] M. Göçken, M. Özçalıcı, A. Boru, and A. T. Dosdoğru,
[28] R. Cervelló-Royo, F. Guijarro, and K. Michniuk, “Stock “Integrating metaheuristics and artificial neural networks for
market trading rule based on pattern recognition and tech- improved stock price prediction,” Expert Systems with Ap-
nical analysis: forecasting the DJIA index with intraday data,” plications, vol. 44, pp. 320–331, 2016.
Expert Systems with Applications, vol. 42, no. 14, pp. 5963– [43] M. Masakazu, K. Mori, Y. Mitari, and Y. Kaneda, “Subject
5975, 2015. independent facial expression recognition with robust face
[29] E. Bastı, C. Kuzey, and D. Delen, “Analyzing initial public detection using a convolutional neural network,” Neural
offerings’ short-term performance using decision trees and Networks, vol. 16, no. 5, pp. 555–559, 2003.
SVMs,” Decision Support Systems, vol. 73, pp. 15–27, 2015. [44] H. Gunduz, Y. Yaslan, and Z. Cataltepe, “Intraday prediction
[30] M. Ballings, D. Van den Poel, N. Hespeels, and R. Gryp, of Borsa Istanbul using convolutional neural networks and
“Evaluating multiple classifiers for stock price direction feature correlations,” Knowledge-Based Systems, vol. 137,
prediction,” Expert Systems with Applications, vol. 42, no. 20, pp. 138–148, 2017.
pp. 7046–7056, 2015. [45] Y. Peng and H. Jiang, “Leverage financial news to predict stock
[31] A. Dutta, G. Bandopadhyay, and S. Sengupta, “Prediction of price movements using word embeddings and deep neural
stock performance in Indian stock market using logistic networks,” 2015, https://arxiv.org/abs/1506.07220.
16 Complexity