0% found this document useful (0 votes)
303 views24 pages

Back Testing For Algorithmic Trading Strategies

This document discusses back-testing algorithmic trading strategies to determine which strategies performed best in the past. It analyzes three alternative strategies using historical Tesla stock price data, including when the price rises above a 10-day exponential moving average (EMA10), when the EMA10 rises above the EMA30, and when the MACD rises above the MACD signal. The EMA10 strategy produced the highest returns, capturing 59% of gains with a 189.67% profit over 251 days of back-testing.

Uploaded by

andy paul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
303 views24 pages

Back Testing For Algorithmic Trading Strategies

This document discusses back-testing algorithmic trading strategies to determine which strategies performed best in the past. It analyzes three alternative strategies using historical Tesla stock price data, including when the price rises above a 10-day exponential moving average (EMA10), when the EMA10 rises above the EMA30, and when the MACD rises above the MACD signal. The EMA10 strategy produced the highest returns, capturing 59% of gains with a 189.67% profit over 251 days of back-testing.

Uploaded by

andy paul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Back-Testing for Algorithmic Trading

Strategies
How to choose a strategy? It would be a good start to test alternative strategies
retrospectively, knowing which of these strategies worked best in the past and produced
more accurate signals. This is called Back-Test.

In [1]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import yahoofinancials as yf
from yahoo_fin.stock_info import *
import requests_html
import requests
import ftplib
import ta as ta
import io

from tscv import GapWalkForward


from sklearn.tree import DecisionTreeClassifier,plot_tree
from sklearn.model_selection import GridSearchCV,cross_val_score
from sklearn.metrics import accuracy_score
from sklearn.tree import export_graphviz

from six import StringIO

from IPython.display import Image


import pydotplus
from yellowbrick.classifier import ClassificationReport,ConfusionMatrix,ROCAUC
from yellowbrick.model_selection import FeatureImportances
import graphviz

Let's get the historical time series data of the stock by specifying the start and end dates

In [2]:
history = yf.YahooFinancials('TSLA').get_historical_price_data('2021-01-01', '2021-1
df = pd.DataFrame(history['TSLA']['prices'])
df.head()

Out[2]: date high low open close volume adjclose formatted_da

0 1609770600 248.163330 239.063339 239.820007 243.256668 145914600 243.256668 2021-01-

1 1609857000 246.946671 239.733337 241.220001 245.036667 96735600 245.036667 2021-01-

2 1609943400 258.000000 249.699997 252.830002 251.993332 134100000 251.993332 2021-01-

3 1610029800 272.329987 258.399994 259.209991 272.013336 154496700 272.013336 2021-01-


date high low open close volume adjclose formatted_da

4 1610116200 294.829987 279.463318 285.333344 293.339996 225166500 293.339996 2021-01-

In [3]:
df.drop('date', axis=1, inplace=True)

In [4]:
df.index = pd.to_datetime(df['formatted_date'])
df.drop('formatted_date', axis=1, inplace=True)

Let's change the name of the adjclose variable to 'Price'.

In [5]:
df.rename(columns={'adjclose': 'price'}, inplace=True)

Let's change the price change to 'return' (daily return) and assign the percentage change
in the price to 'return_pct'.

In [6]:
df["return"] = df["price"].diff()
df["return_pct"] = df["price"].pct_change()

Time Series Graph of Price and Return

In [7]:
f, axarr = plt.subplots(2,sharex=False,figsize=(12,7))
f.suptitle('TESLA Price and Return', fontsize=20)
axarr[0].plot(df['price'], color='blue')
axarr[0].grid(True)
axarr[1].plot(df['return'], color='red')
axarr[1].grid(True)
f.legend(['Price', 'Return'], loc='upper left')
plt.show()
Three Alternative Strategies
1) Price > EMA10 : If the price goes above the 10-day exponential moving average,
it is considered a buy signal.

2) EMA10 > EMA30 : When the 10-day exponential moving average rises above the
30-day exponential moving average, it can be considered as a buy signal.

3) MACD > MACDS : The MACD indicator is the difference between the exponential
values of the 26 and 12-day moving average. The 9-day exponential moving average
of the MACD is called the MACD Signal (MACDS). If the MACD goes above the
MACDS, it is considered a buy signal.

1) Price > EMA10 (s1)

In [8]:
df["EMA10"] = ta.trend.ema_indicator(df["price"],10,fillna=True)

Let's create Buy-Sell Signals

In [9]:
df["buy_s1"] = np.where(df["price"] > df["EMA10"], 1, 0)
df["sell_s1"] = np.where(df["price"] < df["EMA10"], 1, 0)
df["buy_s1_ind"] = np.where((df["buy_s1"] > df["buy_s1"].shift(1)),1, 0)
df["sell_s1_ind"] = np.where((df["sell_s1"] > df["sell_s1"].shift(1)),1, 0)

In [10]:
df["date"] = df.index
fig1 = plt.figure(figsize=(14,8))
plt.plot(df["price"],label="Price",color='blue')
plt.plot(df["EMA10"],label="EMA10",color='red',linestyle='--')
plt.scatter(df.loc[df["buy_s1_ind"] == 1].index,
df["price"][df["buy_s1_ind"] == 1], color='green', marker='^', s=100, la

plt.scatter(df.loc[df["sell_s1_ind"] == 1].index,
df["price"][df["sell_s1_ind"] == 1], color='red', marker='v', s=100, lab

plt.xlabel('Trade Days')
plt.legend(loc='best')
plt.title('TESLA Strategy One(Price>EMA10) Buy-Sell Signals')
plt.show()

According to the Strategy One, the profit of the trader who trades with $1000 in the
relevant period:

Assuming we allocate 5% of the return as transaction costs, then we will consider 95% of the
percentage return.

In [11]:
df["value_s1"] = 1000*(1+(np.where(df["buy_s1"]==1,
0.95*df["return_pct"],0)).cumsum())

Back-Test Report (s1)


In [12]:
print("************* Descriptive Statistics *************")
print("Period",len(df),"days")
print("Highest Daily Loss ",100*round(df["return_pct"].min(),2),"%")
print("Highest Daily Return ",100*round(df["return_pct"].max(),2),"%")
print("Standard Deviation of Return ",100*round(df["return_pct"].std(),2),"%")
print("Total Potential Return ",100*(round(sum(np.where((df["return_pct"]>0),df["ret
print("Total Potential Loss ",100*(round(sum(np.where((df["return_pct"]<0),df["retur
print("Net Return ",100*df["return_pct"].sum().round(2),"%")

print("************* MODEL PERFORMANCE *************")

print("Return Captured by the Model ",100*sum(np.where((df["buy_s1"]==1),df["return_


print("Loss Maintained by the Model ",100*sum(np.where((df["sell_s1"]==1),df["return
print("**************************************************")

************* Descriptive Statistics *************


Period 251 days
Highest Daily Loss -12.0 %
Highest Daily Return 20.0 %
Standard Deviation of Return 3.0 %
Total Potential Return 336.0 %
Total Potential Loss -283.0 %
Net Return 53.0 %
************* MODEL PERFORMANCE *************
Return Captured by the Model 200.0 %
Loss Maintained by the Model -147.0 %
**************************************************

In [13]:
print("************* REPORT *************")
print("The end-of-period price of the stock, which was $",df["price"][0].round(2),
"at the beginning of the period, became $",df["price"][-1].round(2),"with %",
(100*(df["price"][-1]-df["price"][0])/df["price"][0]).round(2),"change","The mod
100*(sum(np.where((df["buy_s1"]==1),df["return_pct"],0))/sum(np.where((df["retur
df["return_pct"]
"of the total positive return.The investment of $1000 at the beginning of the pe
df["value_s1"][-1].round(2),
"on the first",len(df),"days.")

************* REPORT *************


The end-of-period price of the stock, which was $ 243.26 at the beginning of the per
iod, became $ 356.78 with % 46.67 change The model captured % 59.0 of the total posi
tive return.The investment of $1000 at the beginning of the period became $ 2896.76
on the first 251 days.

The sum of the percentage gains on the days when the stock increased was 336%, and
the sum of the percentage loss on the days when the stock decreased was 283%. For the
price > EMA10 strategy, it correctly saw 59% of the potential positive gain. On the days
when the price correctly indicated that the price would increase and gave a buy signal,
the return was 200%. However, someone who bought this stock at the beginning of the
period and sold it at the end of the period made a profit of 46%, but according to the
signals, the one who traded made a profit of 189.67% for $ 1000.

In [14]:
df["date"] = df.index
f,axarr = plt.subplots(2,sharex=False,figsize=(14,10))
f.suptitle('TESLA Strategy One(Price>EMA10)', fontsize=20)
axarr[0].plot(df["price"],label="Price",color='blue')
axarr[0].plot(df["EMA10"],label="EMA10",color='red',linestyle='--')
axarr[0].scatter(df.loc[df["buy_s1_ind"] == 1].index,
df.loc[df["buy_s1_ind"] == 1,"price"].values, color='green', marker=
axarr[0].scatter(df.loc[df["sell_s1_ind"] == 1].index,
df.loc[df["sell_s1_ind"] == 1,"price"].values, color='red', marker='
axarr[0].legend(loc='best')
axarr[1].plot(df["value_s1"],label="Algorithmic Gain",color='blue')
plt.grid(True)
plt.xlabel('Trade Days')
plt.show()

The algorithmic payoff exceeds the potential payoff. Why? Because the investor often
earned additional income by selling high and buying low. The cumulative sum of these is
greater than the potential gain. Let's Back-Test the same implementation in the other
two strategies.

2) EMA10 > EMA30 (s2)

In [15]:
df["EMA30"] = ta.trend.ema_indicator(df["price"],30,fillna=True)

In [16]:
fig1 = plt.figure(figsize=(14,8))
plt.plot(df["price"],label="Price",color='blue')
plt.plot(df["EMA10"],label="EMA10",color='red',linestyle='--')
plt.plot(df["EMA30"],label="EMA30",color='green',linestyle='--')
plt.legend(loc='best')
plt.title('TESLA Strategy Two EMA10>EMA30')
plt.show()
In [17]:
df["buy_s2"] = np.where((df["EMA10"] > df["EMA30"]), 1, 0)
df["sell_s2"] = np.where((df["EMA10"] < df["EMA30"]), 1, 0)
df["buy_s2_ind"] = np.where((df["buy_s2"] > df["buy_s2"].shift(1)),1, 0)
df["sell_s2_ind"] = np.where((df["sell_s2"] > df["sell_s2"].shift(1)),1, 0)

In [18]:
df["date"] = df.index

fig1 = plt.figure(figsize=(14,8))
plt.plot(df["price"],label="Price",color='blue')
plt.plot(df["EMA10"],label="EMA10",color='red',linestyle='--')
plt.plot(df["EMA30"],label="EMA30",color='green',linestyle='--')
plt.scatter(df.loc[df["buy_s2_ind"] == 1].index,
df["price"][df["buy_s2_ind"] == 1], color='green', marker='^', s=100, la
plt.scatter(df.loc[df["sell_s2_ind"] == 1].index,
df["price"][df["sell_s2_ind"] == 1], color='red', marker='v', s=100, lab

plt.xlabel('Trade Days')
plt.legend(loc='best')
plt.title('TESLA Strategy Two EMA10>EMA30 Buy-Sell Signals')
plt.show()
Trading Gain according to Strategy Two

In [19]:
df["value_s2"] = 1000*(1+(np.where(df["buy_s2"]==1,
0.95*df["return_pct"],0)).cumsum())

Back-Test Report (s2)


In [20]:
print("************* Descriptive Statistics *************")
print("Period",len(df),"days")
print("Highest Daily Loss ",100*round(df["return_pct"].min(),2),"%")
print("Highest Daily Return ",100*round(df["return_pct"].max(),2),"%")
print("Standard Deviation of Return ",100*round(df["return_pct"].std(),2),"%")
print("Total Potential Return ",100*(round(sum(np.where((df["return_pct"]>0),df["ret
print("Total Potential Loss ",100*(round(sum(np.where((df["return_pct"]<0),df["retur
print("Net Return ",100*df["return_pct"].sum().round(2),"%")

print("************* MODEL PERFORMANCE *************")

print("Return Captured by the Model ",100*sum(np.where((df["buy_s2"]==1),df["return_


print("Loss Maintained by the Model ",100*sum(np.where((df["sell_s2"]==1),df["return
print("**************************************************")

************* Descriptive Statistics *************


Period 251 days
Highest Daily Loss -12.0 %
Highest Daily Return 20.0 %
Standard Deviation of Return 3.0 %
Total Potential Return 336.0 %
Total Potential Loss -283.0 %
Net Return 53.0 %
************* MODEL PERFORMANCE *************
Return Captured by the Model 61.0 %
Loss Maintained by the Model -8.0 %
**************************************************

In [21]:
print("************* REPORT *************")
print("The end-of-period price of the stock, which was $",df["price"][0].round(2),
"at the beginning of the period, became $",df["price"][-1].round(2),"with %",
(100*(df["price"][-1]-df["price"][0])/df["price"][0]).round(2),"change","The mod
100*(sum(np.where((df["buy_s2"]==1),df["return_pct"],0))/sum(np.where((df["retur
df["return_pct"]
"of the total positive return.The investment of $1000 at the beginning of the pe
df["value_s2"][-1].round(2),
"on the first",len(df),"days.")

************* REPORT *************


The end-of-period price of the stock, which was $ 243.26 at the beginning of the per
iod, became $ 356.78 with % 46.67 change The model captured % 18.0 of the total posi
tive return.The investment of $1000 at the beginning of the period became $ 1581.24
on the first 251 days.

In [22]:
f,axarr = plt.subplots(2,sharex=False,figsize=(14,10))
f.suptitle('TESLA Strategy Two(EMA10>EMA30)', fontsize=20)
axarr[0].plot(df["price"],label="Price",color='blue')
axarr[0].plot(df["EMA10"],label="EMA10",color='red',linestyle='--')
axarr[0].plot(df["EMA30"],label="EMA30",color='green',linestyle='--')
axarr[0].scatter(df.loc[df["buy_s2_ind"] == 1].index,
df.loc[df["buy_s2_ind"] == 1,"price"].values, color='green', marker=
axarr[0].scatter(df.loc[df["sell_s2_ind"] == 1].index,
df.loc[df["sell_s2_ind"] == 1,"price"].values, color='red', marker='

axarr[0].legend(loc='best')
axarr[1].plot(df["value_s2"],label="Algorithmic Gain",color='blue')
plt.grid(True)
plt.xlabel('Trade Days')
plt.show()
3) MACD > MACDS (s3)

In [23]:
df["MACD"] = ta.trend.macd(df["price"],fillna=True,window_fast=12,window_slow=26)
df["MACD_signal"] = ta.trend.macd_signal(df["price"],
window_fast=12,window_slow=26,window_sign=9,
fillna=True)

In [24]:
df["buy_s3"] = np.where((df["MACD"] > df["MACD_signal"]), 1, 0)
df["sell_s3"] = np.where((df["MACD"] < df["MACD_signal"]), 1, 0)
df["buy_s3_ind"] = np.where((df["buy_s3"] > df["buy_s3"].shift(1)),1, 0)
df["sell_s3_ind"] = np.where((df["sell_s3"] > df["sell_s3"].shift(1)),1, 0)

In [25]:
f,axarr = plt.subplots(2,sharex=False,figsize=(14,10))
f.suptitle('TESLA Strategy Three(MACD > MACDS)', fontsize=20)
axarr[0].plot(df["price"],label="Price",color='blue')
axarr[0].legend(loc='best')
axarr[0].grid(True)
axarr[1].plot(df["MACD"],label="MACD",color='red',linestyle='--')
axarr[1].plot(df["MACD_signal"],label="MACD Signal",color='green',linestyle='--')

axarr[1].legend(loc='best')
plt.xlabel('Trade Days')
plt.show()
Trading Gain according to Strategy Three

In [26]:
df["value_s3"] = 1000*(1+(np.where(df["buy_s3"]==1,
0.95*df["return_pct"],0)).cumsum())

Back-Test Report (s3)


In [27]:
print("************* Descriptive Statistics *************")
print("Period",len(df),"days")
print("Highest Daily Loss ",100*round(df["return_pct"].min(),2),"%")
print("Highest Daily Return ",100*round(df["return_pct"].max(),2),"%")
print("Standard Deviation of Return ",100*round(df["return_pct"].std(),2),"%")
print("Total Potential Return ",100*(round(sum(np.where((df["return_pct"]>0),df["ret
print("Total Potential Loss ",100*(round(sum(np.where((df["return_pct"]<0),df["retur
print("Net Return ",100*df["return_pct"].sum().round(2),"%")

print("************* MODEL PERFORMANCE *************")

print("Return Captured by the Model ",100*sum(np.where((df["buy_s3"]==1),df["return_


print("Loss Maintained by the Model ",100*sum(np.where((df["sell_s3"]==1),df["return
print("**************************************************")
************* Descriptive Statistics *************
Period 251 days
Highest Daily Loss -12.0 %
Highest Daily Return 20.0 %
Standard Deviation of Return 3.0 %
Total Potential Return 336.0 %
Total Potential Loss -283.0 %
Net Return 53.0 %
************* MODEL PERFORMANCE *************
Return Captured by the Model 72.0 %
Loss Maintained by the Model -19.0 %
**************************************************

In [28]:
print("************* REPORT *************")
print("The end-of-period price of the stock, which was $",df["price"][0].round(2),
"at the beginning of the period, became $",df["price"][-1].round(2),"with %",
(100*(df["price"][-1]-df["price"][0])/df["price"][0]).round(2),"change","The mod
100*(sum(np.where((df["buy_s3"]==1),df["return_pct"],0))/sum(np.where((df["retur
df["return_pct"]
"of the total positive return.The investment of $1000 at the beginning of the pe
df["value_s3"][-1].round(2),
"on the first",len(df),"days.")

************* REPORT *************


The end-of-period price of the stock, which was $ 243.26 at the beginning of the per
iod, became $ 356.78 with % 46.67 change The model captured % 21.0 of the total posi
tive return.The investment of $1000 at the beginning of the period became $ 1683.92
on the first 251 days.

In [29]:
f,axarr = plt.subplots(3,sharex=False,figsize=(10,8))
f.suptitle('TESLA Strategy Three(MACD > MACDS)', fontsize=20)
axarr[0].plot(df["price"],label="Price",color='blue')
axarr[0].scatter(df.loc[df["buy_s3_ind"] == 1].index,
df.loc[df["buy_s3_ind"] == 1,"price"].values, color='green', marker=
axarr[0].scatter(df.loc[df["sell_s3_ind"] == 1].index,
df.loc[df["sell_s3_ind"] == 1,"price"].values, color='red', marker='
axarr[0].legend(loc='best')
axarr[1].plot(df["MACD"],label="MACD",color='red',linestyle='--')
axarr[1].plot(df["MACD_signal"],label="MACD Signal",color='green',linestyle='--')

axarr[1].legend(loc='best')
axarr[2].plot(df["value_s3"],label="Algorithmic Gain",color='blue')
plt.grid(True)
plt.xlabel('Trade Days')
plt.show()
Back-Testing is actually a method that we will use in choosing the current strategy. It is
natural for price movements to deviate from the direction the strategy is pointing. Since
the calculated indicators are created from the movements of the stock, they are very
dependent on the price formation. There are many factors that affect the price.
Although it recommends the Back-Test Price>EMA10 strategy for 2021, it may not
perform the same for 2022 or the first days of 2023. The only strategy can be
misleading.

So how can there be a solution to this situation? Answer: Regression Toward The Mean

Let's say we create a stronger signal by combining the buy and sell signals pointed
out by the three indicators. Buy when two of the three indicators give a buy signal,
and sell when it gives a sell signal.

In [30]:
df["BUY"] = np.where((df["buy_s1"]+df["buy_s2"]+df["buy_s3"])>=2,1,0)
df["SELL"] = np.where((df["sell_s1"]+df["sell_s2"]+df["sell_s3"])>=2,1,0)
df["BUY_ind"] = np.where((df["BUY"] > df["BUY"].shift(1)),1, 0)
df["SELL_ind"] = np.where((df["SELL"] > df["SELL"].shift(1)),1, 0)
In [31]:
df["VALUE"] = 1000*(1+(np.where(df["BUY"]==1,
0.95*df["return_pct"],0)).cumsum())

In [32]:
print("************* Descriptive Statistics *************")
print("Period",len(df),"days")
print("Highest Daily Loss ",100*round(df["return_pct"].min(),2),"%")
print("Highest Daily Return ",100*round(df["return_pct"].max(),2),"%")
print("Standard Deviation of Return ",100*round(df["return_pct"].std(),2),"%")
print("Total Potential Return ",100*(round(sum(np.where((df["return_pct"]>0),df["ret
print("Total Potential Loss ",100*(round(sum(np.where((df["return_pct"]<0),df["retur
print("Net Return ",100*df["return_pct"].sum().round(2),"%")

print("************* MODEL PERFORMANCE *************")

print("Return Captured by the Model ",100*sum(np.where((df["BUY"]==1),df["return_pct


print("Loss Maintained by the Model ",100*sum(np.where((df["SELL"]==1),df["return_pc
print("**************************************************")

print("************* REPORT *************")


print("The end-of-period price of the stock, which was $",df["price"][0].round(2),
"at the beginning of the period, became $",df["price"][-1].round(2),"with %",
(100*(df["price"][-1]-df["price"][0])/df["price"][0]).round(2),"change","The mod
100*(sum(np.where((df["BUY"]==1),df["return_pct"],0))/sum(np.where((df["return_p
df["return_pct"]
"of the total positive return.The investment of $1000 at the beginning of the pe
df["VALUE"][-1].round(2),
"on the first",len(df),"days.")

************* Descriptive Statistics *************


Period 251 days
Highest Daily Loss -12.0 %
Highest Daily Return 20.0 %
Standard Deviation of Return 3.0 %
Total Potential Return 336.0 %
Total Potential Loss -283.0 %
Net Return 53.0 %
************* MODEL PERFORMANCE *************
Return Captured by the Model 135.0 %
Loss Maintained by the Model -82.0 %
**************************************************
************* REPORT *************
The end-of-period price of the stock, which was $ 243.26 at the beginning of the per
iod, became $ 356.78 with % 46.67 change The model captured % 40.0 of the total posi
tive return.The investment of $1000 at the beginning of the period became $ 2283.04
on the first 251 days.

In [33]:
f,axarr = plt.subplots(2,sharex=False,figsize=(10,8))
f.suptitle('TESLA Strategy Regression Toward The Mean', fontsize=20)
axarr[0].plot(df["price"],label="Price",color='blue')
axarr[0].scatter(df.loc[df["BUY_ind"] == 1].index,
df.loc[df["BUY_ind"] == 1,"price"].values, color='green', marker='^'

axarr[0].scatter(df.loc[df["SELL_ind"] == 1].index,
df.loc[df["SELL_ind"] == 1,"price"].values, color='red', marker='v',
axarr[0].legend(loc='best')
axarr[1].plot(df["VALUE"],label="Algorithmic Gain",color='blue')
plt.grid(True)
plt.xlabel('Trade Days')
plt.show()

Algorithmic Trading Model and Decision


Tree for Buy-Sell Signals
Let's create a class variable for positive and negative daily earnings. This variable will be
our target variable. 0 will represent negative returns and 1 will represent positive
returns.

In [34]:
df["target_cls"] = np.where(df["return"]>0,1,0)

In [35]:
cnt = pd.value_counts(df["target_cls"], sort = True)
cnt.plot(kind = 'bar', color=["red","green"])
plt.title("Target Class Distribution")
plt.xlabel("Target Class")
plt.ylabel("Frequency")
plt.show()
In [36]:
print("Negative Return $",sum(df["return"]<0))
print("Positive Return $",sum(df["return"]>0))

Negative Return $ 115


Positive Return $ 135

In [37]:
df["p_ema10"] = np.where(df["price"]>df["EMA10"],1,0)
df["ema10_ema30"] = np.where(df["EMA10"]>df["EMA30"],1,0)
df["macd_macds"] = np.where(df["MACD"]>df["MACD_signal"],1,0)

In [38]:
df.dropna(inplace=True)

In [39]:
predictors = ["p_ema10","ema10_ema30","macd_macds"]

In [40]:
X = df[predictors]
y = df["target_cls"]

In [41]:
cv = GapWalkForward(n_splits=5, gap_size=0, test_size=50)

dt = DecisionTreeClassifier()

param_grid = {"max_depth": np.arange(3, 30),


"min_samples_split": range(10, 500,20),
"criterion": ["gini", "entropy"]}
gs GridSearchCV(dt param grid cv cv n jobs 1 verbose 1) fit(X y)
Fitting 5 folds for each of 1350 candidates, totalling 6750 fits

In [42]:
gs.best_params_

Out[42]: {'criterion': 'gini', 'max_depth': 3, 'min_samples_split': 10}

In [43]:
final_clf = gs.best_estimator_

In [44]:
scores = cross_val_score(estimator=final_clf, X=X, y=y, cv=5)

print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

Accuracy: 0.67 (+/- 0.10)

In [45]:
features = predictors
classes = {0:"SELL",1:"BUY"}

dot_data = StringIO()
export_graphviz(final_clf, out_file=dot_data,
filled=True, rounded=True,
special_characters=True,feature_names = features,class_names=classes
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_png('tesla.png')

display(Image('tesla.png'))

Model Performance

In [46]:
category = ['Buy', 'Sell']

cm = ConfusionMatrix(final_clf, classes=category, percent=True)

cm.fit(X, y)
cm.score(X, y)
cm.poof()
Out[46]: <AxesSubplot: title={'center': 'DecisionTreeClassifier Confusion Matrix'}, xlabel='P
redicted Class', ylabel='True Class'>

In [47]:
cr = ClassificationReport(final_clf, classes=category)

cr.fit(X, y)
cr.score(X, y)
cr.poof()
Out[47]: <AxesSubplot: title={'center': 'DecisionTreeClassifier Classification Report'}>

The model captures the sell signals better. This was due to the fact that there was some
difference between the number of buy and sell signals in the data set.

In [48]:
rc = ROCAUC(final_clf, classes=category)

rc.fit(X, y)
rc.score(X, y)
rc.poof()
Out[48]: <AxesSubplot: title={'center': 'ROC Curves for DecisionTreeClassifier'}, xlabel='Fal
se Positive Rate', ylabel='True Positive Rate'>

Retrospective Prediction

In [49]:
df["prediction_signal"] = final_clf.predict(X)

In [50]:
print("Accuracy of the model is ",accuracy_score(df["target_cls"],df["prediction_sig

Accuracy of the model is 0.692

Back-Test for Decision Tree


In [51]:
df["buy_dt"] = np.where((df["prediction_signal"]==1) &
(df["prediction_signal"].shift(1)==0),1,0)

df["sell_dt"] = np.where((df["prediction_signal"]==0) &


(df["prediction_signal"].shift(1)==1),1,0)

df["buy_dt_ind"] = np.where((df["buy_dt"] > df["buy_dt"].shift(1)),1,0)

df["sell_dt_ind"] = np.where((df["sell_dt"] > df["sell_dt"].shift(1)),1,0)


Buy-Sell Signals
In [52]:
fig1 = plt.figure(figsize=(14,8))
plt.plot(df["price"],label="Price",color='blue')
plt.scatter(df.loc[df["buy_dt_ind"] == 1].index,
df.loc[df["buy_dt_ind"] == 1,"price"].values, color='green', marker='^',

plt.scatter(df.loc[df["sell_dt_ind"] == 1].index,
df.loc[df["sell_dt_ind"] == 1,"price"].values, color='red', marker='v',

plt.legend(loc='best')
plt.grid(True)
plt.xlabel('Trade Days')
plt.title('TESLA \n Decision Tree Classifier \n Buy-Sell Signals', fontsize=20)
plt.show()

The profit of the trader who trades in the period related to the Decision Tree Model:

In [53]:
df["value_dt"] = 1000*(1 + (np.where(df["buy_dt"]==1,
0.95*df["return_pct"],0)).cumsum())

In [54]:
print("************* Descriptive Statistics *************")
print("Period",len(df),"days")
print("Highest Daily Loss ",100*round(df["return_pct"].min(),2),"%")
print("Highest Daily Return ",100*round(df["return_pct"].max(),2),"%")
print("Standard Deviation of Return ",100*round(df["return_pct"].std(),2),"%")
print("Total Potential Return ",100*(round(sum(np.where((df["return_pct"]>0),df["ret
print("Total Potential Loss ",100*(round(sum(np.where((df["return_pct"]<0),df["retur
print("Net Return ",100*df["return_pct"].sum().round(2),"%")

print("************* MODEL PERFORMANCE *************")

print("Return Captured by the Model ",100*sum(np.where((df["buy_dt"]==1),df["return_


print("Loss Maintained by the Model ",100*sum(np.where((df["sell_dt"]==1),df["return
print("**************************************************")

print("************* REPORT *************")


print("The end-of-period price of the stock, which was $",df["price"][0].round(2),
"at the beginning of the period, became $",df["price"][-1].round(2),"with %",
(100*(df["price"][-1]-df["price"][0])/df["price"][0]).round(2),"change","The mod
100*(sum(np.where((df["buy_dt"]==1),df["return_pct"],0))/sum(np.where((df["retur
df["return_pct"]
"of the total positive return.The investment of $1000 at the beginning of the pe
df["value_dt"][-1].round(2),
"on the first",len(df),"days.")
************* Descriptive Statistics *************
Period 250 days
Highest Daily Loss -12.0 %
Highest Daily Return 20.0 %
Standard Deviation of Return 3.0 %
Total Potential Return 336.0 %
Total Potential Loss -283.0 %
Net Return 53.0 %
************* MODEL PERFORMANCE *************
Return Captured by the Model 83.0 %
Loss Maintained by the Model -66.0 %
**************************************************
************* REPORT *************
The end-of-period price of the stock, which was $ 245.04 at the beginning of the per
iod, became $ 356.78 with % 45.6 change The model captured % 25.0 of the total posit
ive return.The investment of $1000 at the beginning of the period became $ 1789.78 o
n the first 250 days.

In [55]:
f,axarr = plt.subplots(1,2,figsize=(16,10))
f.suptitle('Algorithmic Trading Gain', fontsize=20)
axarr[0].plot(df["price"],label="Price",color='blue')
axarr[0].scatter(df.loc[df["buy_dt_ind"] == 1].index,
df.loc[df["buy_dt_ind"] == 1,"price"].values, color='green', marker='^',

axarr[0].scatter(df.loc[df["sell_dt_ind"] == 1].index,
df.loc[df["sell_dt_ind"] == 1,"price"].values, color='red', marker='v',

axarr[0].legend(loc='best')
axarr[0].grid(True)
axarr[0].set_xlabel('Trade Days')

axarr[1].plot(df["value_dt"],label="Value",color='blue')
axarr[1].legend(loc='best')
axarr[1].grid(True)
axarr[1].set_title('Algorithmic Trading Gain')
axarr[1].set_xlabel('Trade Days')
plt.show()
In [56]:
viz = FeatureImportances(final_clf, classes=category, relative=False)

viz.fit(X, y)
viz.poof()
Out[56]: <AxesSubplot: title={'center': 'Feature Importances of 3 Features using DecisionTree
Classifier'}, xlabel='feature importance'>

Clearly, two of the three strategies seem to be important, while the strategy
EMA10>EMA30 has a negligible effect. The MACD>MACDS strategy does not seem to
have a significant impact on this model either. When Price>EMA10 gives a buy/sell
signal, the other two strategies don't matter much from a decision point of view.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy