Back Testing For Algorithmic Trading Strategies
Back Testing For Algorithmic Trading Strategies
Strategies
How to choose a strategy? It would be a good start to test alternative strategies
retrospectively, knowing which of these strategies worked best in the past and produced
more accurate signals. This is called Back-Test.
In [1]:
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import yahoofinancials as yf
from yahoo_fin.stock_info import *
import requests_html
import requests
import ftplib
import ta as ta
import io
Let's get the historical time series data of the stock by specifying the start and end dates
In [2]:
history = yf.YahooFinancials('TSLA').get_historical_price_data('2021-01-01', '2021-1
df = pd.DataFrame(history['TSLA']['prices'])
df.head()
In [3]:
df.drop('date', axis=1, inplace=True)
In [4]:
df.index = pd.to_datetime(df['formatted_date'])
df.drop('formatted_date', axis=1, inplace=True)
In [5]:
df.rename(columns={'adjclose': 'price'}, inplace=True)
Let's change the price change to 'return' (daily return) and assign the percentage change
in the price to 'return_pct'.
In [6]:
df["return"] = df["price"].diff()
df["return_pct"] = df["price"].pct_change()
In [7]:
f, axarr = plt.subplots(2,sharex=False,figsize=(12,7))
f.suptitle('TESLA Price and Return', fontsize=20)
axarr[0].plot(df['price'], color='blue')
axarr[0].grid(True)
axarr[1].plot(df['return'], color='red')
axarr[1].grid(True)
f.legend(['Price', 'Return'], loc='upper left')
plt.show()
Three Alternative Strategies
1) Price > EMA10 : If the price goes above the 10-day exponential moving average,
it is considered a buy signal.
2) EMA10 > EMA30 : When the 10-day exponential moving average rises above the
30-day exponential moving average, it can be considered as a buy signal.
3) MACD > MACDS : The MACD indicator is the difference between the exponential
values of the 26 and 12-day moving average. The 9-day exponential moving average
of the MACD is called the MACD Signal (MACDS). If the MACD goes above the
MACDS, it is considered a buy signal.
In [8]:
df["EMA10"] = ta.trend.ema_indicator(df["price"],10,fillna=True)
In [9]:
df["buy_s1"] = np.where(df["price"] > df["EMA10"], 1, 0)
df["sell_s1"] = np.where(df["price"] < df["EMA10"], 1, 0)
df["buy_s1_ind"] = np.where((df["buy_s1"] > df["buy_s1"].shift(1)),1, 0)
df["sell_s1_ind"] = np.where((df["sell_s1"] > df["sell_s1"].shift(1)),1, 0)
In [10]:
df["date"] = df.index
fig1 = plt.figure(figsize=(14,8))
plt.plot(df["price"],label="Price",color='blue')
plt.plot(df["EMA10"],label="EMA10",color='red',linestyle='--')
plt.scatter(df.loc[df["buy_s1_ind"] == 1].index,
df["price"][df["buy_s1_ind"] == 1], color='green', marker='^', s=100, la
plt.scatter(df.loc[df["sell_s1_ind"] == 1].index,
df["price"][df["sell_s1_ind"] == 1], color='red', marker='v', s=100, lab
plt.xlabel('Trade Days')
plt.legend(loc='best')
plt.title('TESLA Strategy One(Price>EMA10) Buy-Sell Signals')
plt.show()
According to the Strategy One, the profit of the trader who trades with $1000 in the
relevant period:
Assuming we allocate 5% of the return as transaction costs, then we will consider 95% of the
percentage return.
In [11]:
df["value_s1"] = 1000*(1+(np.where(df["buy_s1"]==1,
0.95*df["return_pct"],0)).cumsum())
In [13]:
print("************* REPORT *************")
print("The end-of-period price of the stock, which was $",df["price"][0].round(2),
"at the beginning of the period, became $",df["price"][-1].round(2),"with %",
(100*(df["price"][-1]-df["price"][0])/df["price"][0]).round(2),"change","The mod
100*(sum(np.where((df["buy_s1"]==1),df["return_pct"],0))/sum(np.where((df["retur
df["return_pct"]
"of the total positive return.The investment of $1000 at the beginning of the pe
df["value_s1"][-1].round(2),
"on the first",len(df),"days.")
The sum of the percentage gains on the days when the stock increased was 336%, and
the sum of the percentage loss on the days when the stock decreased was 283%. For the
price > EMA10 strategy, it correctly saw 59% of the potential positive gain. On the days
when the price correctly indicated that the price would increase and gave a buy signal,
the return was 200%. However, someone who bought this stock at the beginning of the
period and sold it at the end of the period made a profit of 46%, but according to the
signals, the one who traded made a profit of 189.67% for $ 1000.
In [14]:
df["date"] = df.index
f,axarr = plt.subplots(2,sharex=False,figsize=(14,10))
f.suptitle('TESLA Strategy One(Price>EMA10)', fontsize=20)
axarr[0].plot(df["price"],label="Price",color='blue')
axarr[0].plot(df["EMA10"],label="EMA10",color='red',linestyle='--')
axarr[0].scatter(df.loc[df["buy_s1_ind"] == 1].index,
df.loc[df["buy_s1_ind"] == 1,"price"].values, color='green', marker=
axarr[0].scatter(df.loc[df["sell_s1_ind"] == 1].index,
df.loc[df["sell_s1_ind"] == 1,"price"].values, color='red', marker='
axarr[0].legend(loc='best')
axarr[1].plot(df["value_s1"],label="Algorithmic Gain",color='blue')
plt.grid(True)
plt.xlabel('Trade Days')
plt.show()
The algorithmic payoff exceeds the potential payoff. Why? Because the investor often
earned additional income by selling high and buying low. The cumulative sum of these is
greater than the potential gain. Let's Back-Test the same implementation in the other
two strategies.
In [15]:
df["EMA30"] = ta.trend.ema_indicator(df["price"],30,fillna=True)
In [16]:
fig1 = plt.figure(figsize=(14,8))
plt.plot(df["price"],label="Price",color='blue')
plt.plot(df["EMA10"],label="EMA10",color='red',linestyle='--')
plt.plot(df["EMA30"],label="EMA30",color='green',linestyle='--')
plt.legend(loc='best')
plt.title('TESLA Strategy Two EMA10>EMA30')
plt.show()
In [17]:
df["buy_s2"] = np.where((df["EMA10"] > df["EMA30"]), 1, 0)
df["sell_s2"] = np.where((df["EMA10"] < df["EMA30"]), 1, 0)
df["buy_s2_ind"] = np.where((df["buy_s2"] > df["buy_s2"].shift(1)),1, 0)
df["sell_s2_ind"] = np.where((df["sell_s2"] > df["sell_s2"].shift(1)),1, 0)
In [18]:
df["date"] = df.index
fig1 = plt.figure(figsize=(14,8))
plt.plot(df["price"],label="Price",color='blue')
plt.plot(df["EMA10"],label="EMA10",color='red',linestyle='--')
plt.plot(df["EMA30"],label="EMA30",color='green',linestyle='--')
plt.scatter(df.loc[df["buy_s2_ind"] == 1].index,
df["price"][df["buy_s2_ind"] == 1], color='green', marker='^', s=100, la
plt.scatter(df.loc[df["sell_s2_ind"] == 1].index,
df["price"][df["sell_s2_ind"] == 1], color='red', marker='v', s=100, lab
plt.xlabel('Trade Days')
plt.legend(loc='best')
plt.title('TESLA Strategy Two EMA10>EMA30 Buy-Sell Signals')
plt.show()
Trading Gain according to Strategy Two
In [19]:
df["value_s2"] = 1000*(1+(np.where(df["buy_s2"]==1,
0.95*df["return_pct"],0)).cumsum())
In [21]:
print("************* REPORT *************")
print("The end-of-period price of the stock, which was $",df["price"][0].round(2),
"at the beginning of the period, became $",df["price"][-1].round(2),"with %",
(100*(df["price"][-1]-df["price"][0])/df["price"][0]).round(2),"change","The mod
100*(sum(np.where((df["buy_s2"]==1),df["return_pct"],0))/sum(np.where((df["retur
df["return_pct"]
"of the total positive return.The investment of $1000 at the beginning of the pe
df["value_s2"][-1].round(2),
"on the first",len(df),"days.")
In [22]:
f,axarr = plt.subplots(2,sharex=False,figsize=(14,10))
f.suptitle('TESLA Strategy Two(EMA10>EMA30)', fontsize=20)
axarr[0].plot(df["price"],label="Price",color='blue')
axarr[0].plot(df["EMA10"],label="EMA10",color='red',linestyle='--')
axarr[0].plot(df["EMA30"],label="EMA30",color='green',linestyle='--')
axarr[0].scatter(df.loc[df["buy_s2_ind"] == 1].index,
df.loc[df["buy_s2_ind"] == 1,"price"].values, color='green', marker=
axarr[0].scatter(df.loc[df["sell_s2_ind"] == 1].index,
df.loc[df["sell_s2_ind"] == 1,"price"].values, color='red', marker='
axarr[0].legend(loc='best')
axarr[1].plot(df["value_s2"],label="Algorithmic Gain",color='blue')
plt.grid(True)
plt.xlabel('Trade Days')
plt.show()
3) MACD > MACDS (s3)
In [23]:
df["MACD"] = ta.trend.macd(df["price"],fillna=True,window_fast=12,window_slow=26)
df["MACD_signal"] = ta.trend.macd_signal(df["price"],
window_fast=12,window_slow=26,window_sign=9,
fillna=True)
In [24]:
df["buy_s3"] = np.where((df["MACD"] > df["MACD_signal"]), 1, 0)
df["sell_s3"] = np.where((df["MACD"] < df["MACD_signal"]), 1, 0)
df["buy_s3_ind"] = np.where((df["buy_s3"] > df["buy_s3"].shift(1)),1, 0)
df["sell_s3_ind"] = np.where((df["sell_s3"] > df["sell_s3"].shift(1)),1, 0)
In [25]:
f,axarr = plt.subplots(2,sharex=False,figsize=(14,10))
f.suptitle('TESLA Strategy Three(MACD > MACDS)', fontsize=20)
axarr[0].plot(df["price"],label="Price",color='blue')
axarr[0].legend(loc='best')
axarr[0].grid(True)
axarr[1].plot(df["MACD"],label="MACD",color='red',linestyle='--')
axarr[1].plot(df["MACD_signal"],label="MACD Signal",color='green',linestyle='--')
axarr[1].legend(loc='best')
plt.xlabel('Trade Days')
plt.show()
Trading Gain according to Strategy Three
In [26]:
df["value_s3"] = 1000*(1+(np.where(df["buy_s3"]==1,
0.95*df["return_pct"],0)).cumsum())
In [28]:
print("************* REPORT *************")
print("The end-of-period price of the stock, which was $",df["price"][0].round(2),
"at the beginning of the period, became $",df["price"][-1].round(2),"with %",
(100*(df["price"][-1]-df["price"][0])/df["price"][0]).round(2),"change","The mod
100*(sum(np.where((df["buy_s3"]==1),df["return_pct"],0))/sum(np.where((df["retur
df["return_pct"]
"of the total positive return.The investment of $1000 at the beginning of the pe
df["value_s3"][-1].round(2),
"on the first",len(df),"days.")
In [29]:
f,axarr = plt.subplots(3,sharex=False,figsize=(10,8))
f.suptitle('TESLA Strategy Three(MACD > MACDS)', fontsize=20)
axarr[0].plot(df["price"],label="Price",color='blue')
axarr[0].scatter(df.loc[df["buy_s3_ind"] == 1].index,
df.loc[df["buy_s3_ind"] == 1,"price"].values, color='green', marker=
axarr[0].scatter(df.loc[df["sell_s3_ind"] == 1].index,
df.loc[df["sell_s3_ind"] == 1,"price"].values, color='red', marker='
axarr[0].legend(loc='best')
axarr[1].plot(df["MACD"],label="MACD",color='red',linestyle='--')
axarr[1].plot(df["MACD_signal"],label="MACD Signal",color='green',linestyle='--')
axarr[1].legend(loc='best')
axarr[2].plot(df["value_s3"],label="Algorithmic Gain",color='blue')
plt.grid(True)
plt.xlabel('Trade Days')
plt.show()
Back-Testing is actually a method that we will use in choosing the current strategy. It is
natural for price movements to deviate from the direction the strategy is pointing. Since
the calculated indicators are created from the movements of the stock, they are very
dependent on the price formation. There are many factors that affect the price.
Although it recommends the Back-Test Price>EMA10 strategy for 2021, it may not
perform the same for 2022 or the first days of 2023. The only strategy can be
misleading.
So how can there be a solution to this situation? Answer: Regression Toward The Mean
Let's say we create a stronger signal by combining the buy and sell signals pointed
out by the three indicators. Buy when two of the three indicators give a buy signal,
and sell when it gives a sell signal.
In [30]:
df["BUY"] = np.where((df["buy_s1"]+df["buy_s2"]+df["buy_s3"])>=2,1,0)
df["SELL"] = np.where((df["sell_s1"]+df["sell_s2"]+df["sell_s3"])>=2,1,0)
df["BUY_ind"] = np.where((df["BUY"] > df["BUY"].shift(1)),1, 0)
df["SELL_ind"] = np.where((df["SELL"] > df["SELL"].shift(1)),1, 0)
In [31]:
df["VALUE"] = 1000*(1+(np.where(df["BUY"]==1,
0.95*df["return_pct"],0)).cumsum())
In [32]:
print("************* Descriptive Statistics *************")
print("Period",len(df),"days")
print("Highest Daily Loss ",100*round(df["return_pct"].min(),2),"%")
print("Highest Daily Return ",100*round(df["return_pct"].max(),2),"%")
print("Standard Deviation of Return ",100*round(df["return_pct"].std(),2),"%")
print("Total Potential Return ",100*(round(sum(np.where((df["return_pct"]>0),df["ret
print("Total Potential Loss ",100*(round(sum(np.where((df["return_pct"]<0),df["retur
print("Net Return ",100*df["return_pct"].sum().round(2),"%")
In [33]:
f,axarr = plt.subplots(2,sharex=False,figsize=(10,8))
f.suptitle('TESLA Strategy Regression Toward The Mean', fontsize=20)
axarr[0].plot(df["price"],label="Price",color='blue')
axarr[0].scatter(df.loc[df["BUY_ind"] == 1].index,
df.loc[df["BUY_ind"] == 1,"price"].values, color='green', marker='^'
axarr[0].scatter(df.loc[df["SELL_ind"] == 1].index,
df.loc[df["SELL_ind"] == 1,"price"].values, color='red', marker='v',
axarr[0].legend(loc='best')
axarr[1].plot(df["VALUE"],label="Algorithmic Gain",color='blue')
plt.grid(True)
plt.xlabel('Trade Days')
plt.show()
In [34]:
df["target_cls"] = np.where(df["return"]>0,1,0)
In [35]:
cnt = pd.value_counts(df["target_cls"], sort = True)
cnt.plot(kind = 'bar', color=["red","green"])
plt.title("Target Class Distribution")
plt.xlabel("Target Class")
plt.ylabel("Frequency")
plt.show()
In [36]:
print("Negative Return $",sum(df["return"]<0))
print("Positive Return $",sum(df["return"]>0))
In [37]:
df["p_ema10"] = np.where(df["price"]>df["EMA10"],1,0)
df["ema10_ema30"] = np.where(df["EMA10"]>df["EMA30"],1,0)
df["macd_macds"] = np.where(df["MACD"]>df["MACD_signal"],1,0)
In [38]:
df.dropna(inplace=True)
In [39]:
predictors = ["p_ema10","ema10_ema30","macd_macds"]
In [40]:
X = df[predictors]
y = df["target_cls"]
In [41]:
cv = GapWalkForward(n_splits=5, gap_size=0, test_size=50)
dt = DecisionTreeClassifier()
In [42]:
gs.best_params_
In [43]:
final_clf = gs.best_estimator_
In [44]:
scores = cross_val_score(estimator=final_clf, X=X, y=y, cv=5)
In [45]:
features = predictors
classes = {0:"SELL",1:"BUY"}
dot_data = StringIO()
export_graphviz(final_clf, out_file=dot_data,
filled=True, rounded=True,
special_characters=True,feature_names = features,class_names=classes
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_png('tesla.png')
display(Image('tesla.png'))
Model Performance
In [46]:
category = ['Buy', 'Sell']
cm.fit(X, y)
cm.score(X, y)
cm.poof()
Out[46]: <AxesSubplot: title={'center': 'DecisionTreeClassifier Confusion Matrix'}, xlabel='P
redicted Class', ylabel='True Class'>
In [47]:
cr = ClassificationReport(final_clf, classes=category)
cr.fit(X, y)
cr.score(X, y)
cr.poof()
Out[47]: <AxesSubplot: title={'center': 'DecisionTreeClassifier Classification Report'}>
The model captures the sell signals better. This was due to the fact that there was some
difference between the number of buy and sell signals in the data set.
In [48]:
rc = ROCAUC(final_clf, classes=category)
rc.fit(X, y)
rc.score(X, y)
rc.poof()
Out[48]: <AxesSubplot: title={'center': 'ROC Curves for DecisionTreeClassifier'}, xlabel='Fal
se Positive Rate', ylabel='True Positive Rate'>
Retrospective Prediction
In [49]:
df["prediction_signal"] = final_clf.predict(X)
In [50]:
print("Accuracy of the model is ",accuracy_score(df["target_cls"],df["prediction_sig
plt.scatter(df.loc[df["sell_dt_ind"] == 1].index,
df.loc[df["sell_dt_ind"] == 1,"price"].values, color='red', marker='v',
plt.legend(loc='best')
plt.grid(True)
plt.xlabel('Trade Days')
plt.title('TESLA \n Decision Tree Classifier \n Buy-Sell Signals', fontsize=20)
plt.show()
The profit of the trader who trades in the period related to the Decision Tree Model:
In [53]:
df["value_dt"] = 1000*(1 + (np.where(df["buy_dt"]==1,
0.95*df["return_pct"],0)).cumsum())
In [54]:
print("************* Descriptive Statistics *************")
print("Period",len(df),"days")
print("Highest Daily Loss ",100*round(df["return_pct"].min(),2),"%")
print("Highest Daily Return ",100*round(df["return_pct"].max(),2),"%")
print("Standard Deviation of Return ",100*round(df["return_pct"].std(),2),"%")
print("Total Potential Return ",100*(round(sum(np.where((df["return_pct"]>0),df["ret
print("Total Potential Loss ",100*(round(sum(np.where((df["return_pct"]<0),df["retur
print("Net Return ",100*df["return_pct"].sum().round(2),"%")
In [55]:
f,axarr = plt.subplots(1,2,figsize=(16,10))
f.suptitle('Algorithmic Trading Gain', fontsize=20)
axarr[0].plot(df["price"],label="Price",color='blue')
axarr[0].scatter(df.loc[df["buy_dt_ind"] == 1].index,
df.loc[df["buy_dt_ind"] == 1,"price"].values, color='green', marker='^',
axarr[0].scatter(df.loc[df["sell_dt_ind"] == 1].index,
df.loc[df["sell_dt_ind"] == 1,"price"].values, color='red', marker='v',
axarr[0].legend(loc='best')
axarr[0].grid(True)
axarr[0].set_xlabel('Trade Days')
axarr[1].plot(df["value_dt"],label="Value",color='blue')
axarr[1].legend(loc='best')
axarr[1].grid(True)
axarr[1].set_title('Algorithmic Trading Gain')
axarr[1].set_xlabel('Trade Days')
plt.show()
In [56]:
viz = FeatureImportances(final_clf, classes=category, relative=False)
viz.fit(X, y)
viz.poof()
Out[56]: <AxesSubplot: title={'center': 'Feature Importances of 3 Features using DecisionTree
Classifier'}, xlabel='feature importance'>
Clearly, two of the three strategies seem to be important, while the strategy
EMA10>EMA30 has a negligible effect. The MACD>MACDS strategy does not seem to
have a significant impact on this model either. When Price>EMA10 gives a buy/sell
signal, the other two strategies don't matter much from a decision point of view.