Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Trading Using Decision Tree Classifier Part -1

By Varun Divakar

The strategy in this blog will cover no normal technical indicators, but some of my own creation. Also, we will see the difference between strategy Performance on test and train data along with respect to the changes in the size of the train data and the prediction length.

Unlike in my previous blogs, in here I will use a dynamic time frame to fetch data for the past few days. But before we begin, let us Import the necessary libraries.

from pandas_datareader import data as dt

import fix_yahoo_finance

import datetime

import numpy as np

today=datetime.datetime.now().date()

past_date=today- datetime.timedelta(days=365)

stock=’GLD’

d=  dt.get_data_yahoo(stock,end=str(today),start=str(past_date))

I have defined the starting date for the data to be pulled as the date 365 days/ 1 year in the past. You can use a static date format, but this will help you fetch the latest data whenever you run this code.

Trading Using Decision Tree Classifier Click To Tweet

Next, I created a few indicators and changed some of the names of columns to make it easy to refer to them in the code later.

d.rename(columns={‘Adj Close’:’Adj_Close’}, inplace=True)

d[‘dif’] = (d.Close-d.Close.shift(1))/(d.High.shift(1)-d.Low.shift(1))

d[‘sec_dif’]=(d.dif-d.dif.shift(1))/(d.High.shift(1)-d.Low.shift(1))

d[‘diff_v’] = (d.High-d.Low)/(d.High.shift(1)-d.Low.shift(1))

I will briefly explain the indicators that I used. I have created a column called ‘dif’. This measures the change in the close price with respect to the change in yesterday’s range. We are trying to check if there is any correlation between today’s close with respect to the price volatility yesterday. After this, I have created a column called ‘sec_dif’, this is the second order difference of change in close prices. Here we measure the change in dif with respect to the change in yesterday’s range.

After this, I calculated the returns of the market. Please note that these returns are forward-looking. As the prediction is generated at the end of the day or just before the close of the day, we can just multiply the predicted trend with this returns and measure the strategy performance easily.

We also use another indicator called the diff_v which is used to measure the change in the range today compared to that of a day before.

d[‘Return’] = (d.Adj_Close.shift(-1)-d.Adj_Close)/d.Adj_Close

d[‘Signal’]= np.where(d.Return >0,1,-1)

d[‘yesterday_return’]=d.Return.shift(1)

d=d.dropna()

After creating the indicators, we calculate the returns and signal columns needed for measure the strategy performance and the training the algorithm respectively. We will use the day before’s returns as a part of the indicators, to see if there is any correlation between the trend for tomorrow and today’s market performance.

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score

from sklearn.preprocessing import MinMaxScaler

import matplotlib.pyplot as plt

scl= MinMaxScaler()

X=scl.fit_transform(d[[‘dif’,’sec_dif’,’diff_v’,’yesterday_return’]])

y=d.Signal

We scale the data after selecting our indicators, using the MinMaxScaler. This scaler function reduces the feature values to a range of 0 to 1.

test=20

X_train=X[:-test]

y_train=y[:-test]

Next, we decide the test data size, in this case, I chose 20 days which the last one month data.

After this, I have used a decision tree classifier with increasing complexity, by adding more depth and features, to see how well the algorithm performs at the prediction.

I will plot the performance of the strategy with increasing complexity (1-9, 9 being most complex) and also measure the accuracy of these algorithms.

plt.figure(figsize=(10,7))

for i in range(1,10):

cls= DecisionTreeClassifier(max_depth=i,max_features=np.max([int(i/5),1,4]))

cls.fit(X_train,y_train)

d[‘P_Trend’]=cls.predict(X)

d[‘Str_return_{}’.format(i)]=d.P_Trend*d.Return

plt.plot(d[‘Str_return_{}’.format(i)].iloc[-test:].cumsum(), label=i)

print(‘\nAccuracy for complexity {}:’.format(i),\

accuracy_score(d.Signal.iloc[-test:],d.P_Trend.iloc[-test:]))

plt.legend(loc=’best’)

There is no clear indication of any trend or how the algorithms have performed, in general, this result is very mixed. Now, let us print the train data performance and see the performance.

plt.figure(figsize=(10,7))

for i in range(1,10):

plt.plot(d[‘Str_return_{}’.format(i)].cumsum(), label=i)

plt.legend(loc=’best’)

plt.show()

Here, the observations are very clear. In the sense, you can see that the more complex algorithms are doing a very good job at predicting the trend and the less complex ones are more like a coin toss. Given this train data, we could expect the most complex to perform better on the test data. But unfortunately, this is not the case. As you can see, the last 20 days performance in the above graph shows clearly that all the algorithms are almost equally bad. The reason for this is that we are not using the data available up until that point of the day to make the prediction. In other words, we are not making the prediction for last data by using an algorithm trained on entire data prior to that day, but we are assuming that the data of the past (365 days – 20 days) will have a similar trend as the next 20 days, which is mostly not the case. Let us verify this by decreasing the number of days to be predicted to just 5, then we can see a marked difference in the test data performance of the algorithms.

Here, you can see that the algorithms that are overfitting or underfitting the train data have done poorly and the algorithms that have the best learning have a consistent performance. Please note that you see only 4 out of 9 algorithms as most of them are overlapping. If you want to verify the accuracy of the strategy print the code containing the accuracy again with new test data. Here ideally when one uses a time series data we need to use an LSTM type of approach to make predictions. That is use past ‘x’ days of data to train the algorithm and then predict for the next day. This would improve your accuracy considerably. We will look into this in my next blog.

Next Step

If you want to learn various aspects of Algorithmic trading then check out the Executive Programme in Algorithmic Trading (EPAT™). The course covers training modules like Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading. EPAT™ equips you with the required skill sets to be a successful trader. Enroll now!

The post Trading Using Decision Tree Classifier Part -1 appeared first on .



This post first appeared on Best Algo Trading Platforms Used In Indian Market, please read the originial post: here

Share the post

Trading Using Decision Tree Classifier Part -1

×

Subscribe to Best Algo Trading Platforms Used In Indian Market

Get updates delivered right to your inbox!

Thank you for your subscription

×