Importing the libraries

import quantrautil as q
import numpy as np
from sklearn.ensemble import RandomForestClassifier

The libraries imported above will be used as follows:

quantrautil – this will be used to fetch the price data of the BAC stock from yahoo finance.
numpy – to perform the data manipulation on BAC stock price to compute the input features and output. If you want to read more about numpy then it can be found here.
sklearn – Sklearn has a lot of tools and implementation of machine learning models. RandomForestClassifier will be used to create Random Forest classifier model.

Fetching the data

The next step is to import the price data of BAC stock from quantrautil. The get_data function from quantrautil is used to get the BAC data for 19 years from 1 Jan 2000 to 31 Jan 2019 as shown below. The data is stored in the dataframe data.

data = q.get_data('BAC','2000-1-1','2019-2-1')
print(data.tail())

[*********************100%***********************]  1 of 1 downloaded
                 Open       High        Low      Close  Adj Close     Volume  \
Date                                                                           
2019-01-25  29.280001  29.719999  29.139999  29.580000  29.580000   72182100   
2019-01-28  29.320000  29.670000  29.290001  29.629999  29.629999   59963800   
2019-01-29  29.540001  29.700001  29.340000  29.389999  29.389999   51451900   
2019-01-30  29.420000  29.469999  28.950001  29.070000  29.070000   66475800   
2019-01-31  28.750000  28.840000  27.980000  28.469999  28.469999  100201200   

           Source  
Date               
2019-01-25  Yahoo  
2019-01-28  Yahoo  
2019-01-29  Yahoo  
2019-01-30  Yahoo  
2019-01-31  Yahoo

Creating input and output dataset

In this step, I will create the input and output variable.

Input variable: I have used ‘(Open – Close)/Open’, ‘(High – Low)/Low’, standard deviation of last 5 days returns (std_5), and average of last 5 days returns (ret_5)
Output variable: If tomorrow’s close price is greater than today’s close price then the output variable is set to 1 and otherwise set to -1. 1 indicates to buy the stock and -1 indicates to sell the stock.

The choice of these features as input and output is completely random. If you are interested to learn more about feature selection then you can read here.

# Features construction 
data['Open-Close'] = (data.Open - data.Close)/data.Open
data['High-Low'] = (data.High - data.Low)/data.Low
data['percent_change'] = data['Adj Close'].pct_change()
data['std_5'] = data['percent_change'].rolling(5).std()
data['ret_5'] = data['percent_change'].rolling(5).mean()
data.dropna(inplace=True)

# X is the input variable
X = data[['Open-Close', 'High-Low', 'std_5', 'ret_5']]

# Y is the target or output variable
y = np.where(data['Adj Close'].shift(-1) > data['Adj Close'], 1, -1)

# Total dataset length
dataset_length = data.shape[0]

# Training dataset length
split = int(dataset_length * 0.75)
split

3597

# Splitiing the X and y into train and test datasets
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

# Print the size of the train and test dataset
print(X_train.shape, X_test.shape)
print(y_train.shape, y_test.shape)

(3597, 4) (1199, 4)
(3597,) (1199,)

Training the machine learning model

All set with the data! Let’s train a decision tree classifier model. The RandomForestClassifier function from tree is stored in variable ‘clf’ and then a fit method is called on it with ‘X_train’ and ‘y_train’ dataset as the parameters so that the classifier model can learn the relationship between input and output.

clf = RandomForestClassifier(random_state=5)

# Create the model on train dataset
model = clf.fit(X_train, y_train)

from sklearn.metrics import accuracy_score
print('Correct Prediction (%): ', accuracy_score(y_test, model.predict(X_test), normalize=True)*100.0)

Correct Prediction (%):  52.71059216013344

# Run the code to view the classification report metrics
from sklearn.metrics import classification_report
report = classification_report(y_test, model.predict(X_test))
print(report)

             precision    recall  f1-score   support

         -1       0.52      0.61      0.56       594
          1       0.54      0.44      0.49       605

avg / total       0.53      0.53      0.52      1199

Strategy Returns

data['strategy_returns'] = data.percent_change.shift(-1) * model.predict(X)

Daily returns histogram

%matplotlib inline
import matplotlib.pyplot as plt
data.strategy_returns[split:].hist()
plt.xlabel('Strategy returns (%)')
plt.show()

Strategy Returns

(data.strategy_returns[split:]+1).cumprod().plot()
plt.ylabel('Strategy returns (%)')
plt.show()

Random Forest Algorithm In Trading Using Python

Related Articles

What are Decision Trees?

What is a Random Forest?

Working of Random Forest

Python Code For Random Forest

Importing the libraries

Fetching the data

Creating input and output dataset

Train Test Split

Training the machine learning model

Strategy Returns

Daily returns histogram

Strategy Returns

Advantages

Disadvantages

Random Forest Algorithm In Trading Using Python

Related Articles

What are Decision Trees?

What is a Random Forest?

Working of Random Forest

Python Code For Random Forest

Importing the libraries

Fetching the data

Creating input and output dataset

Train Test Split

Training the machine learning model

Strategy Returns

Daily returns histogram

Strategy Returns

Advantages

Disadvantages

Share the post

Subscribe to Best Algo Trading Platforms Used In Indian Market

Thank you for your subscription