By Ishan Shah
Machine Learning Classification Strategy In PythonClick To Tweet
Now, let’s implement the Machine Learning classification strategy in Python.
Step 1: Import the libraries
In this step, we will import the necessary libraries that will be needed to create the strategy.
# machine learning classification
- from sklearn.svm import SVC
- from sklearn.metrics import scorer
- from sklearn.metrics import accuracy_score
# For data manipulation
- import pandas as pd
- import numpy as np
# To plot
- import matplotlib.pyplot as plt
- import seaborn
# To fetch data
- from pandas_datareader import data as pdr
Step 2: Fetch data
We will download the S&P500 data from google finance using pandas_datareader.
After that, we will drop the missing values from the data and plot the S&P500 close price series.
Df = pdr.get_data_google('SPY', start="2012-01-01", end="2017-10-01") Df= Df.dropna() Df.Close.plot(figsize=(10,5)) plt.ylabel("S&P500 Price") plt.show()
Step 3: Determine the target variable
The target variable is the variable which the machine learning classification algorithm will predict. In this example, the target variable is whether S&P500 price will close up or close down on the next trading day.
We will first determine the actual trading signal using the following logic – if next trading day’s close price is greater than today’s close price then, we will buy the S&P500 index, else we will sell the S&P500 index. We will store +1 for the buy signal and -1 for the sell signal.
y = np.where(Df['Close'].shift(-1) > Df['Close'],1,-1)
Step 4: Creation of predictors variables
The X is a dataset that holds the predictor’s variables which are used to predict target variable, ‘y’. The X consists of variables such as ‘Open – Close’ and ‘High – Low’. These can be understood as indicators based on which the algorithm will predict the option price.
Df['Open-Close'] = Df.Open - Df.Close Df['High-Low'] = Df.High - Df.Low X=Df[['Open-Close','High-Low']]
In the later part of the code, the machine learning classification algorithm will use the predictors and target variable in the training phase to create the model and then, predict the target variable in the test dataset.
Step 5: Test and train dataset split
In this step, we will split data into the train dataset and the test dataset.
- First, 80% of data is used for training and remaining data for testing
- X_train and y_train are train dataset
- X_test and y_test are test dataset
split_percentage = 0.8 split = int(split_percentage*len(Df))
# Train data set X_train = X[:split] y_train = y[:split]
# Test data set X_test = X[split:] y_test = y[split:]
Step 6: Create the machine learning classification model using the train dataset
We will create the machine learning classification model based on the train dataset. This model will be later used to predict the trading signal in the test dataset.
cls = SVC().fit(X_train, y_train)
Step 7: The classification model accuracy
We will compute the accuracy of the classification model on the train and test dataset, by comparing the actual values of the trading signal with the predicted values of the trading signal. The function accuracy_score() will be used to calculate the accuracy.
Syntax: accuracy_score(target_actual_value,target_predicted_value)
- target_actual_value: correct signal values
- target_predicted_value: predicted signal values
accuracy_train = accuracy_score(y_train, cls.predict(X_train)) accuracy_test = accuracy_score(y_test, cls.predict(X_test))
print('\nTrain Accuracy:{: .2f}%'.format(accuracy_train*100)) print('Test Accuracy:{: .2f}%'.format(accuracy_test*100))
An accuracy of 50%+ in test data suggests that the classification model is effective.
Step 8: Prediction
We will predict the signal (buy or sell) for the test data set, using the cls.predict() function. Then, we will compute the strategy returns based on the signal predicted by the model in the test dataset. We save it in the column ‘Strategy_Return’ and then, plot the cumulative strategy returns.
Df[‘Predicted_Signal’] = cls.predict(X)
# Calculate log returns Df['Return'] = np.log(Df.Close.shift(-1) / Df.Close)*100 Df['Strategy_Return'] = Df.Return * Df.Predicted_Signal Df.Strategy_Return.iloc[split:].cumsum().plot(figsize=(10,5)) plt.ylabel("Strategy Returns (%)") plt.show()
As seen from the graph, the machine learning classification strategy generates a return of around 15% in the test data set.
Next Step
If you want to learn various aspects of Algorithmic trading then check out the Executive Programme in Algorithmic Trading (EPAT™). The course covers training modules like Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading. EPAT™ equips you with the required skill sets to be a successful trader. Enroll now!
The post Machine Learning Classification Strategy In Python appeared first on .
This post first appeared on Best Algo Trading Platforms Used In Indian Market, please read the originial post: here