January 25th 2018

When designing algorithmic trading systems, knowing the direction of the market can help a lot in improving the accuracy of the signals. In this post I will try to develop an algorithmic trading system that attempts to predict the market direction using Candlestick Patterns and machine learning. Candlestick patterns get widely used by professional traders when making buy/sell decisions. I personally use candlestick patterns a lot in opening and closing trades. I use high and low of the previous candlestick pattern to place the stop loss. But there are problems. Candlestick patterns are vague and imprecise most of the time. You cannot use them alone. You need to use other indicators alongwith candlestick patterns to improve the accuracy of your buy/sell decisions. It take experiences to correctly interpret these patterns and just like other patterns there is a chance of failure that we have a false pattern. How can we develop an algorithm that can use these candlestick patterns in predicting the market? Did you read a previous post on how to develop Algorithmic Trading Strategies using R? In this post, I try to go into great detail and explore how we can use machine learning on candlestick patterns and what level of accuracy can we achieve in our trading signals. In this post we first fuzzify candlestick patterns and then use machine learning as well as deep learning to forecasting future price action. Below is the candlestick chart of GBPUSD on 1 hour timeframe. Did you notice the candlestick patterns on the left of the chart are predicting price rise?

best projectors for home
Revive Your Morning: NAC Hangover Cur…
This Cosmetics Company Now Makes Pain…
Nota Informativa
The Ultimate Guide to the Best All-In…

When we deal with financial data, we are in essence dealing with a time series. Read the post on how to predict financial time series using GARCH model.Most of the time we are only dealing with the closing price at the end of some interval like 30 minute, 60 minute, 240 minute, daily or weekly. Most of the literature on time series analysis just uses the closing price for each interval to develop a model which is then used to make the predictions in the future. But we all know in between each interval we have the high price as well as the close price which contains a lot of information that we ignore when we only deal with the closing price. If you have been trading for a while and know how to read candlestick charts, you know that a candlestick body and the upper shadow and the lower shadow have information that we can use in predicting price in near future. This is the important question how to incorporate that information in our forecasting algorithm. This is precisely what we will do now. Below you can see a bullish as well as a bearish candle. In the figure below, you can see the real body, upper shadow and the lower shadow for both a bearish and a bullish candle.

In the above figure you can see both a bullish and bearish candle. You can see the upper shadow, the lower shadow and the Candle Body. Read the post on the only candlestick pattern you will ever need to making winning trades. We will be using Fuzzy Logic in modeling these candlestick patterns. Fuzzy Logic is not known to many traders. In the recent years, fuzzy logic has become very popular and a lot of research papers have been written showing how fuzzy logic can be applied to many practical areas. Unlike classical logic where everything is either TRUE or FALSE, in fuzzy logic we have shades of grey in between. I explain fuzzy logic more below. First we need to read the data and then define the variables body, upper shadow and the lower shadow. As I said, we will be using R. R is a powerful language that can help you a lot in building powerful algorithmic trading systems. Below I have provided the code that builds a forecasting model based on the candlestick pattern body and the upper shadow and the lower shadow.

#Fuzzy Candlestick Pattern Forecasting Algorithm # Import the csv file data1 0, data2$Open, data2$Close) #fine the min of the open and close data2$minOC 0, data2$Close, data2$Open) #calculate the previous trend data2$Trend 0, 10000*(data2$High-data2$Open), 10000*(data2$High-data2$Close)) data2$Llower 0, 10000*(data2$Close-data2$Low), 10000*(data2$Open-data2$Low)) data2$Lbody 0, 10000*(data2$Open-data2$Close), 10000*(data2$Close-data2$Open)) #calculate the daily variation #calculate the daily variation data2$Pips

In the above model, first we calculate the trend which is just the cumulative return of the past 24 candles. We will be developing a model for predicting the price after 24 hours. Since we are using 1 hour candles, 24 hours means we will predict price after 24 hours. If the price prediction is up, our indicator will give a buy alert and if the price prediction after 24 hours is negative, our indicator will give a sell signal. We can easily use this model for lower timeframes like 30 minute, 60 minute and 240 minutes as well as higher timeframe like the weekly. So don’t worry about the timeframe for now. Focus on the model. We will be able to use it on any timeframe. After calculating the trend, I have calculated the candle body which I have given the name Lbody and the candle upper shadow which I call the Lupper and the lower shadow which I call Llower. Did you read the post on 8 machine learning algorithms that can help you in building powerful algorithmic trading strategies?

Candle body is just the difference of the open and close price without the sign meaning if the candle is bullish we calculate candle body by subtracting open from close. In case the candle is bearish than we subtract close from open. We take care of the bullish and bearish candle by using the variable Color which is the difference of open with close so it has a sign. Upper shadow is also without sign. If the candle is bullish we subtract close from high and if the candle is bearish we subtract open from high. Below I have calculated the median of the candle body, upper shadow and the lower shadow.

#calculate the universe of discourse #max(Imax1, Imin1) #UoD

In the above code, I have calculated the mean as well the median of the candle body, candle upper shadow as well as candle lower shadow. When we do data analysis, it is always a good idea to look at the summary statistics of each variable that we want to use as a feature in our model. I have used GBPUSD daily data. If you check the above code, I have used 600 daily candles in the input dataframe. Below is the summary statistics for the previous trend.

summary(data2$Trend)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
-125.60  -22.75   10.95   15.60   47.52  210.10      24

Above is the summary statistics for Previous Trend. You can see the minimum is -125 pips and the maximum is 210 pips. Do you remember 1192 pips down movement in one day? If you check this downward movement took place on 2016-06-24 just when the Brexit vote results got announced. It was a close call. When Brexit vote result got announced, GBPUSD fell down like a stone. Now in old days, statisticians would call it an outlier that would skew the results. So they would take it out of the data. So you should keep this in mind these black swan events when market can move a lot like 1192 pips in just 24 days. In our data the minimum price movement is 125 pips and the maximum price movement is 210 pips. Extreme values are important in data analysis. Don’t be afraid of them. Below is the summary for Pips!

> summary(data2$Pips)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
-125.60  -22.75   10.95   15.60   47.52  210.10      24

Above is the summary statistics for Pips.We will not be directly using this Pips variable that we have calculated. We will use a categorical variable with two values: BUY and SELL. When we have a positive Pips value we will classify it as BUY and when we have a negative Pips value we will classify it as SELL. Read this post on an AI Forex System. Below is the summary statistics for candle body, candle upper shadow and the candle lower shadow! It is always a good idea to become familiar with data before you start developing a algorithmic trading model.

> summary(data3$Body1)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   1.100   2.800   3.995   5.600  35.800       1 
> summary(data3$UpperShadow1)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   1.100   2.700   4.247   5.400 123.500 
> summary(data3$LowerShadow1)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   2.100   4.900   7.222   9.725  54.800

Above you can see, the median candle body is 2.8 pips while the mean candle body is 4 pips. Minimum candle body is zero pips and the maximum candle body is 35 pips. In the same manner, the median candle upper shadow is 2.7 pips while the mean candle upper shadow is 4.2 pips while the median candle lower shadow is 4.9 pips while the mean candle lower shadow is 7.22 pips. In the candle body summary statistics you can again see the Another extreme event that you should remember is GBPUSD Flash Crash. You can check this the GBPUSD Flash Crash took place on 2016-10-07. Looking at the summary statistics makes you aware of the data. By looking at the above summary statistics for the Previous Trend, Pips, Candle Body, Candle Upper Shadow and the Candle Lower Shadow, we know it better now. Read this post on how to use support vector machines in predicting stock returns. Candlestick patterns are universal. You can use this algorithmic trading strategy on stocks, commodities, currencies and cryptocurrencies. Yes, cryptocurrencies too! Below I have done more data wrangling to process the data into shape that we will use in building a fuzzify model.

data2

Above code is just data wrangling and data pre-processing before we build our fuzzy candlestick pattern model. We will be using the previous trend as well as the past 3 candles as input to predict the output that I have called the variation.

How To Fuzzify The Candlestick Patterns?

I have developed a course on how to use fuzzy logic in trading. if you are interested, you can take a look at my course R Fuzzy Logic for Traders. In the course I explain how to fuzzify data using different membership functions. We will fuzzify the input data. If you don’t know anything about fuzzy logic, let you can take a look at my course R Fuzzy Logic for Traders. In each candlestick there are three variables, the body the upper shadow and the lower shadow. Do you remember the Doji candle? Doji has a very small candle body. When the open and close is almost the same we have a Doji candle. Doji can give you strong clues about trend continuation and trend reversal. In the figure below you can see the fuzzy membership functions for candle body, upper shadow and the lower shadow. We will be using the median candle body, upper shadow and lower shadow.

Above is the plot of the fuzzy membership functions for the candle body, upper shadow and the lower shadow. All have four linguistic values Equal, Short, Middle and Long. As said above I have used median of the candle body as well as the upper shadow and the lower shadow instead of the mean. Median is not affected by outliers while mean gets affected by outliers. You can use mean, it wont make much difference. Everything is relative when we use candlestick patterns. We have used a window of 600 one hour candles to calculate the median. In the same manner each candle position relative to the previous candle is important. We model that using Open Style and Close Style variables. Below is the chart of Open Style and Close Style variables.

Above is the fuzzy membership function for the Open Style and the Close Style. Open Style has five linguistic variables: OPEN_LOW, OPEN_EQUAL_LOW, OPEN_EQUAL, OPEN_EQUAL_HIGH and OPEN_HIGH. In the same manner, Close Style has five linguistic variables: CLOSE_LOW, CLOSE_EQUAL_LOW, CLOSE_EQUAL, CLOSE_EQUAL_HIGH and CLOSE_HIGH.

Above is the fuzzy membership function for the Trend. In this model, we will not be using the Trend variable. But in the above figure you can see how you can fuzzify the Trend variable. Left is the minimum trend value and the right is the maximum trend value. I have used FRBS R package. This is a powerful fuzzy relation based system package that you can use to develop your algorithmic trading strategies. The important thing is the build the fuzzfier that fuzzifies the input data.

#demo(WM.GasFur) #demo(FRBS.Mamdani.Manual) #demo(FRBS.TSK.Manual) #demo(FRBS.Manual) ## Define number of linguistic terms of input variables. ## PreviousTrend has 6 linguistic terms ##CandleBody has got 4 linguistic terms ##CandleUpperShadow has got 4 linguistic terms ## CandleLowerShadow has got 4 linguistic terms ##OpenStyle has got 5 linguistic terms ##CloseStyle has got 5 linguistic terms ##FollowingTrend hast got 8 linguistic terms ## Define number of linguistic terms of input variables. num.fvalinput 70) | # (data2$Trend > 0 & # data2$Pips) 0, 1,2) data5[,31]

We have fuzzified the input data. Below label the input data with the maximum value of fuzzify lingusitic variable. We will use this linguistic variables instead of the initial numerical values in building our Fuzzy Candlestick Patterns Forecasting Algorithm. Read the post on how to code stop loss and take profit in MQL5.

#View(data3) #options(error=recover) #View(varinp.mf) #View(data4) #a new matrix with clear labels data5

We have turned the fuzzified data into fuzzy labels.Now we need to split the data into train and test. We will use the train dataset for building the model. Once we have the model, we will use the test dataset to test the accuracy of the model. Test dataset is also known as the Validation Set. This is something very important. In the validation data set we use the data that the model has never seen before.

#View(data5) #split the data into train and test train

First we will use the train dataset for training our model and then use that trained model to make predictions on the test data and check the accuracy for the predictions made by our model.

Data Mining Using AdaBoost

The first model that we build is a Decision Tree model. Data mining algorithms look for repeatable patterns in the data. This is what we also do as technical traders. As technical traders, we look for patterns that can predict the market. Most popular chart patterns are the double top and the double bottom, triple top and tripe bottom, head and shoulder pattern, triangle pattern, wedge pattern etc. These patterns are important in predicting trend reversal and trend continuation. AdaBoost Decision Tree model uses boosting.

#using adaBag algorithm for data mining
library(rpart)
library(adabag)
#first we try boosting
f  pred1$confusion
               Observed Class
Predicted Class Buy Sell
           Buy   36    6
           Sell  22    6
> pred1$error
[1] 0.4

I have used the AdaBoost Decision Tree Algorithm to build a model that I used to then make predictions on the test data. The error is 0.4. You can say our model is giving 40% wrong predictions and 60% correct predictions. Important question is how to improve the accuracy of our model to above 80%. Let’s try that now. One approach is to increase the input data from three candlesticks to five candlesticks and see if this improves the correct prediction percentage.

Fuzzy Candlestick Patterns Neural Network Model

Another approach is to use another algorithm like RandomForest or a Deep Learning Neural Network. Below I have used a simple Multilayer Perceptron in building a Fuzzy Candlestick Patterns Neural Network Model.

library(RSNNS) m4

First I train the MLP on the train dataset. Once I trained the MLP model, I use the test data to see how good the model is! Below I have shown the confustion matrix for the validation set. You can see that the model achieved a predictive accuracy of 82% which is good for our purposes.

Confusion Matrix and Statistics

          test[, 31]
test[, 32] Buy Sell
      Buy   58   12
      Sell   0    0
                                          
               Accuracy : 0.8286          
                 95% CI : (0.7197, 0.9082)
    No Information Rate : 0.8286          
    P-Value [Acc > NIR] : 0.576303        
                                          
                  Kappa : 0               
 Mcnemar's Test P-Value : 0.001496        
                                          
            Sensitivity : 1.0000          
            Specificity : 0.0000          
         Pos Pred Value : 0.8286          
         Neg Pred Value :    NaN          
             Prevalence : 0.8286          
         Detection Rate : 0.8286          
   Detection Prevalence : 1.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : Buy

We can improve the predictive accuracy of the model by using deep learning. Did you take a look at my course R Deep Learning for Traders? Deep learning is much more powerful as compared to machine learning.

Conclusion

In this post I developed a fuzzy candlestick forecasting algorithm. I used five candlesticks in my model. I did not use any other indicator and achieved a predictive accuracy of 82%. This model needs more work and I think we can improve the predictive accuracy above 90% using deep learning. What this model shows is that we can use candlestick patterns in predicting price. As technical traders we have known this for a long time by just looking at the charts. Now we can use machine learning to prove that candlestick patterns do have predictive powers.

This post first appeared on Trading Ninja 2.0, please read the originial post: here