Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Anomaly Detection (One Class SVM) in R with MicrosoftML

In my previous post I described about the text featurization using Microsoftml.
In this post, I show you a brief introduction for the anomaly detection with MicrosoftML.

Note : As I mentioned in the previous post, MicrosoftML is now available in Windows only (not Linux including the Spark cluster). Sorry, but please wait for the update.

MicrosoftML provides the Function of one class support vector machines (OC-SVM) named rxOneClassSvm, which is used for the unbalanced binary classification. This function is the unsupervised learner, i.e., it doesn’t need the values of anomalies as the training data. (The only normal data is used for the training, and it’s separated by the optimal hyperplane while it’s mapped into the high dimensional space.)

First I show you a brief example of this function for your understanding as follows.

library(MicrosoftML)

# train data with normal data
train_count 

As you can see, the row #3 and #7 in the test data is the outlier.
The following illustrates the data map including the normal data by the blue dot and this outlier data by the red dot.

The following is the result. The outlier data in row #3 and #7 are scored as follows.

Let’s see the real scenario.
Here I use the “Breast Cancer Wisconsin Data Set” (see here). This data is including id of patient, the diagnosis result of disease (M = malignant, B = benign), and a lot of attributes which are computed from a digitized image of a breast mass (radius, texture, perimeter, etc). This sample is having high dimensions.

This dataset is well-formed for the analysis purpose, but in the real application you must do some works before training like selecting appropriate attributes, vectorizing, data cleaning, eliminating dependencies, etc.

8510426, B, 13.54, 14.36, 87.46, ...
8510653, B, 13.08, 15.71, 85.63, ...
8510824, B, 9.504, 12.44, 60.34, ...

...

Here I train and predict with the following steps.

  1. Split the original data into the training purpose and testing purpose.
  2. Create the trained model by rxOneClassSvm with the training data. We use all the attributes except for the patient id and the result (‘M’ or ‘B’) for training.
  3. Predict by the generated model with test data, and evaluate the results. (Here I use ROCR package.)

This programming example is here :

library("MicrosoftML")
library("ROCR")

# read data
alldata 

The following is the result plotted by ROCR. The result seems to fairly match the diagnosis results.

rxOneClassSvm uses the radial basis (RBF) as the SVM kernel function by default. For more complex cases, you can specify other kernel functions (linear, polynomial, sigmoid) with appropriate parameters.

model kernel = polynomialKernel(a = .2, deg = 2),
  data = traindata)

Share the post

Anomaly Detection (One Class SVM) in R with MicrosoftML

×

Subscribe to Msdn Blogs | Get The Latest Information, Insights, Announcements, And News From Microsoft Experts And Developers In The Msdn Blogs.

Get updates delivered right to your inbox!

Thank you for your subscription

×