Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Scale your deep learning workloads on MXNet R (scoring phase)

In my previous post, I described about the basis of scaling the statistical R computing using Azure Hadoop (HDInsight) and R Server.
Some folks asked me “What if the deep learning workloads ?”.

This post will answer for this question.

Overall recap

The machine learning team is providing some useful resources about this concerns as follows. Please refer these document for the technical backgrounds or details.
Here MXNet is used for implementing the deep neural networks with R.

Machine learning blog – Building Deep Neural Networks in the Cloud with Azure GPU VMs, MXNet and Microsoft R Server
https://blogs.technet.microsoft.com/machinelearning/2016/09/15/building-deep-neural-networks-in-the-cloud-with-azure-gpu-vms-Mxnet-and-microsoft-r-server/

Channel9 – Deep Learning in Microsoft R Server Using MXNet on High-Performance GPUs in the Public Cloud
https://channel9.msdn.com/Events/Machine-Learning-and-Data-Sciences-Conference/Data-Science-Summit-2016/MSDSS21

Machine learning blog – Applying Deep Learning at Cloud Scale, with Microsoft R Server & Azure Data Lake
https://blogs.technet.microsoft.com/machinelearning/2016/10/31/applying-cloud-deep-learning-at-scale-with-microsoft-r-server-azure-data-lake/

In my post (here) I show you the programming code or how-to-step along with these useful resources.

For the training perspective, MXNet natively has the capability of data prallelization by the multiple devices, including the utilization of massive power of GPU. (See “MXNet how-To – Run MXNet on Multiple CPU/GPUs with Data Parallel“.) The key-value store of MXNet works for the synchronization in the multiple devices.
You can easily run the GPU-enabled Virtual Machines (Data Science Virtual Machines or N-Series Virtual Machines) in Microsoft Azure, and see how it works. I will show you this scenario (training scenario) in the next post.

For the scoring perspective, you can indepently run the scoring tasks (indepently not like the training task) for each data separately, since you can easily scale the workloads using a series of devices or machines.
In this post I show you the step-by-step tutorial for that scoring scenario. (Here we also use Spark and R Server on Azure.) The sample I show you here is so trivial code, but the scoring might use the extremely large data in the actual system.

Our sample

In this post we use the familiar MNIST example (which is the example of handwritten digits recognition) for the deep neural networks.
You can easily copy the script of MNIST and download the sample data from the official tutorial “MXNet R – Handwritten Digits Classification Competition“. This uses a large number of 28 X 28 = 783 pixels images, i.e, 28 X 28 input neurons of number data.

Please see the following script.

Here I don’t focus on the deep learning algorithms (networks) itself, and I note that the below is the traditional network algorithm (called feedforward neural networks, which is always fed forward without fed back or loops) as the brief example instead of the real Convolutional Neural Network (CNN) or some other well-refined neural networks. If you have much interest, please see the LeNet example for CNN in the tutorial.

Note : When you’re using Windows or Mac and run install.packages() for installing MXNet package (in DMLC repository), the latest package (mxnet 0.9.4) uses visNetwork 1.0.3.
Since currently you must install the latest visNetwork package as follows.
install.packages("visNetwork", repos="https://cran.revolutionanalytics.com/")

R MNIST Complete Code – Standalone

require(mxnet)

#####
# read training data
#
# train.csv is:
# (label, pixel0, pixel1, ..., pixel783)
# 1, 0, 0, ..., 0
# 4, 0, 0, ..., 0
# ...
#####
train 

Setting up Spark clusters

As I described in my previous post, you can easily create your R Server and Spark clusters on Azure.
Here I skip how to setup, but please see my previous post for details (using Azure Data Lake store, RStudio setup, etc).

Moreover, here you have one more thing that needs to be done.
Currently R Server on Azure HDInsight (Hadoop) cluster is not including MXNet. Because of this reason, you must install MXNet on all worker nodes (ubuntu 16) using HDInsight script action. (You just only create the installation script and set this script on Azure Portal. See the following screenshot.)

Note : If you want to apply the script action on the edge node, you must use the sku of HDInsight Premium. So it’s better to run the MXNet workloads only on worker nodes. (The edge node is just for orchestrating.)

Below is my script action (.sh) for MXNet installation.
As you can see, you don’t need to utilize GPU in scoring phase, and installation (compilation) is much simpler.

#!/usr/bin/env bash
##########
#
# HDInsight script action
# for installing MXNet
# without GPU utilized
#
##########

sudo apt-get install -y libatlas-base-dev libopencv-dev libprotoc-dev python-numpy python-scipy make unzip git gcc g++ libcurl4-openssl-dev libssl-dev
sudo update-alternatives --install "/usr/bin/cc" "cc" "/usr/bin/gcc" 50

MXNET_HOME="$HOME/mxnet/"

git clone https://github.com/dmlc/mxnet.git "$HOME/mxnet/" --recursive

cd "$MXNET_HOME"

make -j$(nproc)

sudo Rscript -e "install.packages('devtools', repo = 'https://cran.rstudio.com')"

cd R-package
sudo Rscript -e "library(devtools); library(methods); options(repos=c(CRAN='https://cran.rstudio.com')); install_deps(dependencies = TRUE)"
sudo Rscript -e "install.packages(c('curl', 'httr'))"
sudo Rscript -e "install.packages(c('Rcpp', 'DiagrammeR', 'data.table', 'jsonlite', 'magrittr', 'stringr', 'roxygen2'), repos = 'https://cran.rstudio.com')"

cd ..
sudo make rpkg

sudo R CMD INSTALL mxnet_current_r.tar.gz

Note : If your script action encounters some errors, you can see the log using Ambari UI https://{your cluster name}.azurehdinsight.net.

Prepare your trained model

Before running the scoring workloads, we prepare the trained model and save it to the local disk as follows. (See the following script.)
This script is saving the MXNet trained model in c:tmp. Here the two of files, mymodel-symbol.json and mymodel-0100.params will be created.

R MNIST Train and Save model

require(mxnet)

#####
# train.csv is:
# (label, pixel0, pixel1, ..., pixel783)
# 1, 0, 0, ..., 0
# 4, 0, 0, ..., 0
# ...
#####

# read input data
train 

After running this script, upload these model files (mymodel-symbol.json, mymodel-0100.params) into the downloadable location. (The scoring program will extract these files.)
In my example, I uploaded to Azure Data Lake store (adl://mltest.azuredatalakestore.net/dnndata folder) which is the same storage as the primary storage of Hadoop cluster.

R Script for scaling

It’s ready. Now let’s start the programming for scaling.

As I mentioned in my previous post, we can use “rx” prefixed ScaleR functions for Spark cluster scaling.

“So… we must rewrite all the functions with ScaleR functions !?”

Don’t worry ! Of course, not.
In this case, you can just use rxExec() function, which enables some bunch of codes to be distributed into the clusters.

Let’s see the following sample code. (As I described in my previous post, this code should be run on the edge node.)

R MNIST scoring on Spark cluster (R Server)

# Set Spark clusters context
spark 

As you can see, here we are passing 10 “filename” arguments to image.score function using rxExec(). Eventually each image.score workloads (total 10 workloads) are distributed into the worker nodes on Spark cluster.
The return value of rxExec() is the aggregated results of 10 image.score instance executions.

http://i1155.photobucket.com/albums/p551/tsmatsuz/20170209_RStudio_Web_zpsymgpfl1l.jpg

If needed, you can easily scale your cluster (ex. 4 nodes -> 16 nodes) on Azure Portal and get the massive computing resource for your workloads.

Azure Hadoop (Data Lake, HDInsight) team is also writing in the recent post for Caffe on Spark cluster. (It’s also much useful resource.)

Azure Data Lake & Azure HDInsight Blog : Distributed Deep Learning on HDInsight with Caffe on Spark
https://blogs.msdn.microsoft.com/azuredatalake/2017/02/02/distributed-deep-learning-on-hdinsight-with-caffe-on-spark/

Share the post

Scale your deep learning workloads on MXNet R (scoring phase)

×

Subscribe to Msdn Blogs | Get The Latest Information, Insights, Announcements, And News From Microsoft Experts And Developers In The Msdn Blogs.

Get updates delivered right to your inbox!

Thank you for your subscription

×