Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Amazon Machine Learning for Kaggle

Amazon recently released Machine Learning as a service on AWS. Here I make an attempt to use it to make a submission to Kaggle. I’m taking Titanic: Machine Learning from Disaster problem from Kaggle. Read the problem statement and download the train and test data from Kaggle.

Image Courtesy of Amazon

Goal is to get a sufficiently good model as quickly as possible. If you are not familiar with AWS or have not setup your account yet, you might want to look at my previous post on AWS. Let’s get started:

Uploading data:

  1. Login to AWS console and go to S3.
  2. Click on “Create Bucket” and name it something and click on Create.
  3. Now, in the bucket list, click on the new created bucket, then Actions -> Upload -> Add Files and select train.csv that you downloaded from Kaggle. Click Start Upload.
  4. Do the same with test.csv.

Creating datasource:

  1. If it says that Machine Learning is not available in your region, select the listed region and click on Get Started. Then click on Launch against Standard Setup.
  2. Enter the path of train.csv in “S3 location” field. It should be <bucket_name>/<filename>. Give a datasource name.Click verify. If it asks for read permission, then say yes and proceed. Click Continue.
  3. Just give a glance to see if all field types are correctly identified. You can play with different combinations which you feel correct, I changed a few data types here. Now, click Yes on “Does the first line in your CSV contain the column names?“.
  4. In Target step select field with name “Survived” and click Continue -> Review -> Continue.

Machine Learning Model:

  1. After the above step, you would be taken to ML Model Settings. Click Review. Let it come up with a default recipe. Click Finish.
  2. It would take several minutes to complete, have patience. Click on ML models at the top of AWS console window. Click on the model name from the list. Click Generate Batch Predictions.
  3. Select “My data is in S3, and I need to create a datasource”.  Put S3 location of test file same as before and check “Does the first line in your CSV contain the column names?”. Click verify and give read permission.
  4. Move ahead, give a filename in S3 bucket and click Verify and give write permissions. Click Finish.

Download the generated prediction file .csv.gz from the S3 bucket. Adjust the file to get it into proper submission format by adding one column of ID from text.csv and putting proper headers. Now submit it on Kaggle. I got a score of 0.77033 on the leaderboard, which is certainly not very good. You can improve it by setting few parameters in evaluation of ML Model in AWS and choosing a different evaluation strategy .

Share and Enjoy

• Facebook • Twitter • Delicious • LinkedIn • StumbleUpon • Add to favorites • Email • RSS

The post Amazon Machine Learning for Kaggle appeared first on Harsh Tech Talk.



This post first appeared on Harsh Tech Talk, please read the originial post: here

Share the post

Amazon Machine Learning for Kaggle

×

Subscribe to Harsh Tech Talk

Get updates delivered right to your inbox!

Thank you for your subscription

×