In case you are catching the train running, here is the link to the introduction blog of the Machine Learning in a Box series. At the end of this introduction blog you will find the links for each elements of the series.
Before we get started with week 3,here is a quick recap from last week!
In my opinion, there is series of task that I pay a lot attention:
- Business Understanding :
Assessing the current situation (the business pain) and defining a business objective (what we think could be done to address it) are tasks I really like.
This is where I was allowed to challenge my customers and help them better define their pain and what way they think it should and could be solved, what is the expected impact, how it will be measured, what is the impact of not doing it, etc.
For example, if someone tells you that we need reduce “churn” to maintain revenue growth, you may end up looking at not only who is likely to churn, but also why they are churning, what is the best way to retain, and is it worth to retain this churner.
In other words, one business pain or one business objective may end up in several data mining goals.
When defining the data mining goals, it is important to define success criteria (which are not the same as the business success criteria).
- Data Understanding & Preparation:
This is where your “Data Engineering” and SQL skills will start to pay off. You will also need to be realistic regarding the fact that “Datasets are never clean” whatever your customer says.
During this phase, you have to assess quickly that you have the relevant data to achieve your data mining goals, because you can’t predict churn if you don’t have an existing churn flag or a way to “build” it.
This phases should also allow you to gain insights, elaborate a series of feedback, but I use it to make sure I will be modeling the right question with the right data.
- Modeling & Evaluation:
Here you will leverage your algorithm skills and your modeling techniques. That’s what this blog series is all about. And today, we will have a closer look at the different type/families of algorithms.
An important part of the Modeling phase is dedicated to how you will test and evaluate your models performances.
It is a good practice to pick a single metric to measure your model performance. Once selected (usually before you start modeling), your goal will be to optimize it. You also may include a series of other metrics that you will try to satisfice (like the execution time, resource consumed). This is described by Andrew Ng as the “Satisficing and Optimizing metric” in his recent Machine Learning Strategy MOOC on Coursera .
This phase is often not enough considered as critical or important during many Machine Learning project.
You all probably heard about the Netflix Recommendation challenge, something you may also need to know is that the original winning solution couldn’t be rolled out to production as-is.
So, always keep in mind that just like perfection don’t exist in prediction, a good model is a model that can go live!
Just like a lot of people, I failed when I started with Machine Learning. And applying a “method” really helped me be more successful and most importantly be seen as a trusted resource to run Data Science projects despite the fact that I don’t have a PhD in Mathematics.
(Remember sharing is caring!)
UPDATE : Here are the links to all the Machine Learning in a Box weekly blogs:
- Introducing “Project: Machine Learning in a Box”
- Machine Learning in a Box (week 2) : Project Methodologies
- Recap Machine Learning in a Box (week 2) : Project Methodologies
- Machine Learning in a Box (week 3) : Algorithms Learning Styles