Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Machine Learning Problem Framing and Feature extraction and Feature engineering

Machine Learning Problem Framing 

Beginning work on a Machine Learning competition presents a simulation of a real machine problem. The competition presents a brief description - For example, Announcing that an insurance company would like to predict the loss rates on their automobile policies. As a competitor, your first step is to have a deep look at the dataset and identify what form a prediction needs to take to be useful. The data can give insights regarding the approaches.





















The above figure gives the general language statement of objective and moving towards an arrangement of data that will serve as an input for a machine learning Algorithm
The generalized statement pictured as "Let's get better Results" has first to be converted into specific goals that can be measured and optimized.
For example, For a website owner, specific Performance might be improved click-through rates or more sales.
The next step is to assemble data that might make it possible to predict how likely a given customer is to click various links or to purchase various products offered. online. These data depict a matrix of attributes. For the website example, they might include other pages the visitor has viewed or items the visitor has purchased in the past. In addition to the attributes that will be used to make predictions, the machine learning algorithms for this type of problem need to have correct answers to use for training. These are denoted as Targets.
Usually, several aspects of the problem formulation can be done in more than one way. This leads to some iteration between framing the problem, selecting and training a model, and producing performance estimates.
The problem may come with specific quantitative training objectives or part of the job might be extracting these data ( Called Targets or Labels).

Feature extraction and Feature Engineering

Deciding which variables to use for making predictions can also involve experimentation. This process is known as Feature experimentation and Feature engineering.
Feature extraction is the process of taking data from a free-form arrangement, such as words in a document or on a web page and arranging them into rows and columns of numbers.
For example, a spam filtering problem begins with the text from emails and might extract things like the number of capital letters in the document and number of words in all Caps, the number of times the word "buy" appears in the document and other numeric features selected to highlight the difference between spam and non-spam emails.
Feature engineering is the process of manipulating and combining features to arrive at more informative ones.
Example, Buiding a system of trading securities involves feature extraction and feature engineering. Feature extraction would be deciding what things will be used to predict prices. Past prices, prices of related securities, interest rates, and features extracted from new releases have all been incorporated into various trading systems that have been discussed publicly.
After a reasonable set of features is developed, you train a predictive model, assess its performance, and make a decision about deploying the model. Generally, you will want to make changes to the features used, if for no other reason than to confirm that your model's performance is adequate. One way to determine which feature to use to try all combinations, but that can take a lot of time. Inevitably you will face competing for pressures to improve performance but also to get a trained model to use quickly.
Data preparations and feature engineering are estimated to take 80 to 90% of the time required to develop a machine learning model.

Determining performance of a trained model

The fitness of a model is determined by how well it performs on data that were not used to train the model. Just set aside some data. Dont use it in training. After the training is done use the data that you set aside to determine the performance of your algorithm.





This post first appeared on Big Data Basics, please read the originial post: here

Share the post

Machine Learning Problem Framing and Feature extraction and Feature engineering

×

Subscribe to Big Data Basics

Get updates delivered right to your inbox!

Thank you for your subscription

×