Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Model Hyperparameter Tuning In Scikit Learn Using GridSearch

Getting the right hyperparameters for a machine learning model is an essential task for getting the best generalization performance. Hyperparameters provides regularization to the model and hence prevents it from overfitting. Grid search basically does a exhaustive search on the set of hyperparameters’ Values Provided, finding the best combination of values provided using Cross Validation or some other evaluation method and a Scoring function.

Let us see how to do parameter optimization in Scikit Learn using GridSearch –

clf_LR = Pipeline([('chi2', SelectKBest(chi2)), ('lr', LogisticRegression())])
params = {
          'chi2__k': [800, 1000, 1200, 1400, 1600, 1800, 2000],
          'lr__C': [0.0001, 0.001, 0.01, 0.5, 1, 10, 100, 1000],
          'lr__class_weight': [None, 'auto'],
          'lr__tol': [1e-2, 1e-3, 1e-4, 1e-5, 1e-6, 1e-7]
          }
gs = GridSearchCV(clf_LR, params, cv=5, scoring='f1')
gs.fit(X, y)

Here, I’m taking the classifier as Logistic Regression and feature selection method as chi2. clf_LR represents the standard pipeline, where my data (represented as set of feature vectors) first goes through a feature selection process (chi2) and then through the classifier (LogisticRegression).

params is where I’ve defined all the values of parameters that I want to try out. If I don’t use any parameter here, its default value would be considered. It is represented as a dictionary, where keys represent the hyperparameter name (defined as <model_name>__<hyperparamater_name>) and values represent the list of values that we want grid search to try. For keys, “model_name” comes from the Pipeline, it is followed by two underscores and then by the hyperparameter name. Hyperparameter name should be taken from the scikit learn’s documentation, from the model’s page, listed under section “Parameters:”, like here. Remember that few values of hyperparameters are dependent on each other, so if you fix two hyperparameters and try to vary the third which is dependent on the first two, then the third hyperparameter can only take limited values. Trying to access the values outside such limit would raise error, which is perfectly fine. In that case, just change the values appropriately.

Lastly, you see how I called the GridSearch. cv defines the number of cross-validations to use, in this case I’m doing 5 fold cross validation. scoring defines the Scoring Function to optimize. You can see the different scoring values possible, here. Now, just fit the dataset with the labels. X is my sample feature matrix and y is the corresponding labels.

Now, we can get the best combination of parameters from the set of parameters provided above based on the scoring and evaluation criteria –

print gs.best_estimator_.get_params()

We can also get the best score (here, F1 measure) for these combination of parameters –

print gs.best_score_

Share and Enjoy

• Facebook • Twitter • Delicious • LinkedIn • StumbleUpon • Add to favorites • Email • RSS

The post Model Hyperparameter Tuning In Scikit Learn Using GridSearch appeared first on Harsh Tech Talk.



This post first appeared on Harsh Tech Talk, please read the originial post: here

Share the post

Model Hyperparameter Tuning In Scikit Learn Using GridSearch

×

Subscribe to Harsh Tech Talk

Get updates delivered right to your inbox!

Thank you for your subscription

×