How to Tune Algorithm Parameters with Scikit-Learn

Machine learning models are parameterized so that their behavior can be tuned for a given problem.

Models can have many parameters and finding the best combination of parameters can be treated as a search problem.

In this post, you will discover how to tune the parameters of machine learning algorithms in Python using the scikit-learn library.

  • Update Jan/2017: Updated to reflect changes to the scikit-learn API in version 0.18.
fine tuning

Tuning an algorithm like Tuning a Piano
Photo by Katie Fricker, some rights reserved

Machine Learning Algorithm Parameters

Algorithm tuning is a final step in the process of applied machine learning before presenting results.

It is sometimes called Hyperparameter optimization where the algorithm parameters are referred to as hyperparameters whereas the coefficients found by the machine learning algorithm itself are referred to as parameters. Optimization suggests the search-nature of the problem.

Phrased as a search problem, you can use different search strategies to find a good and robust parameter or set of parameters for an algorithm on a given problem.

Two simple and easy search strategies are grid search and random search. Scikit-learn provides these two methods for algorithm parameter tuning and examples of each are provided below.

Beat Information Overload and Master the Fastest Growing Platform of Machine Learning Pros


Machine Learning Mastery With Python Mini-CourseGet my free Machine Learning With Python mini course and start loading your own datasets from CSV in just 1 hour.

Daily lessons in your inbox for 14 days, and a Machine-Learning-With-Python “Cheat Sheet” you can download right now.

Download Your FREE Mini-Course >>

 

Grid Search Parameter Tuning

Grid search is an approach to parameter tuning that will methodically build and evaluate a model for each combination of algorithm parameters specified in a grid.

The recipe below evaluates different alpha values for the Ridge Regression algorithm on the standard diabetes dataset. This is a one-dimensional grid search.

For more information see the API for GridSearchCV and Exhaustive Grid Search section in the user guide.

Random Search Parameter Tuning

Random search is an approach to parameter tuning that will sample algorithm parameters from a random distribution (i.e. uniform) for a fixed number of iterations. A model is constructed and evaluated for each combination of parameters chosen.

The recipe below evaluates different alpha random values between 0 and 1 for the Ridge Regression algorithm on the standard diabetes dataset.

For more information see the API for RandomizedSearchCV and the the Randomized Parameter Optimization section in the user guide.

Summary

Algorithm parameter tuning is an important step for improving algorithm performance right before presenting results or preparing a system for production.

In this post, you discovered algorithm parameter tuning and two methods that you can use right now in Python and the scikit-learn library to improve your algorithm results. Specifically grid search and random search.

Frustrated With Python Machine Learning?

Develop Your Own Models and Predictions in Minutes

...with just a few lines of scikit-learn code

Discover how in my new Ebook: Machine Learning Mastery With Python

It covers self-study tutorials and end-to-end projects on topics like:
Loading data, visualization, modeling, algorithm tuning, and much more...

Finally Bring Machine Learning To
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.

 

13 Responses to How to Tune Algorithm Parameters with Scikit-Learn

  1. Harsh October 23, 2014 at 4:59 pm #

    Nice summary. I think that due to dependency of few parameters on each other you cannot choose any combination of them in GridSearch, else it would error out. I’ve written a post exclusively on GridSearch http://harshtechtalk.com/model-hyperparameter-tuning-scikit-learn-using-gridsearch/

  2. Alex September 5, 2016 at 6:29 pm #

    Sir, this is an excellent introduction to hyperparameter optimization.

    I’m now thinking, there must be a process for determining an optimal range of parameter values for a particular parameter. For example, when demonstrating GridSearchCV, you used alphas = np.array([1, 0.1, 0.01, 0.001, 0.0001, 0]). What principles guide you into selecting those particular values? And where can I read more about those principles — do they have their roots in statistics, probability theory, or something else?

    One more thing, I’m still a machine learning novice and the parameters used to tune Scikit-learn algorithms hardly make sense to me. For example, the Ridge model has parameters “alpha”, “fit_intercept”, “normalize”, “copy_X”, “max_iter”, “tol”, “solver”, and “random_state”. Those parameters don’t make sense to me because I understand I lack the background necessary to make sense of them. What is this background that I am missing?

    By the way, I’m subscribed to your newsletter with the same email I’ve used to post this comment. I like your mail, very insightful. I’ll appreciate it if you can also send a copy of your response to my mailbox.

    • Jason Brownlee September 6, 2016 at 9:44 am #

      Hi Alex, I just chose popular values for alpha as a starting point for the search. A good practice.

      You could use random search on a suite of similar problems and try to deduce cause-effect for the parameter settings or heuristics, but you will always find a problem that breaks the rules. It is always good to use a mix of random and grid searching to expose “good” regions of the hyperparameter search space.

      Often only a few parameters make a big difference when tuning an algorithm. You can research a given algorithm and figure out what each parameter does and normal ranges for values. A difficulty is that different implementations may expose different parameters and may require careful reading of the implementations documentation as well. Basically, lots of hard work is required.

      I hope this helps.

  3. Chris Knowles September 17, 2016 at 6:55 pm #

    What exactly do you mean by ‘a mix of random and grid search’? Can you please elaborate? Thanks.

    • Jason Brownlee September 18, 2016 at 7:58 am #

      Great question Chris.

      You can use random search to find good starting points, then grid search to zoom in and find the local optima (or close to it) for those good starting points. Using the two approaches interchangeably like a manual optimization algorithm. If you have a lot of resources, you could just use a genetic algorithm or similar.

  4. Himanshu Rai September 28, 2016 at 3:32 am #

    Hey Jason,
    Can you suggest any relevant material on the implementation of accelerated random search?Thanks.

    • Jason Brownlee September 28, 2016 at 7:42 am #

      No, sorry. Using lots of cores with random search has always worked well for me.

  5. Aizzaac October 6, 2016 at 7:01 am #

    When does the tuning have to be done, before or after feature selection (i mean: Forward feature selection, Recursive feature elimination , etc)?

    • Jason Brownlee October 6, 2016 at 9:42 am #

      Hi Aizzaac,

      I recommend tuning a model after you have spot checked a number of methods. I think it is an activity to improve what is working and get the most out of it, not to find what might work.

      This step-by-step process for working through a might make things clearer:
      http://machinelearningmastery.com/start-here/#process

  6. Ehsan October 8, 2016 at 6:48 am #

    Thanks Jason.
    Lets say we optimized our parameters by grid search or random search and get the accuracy of 0.98 so how do we realize if it did over fit or not?
    I mean i remember in Poly Kernel I used grid search and got very high accuracy but then I realized it might be over fit.

    • Jason Brownlee October 8, 2016 at 10:46 am #

      Really great question Ehsan.

      You must develop a robust test harness. Try really hard to falsify any results you get.

      For example:
      – use k-fold cross validation
      – use multiple repeats of your cross validation
      – look at the graph of performance of an algorithm while it learns over each epoch/iteration and check for test accuracy>train accuracy
      – hold back a validation dataset for final confirmation
      – and so on.

      I hope that gives you some ideas.

  7. Robin CABANNES February 24, 2017 at 8:33 pm #

    Hi, Thank you for these explanations.

    However, when I used the Grid Search Parameter Tuning with my model, it always returned to me the first value of the param_grid dictionary. For example, if I write
    param_grid = {
    ‘solver’: [‘lbfgs’,’sgd’,’adam’],
    ‘alpha’: [0.0001,0.00001,0.1,1],
    ‘activation’:[‘relu’,’tanh’],
    ‘hidden_layer_sizes’: [(20)],
    ‘learning_rate_init’: [1,0.01,0.1,0.001],
    ‘learning_rate’:[‘invscaling’,’constant’],
    ‘beta_1’:[0.9],
    ‘max_iter’:[1000],
    ‘momentum’: [0.2,0.6,1],
    }
    It will return as best_params
    {‘max_iter’: 1000, ‘activation’: ‘relu’, ‘hidden_layer_sizes’: 20, ‘learning_rate’: ‘invscaling’, ‘alpha’: 0.0001, ‘learning_rate_init’: 1, ‘beta_1’: 0.9, ‘solver’: ‘sgd’, ‘momentum’: 0.2}

    but if I just change the order of Learning ate init for example ‘learning_rate_init’: [0.001,0.01,0.1,1], it will return:

    {‘max_iter’: 1000, ‘activation’: ‘relu’, ‘hidden_layer_sizes’: 20, ‘learning_rate’: ‘invscaling’, ‘alpha’: 0.0001, ‘learning_rate_init’: 0.001, ‘beta_1’: 0.9, ‘solver’: ‘sgd’, ‘momentum’: 0.2}

    Have you already had this issue?

    I don’t know if I was clear,

    Thanks,

Leave a Reply