How to Grid Search Hyperparameters for Deep Learning Models in Python with Keras

Hyperparameter optimization is a big part of deep learning.

The reason is that neural networks are notoriously difficult to configure, and a lot of parameters need to be set. On top of that, individual models can be very slow to train.

In this post, you will discover how to use the grid search capability from the scikit-learn Python machine learning library to tune the hyperparameters of Keras’s deep learning models.

After reading this post, you will know:

  • How to wrap Keras models for use in scikit-learn and how to use grid search
  • How to grid search common neural network parameters, such as learning rate, dropout rate, epochs, and number of neurons
  • How to define your own hyperparameter tuning experiments on your own projects

Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Aug/2016: First published
  • Update Nov/2016: Fixed minor issue in displaying grid search results in code examples
  • Update Oct/2016: Updated examples for Keras 1.1.0, TensorFlow 0.10.0 and scikit-learn v0.18
  • Update Mar/2017: Updated example for Keras 2.0.2, TensorFlow 1.0.1 and Theano 0.9.0
  • Update Sept/2017: Updated example to use Keras 2 “epochs” instead of Keras 1 “nb_epochs”
  • Update March/2018: Added alternate link to download the dataset
  • Update Oct/2019: Updated for Keras 2.3.0 API
  • Update Jul/2022: Updated for TensorFlow/Keras and SciKeras 0.8
How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras

How to grid search hyperparameters for deep learning models in Python with Keras
Photo by 3V Photo, some rights reserved.


In this post, you will discover how you can use the scikit-learn grid search capability. You will be given a suite of examples that you can copy and paste into your own project as a starting point.

Below is a list of the topics this post will cover:

  1. How to use Keras models in scikit-learn
  2. How to use grid search in scikit-learn
  3. How to tune batch size and training epochs
  4. How to tune optimization algorithms
  5. How to tune learning rate and momentum
  6. How to tune network weight initialization
  7. How to tune activation functions
  8. How to tune dropout regularization
  9. How to tune the number of neurons in the hidden layer

How to Use Keras Models in scikit-learn

Keras models can be used in scikit-learn by wrapping them with the KerasClassifier or KerasRegressor class from the module SciKeras. You may need to run the command pip install scikeras first to install the module.

To use these wrappers, you must define a function that creates and returns your Keras sequential model, then pass this function to the model argument when constructing the KerasClassifier class.

For example:

The constructor for the KerasClassifier class can take default arguments that are passed on to the calls to, such as the number of epochs and the batch size.

For example:

The constructor for the KerasClassifier class can also take new arguments that can be passed to your custom create_model() function. These new arguments must also be defined in the signature of your create_model() function with default parameters.

For example:

You can learn more about these from the SciKeras documentation.

How to Use Grid Search in scikit-learn

Grid search is a model hyperparameter optimization technique.

In scikit-learn, this technique is provided in the GridSearchCV class.

When constructing this class, you must provide a dictionary of hyperparameters to evaluate in the param_grid argument. This is a map of the model parameter name and an array of values to try.

By default, accuracy is the score that is optimized, but other scores can be specified in the score argument of the GridSearchCV constructor.

By default, the grid search will only use one thread. By setting the n_jobs argument in the GridSearchCV constructor to -1, the process will use all cores on your machine. However, sometimes this may interfere with the main neural network training process.

The GridSearchCV process will then construct and evaluate one model for each combination of parameters. Cross validation is used to evaluate each individual model, and the default of 3-fold cross validation is used, although you can override this by specifying the cv argument to the GridSearchCV constructor.

Below is an example of defining a simple grid search:

Once completed, you can access the outcome of the grid search in the result object returned from The best_score_ member provides access to the best score observed during the optimization procedure, and the best_params_ describes the combination of parameters that achieved the best results.

You can learn more about the GridSearchCV class in the scikit-learn API documentation.

Problem Description

Now that you know how to use Keras models with scikit-learn and how to use grid search in scikit-learn, let’s look at a bunch of examples.

All examples will be demonstrated on a small standard machine learning dataset called the Pima Indians onset of diabetes classification dataset. This is a small dataset with all numerical attributes that is easy to work with.

  1. Download the dataset and place it in your currently working directly with the name pima-indians-diabetes.csv (update: download from here).

As you proceed through the examples in this post, you will aggregate the best parameters. This is not the best way to grid search because parameters can interact, but it is good for demonstration purposes.

Note on Parallelizing Grid Search

All examples are configured to use parallelism (n_jobs=-1).

If you get an error like the one below:

Kill the process and change the code to not perform the grid search in parallel; set n_jobs=1.

Need help with Deep Learning in Python?

Take my free 2-week email course and discover MLPs, CNNs and LSTMs (with code).

Click to sign-up now and also get a free PDF Ebook version of the course.

How to Tune Batch Size and Number of Epochs

In this first simple example, you will look at tuning the batch size and number of epochs used when fitting the network.

The batch size in iterative gradient descent is the number of patterns shown to the network before the weights are updated. It is also an optimization in the training of the network, defining how many patterns to read at a time and keep in memory.

The number of epochs is the number of times the entire training dataset is shown to the network during training. Some networks are sensitive to the batch size, such as LSTM recurrent neural networks and Convolutional Neural Networks.

Here you will evaluate a suite of different mini-batch sizes from 10 to 100 in steps of 20.

The full code listing is provided below:

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running this example produces the following output:

You can see that the batch size of 10 and 100 epochs achieved the best result of about 70% accuracy.

How to Tune the Training Optimization Algorithm

Keras offers a suite of different state-of-the-art optimization algorithms.

In this example, you will tune the optimization algorithm used to train the network, each with default parameters.

This is an odd example because often, you will choose one approach a priori and instead focus on tuning its parameters on your problem (see the next example).

Here, you will evaluate the suite of optimization algorithms supported by the Keras API.

The full code listing is provided below:

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Note the function create_model() defined above does not return a compiled model like that one in the previous example. This is because setting an optimizer for a Keras model is done in the compile() function call; hence it is better to leave it to the KerasClassifier wrapper and the GridSearchCV model. Also, note that you specified loss="binary_crossentropy" in the wrapper as it should also be set during the compile() function call.

Running this example produces the following output:

The KerasClassifier wrapper will not compile your model again if the model is already compiled. Hence the other way to run GridSearchCV is to set the optimizer as an argument to the create_model() function, which returns an appropriately compiled model like the following:

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Note that in the above, you have the prefix model__ in the parameter dictionary param_grid. This is required for the KerasClassifier in the SciKeras module to make clear that the parameter needs to route into the create_model() function as arguments, rather than some parameter to set up in compile() or fit(). See also the routed parameter section of SciKeras documentation.

Running this example produces the following output:

The results suggest that the ADAM optimization algorithm is the best with a score of about 70% accuracy.

How to Tune Learning Rate and Momentum

It is common to pre-select an optimization algorithm to train your network and tune its parameters.

By far, the most common optimization algorithm is plain old Stochastic Gradient Descent (SGD) because it is so well understood. In this example, you will look at optimizing the SGD learning rate and momentum parameters.

The learning rate controls how much to update the weight at the end of each batch, and the momentum controls how much to let the previous update influence the current weight update.

You will try a suite of small standard learning rates and momentum values from 0.2 to 0.8 in steps of 0.2, as well as 0.9 (because it can be a popular value in practice). In Keras, the way to set the learning rate and momentum is the following:

In the SciKeras wrapper, you will route the parameters to the optimizer with the prefix optimizer__.

Generally, it is a good idea to also include the number of epochs in an optimization like this as there is a dependency between the amount of learning per batch (learning rate), the number of updates per epoch (batch size), and the number of epochs.

The full code listing is provided below:

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running this example produces the following output:

You can see that SGD is not very good on this problem; nevertheless, the best results were achieved using a learning rate of 0.001 and a momentum of 0.0 with an accuracy of about 68%.

How to Tune Network Weight Initialization

Neural network weight initialization used to be simple: use small random values.

Now there is a suite of different techniques to choose from. Keras provides a laundry list.

In this example, you will look at tuning the selection of network weight initialization by evaluating all the available techniques.

You will use the same weight initialization method on each layer. Ideally, it may be better to use different weight initialization schemes according to the activation function used on each layer. In the example below, you will use a rectifier for the hidden layer. Use sigmoid for the output layer because the predictions are binary. The weight initialization is now an argument to create_model() function, where you need to use the model__ prefix to ask the KerasClassifier to route the parameter to the model creation function.

The full code listing is provided below:

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running this example produces the following output:

We can see that the best results were achieved with a uniform weight initialization scheme achieving a performance of about 72%.

How to Tune the Neuron Activation Function

The activation function controls the non-linearity of individual neurons and when to fire.

Generally, the rectifier activation function is the most popular. However, it used to be the sigmoid and the tanh functions, and these functions may still be more suitable for different problems.

In this example, you will evaluate the suite of different activation functions available in Keras. You will only use these functions in the hidden layer, as a sigmoid activation function is required in the output for the binary classification problem. Similar to the previous example, this is an argument to the create_model() function, and you will use the model__ prefix for the GridSearchCV parameter grid.

Generally, it is a good idea to prepare data to the range of the different transfer functions, which you will not do in this case.

The full code listing is provided below:

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running this example produces the following output:

Surprisingly (to me at least), the “linear” activation function achieved the best results with an accuracy of about 71%.

How to Tune Dropout Regularization

In this example, you will look at tuning the dropout rate for regularization in an effort to limit overfitting and improve the model’s ability to generalize.

For the best results, dropout is best combined with a weight constraint such as the max norm constraint.

For more on using dropout in deep learning models with Keras see the post:

This involves fitting both the dropout percentage and the weight constraint. We will try dropout percentages between 0.0 and 0.9 (1.0 does not make sense) and maxnorm weight constraint values between 0 and 5.

The full code listing is provided below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running this example produces the following output.

We can see that the dropout rate of 20% and the MaxNorm weight constraint of 3 resulted in the best accuracy of about 77%. You may notice some of the result is nan. Probably it is due to the issue that the input is not normalized and you may run into a degenerated model by chance.

How to Tune the Number of Neurons in the Hidden Layer

The number of neurons in a layer is an important parameter to tune. Generally the number of neurons in a layer controls the representational capacity of the network, at least at that point in the topology.

Also, generally, a large enough single layer network can approximate any other neural network, at least in theory.

In this example, we will look at tuning the number of neurons in a single hidden layer. We will try values from 1 to 30 in steps of 5.

A larger network requires more training and at least the batch size and number of epochs should ideally be optimized with the number of neurons.

The full code listing is provided below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running this example produces the following output.

We can see that the best results were achieved with a network with 30 neurons in the hidden layer with an accuracy of about 73%.

Tips for Hyperparameter Optimization

This section lists some handy tips to consider when tuning hyperparameters of your neural network.

  • k-fold Cross Validation. You can see that the results from the examples in this post show some variance. A default cross-validation of 3 was used, but perhaps k=5 or k=10 would be more stable. Carefully choose your cross validation configuration to ensure your results are stable.
  • Review the Whole Grid. Do not just focus on the best result, review the whole grid of results and look for trends to support configuration decisions.
  • Parallelize. Use all your cores if you can, neural networks are slow to train and we often want to try a lot of different parameters. Consider spinning up a lot of AWS instances.
  • Use a Sample of Your Dataset. Because networks are slow to train, try training them on a smaller sample of your training dataset, just to get an idea of general directions of parameters rather than optimal configurations.
  • Start with Coarse Grids. Start with coarse-grained grids and zoom into finer grained grids once you can narrow the scope.
  • Do not Transfer Results. Results are generally problem specific. Try to avoid favorite configurations on each new problem that you see. It is unlikely that optimal results you discover on one problem will transfer to your next project. Instead look for broader trends like number of layers or relationships between parameters.
  • Reproducibility is a Problem. Although we set the seed for the random number generator in NumPy, the results are not 100% reproducible. There is more to reproducibility when grid searching wrapped Keras models than is presented in this post.


In this post, you discovered how you can tune the hyperparameters of your deep learning networks in Python using Keras and scikit-learn.

Specifically, you learned:

  • How to wrap Keras models for use in scikit-learn and how to use grid search.
  • How to grid search a suite of different standard neural network parameters for Keras models.
  • How to design your own hyperparameter optimization experiments.

Do you have any experience tuning hyperparameters of large neural networks? Please share your stories below.

Do you have any questions about hyperparameter optimization of neural networks or about this post? Ask your questions in the comments and I will do my best to answer.

811 Responses to How to Grid Search Hyperparameters for Deep Learning Models in Python with Keras

  1. Avatar
    Yanbo August 9, 2016 at 9:10 am #

    As always excellent post,. I’ve been doing some hyper-parameter optimization by hand, but I’ll definitely give Grid Search a try.

    Is it possible to set up a different threshold for sigmoid output in Keras? Rather then using 0.5 I was thinking of trying 0.7 or 0.8

    • Avatar
      Jason Brownlee August 15, 2016 at 11:10 am #

      Thanks Yanbo.

      I don’t think so, but you could implement your own activation function and do anything you wish.

      • Avatar
        Shudhan September 5, 2016 at 6:20 pm #

        My question is related to this thread. How to get the probablities as the output? I dont want the class output. I read for a regression problem that no activation function is needed in the output layer. Similiar implementation will get me the probabilities ?? or the output will exceed 0 and 1??

        • Avatar
          Jason Brownlee September 6, 2016 at 9:41 am #

          Hi Shudhan, you can use a sigmoid activation and treat the outputs like probabilities (they will be in the range of 0-1).

      • Avatar
        Swapna November 2, 2017 at 11:51 pm #

        excellent post

  2. Avatar
    eclipsedu August 18, 2016 at 5:55 pm #

    Sound awesome!Will this grid search method use the full cpu(which can be 8/16 cores) ?

    • Avatar
      Jason Brownlee August 19, 2016 at 5:23 am #

      It can if you set n_jobs=-1

      • Avatar
        Hemanth Naidu S August 20, 2019 at 10:52 pm #

        Hi Jason,

        In grid search, we do get train score right?
        Why it’s not displaying in model.cv_results_ only test score we are getting..

        • Avatar
          Jason Brownlee August 21, 2019 at 6:42 am #

          You get a cross-validation score for each configuration tested.

  3. Avatar
    Reza August 18, 2016 at 6:00 pm #

    Great post,
    Can I use this tips on CNNs in keras as well?

    • Avatar
      Jason Brownlee August 19, 2016 at 5:24 am #

      They can be a start, but remember it is a good idea to use a repeating structure in a large CNN and you will need to tune the number of filters and pool size.

      • Avatar
        maxv April 29, 2019 at 3:30 am #

        Hi Jason thanks for everything.
        Could you explain what do you mean by repeatting structure in your reply please ?

        Quick question on the GridSearchCV for CNN, param_grid=param_grid using the sklearn wrapper gives this error : ”ValueError: filters is not a legal parameter ”
        How can we use the wrapper for the filters params of Conv1D ?

      • Avatar
        Salvin Sanjesh Prasad April 8, 2021 at 1:02 pm #

        Dear Jason,

        This is an An excellent post. I have question: how can we grid search the optimum the number of filters in three different layers of CNN. For example: [60, 70 ,80] in layer 1, [20, 30, 40] in layer 2 and [5,10,20] in layer 3. I have searched everywhere for codes using grid search but could not find this. I really need to use grid search for this. I would be highly grateful for your kind advice. If possible, also reply in via my email address that I have provided (as this was a requirement for me to comment)

        • Avatar
          Jason Brownlee April 9, 2021 at 5:16 am #


          You might need to write some for-loops, e.g. do the search manually.

          Also, we never find an “optimal” configuration, just a good enough configuration given the time/resources available.

  4. Avatar
    Prashant August 22, 2016 at 4:55 pm #

    Hi Jason, First of all great post! I applied this by dividing the data into train and test and used train dataset for grid fit. Plan was to capture best parameters in train and apply them on test to see accuracy. But it seems and applied with same parameters on same dataset (in this case train) give different accuracy results. Any idea why this happens. I can share the code if it helps.

    • Avatar
      Jason Brownlee August 23, 2016 at 6:00 am #

      You will see small variation in the performance of a neural net with the same parameters from run to run. This is because of the stochastic nature of the technique and how very hard it is to fix the random number seed successfully in python/numpy/theano.

      You will also see small variation due to the data used to train the method.

      Generally, you could use all of your data to grid search to try to reduce the second type of variation (slower). You could store results and use statistical significance tests to compare populations of results to see if differences are significant to sort out the first type or variation.

      I hope that helps.

  5. Avatar
    vinay August 22, 2016 at 9:05 pm #

    hi, I think this will best tutorial i ever found on web….Thanks for sharing….is it possible to use these tips on LSTM, Bilstm cnnlstm

    • Avatar
      Jason Brownlee August 23, 2016 at 5:57 am #

      Thanks Vinay, I’m glad it’s useful.

      Absolutely, you could use these tactics on other algorithm types.

  6. Avatar
    shudhan September 2, 2016 at 3:26 pm #

    Best place to learn the tuning.. my question – is it good to follow the order you mentioned to tune the parameters? I know the most significant parameters should be tuned first

    • Avatar
      Jason Brownlee September 3, 2016 at 6:56 am #

      Thanks. The order is a good start. It is best to focus on areas where you think you will get the biggest improvement first – which is often the structure of the network (layers and neurons).

      • Avatar
        Reed Guo September 2, 2018 at 5:59 pm #

        Hi, Jason

        Thanks for your post. It is excellent.

        I have a question.

        You tune batch size and epoch first. But if you set a inappropriate number of neurons or activation function, then batch size and epoch tuning won’t make sense.

        So I think we should tune all of these hyper-parameters at the same time.

        How do you think about it?

  7. Avatar
    Satheesh September 27, 2016 at 12:24 am #

    when I am using the categorical_entropy loss function and running the grid search with n_jobs more than 1 its throwing error “cannot pickle object class”, but the same thing is working fine with binary_entropyloss. Can you tell me if I am making any mistake in my code:
    def create_model(optimizer=’adam’):
    # create model
    model.add(Dense(30, input_dim=59, init=’normal’, activation=’relu’))
    model.add(Dense(15, init=’normal’, activation=’sigmoid’))
    model.add(Dense(3, init=’normal’, activation=’sigmoid’))
    # Compile model
    model.compile(loss=’categorical_crossentropy’, optimizer=optimizer, metrics=[‘accuracy’])
    return model

    # Create Keras Classifier
    print “——————— Running Grid Search on Keras Classifier for epochs and batch ——————”
    clf = model = KerasClassifier(build_fn = create_model, verbose=0)
    param_grid = {“batch_size”:range(10, 30, 10), “nb_epoch”:range(50, 150, 50)}
    optimizer = [‘SGD’, ‘RMSprop’, ‘Adagrad’, ‘Adadelta’, ‘Adam’, ‘Adamax’, ‘Nadam’]
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=4)
    grid_result =, y_train)
    print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))

    • Avatar
      Jason Brownlee September 27, 2016 at 7:44 am #

      Strange Satheesh, I have not seen that before.

      Let me know if you figure it out.

      • Avatar
        Kai September 18, 2017 at 10:01 pm #

        I came cross and solved the problem several days ago. Please use “epochs” instead of “nb_epoch” in param_grid dict. Personally, I guess “cannot pickle object class” means the neuron network cannot be built because of some errors. Open to discussion.

        • Avatar
          Jason Brownlee September 19, 2017 at 7:40 am #

          Glad to hear it.

          I updated the example to use “epochs” to work with Keras 2.

  8. Avatar
    L Fenu November 9, 2016 at 7:47 pm #

    excellent post, thanks. It’s been very helpful to get me started on hyperparameterisation.

    One thing I haven’t been able to do yet is to grid search over parameters which are not proper to the NN but to the trainign set. For example, I can fine-tune the input_dim parameter by creating a function generator which takes care of creating the function that will create the model, like this:

    # fp_subset is a subset of columns of my whole training set.

    create_basic_ANN_model = kt.ANN_model_gen( # defined elsewhere
    input_dim=len(fp_subset), output_dim=1, layers_num=2, layers_sizes=[len(fp_subset)/5, len(fp_subset)/10, ],
    loss=’mean_squared_error’, optimizer=’adadelta’, metrics=[‘mean_squared_error’, ‘mean_absolute_error’]

    model = KerasRegressor(build_fn=create_basic_ANN_model, verbose=1)
    # define the grid search parameters
    batch_size = [10, 100]
    epochs = [5, 10]

    param_grid = dict(batch_size=batch_size, nb_epoch=epochs)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=7)

    grid_results =, trY)

    this works but only as a for loop over the different fp_subset, which I must define manually.
    I could easily pick the best out of every run but it wuld be great if I could fold them all inside a big grid definition and fit, so as to automatically pick the largest.

    However, until now haven’t been able to figure out a way to get that in my head.
    If the wrapper function is useful to anyone, I can post a generalised version here.

    • Avatar
      Jason Brownlee November 10, 2016 at 7:42 am #

      Good question.

      You might just need to us a loop around the whole lot for different projections/views of your training data.

      • Avatar
        L Fenu November 11, 2016 at 1:05 am #

        Thanks. I ended up coding my own for loop, saving the results of each grid in a dict, sorting the hash by the perofrmance metrics, and picking the best model.

        Now, the next question is: How do I save the model’s architecture and weights to a .json .hdf5 file? I know how to do that for a simple model. But how do I extract the best model out of the gridsearch results?

        • Avatar
          Jason Brownlee November 11, 2016 at 10:04 am #

          Well done.

          No need. Once you know the parameters, you can use them to train a new standalone model on all of your training data and start making predictions.

          • Avatar
            Fenu Luca November 15, 2016 at 3:23 am #

            I may have found a way. How about this?

            best_model = grid_result.best_estimator_.model
            best_model_file_path = ‘your_pick_here’
            model2json = best_model.to_json()
            with open( best_model_file_path+’.json’, ‘w’) as json_file:

  9. Avatar
    volador November 14, 2016 at 6:21 pm #

    Hi Jason, I think this is very best deep learning tutorial on the web. Thanks for your work. I have a question is :how to use the heuristic algorithm to optimize Hyperparameters for Deep Learning Models in Python With Keras, these algorithms like: Genetic algorithm, Particle swarm optimization, and Cuckoo algorithm etc. If the idea could be experimented, could you give an example

    • Avatar
      Jason Brownlee November 15, 2016 at 7:50 am #

      Thanks for your support volador.

      You could search the hyperparameter space using a stochastic optimization algorithm like a genetic algorithm and use the mean performance as the cost function orf fitness function. I don’t have a worked example, but it would be relatively easy to setup.

  10. Avatar
    Jan de Lange November 15, 2016 at 6:50 am #

    Hi Jason, very helpful intro into gridsearch for Keras. I have used your guidance in my code, but rather than using the default ‘accuracy’ to be optimized, my model requires a specific evaluation function to be optimized. You hint at this possibility in the introduction, but there is no example of it. I have followed the SciKit-learn documentation, but I fail to come up with the correct syntax.

    I have posted my question at StackOverflow, but since it is quite specific, it requires understanding of SciKit-learn in combination with Keras.

    Perhaps you can have a look? I think it would nicely extend your tutorial.

    Thanks, Jan

  11. Avatar
    Jan de Lange November 16, 2016 at 7:31 am #

    Yup, same sources as I referenced in my post at Stackoverflow.

  12. Avatar
    Anthony Ohazulike December 6, 2016 at 12:46 am #

    Good tutorial again Jason…keep on the good job!

  13. Avatar
    nrcjea001 December 13, 2016 at 10:48 pm #

    Hi Jason

    First off, thank you for the tutorial. It’s very helpful.

    I was also hoping you would assist on how to adapt the keras grid search to stateful lstms as discussed in

    I’ve coded the following:

    # create model
    model = KerasRegressor(build_fn=create_model, nb_epoch=1, batch_size=bats,
    verbose=2, shuffle=False)

    # define the grid search parameters
    h1n = [5, 10] # number of hidden neurons
    param_grid = dict(h1n=h1n)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=5)

    for i in range(100):, trainY)

    Is grid.reset_states() corrrect? or would you suggest creating function callback for reset states.


    • Avatar
      Jason Brownlee December 14, 2016 at 8:27 am #

      Great question.

      With stateful LSTMs we must control the resetting of states after each epoch. The sklearn framework does not open this capacity to us – at least it looks that way to me off the cuff.

      I think you may have to grid search stateful LSTM params manually with a ton of for loops. Sorry.

      If you discover something different, let me know. i.e. there may be a way in the back door to the sklearn grid search functionality that we can inject our own custom epoch handing.

  14. Avatar
    Thomas Maier December 21, 2016 at 2:53 am #

    Hi Jason

    Thanks a lot for this and all the other great tutorials!

    I tried to combine this gridsearch/keras approach with a pipeline. It works if I tune nb_epoch or batch_size, but I get an error if I try to tune the optimizer or something else in the keras building function (I did not forget to include the variable as an argument):

    def keras_model(optimizer = ‘adam’):
    model = Sequential()
    model.add(Dense(80, input_dim=79, init= ‘normal’))
    model.add(Dense(1, init=’normal’))
    model.compile(optimizer=optimizer, loss=’mse’)
    return model

    kRegressor = KerasRegressor(build_fn=keras_model, nb_epoch=500, batch_size=10, verbose=0)

    estimators = []
    estimators.append((‘imputer’, preprocessing.Imputer(strategy=’mean’)))
    estimators.append((‘scaler’, preprocessing.StandardScaler()))
    estimators.append((‘kerasR’, kRegressor))
    pipeline = Pipeline(estimators)

    param_grid = dict(kerasR__optimizer = [‘adam’,’rmsprop’])

    grid = GridSearchCV(pipeline, param_grid, cv=5, scoring=’neg_mean_squared_error’)

    Do you know this problem?

    Thanks, Thomas

    • Avatar
      Jason Brownlee December 21, 2016 at 8:44 am #

      Thanks Thomas. I’ve not seen this issue.

      I think we’re starting to push the poor Keras sklearn wrapper to the limit.

      Maybe the next step is to build out a few functions to do manual grid searching across network configs.

      • Avatar
        James April 14, 2018 at 12:26 am #

        Has there been a blog post on this?

    • Avatar
      Anastasiya December 12, 2018 at 9:41 pm #

      Have you solved this issue? I’m exploring Keras now as wel and came across exactly the same problem.

  15. Avatar
    Jimi December 21, 2016 at 3:26 pm #

    Great resource!

    Any thoughts on how to get the “history” objects out of grid search? It could be beneficial to plot the loss and accuracy to see when a model starts to flatten out.

    • Avatar
      Jason Brownlee December 22, 2016 at 6:30 am #

      Not sure off the cuff Jimi, perhaps repeat the run standalone for the top performing configuration.

  16. Avatar
    DeepLearning January 4, 2017 at 6:08 am #

    Thanks for the post. Can we optimize the number of hidden layers as well on top of number of neurons in each layers?

    • Avatar
      Jason Brownlee January 4, 2017 at 9:00 am #

      Yes, it just may be very time consuming depending on the size of the dataset and the number of layers/nodes involved.

      Try it on some small datasets from the UCI ML Repo.

      • Avatar
        DeepLearning January 4, 2017 at 12:02 pm #

        Thanks. Would you mind looking at below code?

        def create_model(neurons=1, neurons2=1):
        # create model
        model = Sequential()
        model.add(Dense(neurons1, input_dim=8))
        model.add(Dense(1, init=’uniform’, activation=’sigmoid’))
        # Compile model
        model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
        return model
        # define the grid search parameters
        neurons1 = [1, 3, 5, 7]
        param_grid = dict(neurons1=neurons1, neurons2=neurons2)
        grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
        grid_result =, Y)

        This code runs without error (I excluded certain X, y parts for brewity) but when I run “, Y), it gives AssertionError.

        I’d appreciate if you can show me where I am wrong.

        • Avatar
          DeepLearning January 4, 2017 at 12:26 pm #

          Update” It worked when I deleted 0 from neurons2. Thanks

        • Avatar
          Jason Brownlee January 5, 2017 at 9:16 am #

          A Dense() with a value of 0 neurons might blow up. Try removing the 0 from your neurons2 array.

          A good debug strategy is to cut code back to the minimum, make it work, then and add complexity. Here. Try searching a grid of 1 and 1 neurons, make it all work, then expand the grid you search.

          Let me know how you go.

  17. Avatar
    DeepLearning January 9, 2017 at 11:04 am #

    I keep getting error messages and I tried a big for loops that scan for all possible combinations of layer numbers, neuron numbers, other optimization stuff within defined limits. It is very time consuming code, but I could not figure it out how to adjust layer structure and other optimization parameters in the same code using GridSearch. If you would provide a code for that in your blog one day, that would be much appreciated. Thanks.

  18. Avatar
    Rajneesh January 11, 2017 at 10:48 am #

    Hi Jason,
    Many thanks for this awesome tutorial !

  19. Avatar
    Andy January 22, 2017 at 1:02 pm #

    Hi Jason,

    Great tutorial! I’m running into a slight issue. I tried running this on my own variation of the code and got the following error:

    TypeError: get_params() got an unexpected keyword argument ‘deep’

    I copied and pasted your code using the given data set and got the same error. The code is showing an error on the grid_result =, Y) line. I looked through the other comments and didn’t see anyone with the same issue. Do you know where this could be coming from?

    Thanks for your help!

    • Avatar
      YechiBechi January 23, 2017 at 2:18 am #

      same issue here,

      great tutorial, life saver.

    • Avatar
      Jason Brownlee January 23, 2017 at 8:35 am #

      Hi Andy, sorry to hear that.

      Is this happening with a specific example or with all of them?

      Are you able to check your version of Python/sklearn/keras/tf/theano?


      I can confirm the first example still works fine with Python 2.7, sklearn 0.18.1, Keras 1.2.0 and TensorFlow 0.12.1.

      • Avatar
        Andy January 25, 2017 at 7:12 am #

        The only differences are I am running Python 3.5 and Keras 1.2.1. The example I ran previously was the grid search for the number of neurons in a layer. But I just ran the first example and got the same error.

        Do you think the issue is due to the next version of Python? If so, what should my next steps be?

        Thanks for your help and quick response!

  20. Avatar
    kono February 8, 2017 at 3:14 am #


    Can you use early_stopping to decide n_epoch?

    • Avatar
      Jason Brownlee February 8, 2017 at 9:36 am #

      Yes, that is a good method to find a generalized model.

  21. Avatar
    Jayant February 23, 2017 at 4:33 am #

    Hi Jason,

    Really great article. I am a big fan of your blog and your books. Can you please explain your following statement?

    “A default cross-validation of 3 was used, but perhaps k=5 or k=10 would be more stable. Carefully choose your cross validation configuration to ensure your results are stable.”

    I didn’t see anywhere cross-validation being used.

    • Avatar
      Jason Brownlee February 23, 2017 at 8:56 am #

      Hi Jayant,

      Grid search uses k-fold cross-validation to evaluate the performance of each combination of parameters on unseen data.

  22. Avatar
    Jing February 28, 2017 at 2:09 am #

    Hi Jason,
    thanks for this awesome tutorial !
    I have two questions: 1. In “model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])”, accuracy is used for evaluate results. But GridSearchCV also has scoring parameter, if I set “scoring=’f1’”,which one is used for evaluate the results of grid search? 2.How to set two evaluate parameters ,e.g. ‘accuracy’and ’f1’ evaluating the results of grid search?

    • Avatar
      Jason Brownlee February 28, 2017 at 8:13 am #

      Hi Jing,

      You can set the “scoring” argument for GridSearchCV with a string of the performance measure to use, or the name of your own scoring function. You can learn about this argument here:

      You can see a full list of supported scoring measures here:

      As far as I know you can only grid search using a single measure.

      • Avatar
        Jing February 28, 2017 at 12:50 pm #

        Thank you so much for your help!

      • Avatar
        Jing February 28, 2017 at 1:54 pm #

        I find no matter what evaluate parameters used in GridSearchCV “scoring”,”metrics” in “model.compile” must be [‘accuracy’],otherwise the program gives “ValueError: The model is not configured to compute accuracy.You should pass ‘metrics=[“accuracy”]’ to the ‘model.compile()’method. So, if I set:
        model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
        grid = GridSearchCV(estimator=model, param_grid=param_grid, scoring=’recall’)
        the grid_result.best_score_ =0.72.My question is: 0.72 is accuracy or recall ? Thank you!

        • Avatar
          Jason Brownlee March 1, 2017 at 8:31 am #

          Hi Jing,

          When using GridSearchCV with Keras, I would suggest not specifying any metrics when compiling your Keras model.

          I would suggest only setting the “scoring” argument on the GridSearchCV. I would expect the metric reported by GridSearchCV to be the one that you specified.

          I hope that helps.

  23. Avatar
    Dan March 8, 2017 at 4:13 am #

    Great Blogpost. Love it. You are awesome Jason. I got one question to GridsearchCV. As far as i understand the crossvalidation already takes place in there. That’s why we do not need any kfold anymore.
    But with this technique we would have no validation set correct? e.g. with a default value of 3 we would have 2 training sets and one test set.

    That means in kfold as well as in GridsearchCV there is no requirement for creating a validation set anymore?


    • Avatar
      Jason Brownlee March 8, 2017 at 9:44 am #

      Hi Dan,

      Yes, GridSearchCV performs cross validation and you must specify the number of folds. You can hold back a validation set to double check the parameters found by the search if you like. This is optional.

      • Avatar
        Dan March 9, 2017 at 3:25 am #

        Thank you for the quick response Jason. Especially considering the huge amount of questions you get.

  24. Avatar
    Johan Steunenberg March 22, 2017 at 8:25 pm #

    What I’m missing in the tutorial is the info, how to get the best params in the model with KERAS. Do I pickup the best parameters and call ‘create_model’ again with those parameters or can I call the GridSearchCV’s ‘predict’ function? (I will try out for myself but for completeness it would be good to have it in the tutorial as well.)

    • Avatar
      Jason Brownlee March 23, 2017 at 8:49 am #

      I see, but we don’t know the best parameters, we must search for them.

  25. Avatar
    Maycown Miranda April 5, 2017 at 2:09 am #

    Hi, Jason. I am getting
    /usr/local/lib/python2.7/dist-packages/keras/wrappers/ in check_params(self=, params={‘batch_size’: 10, ‘epochs’: 10})
    80 legal_params += inspect.getargspec(fn)[0]
    81 legal_params = set(legal_params)
    83 for params_name in params:
    84 if params_name not in legal_params:
    —> 85 raise ValueError(‘{} is not a legal parameter’.format(params_name))
    params_name = ‘epochs’
    87 def get_params(self, _):
    88 “””Gets parameters for this estimator.

    ValueError: epochs is not a legal parameter

    • Avatar
      Jason Brownlee April 9, 2017 at 2:32 pm #

      It sounds like you need to upgrade to Keras v2.0 or higher.

      • Avatar
        Chandra Sutrisno Tjhong November 28, 2017 at 10:46 am #

        I experienced the same problem.I upgraded my keras and the same problem still occurs.

    • Avatar
      neumatron11 February 5, 2019 at 12:42 pm #

      I was getting the ‘not a legal paramater’ error when I was trying to pass required inputs into my create_model function in the wrapper.

      model = KerasClassifier(build_fn=create_model(input_dim = x ), verbose=0)

      when I removed it and included it in the grid search instead it ran fine, I just added it to the dictionary of parameters

      input_dim = [x]

  26. Avatar
    Usman May 3, 2017 at 7:56 am #

    Nice tutorial. I would like to optimize the number of hidden layers in the model. Can you please guide in this regard, thanks

    • Avatar
      Jason Brownlee May 4, 2017 at 7:59 am #

      Thanks Usman.

      Consider exploring specific patterns, e.g. small-big-small, etc.

  27. Avatar
    Carl May 5, 2017 at 12:58 pm #

    Do you know any way this could be possible using a network with multiple inputs?

    • Avatar
      Sukhpal December 16, 2019 at 2:18 am #

      The optmization of network topology ,learning rate ,batch size and epochs are done in stages?sir please tell me why these were done in stages

      • Avatar
        Jason Brownlee December 16, 2019 at 6:18 am #

        To make the explanation to the reader simpler.

        • Avatar
          Dan Thomas May 28, 2020 at 7:21 am #

          Also probably to reduce search space, and thus computational time.

  28. Avatar
    DanielP May 9, 2017 at 4:26 pm #

    Hi Jason, great to see posts like this – amazing job!

    Just noticed, when you tune the optimisation algorithm SGD performs at 34% accuracy. As no parameters are being passed to the SGD function, I’d assume it takes the default configuration, lr=0.01, momentum=0.0.

    Later on, as you look for better configurations for SGD, best result (68%) is found when {‘learn_rate’: 0.01, ‘momentum’: 0.0}.

    It seems to me that these two experiments use exactly the same network configuration (including the same SGD parameters), yet their resulting accuracies differ significantly. Do you have any intuition as to why this may be happening?

  29. Avatar
    Pradanuari May 14, 2017 at 3:13 am #

    Hi Jason!
    absolutely love your tutorial! But would you mind to give tutorial for how to tune the number of hidden layer?


  30. Avatar
    Pradanuari May 14, 2017 at 11:32 pm #

    Thank you so much Jason!

  31. Avatar
    Ibrahim El-Fayoumi May 17, 2017 at 12:53 pm #

    Hello Jason
    I tried to use your idea in a similar problem but I am getting error : AttributeError: ‘NoneType’ object has no attribute ‘loss’
    it looks like the model does not define loss function?

    This is the error I get:
    b\site-packages\keras-2.0.4-py3.5.egg\keras\wrappers\ in fit(self=, x=memmap([[[ 0., 0., 0., …, 0., 0., 0.],
    …, 0., 0., …, 0., 0., 0.]]], dtype=float32), y=array([[ 0., 0., 0., …, 0., 0., 0.],
    [ 0., 0., 0., …, 0., 1., 0.]]), **kwargs={})
    135 self.model = self.build_fn(
    136 **self.filter_sk_params(self.build_fn.__call__))
    137 else:
    138 self.model = self.build_fn(**self.filter_sk_params(self.build_fn))
    –> 140 loss_name = self.model.loss
    loss_name = undefined
    self.model.loss = undefined
    141 if hasattr(loss_name, ‘__name__’):
    142 loss_name = loss_name.__name__
    143 if loss_name == ‘categorical_crossentropy’ and len(y.shape) != 2:
    144 y = to_categorical(y)

    AttributeError: ‘NoneType’ object has no attribute ‘loss’

    Process finished with exit code 1


    • Avatar
      Jason Brownlee May 18, 2017 at 8:26 am #

      Does the example in the blog post work on your system?

      • Avatar
        Ibrahim El-Fayoumi May 18, 2017 at 12:18 pm #

        Ok, I think your code needs to be placed after
        if __name__ == ‘__main__’:

        to work with multiprocess…

        But thanks for the post is great…

        • Avatar
          Jason Brownlee May 19, 2017 at 8:12 am #

          Not on Linux and OS X when I tested it, but thanks for the tip.

        • Avatar
          Gautam August 25, 2017 at 11:33 pm #

          n_jobs=-1 doesnt work on Windows.

          @Ibrahim: Can you please explain, what part of the code needs to be behind
          if __name__ == ‘__main__’: )

          • Avatar
            Martin October 19, 2019 at 4:52 am #

            Assuming you have got several functions (i have a single python script acting as main file and the other stuff in a separate file, but at least functions like Jason does) you need to put this at the very begining of your main routine where everything comes together and is set-up. Note, since it is an if-condition, you need to tab everything below the condition.

            @Jason maybe you can add this in the section where you talk about the problems on parallelization as a hint for windows users.

          • Avatar
            Jason Brownlee October 19, 2019 at 6:53 am #

            Thanks. I really don’t know about windows.

            I’ve not seen a windows box in a long time and I’m impressed people use them for software development.

  32. Avatar
    Edward May 21, 2017 at 3:17 am #

    Hello Jason!
    I do the first step – try to tune Batch Size and Number of Epochs and get
    print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
    Best: 0.707031 using {‘epochs’: 100, ‘batch_size’: 40}
    After that I do the same and get
    print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
    Best: 0.688802 using {‘epochs’: 100, ‘batch_size’: 20}
    And so on
    The problem is in the grid_result.best_score_

    I expect that in the second step (for ample tuning optimizer) I will get grid_result.best_score_ better than in the first step (in the second step i use grid_result.best_params_ from the first step). But it is not true
    Tune all Hyperparameters is a very long time

    How to fix it?

    • Avatar
      Jason Brownlee May 21, 2017 at 6:01 am #

      Consider tuning different parameters, like network structure or number of input features.

      • Avatar
        Edward May 21, 2017 at 7:18 pm #

        Thanks a lot Jason!

  33. Avatar
    pattijane May 21, 2017 at 7:44 am #


    I’d like to have your opinion about a problem:

    I have two loss function plots, with SGD and Adamax as optimizer with same learning rate.
    Loss function of SGD looks like the red one, whereas Adamax’s looks like blue one.

    I have better scores with Adamax on validation data. I’m confused about how to proceed, should I choose Adamax and play with learning rates a little more, or go on with SGD and somehow try to improve performance?


    • Avatar
      Jason Brownlee May 22, 2017 at 7:49 am #

      Explore both, but focus on the validation score of interest (e.g. accuracy, RMSE, etc.) over loss.

      For example, you can get very low loss and get worse accuracy.

      • Avatar
        pattijane May 22, 2017 at 6:35 pm #

        Thanks for your response! I experimented with different learning rates and found out a reasonable one, (good for both Adamax and SGD) and now I try to fix learning rate and optimizer and focus on other hyperparameters such as batch-size and number of neurons. Or would be better if I set those first?

        • Avatar
          Jason Brownlee May 23, 2017 at 7:49 am #

          Number of neurons will have a big effect along with learning rate.

          Batch size will have a smaller effect and could be optimized last.

  34. Avatar
    Lotem May 23, 2017 at 1:47 am #

    Thanks for this post!

    One question – why not use grid search on all the parameters together, rather than preforming several grid searches and finding each parameter separately? surly the results are not the same…

    • Avatar
      Jason Brownlee May 23, 2017 at 7:54 am #

      Great question,

      In practice, the datasets are large and it can take a long time and require a lot of RAM.

  35. Avatar
    StatsSorceress May 25, 2017 at 6:52 am #

    Hi Jason,

    Excellent post!

    It seems to me that if you use the entire training set during your cross-validation, then your cross-validation error is going to give you an optimistically biased estimate of your validation error. I think this is because when you train the final model on the entire dataset, the validation set you create to estimate test performance comes out of the training set.

    My question is: assuming we have a lot of data, should we use perhaps only 50% of the training data for cross-validation for the hyperparameters, and then use the remaining 50% for fitting the final model (and a portion of that remaining 50% would be used for the validation set)? That way we wouldn’t be using the same data twice. I am assuming in this case that we would also have a separate test set.

    • Avatar
      Jason Brownlee June 2, 2017 at 11:38 am #

      Yes, it is a good idea to hold back a test set when tuning.

  36. Avatar
    Yang May 27, 2017 at 5:35 am #

    Thanks for your valuable post. I learned a lot from it.
    When I wrote my code for grid search, I encountered a question:

    I use fit_generator instead of fit in keras.
    Is it possible to use grid search with fit_generator ?

    I have some Merge layers in my deep learning model.
    Hence, the input of the neural network is not a single matrix.
    For example:
    Suppose we have 1,000 samples
    Input = [Input1,Input2]
    Input1 is a 1,000 *3 matrix
    Input2 is a 1,000*3*50*50 matrix (image)

    When I use the fit in your post, there is a bug….because the input1 and input2 don’t have the same dimension. So I wonder whether the fit_generator can work with grid search ?

    Thanks in advance!

  37. Avatar
    Yang May 27, 2017 at 6:46 am #

    Please ignore my previous reply.
    I find an answer here:
    Right now, the GridsearchCV using the scikit wrapper for network with multiple inputs is not available.

  38. Avatar
    Kate liu May 28, 2017 at 4:31 pm #

    Hi Jason, thank you for your good tutorial of the grid research with Keras. I followed your example with my own dataset. It could be run. But when I using the autoencoder structure, instead of the sequential structure, to gird the parameters with my own data. It could not be run. I don’t know the reason. Could you help me? Are there any differences between the gird of sequential structure and the grid of model structure?

    The follows are my codes:

    from keras.models import Sequential
    from keras.layers import Dense, Input
    from keras.wrappers.scikit_learn import KerasClassifier
    from sklearn.model_selection import StratifiedKFold
    from sklearn.model_selection import cross_val_score
    from sklearn.model_selection import GridSearchCV
    import numpy as np
    from keras.optimizers import SGD, Adam, RMSprop, Adagrad
    from keras.regularizers import l1,l2
    from keras.models import Model
    import pandas as pd
    from keras.models import load_model


    def create_model(optimizer=’rmsprop’):

    # encoder layers
    encoding_dim =140
    input_img = Input(shape=(6,))
    encoded = Dense(300, activation=’relu’,W_regularizer=l1(0.01))(input_img)
    encoded = Dense(300, activation=’relu’,W_regularizer=l1(0.01))(encoded)
    encoded = Dense(300, activation=’relu’,W_regularizer=l1(0.01))(encoded)
    encoder_output = Dense(encoding_dim, activation=’relu’,W_regularizer=l1(0.01))(encoded)

    # decoder layers
    decoded = Dense(300, activation=’relu’,W_regularizer=l1(0.01))(encoder_output)
    decoded = Dense(300, activation=’relu’,W_regularizer=l1(0.01))(decoded)
    decoded = Dense(300, activation=’relu’,W_regularizer=l1(0.01))(decoded)
    decoded = Dense(6, activation=’relu’,W_regularizer=l1(0.01))(decoded)

    # construct the autoencoder model
    autoencoder = Model(input_img, decoded)

    # construct the encoder model for plotting
    encoder = Model(input_img, encoder_output)

    # Compile model
    autoencoder.compile(optimizer=’RMSprop’, loss=’mean_squared_error’,metrics=[‘accuracy’])

    return autoencoder

    • Avatar
      Jason Brownlee June 2, 2017 at 12:09 pm #

      I’m surprised, I would not think the network architecture would make a difference.

      Sorry, I have no good suggestions other than try to debug the cause of the fault.

  39. Avatar
    Kate liu May 28, 2017 at 4:36 pm #

    the command of autoencoder.compile is modified as the follows:
    # Compile model
    autoencoder.compile(optimizer=optimizer, loss=’mean_squared_error’,metrics=[‘accuracy’])

  40. Avatar
    Rahul May 30, 2017 at 12:07 am #

    Can we do this for functional API as well ?

  41. Avatar
    Ian Worthington May 30, 2017 at 10:36 pm #

    Thanks for a great tutorial Jason, appreciated.

    njobs=-1 didn’t work very well on my Windows 10 machine: took a very long time and never finished. seems to suggest this is (or at least was in 2015) a known problem under Windows so I changed to n_jobs=1, which also allowed me to see throughput using verbose=10.

  42. Avatar
    Ian Worthington May 31, 2017 at 1:56 am #

    Jason —

    Given all the parameters it is possible to adjust, is there any recommendation for which should be fixed first before exploring others, or can ALL results for one change when others are changed?

  43. Avatar
    Mario June 9, 2017 at 12:10 am #

    Hi and thank you for the resource.

    Am I right in my understanding that this only works on one machine?

    Any hints / pointers on how to run this on a cluster? I have found as a potential avenue using Spark (no Keras though).

    Any comment at all? Information on the subject is scarce.

    • Avatar
      Jason Brownlee June 9, 2017 at 6:26 am #

      Yes, this example is for a single machine. Sorry, I do not have examples for running on a cluster.

  44. Avatar
    Shaun June 16, 2017 at 11:54 pm #

    Hi Jason,

    I’m a little bit confused about the definition of the “score” or “accuracy”. How are they made? I believe that they are not simply comparing the results with target, otherwise it will be the overfitting model being the best (like the more neurons the better).

    But on the other hand, they are just using those combinations of parameters to train the model, so what is the difference between I manually set the parameters and see my result good or not, with risk of overfitting and the grid search that creates an accuracy score to determine which one is the best?

    Best regards,

    • Avatar
      Jason Brownlee June 17, 2017 at 7:30 am #

      The grid search will provide an estimate of the skill of the model with a set of parameters.

      Any one configuration in the grid search can be set and evaluated manually.

      Neural networks are stochastic and will give different predictions/skill when trained on the same data.

      Ideally, if you have the time/compute the grid search should use repeated k-fold cross validation to provide robust estimates of model skill. More here:

      Does that help?

      • Avatar
        Shaun June 20, 2017 at 2:30 am #

        I’m new to the NN, a little bit puzzled. So say, if I have to many neurons that leads to overfitting (good on the train set, bad on the validation or test set), can grid search detect it by the score?

        My guess is yes, because there is a validation set in the GridsearchCV. Is that correct?

        • Avatar
          Jason Brownlee June 20, 2017 at 6:39 am #

          A larget network can overfit.

          The idea is to find a config that does well on the train and validation sets. We require a robust test harness. With enough resources, I’d recommend repeated k-fold cross validation within the grid search.

  45. Avatar
    Huyen June 19, 2017 at 4:21 pm #

    One more very useful tutorial, thank Jason.

    One question about GridSearch in my case. I have tried to tune parameters of my neural network for regression with 18 inputs size 800 but the time to use GridSearch totally long, like forever even though I have limited to the number. I saw in your code:

    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)

    Normally, n_jobs=1, can I increase that number to improve the performances?

    • Avatar
      Jason Brownlee June 20, 2017 at 6:36 am #

      We often cannot grid search with neural nets because it takes so long!

      Consider running on a large computer in the cloud over the weekend.

  46. Avatar
    Bobo June 21, 2017 at 4:57 am #

    Hi Jason,

    Any idea how to use GridSearchCV if you don’t want cross validation?

  47. Avatar
    makis June 28, 2017 at 11:54 pm #

    Hello. Thank you for the nice tutorial.

    I am trying to combine pipeline and gridsearch.

    Inside my keras model i use kernel_initializer=init_mode.
    Then I am trying to assign values to the init_mode dictionary in order to perform the gridsearch.

    I get the following error: ValueError: init_mode is not a legal parameter

    My code is here:

    Any tip? Thank you

  48. Avatar
    Abhijith Darshan Ravindra July 11, 2017 at 6:31 am #

    Hi Dr. Brownlee,

    When I run this in Spyder IDE nothing happens after

    It just appears to do nothing.

    Any suggestions as to why?

    • Avatar
      Jason Brownlee July 11, 2017 at 10:34 am #

      Consider running from the command line.

      The grid search may take a long time.

      • Avatar
        DY July 14, 2017 at 6:11 am #

        Hello Dr Brownlee,

        I saved your example codes into .py file and run it. Nothing happens after However, if I run line by line from your example codes it works. Do you know why?

        • Avatar
          Jason Brownlee July 14, 2017 at 8:36 am #

          It may take a long time. Consider reducing the scope of the search to see if you can get results sooner.

    • Avatar
      Tryfon September 18, 2017 at 11:46 pm #

      I had the same issue with you (using spyder and python 3.6) but after changing the parameter n_jobs = 1 it worked fine. Also n_jobs = 2 was stuck although spyder showed it was running in the backgound (I checked the CPU usage and was down to 1% vs the 55-80% when it is actually running).

      Don’t ask the reason why is that. My guess would be that it has to do with your system and the fact that it might not support parallelization (no CUDA GPU).

      • Avatar
        Jason Brownlee September 19, 2017 at 7:47 am #

        Consider running the example from the command line instead.

  49. Avatar
    Kamal Thapa July 27, 2017 at 3:46 pm #

    How can I do Hyper-parameter optimization for MLPRegressor in scikit learn?

  50. Avatar
    Josep August 3, 2017 at 2:31 am #

    Hi Jason,
    I’m unable to apply the grid search to a seq to seq LSTM network (Keras Regressor model in the scikit API). When I set the GridSearchCV scoring algorithm to r^2 (or any scoring function for regression problems) the expect a 2 dim input vector, not the 3 dim used in Keras.
    Otherwise, if I left the default scoring algorithm named “_passthrough_scorer”( I don’t know what it does, I don’t even know what it is) it works but the best_score doesn’t match with the real best parametrization. I’m really confused…I’ll had to write the grid search manually…

    • Avatar
      Josep August 3, 2017 at 2:42 am #

      I’ve solved it, I share it if someone have the same issue…,If you set the gridsearch scoring function to “None” it uses the scoring metrics of the Keras model.

      • Avatar
        Josep August 3, 2017 at 2:49 am #

        Sorry for bothering, but the results of the approach I’ve said are incorrect. I don’t know what to do.

    • Avatar
      Jason Brownlee August 3, 2017 at 6:54 am #

      Hi Josep,

      Consider writing your own for loop to iterate over params and run a Cross Validation for the params within the loop.

      This is how I do it now for large/complex models.

  51. Avatar
    kotb August 8, 2017 at 7:10 pm #

    Can i use this grid search without using keras model

  52. Avatar
    Aman Garg August 19, 2017 at 3:35 am #

    Hello Jason,

    Thanks for such a nice tutorial.

    Instead of getting a output as ‘Best: 0.720052 using {‘init_mode’: ‘uniform’}’ , it would be really nice if you could show us how to visualize this result with matplotlib so that it gets more easier.

  53. Avatar
    Michael August 20, 2017 at 4:42 am #

    Hi, Jason. Thanks, again, for all of the blog posts and example code. I’m trying to tune my binary classification Keras neural network. My dataset includes about 50,000 entries with 52 (numeric) variables. Using Grid Search, I’ve tested all sorts of combinations of layer size, number of epochs, batch size, optimizers, activations, learning rates, dropout rates, and L2 regularization parameters. My grid search shows every combination performs the same. For example, here is a snippet from my latest results:

    Best: 0.876381 using {‘act’: ‘relu’, ‘opt’: ‘Adam’}
    0.876381 (0.003878) with: {‘act’: ‘relu’, ‘opt’: ‘Adam’}
    0.876381 (0.003878) with: {‘act’: ‘relu’, ‘opt’: ‘SGD’}
    0.876381 (0.003878) with: {‘act’: ‘relu’, ‘opt’: ‘Adagrad’}
    0.876381 (0.003878) with: {‘act’: ‘relu’, ‘opt’: ‘Adadelta’}
    0.876361 (0.003880) with: {‘act’: ‘tanh’, ‘opt’: ‘Adam’}
    0.876381 (0.003878) with: {‘act’: ‘tanh’, ‘opt’: ‘SGD’}

    But I also get 0.876381 whether I have 1000 nodes or 1 node, and for every other combo I’ve tested. I’ve also tried different ways of scaling or transforming my input data with no impact.

    Do you have any thoughts on why I’m having trouble finding different combinations of parameters that actually have a difference in performance?

    Thank you for your help! You rock!

  54. Avatar
    Shubham Kumar September 3, 2017 at 11:54 am #

    Hey Jason.
    I was using grid search to tune hyperparameters for a CNN-LSTM classification problem.
    I used the code template on your blog about sequence classification.
    MY original data has 38932 instances, but for tuning I am using only 1000 to save time.
    But even then, I am not sure how to best search for those parameters and save time.

    Is it a bad idea to search for hyper parameters in a small subset (almost 1/40th of training in my case).
    Will the result vary largely when I use actual data size?
    Also, I passed in several parameters for the grid search. Left it overnight and it still hadn’t made enough progress, so I stopped the execution.
    How can I speed up this process?

    • Avatar
      Jason Brownlee September 3, 2017 at 3:44 pm #

      The result will be biased, but perhaps might give you an idea of the direction in which to proceed – this could be enough for you.

      I often run a lot of sanity check grid searches on small samples to get ideas on which direction to push.

      More data will result in less biased estimates of model skill, often proportionately to some point of diminishing returns.

      • Avatar
        Shubham Kumar September 4, 2017 at 3:10 am #

        Great !
        I did read that one of the sanity checks is to check whether the model overfits on a small sample! If yes, then we are good to go…
        I am slightly new to building proper models and find this part exciting but a little intimidating at the same time !
        I am going to use only a few hyper parameters at a time, and keep the rest constant and check what happens !

        Love your posts ! They are amazingly helpful .
        Does the Python LSTM book have code snippets in Python 3 as well?
        Coz it becomes a little difficult to search for the right modules and attributes otherwise :/

        • Avatar
          Jason Brownlee September 4, 2017 at 4:39 am #


          Yes, the code in my LSTM book was tested with Python 2.7 and Python 3.5.

  55. Avatar
    Kaushal Shetty September 8, 2017 at 12:24 am #

    Hi Jason, Is this a valid approach to decide the number of layers?
    def neural_train(layer1 = 1,layer2 = 1,layer3 = 1,layers = 1):

    input_tensor = Input(shape=(2001,))
    x = Dense(units = layer1,activation=’relu’)(input_tensor)
    if layers == 2:
    x = Dense(layer2,activation = ‘relu’)(x)
    if layers ==3 :
    x = Dense(layer2,activation = ‘relu’)(x)
    x = Dense(layer3,activation = ‘relu’)(x)

    output_tensor = Dense(10,activation=’softmax’)(x)
    model = Model(input_tensor,output_tensor)
    model.compile(optimizer = ‘rmsprop’,loss=’categorical_crossentropy’,metrics = [‘accuracy’])
    return model

    layer1 = [1024,512]
    layer2 = [256,100]
    layer3 = [60,40]
    epochs = [10,11]
    layers = [2,3]
    param_grid = dict(epochs = epochs,layer1 = layer1,layer2 = layer2,layer3 = layer3,layers=layers)
    model = KerasClassifier(build_fn = neural_train)
    gsv_model = GridSearchCV(model,param_grid=param_grid),y_train)

  56. Avatar
    ari September 9, 2017 at 1:29 am #

    Very helpful post Jason. Thanks for this. Are there any advantages for using gridsearch over something like hyperas/hyperopt ? To your best knowledge is one faster than the other?

    • Avatar
      Jason Brownlee September 9, 2017 at 11:58 am #

      Depends on your data and model. Use the took that you prefer.

  57. Avatar
    Shubham Kumar September 10, 2017 at 4:38 am #

    {‘split0_test_score’: array([ 0.6641791, 0.6641791, 0.6641791, 0.6641791]), ‘split1_test_score’: array([ 0.65413534, 0.65413534, 0.65413534, 0.65413534]), ‘split2_test_score’: array([ 0.69924811, 0.69924811, 0.69924811, 0.69924811]), ‘mean_test_score’: array([ 0.6725, 0.6725, 0.6725, 0.6725]), ‘std_test_score’: array([ 0.01931902, 0.01931902, 0.01931902, 0.01931902]), ‘rank_test_score’: array([1, 1, 1, 1]), ‘split0_train_score’: array([ 0.67669174, 0.67669174, 0.67669174, 0.67669174]), ‘split1_train_score’: array([ 0.68164794, 0.68164794, 0.68164794, 0.68164794]), ‘split2_train_score’: array([ 0.65917602, 0.65917602, 0.65917602, 0.65917602]), ‘mean_train_score’: array([ 0.67250523, 0.67250523, 0.67250523, 0.67250523]), ‘std_train_score’: array([ 0.00963991, 0.00963991, 0.00963991, 0.00963991]), ‘mean_fit_time’: array([ 36.72573058, 37.0244147 , 38.12670692, 40.71116368]), ‘std_fit_time’: array([ 0.4829061 , 0.35207924, 0.13746276, 2.71443639]), ‘mean_score_time’: array([ 1.49508754, 1.76741695, 2.14029002, 2.67426189]), ‘std_score_time’: array([ 0.04907801, 0.11919153, 0.07953362, 0.13931651]), ‘param_dropout’: masked_array(data = [0.2 0.5 0.6 0.7],
    mask = [False False False False],
    fill_value = ?)
    , ‘params’: ({‘dropout’: 0.2}, {‘dropout’: 0.5}, {‘dropout’: 0.6}, {‘dropout’: 0.7})}

    Hey. I was hypertuning a model on 4 different choices of hyper parameters. However, in the grid_results_ dictionary, the rank_test_score key has array with all same values. I find that confusing. Shouldn’t it have 4 different values in each place?
    Something like [1,3,2,4] ?
    What could be the explanation for this?

    • Avatar
      Shubham Kumar September 10, 2017 at 4:50 am #

      It must have something to do with all mean_test_scores being the same ,

    • Avatar
      Jason Brownlee September 11, 2017 at 12:03 pm #

      If you are testing 4 different values for one parameter, then you must build 4 models/complete 4 runs.

      Does that help?

      • Avatar
        Shubham Kumar September 13, 2017 at 5:20 am #

        I am sorry. That’s confusing. 4 models or complete 4 runs means ?

        Things are different if we are gridsearching/randomsearching just for one hyperparameter?

        Does it have something to do with the actual code used to write TensorFlow /keras ?

        • Avatar
          Jason Brownlee September 13, 2017 at 12:36 pm #

          If you have one parameter and you want to test 4 values, each value needs one run. Ideally, we would run many times for each parameter value and take the average skill score given the stochastic nature of ML algorithms.

          For a random search, you run for as long as you like.

          Does that help?

  58. Avatar
    Shubham Kumar September 13, 2017 at 11:17 pm #

    What I understand is that when we have more than 1 (say 2) hyper-parameters in a grid, then for each combination, the code will complete as many epochs as I have specified, with as many training-cross-validation sets as specified (the CV in GridSearchCV). So, going through all those epochs, for each training-cross-validation set, we get the avg accuracy over all the cross-validation sets for every combination.

    So when you say 1 run only in the case of a single hyperparameter, that means only 1 training-crossvalidation set? Because only in this case, there won’t be any averaging involved.

    Is that what I have to do? Change the training-crossValidation set to just 1?

  59. Avatar
    Rishi September 18, 2017 at 5:18 am #

    would you please post an example of inheriting from KerasClassifier (or KerasRegressor) to create your own class? I’m attempting to do this and it works for the most part:

    class MLP_Regressor(KerasRegressor):

    def __init__(self, **sk_params):
    super().__init__(build_fn=None, **sk_params)

    def __call__(self, optimizer=’adam’, loss=’mean_squared_error’, **kwargs):
    # more code goes here (that was previously in ‘build_fn’

    I can include this in a pipeline and it runs perfectly:
    MLP Pipeline(memory=None,
    steps=[(‘MLP’, )])

    Only thing is: The Keras documentation includes the ‘build_fn’ keyword argument:

    keras.wrappers.scikit_learn.KerasClassifier(build_fn=None, **sk_params)

    While the actual KerasClassifier class definition shows the following in its __init__ method:

    def __init__(self, model, optimizer=’adam’, loss=’categorical_crossentropy’, **kwargs):
    super(KerasClassifier, self).__init__(model, optimizer, loss, **kwargs)

    I’m not sure if my __init__ in MLP_Regressor has been setup correctly (to avoid hidden bugs in the future).

    Would greatly appreciate it! (I’ve searched, but couldn’t find a single example of KerasClassifier inheritance).

    • Avatar
      Jason Brownlee September 18, 2017 at 5:49 am #

      Thanks for the suggestion, I have not done this but perhaps in the future.

      • Avatar
        Rishi November 21, 2017 at 12:54 pm #

        Jason, managed to get the inherited class working perfectly now:

        class MLP_Classifier(KerasClassifier):

        def __init__(self, build_fn=None, **sk_params):
        self.sk_params = sk_params
        super().__init__(build_fn=None, **sk_params)

        def __call__(self, callbacks=None, layer_sizes=None,activations=None,input_dim=0,init=’normal’,optimizer=’adam’, metrics=’accuracy’, loss=’binary_crossentropy’, use_dropout_input=False, use_dropout_hidden=False):
        Constructs, compiles and return a Keras model
        Implements the “build_fn” function

        Returns a “Sequential” model
        # Code to build a model (that would typically go in “build_fn”) goes here.
        return model

  60. Avatar
    Tmn September 20, 2017 at 2:45 am #

    Hi Jason,

    I can not thank you enough. I am sure that there are many people like me who have learnt a lost from your tutorial on both “R” and “Python”. I have been following your tutorial for more than 3 year now. Before I was using R however, recently I moved to python for Deep learning. And I find your tutorial as usual, exceptional. I think Andrew Ng and CS231n (andrej karpathy), theoretical course and your programming course on deep learning is one of the best in the world. You rock! Thanks a lot.

    I do have a question 🙂 as well.
    The grid search parameter tuning works perfectly with CPU. I agree with your suggestion not to tune everything at once. Now I moved to GPU implementation. I was able to execute the code if I chose options n_job=1. However, if I do multi-threading n_job=-1. I am getting “CUDA_ERROR_OUT_OF_MEMORY”. I have GeForce GTX 1080. Did you happen to encounter similar kind of error? I will post you the error log if needed.

    Once, again thank you.

    • Avatar
      Jason Brownlee September 20, 2017 at 6:00 am #

      Thanks for all of your support!

      Yes, I have the same and I would recommend using a “single thread” and let the GPU do its thing for a given single run.

      In general, I’d recommend contrasting different approaches to grid searching (cpu/gpu) and use the approach that is overall faster for your specific tests.

      • Avatar
        Tmn September 20, 2017 at 11:33 pm #

        Hi Jason,
        Thank you for the response. The parameter search using CPU (n_job=-1) is (2.961489-4.977758) while using GPU (n_job=1) is (140.101048-142.151023) second.

        One more thing, after grid search I have value for parameters {batch_size, activation, neurons, learn_rate..} and accuracy around 90%. However, I wonder why reusing these model parameter does not provide the same results, now accuracy is 52%. Even though I executed it many times with same parameter the accuracy remains the same (52%). I could not achieve the accuracy as shown in grid search using best model parameter. I am doing 5-fold CV I do not expect the accuracy to be the same since it is stochastic process but it should be around SD±5%. What do you think? Did you also happen to encounter the same thing ?

        Also the best parameter values changes in each executions with an accuracy SD±5%.


        Below code is something I am doing to limit GPU memory usage and run multiple grid search. However, we should know the memory usage in advance ( Let me know if it makes sense.

        Also, we can use n-job. I tried with n_job = 2 however the GPU memory is allocated based on fraction. I am searching how to allocated memory based on MB. I will do more research on this “CUDA_ERROR_OUT_OF_MEMORY” and update you.

        import tensorflow as tf
        from keras.backend.tensorflow_backend import set_session
        config = tf.ConfigProto()
        config.gpu_options.per_process_gpu_memory_fraction = 0.3


        • Avatar
          Jason Brownlee September 21, 2017 at 5:42 am #

          The results for the standalone model should fit into the distribution of the grid search results – if you repeated each grid search result many times, e.g. 10-30. See this post on evaluating model skill of neural networks:

          Nice, sorry, I cannot give you good advice on grid searching with the GPU, it is not something I do generally. I am more likely to run instances serially or across AWS instances.

          • Avatar
            TMN October 6, 2017 at 2:12 am #

            Hi Jason,

            Could you please help on how to do features normalization while doing the grid search and cross-validation. Is normalization is done automatically here, GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=15,cv=rkf)? If I normalize the features during training X = scaler.transform(X_train), this will introduce bias in cross-validation. Also, if possible, can you please provide me references on using scikit-learn wrapper with Keras for advance options, is their any limitation on wrapper ?

            Without normalization:
            Best: 0.535211 using {‘learn_rate’: 0.01, ‘dropout_rate’: 25, ‘batch_size’: 40, ‘neurons’: 200, ‘init_mode’: ‘lecun_uniform’, ‘optimizer’: ‘SGD’, ‘activation’: ‘relu’, ‘epochs’: 1000}

            With normalization:
            Best: 0.695775 using {‘optimizer’: ‘SGD’, ‘batch_size’: 132, ‘init_mode’: ‘lecun_uniform’, ‘epochs’: 1000, ‘learn_rate’: 0.01, ‘dropout_rate’: 25, ‘neurons’: 200, ‘activation’: ‘relu’}

          • Avatar
            Jason Brownlee October 6, 2017 at 5:37 am #

            Perhaps you can normalize your data prior to the grid search?

          • Avatar
            TMN October 6, 2017 at 10:59 am #

            I normalize my data prior to grid search using X = scaler.transform(X_train) but dont you think it would introduce bias in the performance. Normally, I expect to normalize train set and use that normalization factor to normalize test or validation set before prediction. May be I did not understand you properly, how do you do normalization prior to grid search?


          • Avatar
            Jason Brownlee October 6, 2017 at 11:07 am #

            Yes, it’s a struggle or trade-off.

            Perhaps you can see if a Pipeline will work in the grid search, it may, but I expect it will error.

            Perhaps the bias is minor and you can ignore it.

            Perhaps you can implement your own grid search loop to only use training data to calculate data scaling coefficients.

          • Avatar
            TMN October 6, 2017 at 6:44 pm #

            I started looking at the pipeline ( on how they have been using it for SVM, lets see. I would expect the pipeline to work for Keras as well, as this is a classical problem in machine learning. Why do you expect error here? I wanted to take the full advantage from automatic grid search. Well, the final option will be to implement my own grid search.

            The bias is really significant in 5-repeated 10-fold CV. Thanks

            Without normalization:
            Best: 0.535211 using {‘learn_rate’: 0.01, ‘dropout_rate’: 25, ‘batch_size’: 40, ‘neurons’: 200, ‘init_mode’: ‘lecun_uniform’, ‘optimizer’: ‘SGD’, ‘activation’: ‘relu’, ‘epochs’: 1000}

            With normalization:
            Best: 0.695775 using {‘optimizer’: ‘SGD’, ‘batch_size’: 132, ‘init_mode’: ‘lecun_uniform’, ‘epochs’: 1000, ‘learn_rate’: 0.01, ‘dropout_rate’: 25, ‘neurons’: 200, ‘activation’: ‘relu’}

          • Avatar
            Jason Brownlee October 7, 2017 at 5:51 am #

            If it works, that is great. I have seen cases where when grid search + keras gets fancy it causes errors.

            I have a tutorial on Pipeline here that might help:

  61. Avatar
    HWU September 22, 2017 at 6:52 am #

    This is such a great, thorough tutorial. Thanks for keeping your tutorials up to date! It’s so nice finding a resource with examples that you know will work because they’ve been tested on recent versions of required packages.

  62. Avatar
    Marjan September 29, 2017 at 1:08 pm #

    Thank you for your great tutorial. I tried to use it for my model with multiple inputs. but It didn`t work. I found that the scikit-learn wrapper does not work for multiple inputs. it gives me an error for[input1,input2],y)
    Do you have any suggestion to handle it?

    • Avatar
      Jason Brownlee September 30, 2017 at 7:34 am #

      Sorry I do not. Perhaps run the grid search manually (e.g. your own for loop)?

  63. Avatar
    Buz Fifer October 5, 2017 at 7:06 am #

    When I run your code to tune the dropout_rate, I get the following error:
    ValueError: dropout_rate is not a legal parameter

    In fact, I get this error for all labels except epochs and batch_size. Both of these were recognized and ran fine. I could not find a reference to valid labels anywhere, even in API docs. Any suggestions?

    • Avatar
      Jason Brownlee October 5, 2017 at 5:16 pm #

      What do you mean by valid labels exactly?

      • Avatar
        Buz Fifer October 6, 2017 at 3:02 am #

        Sorry, I should have included the code in the first place. I have added comments in the code to show exactly what I tried for each parameter.

        # ———— Define Keras Classifier Wrapper
        model1 = KerasClassifier(build_fn=kerasModel1, epochs=5, batch_size=10, verbose=0)

        # ———– define the grid search parameters
        mybatchs = [10, 20, 128]
        myepochs = [5, 10, 20, 50, 60, 80, 100]
        mylearn = [0.001, 0.002, 0.0025, 0.003]
        myopts = [‘Adam’, ‘Nadam’, ‘RMSprop’]
        myinits = [‘uniform’, ‘normal’, ‘lecun_uniform’, ‘lecun_normal’, ‘glorot_uniform’, ‘glorot_normal’]
        mydrop = [0.10, 0.20, 0.30, 0.35, 0.40, 0.50, 0.60, 0.70, 0.80]

        # ————- Not Recognized
        #param_grid = dict(optimizer=myopts)
        #param_grid = dict(learn_rate=mylearn)
        #param_grid = dict(learning_rate=mylearn)
        #param_grid = dict(init=myinits)
        #param_grid = dict(init_mode=myinits)
        #param_grid = dict(dropout_rate=mydrop)

        # ———— Recognized
        #param_grid = dict(epochs=myepochs) # —– OK
        #param_grid = dict(batch_size=mybatchs) # —– OK

        I removed comment # and ran each one separately. For example, running the first param_grid values resulted in: Error – optimizer is not a valid parameter. They all got the same rejection notice except for epochs and batch_size.
        I hope that helps.

  64. Avatar
    Buz Fifer October 6, 2017 at 3:09 am #

    Just to be clearer, each parameter had it’s own name in the error message as follows:

    Error – optimizer is not a valid parameter
    Error – learn_rate is not a valid parameter
    Error – learning_rate is not a valid parameter
    Error – init is not a valid parameter
    Error – init_mode is not a valid parameter
    Error – dropout_rate is not a valid parameter

    • Avatar
      Jason Brownlee October 6, 2017 at 5:39 am #

      That is odd, I don’t have any good ideas, other than continue to debug and try different variations to see if you can expose the cause of the issue.

      Double check all of your python libraries are up to date.

  65. Avatar
    ritika October 6, 2017 at 11:49 pm #

    Hi Jason, Very nice tutorial..very well explained

  66. Avatar
    TC October 17, 2017 at 10:27 am #

    Hi Jason thanks for the great post.

    Let’s say I’m using 5 fold CV on a relatively small dataset (not necessarily for a deep learning model). In this case, the variance of the performance metric might be quite high, and just by chance, a point on the grid that is in reality far from optimal, might be selected as the “best”.

    So are there any approaches to smooth out the response surface of the grid search, to deal with “spikes” in performance due to variance?

    • Avatar
      Jason Brownlee October 17, 2017 at 4:05 pm #

      Wonderful question.

      Yes, we can approach this problem by increasing the number of repeats (not folds) of each param combination.

      • Avatar
        TC October 20, 2017 at 8:40 am #

        Hi Jason, by “number of repeats” do you mean to just repeat the process many times, with perhaps a different random seed?

  67. Avatar
    Lea October 20, 2017 at 10:09 pm #

    Thank you for this great tutorial! I tried to adapt the code for a CNN, but I am running constantly in the same error. May anyone help?

    That is the code:

    def create_model(nb_filters=3, nb_conv=2, pool=20):
    model = Sequential()
    model.add(Convolution1D(nb_filters, nb_conv, activation=’relu’,
    input_shape=(X.shape[1], X.shape[2]), padding=”same”))
    model.add(Dense(1, activation=’sigmoid’))
    model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
    return model

    model = KerasClassifier(build_fn=create_model(), verbose=0)

    nb_conv = [2, 4, 6, 8, 10]
    pool= [10, 20, 30, 50]
    param_grid = dict(nb_conv=nb_conv, epochs=pool)
    grid = GridSearchCV(estimator=model, param_grid=param_grid)
    grid_result =, y)

    And the error I am getting is “nb_conv is not a legal parameter”. Unfortunately, I do not understand why.

  68. Avatar
    went October 22, 2017 at 2:55 am #

    Hi Jason,

    Great post and Thank you.

    What do you think is the best sequence when tuning all those Hyperparameters? I think difference sequence will lead to difference final Hyperparameters..

  69. Avatar
    Bgie October 23, 2017 at 6:28 am #

    Hi Jason,

    What a great blog, I very much appreciate you sharing some of your expertise!

    I want to grid search the hyperparams from my CNN, but I’m using data augmentation with ImageDataGenerator. So I’m not calling but model.fit_generator for the actual training.
    This does not seem to be supported through the grid search..
    Am I forced to write my own KerasClassifier implementation?

    Would you advise to just fall back to using (nested) for loops instead, or would I be missing some ‘magic’ from the existing scikit gridsearch?

    • Avatar
      Jason Brownlee October 23, 2017 at 4:10 pm #

      I would recommend writing your own for loops to grid search instead.

  70. Avatar
    Shubham Kumar October 26, 2017 at 3:58 am #

    Hey Jason!

    Needed help with model improvement!
    Can you help me in understanding how to realize whether your model is suffering from
    bad local minima, vanishing/exploding gradient problem?

    • Avatar
      Jason Brownlee October 26, 2017 at 5:34 am #

      If you have exploding or vanishing gradients, then you will have NaN outputs.

      This post will give you ideas on how to lift skill:

      This post will give you advice on how to effectively evaluate your model:

      • Avatar
        Shubham Kumar November 6, 2017 at 7:05 am #

        NaN outputs as in my predictions ?
        Or the weights ?
        If exploding gradient then weight will be very large (probably NaN) hence output would also be NaN.
        But how will this logic be used for vanishing gradients. I this case the weights basically stop changing r8?

        • Avatar
          Shubham Kumar November 6, 2017 at 7:07 am #

          Should I use some kind of code that checks by how much the weights at each layer are changing…and if after a certain threshold they haven’t changed by a certain amount, I’ll declare vanishing gradient !

        • Avatar
          Jason Brownlee November 7, 2017 at 9:44 am #

          Try gradient clipping on the optimization algorithm.

  71. Avatar
    Mustafa Murat ARAT October 28, 2017 at 12:45 am #

    I have a question for you, Jason and for general audience. I tried to find optimal number of neurons for one of the hidden layers. i did loop over my function which contains my deep learning model. It is fast enough for the values I define and I get a result based on accuracy. However, when I use your code, it is extremely slow and never reached to an end. How long does it take on your computer?

    • Avatar
      Jason Brownlee October 28, 2017 at 5:14 am #

      You could try to test fewer parameters or try to search on a smaller dataset?

      • Avatar
        Mustafa Murat ARAT October 29, 2017 at 4:12 am #

        Hey Jason,

        Thank you for your quick reply. I try grid search for number of neurons on Iris data set for the purpose of learning. I scale the data first and then transform and encode the dependent variable. However, first of all, even though I use small data set or fewer parameters, it is slow; second of all, when I get the results, it is all zero. This is very basic example and I am pretty much sure that my code is correct but I guess I am missing out something.

        Best: 0.000000 using {‘neurons’: 3}
        0.000000 (0.000000) with: {‘neurons’: 3}
        0.000000 (0.000000) with: {‘neurons’: 5}

        THE CODE:

        from pandas import read_csv
        import numpy
        from sklearn.preprocessing import LabelEncoder
        from sklearn.preprocessing import StandardScaler
        from keras.wrappers.scikit_learn import KerasClassifier
        from keras.models import Sequential
        from keras.layers import Dense
        from keras.utils import np_utils
        from sklearn.model_selection import GridSearchCV

        dataframe=read_csv(“iris.csv”, header=None)


        #encode class values as integers
        encoder = LabelEncoder()
        encoded_Y = encoder.transform(Y)
        #one-hot encoding
        dummy_y = np_utils.to_categorical(encoded_Y)

        scaler = StandardScaler()
        X = scaler.fit_transform(X)

        def create_model(n_neurons):
        model = Sequential()
        model.add(Dense(n_neurons, input_dim=X.shape[1], activation=’relu’)) # hidden layer
        model.add(Dense(3, activation=’softmax’)) # output layer
        model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
        return model

        model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=10, initial_epoch=0, verbose=0)
        # define the grid search parameters
        neurons=[3, 5]

        #this does 3-fold classification. One can change k.
        param_grid = dict(n_neurons=neurons)
        grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
        grid_result =, dummy_y)
        # summarize results
        print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
        means = grid_result.cv_results_[‘mean_test_score’]
        stds = grid_result.cv_results_[‘std_test_score’]
        params = grid_result.cv_results_[‘params’]
        for mean, stdev, param in zip(means, stds, params):
        print(“%f (%f) with: %r” % (mean, stdev, param))

        • Avatar
          Jason Brownlee October 29, 2017 at 5:59 am #

          Sorry, I cannot debug your code/problem for you.

          • Avatar
            Mustafa Murat ARAT October 30, 2017 at 8:29 am #

            I totally understand you. Thank you so much, though. I figured out my mistake. Iris dataset is very well balanced so I need to shuffle the data because GridSearchCV is using 3-Fold Cross Validation.

          • Avatar
            Jason Brownlee October 30, 2017 at 3:49 pm #

            Glad to hear it.

  72. Avatar
    jenny November 8, 2017 at 4:12 am #

    Thanks for sharing such a wonderful tutorial. Learnt many new things.

    How can i save all the models that the grid search is generation with identifiers for each model?

    I am an R user. This how I do it in R to save models with passing parameter values to its names.

    xgb.object <- paste0('/path/xgb_disc20_new_',

    sample.sizes[i], '_', s,'_',nrounds[j],'_',max.depth[k],'_',eta[l], '.RData')

    write.table(cbind(sample.sizes[i], s,nrounds[j],max.depth[k],eta[l],tpr, tnr, acc, roc.area,

    concordance), paste0('/path/xgb_disc20_new_', min.sample.size,'_', max.sample.size,

    '.csv'), append=TRUE, sep=",",row.names=FALSE,col.names=FALSE)

    How can this be achieved in python for keras(neural network) and other models in other libraries?

    • Avatar
      Jason Brownlee November 8, 2017 at 9:29 am #

      I would recommend using grid search to find the parameters for a well performing model then train a new standalone model with those parameters that you can then save.

  73. Avatar
    jenny November 8, 2017 at 9:34 pm #

    thank you jason for your quick reply . I will try that way.

  74. Avatar
    Wassim November 16, 2017 at 11:26 pm #

    Hi Jason,
    Thank you for the great tutorial. I just have an issue when using exactly your code: when I try to parallelize the grid search with n_jobs=-1, I end up with the error “AttributeError: Can’t get attribute ‘create_model’ on ” while it works well without parallelization. Any idea where the issue comes from?
    Thank you,

    • Avatar
      Jason Brownlee November 17, 2017 at 9:25 am #

      I’m not sure, perhaps you cannot parallelize the grid search with Keras models.

  75. Avatar
    Sangwon Chae November 28, 2017 at 9:51 pm #

    Hi Jason,

    The example code calculates the best score for accuracy to obtain the hyperparameter.

    In my problem, I want to find RMSE rather than accuracy because it is regression problem (numerical prediction).

    However, ‘grid_result.cv_resluts_’ only provides ‘fit_time’ and ‘score’, so it can not calculate RMSE.

    What should I do?

    Thank you.

  76. Avatar
    Estelle December 5, 2017 at 7:37 am #

    Hi Jason,

    Thank you for this post.

    Is there anything that prevents me to use Grid Search with train_on_batch() instead of fit()?

    Thank you for letting me know.

    All the best,


    • Avatar
      Jason Brownlee December 5, 2017 at 10:26 am #

      I think the wrapper is quite limited and does not offer this facility via sklearn.

      • Avatar
        Estelle December 6, 2017 at 8:15 am #

        Thanks for your quick answer.

        All the best,


  77. Avatar
    Peter December 8, 2017 at 1:56 pm #

    Thanks very much for the tutorial. It is extremely helpful for my work. I came across a problem with grid search with Keras (tensorflow backend). I want to run the same grid search on different datasets. Everything works fine on the first dataset. But when I fit the grid search to the second dataset, the program got stuck there. I run the grid search with n_jobs=-1 and put keras.backend.clear_session() between two fits. You can replicate this issue by fit to the data twice in your examples. Could you please kindly help me with this issue?

    • Avatar
      Jason Brownlee December 8, 2017 at 2:30 pm #

      I’m sorry to hear that, perhaps change n_jobs to 1?

      • Avatar
        Peter December 8, 2017 at 2:40 pm #

        Thanks for the quick reply. It works when n_jobs=1, but I do need parallel threads for speed.

        • Avatar
          Jason Brownlee December 9, 2017 at 5:34 am #

          The neural network will be using all the cores, so running multiple threads may not offer any benefit.

          • Avatar
            Peter December 11, 2017 at 9:53 am #

            I got it to work by just fitting one dataset in the python script and looping the python script over multiple datasets in a bash script. I am still not clear why second fitting fails in python, but this is a not-so-beautiful workaround.

          • Avatar
            Jason Brownlee December 11, 2017 at 4:52 pm #

            Glad to hear that you made some progress.

  78. Avatar
    Daniel Pamplona December 13, 2017 at 1:37 am #

    Hi Jason

    Thank you so much for sharing your knowledge.
    I am trying to optimize the number of hidden layers.
    I can´t figure it out how to do it with keras (actually I am wondering how to set up the function create_model in order to maximize the number of hidden layers)
    Could you please help me?
    Thank you

    • Avatar
      Jason Brownlee December 13, 2017 at 5:42 am #

      Perhaps the number of layers could be a parameter to your function.

  79. Avatar
    Sean December 15, 2017 at 1:43 am #

    Hi Jason,

    Thanks for this insightful and useful tutorial as always

    No doubt your blog posts are arguably the best in the field of data sciences

    Best wishes

  80. Avatar
    Sean December 16, 2017 at 12:06 am #

    Hello Jason,
    I decided to try the code on a textual data of about 3000 tweets having binary classification (Y) and the text corpus as (X). Started off with tuning the batch size and number of epochs

    but got the following error:

    Here’s the modified code below:


    • Avatar
      Jason Brownlee December 16, 2017 at 5:29 am #

      Sorry to hear that, it’s not clear to me. Perhaps post to stackoverflow to get help debugging your code?

  81. Avatar
    Olivier Blais December 16, 2017 at 3:24 am #

    Hi Jason, first thanks for your articles! Super useful!

    I tried to execute the gripsearch but cam up with parallelism issues. I have a Windows OS and I get this error when I try to run the script on multiple cpus:

    ImportError: [joblib] Attempting to do parallel computing without protecting your import on a system that does not support forking. To use parallel-computing in a script, you must protect your main loop using “if __name__ == ‘__main__'”. Please see the joblib documentation on Parallel for more information.

    Do you know how I should address that?

    Thanks in advance

    • Avatar
      Jason Brownlee December 16, 2017 at 5:35 am #

      Perhaps try setting the number of jobs to 1?

      • Avatar
        Olivier Blais December 28, 2017 at 5:30 am #

        Hi Jason! Yes this works but it is very slow as this is not parallel. Do you understand why it cannot run in parallel and how to fix that?

        Thanks again !

        • Avatar
          Jason Brownlee December 28, 2017 at 2:10 pm #

          The backend is parallelized and the two levels of parallelization are in conflict.

  82. Avatar
    Shabnam December 18, 2017 at 2:21 pm #

    Thanks a lot for such a wonderful post. Overall, there are a lot of parameters that need to be tuned. I was thinking to use RandomizedSearchCV instead of GridSearchCV. Still, it will be time consuming for a lot of simulations. Do you have any suggestion for fast parameter tuning? For example, can we say that specific parameters have more effect on scores, so lets try to Grid/RandomizedSearchCV them first?

  83. Avatar
    Henry December 20, 2017 at 10:51 pm #

    Dear Jason,

    Fantastic post, thank you for this wonderful tutorial.

    I was wondering if it would be more appropriate to tune all the hyperparameters at one go instead of breaking it up into various parts as shown above – you may be doing it for the sake of visibility of how each component is tuned but would it be better to tune everything together since there might be “interactions between the hyperparameters” which would not be captured if they were tuned separately?

  84. Avatar
    Hao January 3, 2018 at 2:26 am #

    Hi Jason,

    Many thanks for a series of excellent posts!

    I have an extremely imbalanced data set to study, of which #negative : #positive is about 100:1. When I built the first model, I performed 10-fold validation and in each validation round, I use oversampling to add positive samples on training data, but not on testing data. Now I question is: if I want to perform hyperparameter search, how do I tell GridSearchCV() to do oversampling for each round of cross-validation?

    Many thanks

    • Avatar
      Jason Brownlee January 3, 2018 at 5:40 am #

      Good question, you might need to use a Pipeline and have data prep happen within it.

  85. Avatar
    Justin Solms January 7, 2018 at 11:24 pm #

    Hello Jason

    A good 2018 to you. I have a question about how Keras early stopping callbacks might be able to use the GridSearchCV k-fold generated validation data set as their val_loss or val_acc. The question I posted on StackOverflow but I wished to call your attention to it – should you so wish.

    Kind regards,

    • Avatar
      Jason Brownlee January 8, 2018 at 5:43 am #

      I would suggest not combing CV and early stopping.

      • Avatar
        James March 11, 2018 at 6:16 am #

        Could early stopping be used as a substitute for grid searching epoch size?

        • Avatar
          Jason Brownlee March 11, 2018 at 6:30 am #

          Yes, but you might need to code it up yourself. sklearn might blow up.

  86. Avatar
    shwetabh shekhar January 19, 2018 at 12:03 am #

    Hello sir
    if i have large dataset the also we can do this hyperparameter tunning .
    If i have 70 to 80 feature column and about 50000 rows.
    can we apply this tunnig

    • Avatar
      Jason Brownlee January 19, 2018 at 6:31 am #

      Sure, you might need a large computer or to split the work up across many computers.

      Perhaps you can work with a sample of your data.

  87. Avatar
    shwetabh shekhar January 19, 2018 at 1:13 am #

    how to select the hidden layer if i have largedataset mentioned as above

  88. Avatar
    Kafeel Basha January 29, 2018 at 5:48 pm #

    Very good post.

    Hyper Parameter Tuning: How can I do grid search on number of neuron/epochs or batch size using Keras interface in R.

  89. Avatar
    neha February 2, 2018 at 6:34 am #

    Hi,I am facing a basic query where i have training and test set.i built lstm on training and using history =, trainY, epochs=100, batch_size=50,
    validation_data=(testX, testY), verbose=0, shuffle=False) to fit my model.
    After this i tried to model.predict(testX) to get predicted Y values.Now that was basic code.i am now trying to apply gridsearch.what variation in the history statement code i have to make to apply grid =
    GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
    grid_result =, testY, verbose=0, shuffle=False)

  90. Avatar
    neha February 2, 2018 at 6:50 am #

    can gridsearchcv work for time series as well?

    • Avatar
      Jason Brownlee February 2, 2018 at 8:25 am #

      Not really. You will have to write your own for loops and perform walk forward validation.

  91. Avatar
    Jack February 2, 2018 at 8:01 pm #

    Hi Jason, thank you for your great tutorial! My question here is about ‘grid_result.best_score’. In this article the best score seems to be the best mean score, but in a regression problem, the mean score is irrelevant, so I have to look for the best std score. Is that correct?

    • Avatar
      Jason Brownlee February 3, 2018 at 8:35 am #

      Mean score in regression will be mean error. Not irrelevant.

      • Avatar
        Jack February 3, 2018 at 8:15 pm #

        I see. But when I run the code, the ‘grid_result.best_score’ printed out the biggest score. I don’t think that’s right, cause in a regression problem I should look for the smallest mean error. Am I understanding this right?
        Below are the results:
        Best: 0.062234 using {‘optimizer’: ‘Nadam’}
        0.059561 (0.017101) with: {‘optimizer’: ‘SGD’}
        0.056818 (0.013662) with: {‘optimizer’: ‘RMSprop’}
        0.059617 (0.014734) with: {‘optimizer’: ‘Adagrad’}
        0.061506 (0.014503) with: {‘optimizer’: ‘Adadelta’}
        0.059331 (0.014835) with: {‘optimizer’: ‘Adam’}
        0.057696 (0.014828) with: {‘optimizer’: ‘Adamax’}
        0.062234 (0.010834) with: {‘optimizer’: ‘Nadam’}

  92. Avatar
    Mohamed Abd-Allah February 4, 2018 at 9:09 am #

    very good tutorial, But I have a small question. can I tune all these hyperparameters together or I should take a part of the dataset and tune them separately like the examples you mentioned.

    • Avatar
      Jason Brownlee February 5, 2018 at 7:43 am #

      Ideally, you would tune them all together, but this is often to computationally expensive.

  93. Avatar
    Vidar February 8, 2018 at 7:40 am #

    Is there a way to do similar things in R using the Caret package? Or other package that can help you with hyperparameter grid search when using Keras in R?

    • Avatar
      Jason Brownlee February 8, 2018 at 8:33 am #

      I don’t know if Keras and caret are compatible, sorry.

  94. Avatar
    joseph February 8, 2018 at 3:17 pm #

    hi Jason,

    do i need to split the training data for cross validation, or only perform splitting on the input data.

    • Avatar
      Jason Brownlee February 9, 2018 at 8:59 am #

      Why do you want to split exactly? You goals will help me answer your question.

      • Avatar
        joseph February 9, 2018 at 10:36 am #

        Thanks Jason for the quick reply…i will figure that out.. Just another minor question, is there any way to perform data preprocessing on 3d input (due to the input shape for lstm)

        • Avatar
          Jason Brownlee February 10, 2018 at 8:49 am #

          Sure, but it might be easier (or make more sense) to perform data prep prior to shaping data for the LSTM.

          • Avatar
            joseph February 10, 2018 at 12:12 pm #

            Thanks Jason..i will try that out.. Is it a good idea to tune the hyperparameter using the keras wrapper, then apply those tuned parameters on lstm model? Hope to get some comments on it. Thank you.

          • Avatar
            Jason Brownlee February 11, 2018 at 7:51 am #

            You can. Or you can write your own for loop and tune the model directly.

          • Avatar
            joseph February 12, 2018 at 1:10 pm #

            Thanks a lot Jason.. i will definitely try that one out..

  95. Avatar
    Boris Branson February 15, 2018 at 7:27 am #

    Hi Jason, wonderful post. I love your books – amazing.

    I wish to include callbacks in the Grid Search (one for TensorBoard and one for logging losses on every combination over the params).

    I have something like:

    loggerCB = keras.callbacks.TensorBoard(log_dir=’logs’, histogram_freq=0, write_graph=True)
    class LossHistory(keras.callbacks.Callback):
    def on_train_begin(self, logs={}):
    self.losses = []
    def on_batch_end(self, batch, logs={}):

    historyCB = LossHistory()

    grid_search = GridSearchCV(estimator=model,
    grid_search =, y_train, fit_params={‘callbacks’: [loggerCB, historyCB]})

    BUT I got this error:
    TypeError: Unrecognized keyword arguments: {‘fit_params’: {‘callbacks’: [, ]}}

    How can I pass callbacks using Grid Search?

    Boris Branson

    • Avatar
      Jason Brownlee February 15, 2018 at 8:52 am #

      ry, I have not used callbacks with a grid search. You might need to write your own for-loops for the search.

  96. Avatar
    Alessandro February 17, 2018 at 4:02 am #

    Hello Jason,
    let me congratulate for the good post.

    I am curious about the use of CV . Each time you call
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
    you are compiling a new keras model with the new set of parameters.

    Are these different models of keras, compiled one after another, accumulating in the memory? Would this imply a memory usage problem in the case of an extensive grid search with bigger models? Any tips?


    • Avatar
      Jason Brownlee February 17, 2018 at 8:50 am #

      Yes, each model is evaluated and discarded.

      For larger models, you could run each fold on a different machine (e.g. run the eval manually).

  97. Avatar
    Boris Branson February 23, 2018 at 9:26 pm #

    Hello Jason,

    I see you have used only SGD in the example of learning rate parameterization. Is it possible to combine different values for lr with different optimizers (not only SGD) in one grid search or i’d need a for loop?

    • Avatar
      Jason Brownlee February 24, 2018 at 9:11 am #

      Yes, but the more parameters you grid search at once, the slower the search.

  98. Avatar
    Priyansh February 25, 2018 at 7:39 pm #

    I Jason your article is super useful, but I am having problem using it for MNIST data set which is a three dimensional data set , When I try so ‘fit’ this one gives me error, Dimension error. Can you do one for MNIST data set. Thanks a lot

  99. Avatar
    TonyWang February 27, 2018 at 10:20 pm #

    Hi Jason, Great tutorial, always learn a lot from your post. I have question, is it possible to combine all the parameters and with gridsearch? Seems more than thousands of combinations. For some models it will cost few days or weeks. Is there any better solution for this? randomgridsearch or something else? Thanks again!

    • Avatar
      Jason Brownlee February 28, 2018 at 6:04 am #

      Yes, but as you say, you will need a lot of time or a lot of parallel compute resources to get a result.

      Random search is often preferred because you can uniformly sample the domain and get good enough results quickly.

      • Avatar
        TonyWang March 1, 2018 at 3:18 am #

        Thanks for your reply. Googled a lot but didn’t find any method to search optimizers and their params, say different optimizer, adam and it’s learning rates. Is there any suggestions? Thanks!

        • Avatar
          Jason Brownlee March 1, 2018 at 6:16 am #

          Yes, just start searching for viable params on your model/data. No need to find confirmation.

  100. Avatar
    Johnson Muthii March 2, 2018 at 8:56 am #

    Hello Jason,

    Thanks for this awesome tutorial. Am very fresh in machine learning and your tutorials are so simplified and easy to follow.

    Am encountering an error when i run the epochs and batch size tuning code. Kindly help

    This the code part bringing the error…

    # create model
    model = KerasClassifier(build_fn=create_model, verbose=0)
    # define the grid search parameters
    batch_size = [10, 20, 40, 60, 80, 100]
    epochs = [10, 50, 100]
    param_grid = dict(batch_size=batch_size, epochs=epochs)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs= 1)
    grid_result =, y_train)

    TypeError: __call__() missing 1 required positional argument: ‘inputs’

    • Avatar
      Jason Brownlee March 2, 2018 at 3:20 pm #

      Sorry, I have not seen this error. Are you able to confirm that you have copied all of the code and that your development environment is up to date?

    • Avatar
      Nathan Rasch August 27, 2018 at 5:52 am #

      I ran into this over the weekend, and hopefully to same some one else some pain down the road:

      I kept getting the following error when working the prediction section of my code, which frankly was driving me nuts:

      TypeError: call() missing 1 required positional argument: ‘inputs’

      After researching the error message I came upon this comment which let me to the resolution:

      _The thing here is that KerasRegressor expects a callable that builds a model, rather than the model itself. By wrapping your function in this way you can return the build function (without calling it)._ [Source](

      Solution: I needed to **wrap** my buildModel() function! 🙁

      Once I ‘wrapped’ the buildModel() function the prediction code blocks finally started working. Git it a try, and it should resolve your issue. The link I provided above should give you a working code example. If not let me know, and I’ll post my working example for you.


      • Avatar
        Jason Brownlee August 27, 2018 at 6:15 am #

        It might be easier to write your own for loops to grid search Keras models.

  101. Avatar
    sonia March 7, 2018 at 2:05 am #

    dear jason
    how much time this program run while tunning ?like tuning epoch and batch size?

    • Avatar
      Jason Brownlee March 7, 2018 at 6:16 am #

      It depends on the size of the dataset, the size of the model and the speed of your system.

  102. Avatar
    Yumlembam Rahul March 12, 2018 at 1:59 pm #


    As you mention in your blog “As we proceed through the examples in this post, we will aggregate the best parameters. This is not the best way to grid search because parameters can interact, but it is good for demonstration purposes.” does this mean we should so the hyper parameter search in one grid instead of dividing.


    Yumlembam Rahul

  103. Avatar
    jessy March 15, 2018 at 8:51 pm #


    I have tried above code. it is executing ,but not displaying results..i don’t know the reason ..

    • Avatar
      Jason Brownlee March 16, 2018 at 6:17 am #

      Perhaps try from the command line, then be patient.

      Perhaps try to reduce the data set size or use fewer combinations?

  104. Avatar
    Yumlembam Rahul March 16, 2018 at 5:25 pm #

    hi, in your example optimizer parameter are not specified while doing grid search do they assume default values if not specified??

    and for reproducibility of result i added the following code and have been able to get same result

    import os
    os.environ[‘PYTHONHASHSEED’] = ‘0’
    session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
    from keras import backend as K
    # The below tf.set_random_seed() will make random number generation
    # in the TensorFlow backend have a well-defined initial state.
    # For further details, see:
    sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)

  105. Avatar
    jessy March 16, 2018 at 6:05 pm #


    I have doubt ,Whether LSTM concept could be used for prediction of diabetes dataset(PIMA INDIAN DATASET)…I don’t know how LSTM Learns data from it possible to put an hands on calculation..

  106. Avatar
    jessy March 19, 2018 at 7:32 pm #

    Is it possible to put an hands on calculation particularly for hidden layers and LSTM layers..Is it possible to put manual calculation on weights(how it transfer weight from one layer to another layer)…

    • Avatar
      Jason Brownlee March 20, 2018 at 6:15 am #

      Sure, but you will need to code these as extensions to the Keras library.

  107. Avatar
    jessy March 23, 2018 at 9:26 pm #

    sir ,
    i have tried above code without n_jobs==-1 parameter .it is working …I have doubt ,that is above code can be run using LSTM model …is that possible…

    • Avatar
      Jason Brownlee March 24, 2018 at 6:26 am #

      Perhaps set it to 1 thread and let Keras have all of the cores?

  108. Avatar
    Max March 25, 2018 at 1:12 am #

    Hi Jason,

    I’m sure it’s possible – but I can’t figure it out.
    The above code gives me as a result the best hyper-parameters as measured on the cross-validation.
    Now which adjustments to the code would be necessary to additionally calculate the optimum hyper-parameters on a test set?
    The optimum hyper-parameters seem to lead to significantly different results when applied to my model that I use to predict values.


  109. Avatar
    jessy March 28, 2018 at 7:12 pm #

    sir ,
    I have an doubt that is multivariate time series data can be used for classification or prediction .whether we can use that data for prediction or classification or both.

  110. Avatar
    jessy March 28, 2018 at 7:18 pm #

    In LSTM model you are using only RMSE loss function …..why you are not used other loss function ..In particular sequence prediction problem (forecasting) you used only RMSE loss function ….why sir.

    • Avatar
      Jason Brownlee March 29, 2018 at 6:33 am #

      I use MSE not RMSE. You can try other loss functions if you prefer. I find MSE loss function works well for most problems.

  111. Avatar
    hamidi March 29, 2018 at 4:57 am #

    Thanks for your nice post.

    Could you please let me know how to incorporate class_weight and tune it?

  112. Avatar
    Prabha April 13, 2018 at 8:58 pm #

    Hello, great post as always!
    I had a query regarding this. So I have a training set and a test set, and I am using a stacking ensemble for predictions.
    So when I run GridSearchCV on this, should I fit just the training set on this and print CV score on the training set ONLY? And not touch the test set at all?
    Also should I fit the new grid classifier on the set before printing the CV score or after?

  113. Avatar
    Aditya Jain April 17, 2018 at 1:53 am #

    model = KerasClassifier(build_fn=create_model, verbose=0)
    # define the grid search parameters
    batch_size = [10, 20]
    epochs = [10, 20, 30]
    param_grid = dict(batch_size=batch_size, epochs=epochs)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
    grid_result =, y_train)

    When I am running this code snippet I am getting error as
    AttributeError: ‘NoneType’ object has no attribute ‘loss’

    Can you please help me on that ?

  114. Avatar
    Marshall April 28, 2018 at 5:28 am #

    Hi Jason,

    First and foremost, this is an incredible writeup – very informative.

    I’m getting an error that reads “can’t pickle _thread.RLock objects”

    When I use the following code:


    def build_neural_network(n_predictors, hidden_layer_neurons):
    Builds a Multi-Layer-Perceptron utilizing Keras.

    x_train: (2D numpy array) A n x p matrix, with n observations
    and p features
    y_train: (1D numpy array) A numpy array of length n with the
    target training values.
    hidden_layer_neurons: (list) List of ints for the number of
    neurons in each hidden layer.

    model: A MLP with 2 hidden layers
    model = Sequential()
    input_layer_neurons = n_predictors





    return model

    # columns variable defined elsewhere, works as expected

    mlp = build_neural_network(len(columns), [8, 12])

    model = KerasRegressor(build_fn=mlp)

    # create parameter lists for GridSearchCV
    batch_size = list(np.arange(10, 250, 10))
    epochs = list(np.arange(5, 20, 5))

    neural_net_grid_dict = {‘batch_size’: batch_size,
    ‘epochs’: epochs}

    neural_net_grid = GridSearchCV(estimator=model,

    mask = df[‘Date’] == ‘2006-11-06’
    X, y = create_X_y(df[mask], columns)

    grid_result =, y)


    Any idea what might be going on?

    • Avatar
      Jason Brownlee April 29, 2018 at 6:21 am #

      Sorry, I have not seen this error. Perhaps try posting to stackoverflow?

  115. Avatar
    Cristiana April 29, 2018 at 4:52 am #

    Thanks so much ! This post helped me a lot !

    • Avatar
      Jason Brownlee April 29, 2018 at 6:28 am #

      I’m glad to hear that.

    • Avatar
      Sandra July 26, 2018 at 4:56 am #

      I am experiencing the same error “can’t pickle _thread.RLock objects”, may I know how you solved it?

  116. Avatar
    Juan May 9, 2018 at 4:16 am #

    Hi Jason,

    how can tune your model to found hyperparameters (learning rate, epoch and output dim in hidden layer) using RandomizedSearchCV?

    Thanks !!


    • Avatar
      Jason Brownlee May 9, 2018 at 6:28 am #

      Specify ranges and search. What is the problem exactly?

  117. Avatar
    June May 12, 2018 at 6:18 am #

    Hi Jason, I got a help from this blog post. Thank you very much!

    I have one question though. What if I want to test with optimizers that has customized parameters and not default parameters. From your example, it’s just an array of Strings of optimizers name.

    Do you know how I can do this?


    • Avatar
      Jason Brownlee May 12, 2018 at 6:52 am #

      You can provide lists of strings with optimizer names if you wish.

      • Avatar
        June May 12, 2018 at 7:14 am #

        Yes. Isn’t this what’s provided in the example code?
        optimizer = [‘SGD’, ‘RMSprop’, ‘Adagrad’, ‘Adadelta’, ‘Adam’, ‘Adamax’, ‘Nadam’]

        What I meant was not with default ones but like when I have my own optimizer defined as follows:

        sgd_custom = SGD(lr_rate=0.7)
        adam_custom = (decay=0.005)

        How can I give optimizer list for this setting? optimizer=[sgd_custom, adam_custom]?

        • Avatar
          Jason Brownlee May 13, 2018 at 6:02 am #

          Good question.

          Yes, you could provide a list of pre-configured objects to use instead of strings.

  118. Avatar
    Philipp May 15, 2018 at 3:52 pm #

    Hi Jason,

    Your posts are really helpful – thanks a lot!

    1. I’m using grid search on my own Keras CNN and everything is working. One thing that keep’s confusing me though: The F1 measures reported by grid search are always a bit (3-4%) higher than when running the same network configurations in Keras directly. I know that Keras isn’t using CV, but this shouldn’t lead to systematic deviations in one direction but to deviations in both directions I think.

    2. Also I found that my network is always performing slightly better (accuracy) when using the TF-Layers API instead of Keras, even though the network configurations are exactly the same (as far as I can control this in Keras).

    Any ideas why Keras seems to perform poorer? Have others experienced the same issues with Keras? I just can’t figure it out…


    • Avatar
      Jason Brownlee May 16, 2018 at 5:58 am #

      No good idea sorry. It might be statistical chance, or it might be real. See if you can tease this out with some hypothesis tests on the results.

  119. Avatar
    Philipp May 19, 2018 at 4:33 am #

    Thanks, Jason.

    Just to let you know: Apparently it has something to do with the F1 score. Accuracy scores reported by grid search are pretty much the same as my results in Keras.

  120. Avatar
    Ng Minh Hieu May 28, 2018 at 3:43 am #

    Hi Jason, thank you for very detailed and interesting tutorial.
    1. I tried to grid hyperparameters of epochs and batch size as your code. No result was launched and no error message appeared. after that, i changed n_jobs equal 1, python gave me the result. I do not understand why value of n_jobs = -1 prevented the calculation process.

    2. If i have complicated network (with two layers for example), could you tell me how grid can be implemented with number of epochs and batch size?

    Thank you a lot!

    • Avatar
      Jason Brownlee May 28, 2018 at 6:03 am #

      Might have caused a deadlock internally.

      I don’t understand your second question sorry, perhaps you can rephrase it?

  121. Avatar
    Sumit May 28, 2018 at 7:21 pm #

    Hi, Jason, excellent post and help lot for improving my predictive model.

    I have one question, is there any way I can optimise number of layer in network ?

    • Avatar
      Jason Brownlee May 29, 2018 at 6:25 am #

      Yes, use a grid search and choose the configuration with the lowest loss.

  122. Avatar
    John May 30, 2018 at 2:20 pm #

    I tried the gird search but got this error

    ipython-input-49-ea7e264ec276> in ()
    3 param_grid = dict(batch_size=batch_size, epochs=epochs)
    4 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
    —-> 5 grid_result =, testY)
    6 # summarize results
    7 print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))

    ~\Anaconda3\envs\tfdeeplearning\lib\site-packages\sklearn\model_selection\ in fit(self, X, y, groups, **fit_params)
    612 refit_metric = ‘score’
    –> 614 X, y, groups = indexable(X, y, groups)
    615 n_splits = cv.get_n_splits(X, y, groups)
    616 # Regenerate parameter iterable for each fit

    ~\Anaconda3\envs\tfdeeplearning\lib\site-packages\sklearn\utils\ in indexable(*iterables)
    196 else:
    197 result.append(np.array(X))
    –> 198 check_consistent_length(*result)
    199 return result

    ~\Anaconda3\envs\tfdeeplearning\lib\site-packages\sklearn\utils\ in check_consistent_length(*arrays)
    171 if len(uniques) > 1:
    172 raise ValueError(“Found input variables with inconsistent numbers of”
    –> 173 ” samples: %r” % [int(l) for l in lengths])

    ValueError: Found input variables with inconsistent numbers of samples: [17, 1]

  123. Avatar
    amina May 31, 2018 at 1:38 am #

    what refers 8 in the input dim ? i have a time serie problem a dataset with 41 observation how could i deal with this ?

    • Avatar
      Jason Brownlee May 31, 2018 at 6:20 am #

      It refers to 8 input variables.

      You could define a window of lag obs as input features. Perhaps experiment with different window sizes.

  124. Avatar
    lara May 31, 2018 at 2:08 am #

    could we use only one hiden layer that contain lstm bloc. i want to grid search hyperparametre for my lstm achitecture how could i specify this on code.

    • Avatar
      Jason Brownlee May 31, 2018 at 6:23 am #

      Yes, you could adapt the above examples to search layers/nodes in an LSTM.

  125. Avatar
    Angelo June 18, 2018 at 5:06 am #

    Astounding post, thank you! I wonder how I could evaluate the loss and accuracy evolution of the KerasClassifier according to epoch. Is there something like the history class returned from the method from SciKitLearn?

    • Avatar
      Jason Brownlee June 18, 2018 at 6:45 am #

      Not that I am aware, I believe you would need to use the Keras API directly and collect history objects from each run.

  126. Avatar
    Babu July 2, 2018 at 6:49 pm #

    Dear Jason,

    I found this article as very useful for my research. Thank you very much.

    Is it possible to find the best CNN architecture (No.of layers, Kernel size, Kernel initialization, Pooling Technique etc) for a given dataset by using GridSearch or RandomSearch?

    • Avatar
      Jason Brownlee July 3, 2018 at 6:23 am #

      There is no “best”, just good enough based on the time and resources we have available.

      • Avatar
        prateek bhadauria July 13, 2018 at 8:50 pm #

        Hello Jason Sir , i want to know that how could i apply CNN concept for non image data which contains large datasets in form of rows and coloumns , and how could i apply padding in 50,000 Rows and 20 coloumns , Kindly suggest an approach.

        • Avatar
          Jason Brownlee July 14, 2018 at 6:17 am #

          CNN is not appropriate unless there is some spatial relationship between the observations, e.g. time or space.

          • Avatar
            maxv April 29, 2019 at 3:01 am #


            thanks for this post and the replies to questions.

            I have a question on the properties of the cnn, if you have a dataset like the pollution dataset.

            If we have one binary variable as target in a classification with 10 exogenous variables and it is a daily forecast.
            Let us say we have 500 days of data.

            I can create a multivariate timeseries forecast and have 5 timesteps in my window so that my train shape will be (500,5,10)

            If I apply Conv1D, it should extract features out of all the 10 variables right ?
            or does it apply a Conv1D on each exogenous variable separately.

            What I try to understand is : does it capture interactions of exogenous variables ?

            Does the Conv2D only work for images or for times series too ?

            For each window of 5 timesteps, we have 5 timesteps and 10 exogenous variables so we could think this is 2D.

            Thanks J

          • Avatar
            maxv May 1, 2019 at 6:52 am #

            I think you are pointing me again to the same tutorial but my questions come from this one.

            Questions see above.

            Question 1 :
            If I apply Conv1D, it should extract features out of all the 10 variables right ?
            or does it apply a Conv1D on each exogenous variable separately.

            Question 2 : does it capture interactions of exogenous variables ?

            Question 3 :

            Does the Conv2D only work for images or for times series too ?

          • Avatar
            Jason Brownlee May 1, 2019 at 7:11 am #

            If you have multiple parallel time series, you can use separate Conv1D layers for each or one and merge into the model OR one Conv1D layer and treat each time series as a separate channel.

            Test both, but I recommend the latter.

            In both cases, the model will capture interactions.

            No Conv2D can work for any data that has a temporal or spatial relationship in two dimensions.

  127. Avatar
    James July 10, 2018 at 12:34 am #

    Thanks for the tutorial Jason, very informative. I wonder if you know of a relatively un-intrusive way of reducing the memory footprint of Grid (or equivalently Random) SearchCV, since they seem to store every model produced during the search in memory, instead of e.g just the best. I’m handling 3d data and trying 3d cnns, so the models quickly get too big to have e.g 25 in memory at once.

    Wondered about hacky divide and conquer strategies on a higher level, e.g if the full space for a parameter is


    do a grid search of [1,5,10], keep best model (m1) and discard the rest, search [15,20,25], keep best (m2), then keep best of [m1,m2], but this would still be fiddly/somewhat arbitrary to get correct for a given amount of memory and parameter space. I’d rather not have to implement my own parameter search, but if I go too far down this route I may as well end up doing so


    • Avatar
      Jason Brownlee July 10, 2018 at 6:49 am #

      Split the search across multiple scripts and machines or implement the for-loops of the search yourself (preferred).

  128. Avatar
    Kemas Farosi July 11, 2018 at 8:50 pm #

    Hi Jason,

    Great tutorial, I have a question, is it possible to find how many hidden layers in my deep neural networks by grid search ? because i want to find the best layer numbers in my DNN.


  129. Avatar
    Vugar Bayramov July 19, 2018 at 11:33 pm #

    Hi Jason!!

    Awesome content. Thanks very much for your effort.

    I have a question regarding the model with multidimensional output. What i mean is my y_train is an array with [value1, value2, value3] which i am trying to predict. While using the example above for selection of the best activation function for my probelm i got this error below:

    ValueError: y_true and y_pred have different number of output

    How can i solve this issue?



  130. Avatar
    Nick July 27, 2018 at 9:37 pm #

    While doing the grid search some combinations lead to a:
    ValueError: Input contains NaN, infinity or a value too large for dtype(‘float32’).

    so the grid search stops. Do you know if its possible just to skip these combinations to prevent the search from stopping or why this happens with some NN hyperparameters?


    • Avatar
      Jason Brownlee July 28, 2018 at 6:35 am #

      Perhaps. It might be easier to run the grid search yourself with some for-loops.

    • Avatar
      Pramod Hankare May 20, 2020 at 2:45 pm #

      Hi Nick, did you eventually find a solution for this?

  131. Avatar
    billa July 30, 2018 at 10:29 pm #

    Is it possible to tune the neurons inside the convolution layer for image classification?

  132. Avatar
    Zenon Uchida July 31, 2018 at 8:56 pm #

    Do filters (in the code below) denote to number of neurons?
    conv = Conv1D(filters=64, kernel_size=5, activation=’relu’)(embedding)
    if not, should filters also be tuned?
    I’m pretty sure kernel_size should be tuned.

    • Avatar
      Jason Brownlee August 1, 2018 at 7:43 am #

      No, they are the number of filters.

      Yes, the number of filter pas and kernel size can and should be tuned.

  133. Avatar
    Khaw August 7, 2018 at 11:53 pm #

    Thank you for your awesome explanation.

    Is it possible to do the same grid search for hyperparametrs in the R package Keras? I do not find the equivalent of the gridCV function

  134. Avatar
    Beatriz August 15, 2018 at 10:05 am #

    Hi Jason,

    I’m trying to do a grid search in my Seq2Seq model.

    I’m not sure if I understand the values X,Y I should put inside the function.

    In my case, I tried two numpy arrays with three dimensions (samples, max length of words, number of characters)

    Anyway, I’m not sure if that is the reason it is not working for me. I get the following error:

    TypeError: Cannot clone object ” (type ): it does not seem to be a scikit-learn estimator as it does not implement a ‘get_params’ methods.

    What do you think is going wrong?

    • Avatar
      Jason Brownlee August 15, 2018 at 1:53 pm #

      You might need to implement the for-loops of your grid search manually in order to have more control over the process.

  135. Avatar
    ammara August 15, 2018 at 9:16 pm #

    Thanks for such a great content!!
    I have a query that what is the “random_state” used in deep models, is this a
    hyper-parameter?if it is then how much it is important for model training. kindly guide me.
    Thanks in advance.

  136. Avatar
    Hoo Yu Heng August 21, 2018 at 4:17 am #

    For those who face the error of ‘cannot pickle object class’, make sure u use create_model and not create_model() in the KerasClassifier constructor:

    model = KerasClassifier(build_fn=create_model, verbose=0, epochs=100)

    model = KerasClassifier(build_fn=create_model(), verbose=0, epochs=100)

  137. Avatar
    Natanos August 21, 2018 at 8:31 pm #

    Sorry but when I run this program, it ends in “Using TensorFlow backend” and not finished in almost 3 hours.

    Is this normal? if not, what should I do? thanks

    • Avatar
      Jason Brownlee August 22, 2018 at 6:11 am #

      Perhaps try searching fewer parameters?

    • Avatar
      clemm September 19, 2018 at 7:00 pm #


      Same problem here with a gridsearch reduced to one epoch and one batch_size : the fit function never ends (keras version : 2.2.2). But the same code worked with an other computer (keras version : 2.0.5).

      • Avatar
        Jason Brownlee September 20, 2018 at 7:56 am #

        Perhaps run the grid search manually? Just some for-loops.

  138. Avatar
    Nathan Rasch August 27, 2018 at 10:28 am #

    Has anyone had a change to combine RandomizedSearchCV with SelectKBest?

    I have a “FeatureUnion” that includes “SelectKBest”, but then the “model.add(Dense….” call in the model build function complains about the “input_dim” being incorrect. I’m not sure how to attach to the value “SelectKBest” is currently considering as part of the random search, so that I can feed it to the build model function as a param for “input_dim”.

    features = []
    features.append((‘Scaler’, StandardScaler()))
    features.append((‘SelectKBest’, SelectKBest( k = 5)))
    featureUnion = FeatureUnion(features)

    def buildModel(optimizer = ‘Adam’, lr = 0.001, decay = 0.0, epsilon = None):
    opt = None
    model = Sequential()
    model.add(Dense(20, input_dim = ???? …)

    We get a nice, juicy error about the input dim when running this. 🙁

    If anyone has a working example or link to some one who does I’d be very grateful.


    • Avatar
      Nathan Rasch August 27, 2018 at 11:36 am #

      OK, solved my own issue:

      The key is just to remove the “input_dim” param from the “model.Add” method call. Then you can pass whatever values you want to test with as part of the params dict.


      # Notice we don’t have a “Input dim” param on the model.add call anymore
      def buildModel():
      model = Sequential()
      model.add(Dense(20, kernel_initializer=’normal’, activation = ‘relu’))

      # We add the SelectKBest__k values we want to test to the “params” dict:
      params = {
      ‘housingModel__epochs’ : [ 1, 2 ],
      ‘housingModel__batch_size’ : [ 15, 30, 65 ],
      ‘FeatureUnion__SelectKBest__k’: [5, 6, 7, 8, 9, 10]

      # And create the FeatureUnion
      features = []
      features.append((‘Scaler’, StandardScaler()))
      features.append((‘SelectKBest’, SelectKBest()))
      featureUnion = FeatureUnion(features)

      And that’s that. 🙂


    • Avatar
      Jason Brownlee August 27, 2018 at 1:56 pm #

      Perhaps write your own for-loop or use regularization to let the model ignore irrelevant features?

  139. Avatar
    Piyush September 11, 2018 at 2:15 am #

    @Jason Brownlee

    Great tutorial, though I suggest to combine all chunks of code and give a one final code which tunes all hyper parameters at once, e.g., define a grid with all hyper parameters rather than focusing on them one by one.

    Also, once the tuned hyper parameters are found, provide a code with predictive model with tuned hyper parameters which can be used in actual problem to predict class labels.

  140. Avatar
    Michael Pappas September 29, 2018 at 7:23 am #

    Does anyone else has two problems with the first example? I’m using theano as backend and I run into two errors:

    1) RuntimeError: You can’t initialize the GPU in a subprocess if the parent process already did it (goes away when I change .theanorc to cpu instead of cuda0)

    2) sklearn.externals.joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

    Any ideas?

    • Avatar
      Jason Brownlee September 30, 2018 at 5:59 am #

      Perhaps try running on the CPU as a first step?

      • Avatar
        Michael Pappas October 2, 2018 at 1:04 am #

        Then I get the second error as mentioned above.

        • Avatar
          FERNANDO FREGAPANE SCALIA October 6, 2018 at 2:52 am #

          I have the same error with all libraries updated.

          Any ideas, please?

  141. Avatar
    Vasileios Papanikolaou October 13, 2018 at 11:23 am #

    Hey Jason, thank you for this excellent post and your whole contribution to the ML/DL community! It really means a lot. I have quick q: Let’s say that once you define the model architecture and perform your first grid search over – say one hyperparameter. How can you redefine the model using the optimal hyperparameter, without rewriting the ‘create_model’ function? Thanks a lot in adavance

    • Avatar
      Jason Brownlee October 14, 2018 at 5:59 am #

      You can create the model directly, using the hyperparametres found via the search.

      Perhaps I’m missing something in your question?

  142. Avatar
    Janosh Riebesell November 4, 2018 at 9:04 pm #

    Slight correction:

    > We can see that the dropout rate of 0.2% and the maxnorm weight constraint of 4 resulted in the best accuracy of about 72%.

    Should be either 0.2 or 20 %.

  143. Avatar
    Robert Guenther November 6, 2018 at 5:51 am #


    Ditto all the good things said above. You definitely are fulfilling your mission of making us (data scientist) better at machine learning.

    Thank you,

  144. Avatar
    sukhpal November 15, 2018 at 12:43 am #

    when i run the above code i got this message

    model = Sequential()
    IndentationError: expected an indented block
    kindly help me to remove this error

  145. Avatar
    Long November 15, 2018 at 2:03 pm #

    Great tutorial as always,

    I also had 1 experience with Keras & scikit-learn wrapper when doing the train-test split. It turned out that I should not use params like validation_split/validation_data in Keras because cross validation from GridSearchCV already takes care of that.

    I would like to ask, should I use scoring metrics from Keras itself or should I use metrics provided by GridSearchCV?
    The docs here is not really clear

    And how about other parameters (if available) that appear to be overridden by scikit-learn wrapper), which ones should I pick, keras or scikit-learn?

    Thank you so much Jason.

    • Avatar
      Jason Brownlee November 16, 2018 at 6:11 am #

      Probably use sklearn’s metrics.

      What other parameters exactly?

  146. Avatar
    sukhpal November 16, 2018 at 9:12 pm #

    when i run the code i receive this message instead of output.kindly help me

    runfile(‘C:/Users/sukhpal/’, wdir=’C:/Users/sukhpal’)
    Using Theano backend.
    C:\Users\sukhpal\Anaconda2\lib\site-packages\sklearn\ DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
    “This module will be removed in 0.20.”, DeprecationWarning)

    • Avatar
      Jason Brownlee November 17, 2018 at 5:46 am #

      Looks like a warning, you can ignore for now.