How to Grid Search Hyperparameters for Deep Learning Models in Python with Keras

Hyperparameter optimization is a big part of deep learning.

The reason is that neural networks are notoriously difficult to configure, and a lot of parameters need to be set. On top of that, individual models can be very slow to train.

In this post, you will discover how to use the grid search capability from the scikit-learn Python machine learning library to tune the hyperparameters of Keras’s deep learning models.

After reading this post, you will know:

  • How to wrap Keras models for use in scikit-learn and how to use grid search
  • How to grid search common neural network parameters, such as learning rate, dropout rate, epochs, and number of neurons
  • How to define your own hyperparameter tuning experiments on your own projects

Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Aug/2016: First published
  • Update Nov/2016: Fixed minor issue in displaying grid search results in code examples
  • Update Oct/2016: Updated examples for Keras 1.1.0, TensorFlow 0.10.0 and scikit-learn v0.18
  • Update Mar/2017: Updated example for Keras 2.0.2, TensorFlow 1.0.1 and Theano 0.9.0
  • Update Sept/2017: Updated example to use Keras 2 “epochs” instead of Keras 1 “nb_epochs”
  • Update March/2018: Added alternate link to download the dataset
  • Update Oct/2019: Updated for Keras 2.3.0 API
  • Update Jul/2022: Updated for TensorFlow/Keras and SciKeras 0.8
How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras

How to grid search hyperparameters for deep learning models in Python with Keras
Photo by 3V Photo, some rights reserved.

Overview

In this post, you will discover how you can use the scikit-learn grid search capability. You will be given a suite of examples that you can copy and paste into your own project as a starting point.

Below is a list of the topics this post will cover:

  1. How to use Keras models in scikit-learn
  2. How to use grid search in scikit-learn
  3. How to tune batch size and training epochs
  4. How to tune optimization algorithms
  5. How to tune learning rate and momentum
  6. How to tune network weight initialization
  7. How to tune activation functions
  8. How to tune dropout regularization
  9. How to tune the number of neurons in the hidden layer

How to Use Keras Models in scikit-learn

Keras models can be used in scikit-learn by wrapping them with the KerasClassifier or KerasRegressor class from the module SciKeras. You may need to run the command pip install scikeras first to install the module.

To use these wrappers, you must define a function that creates and returns your Keras sequential model, then pass this function to the model argument when constructing the KerasClassifier class.

For example:

The constructor for the KerasClassifier class can take default arguments that are passed on to the calls to model.fit(), such as the number of epochs and the batch size.

For example:

The constructor for the KerasClassifier class can also take new arguments that can be passed to your custom create_model() function. These new arguments must also be defined in the signature of your create_model() function with default parameters.

For example:

You can learn more about these from the SciKeras documentation.

How to Use Grid Search in scikit-learn

Grid search is a model hyperparameter optimization technique.

In scikit-learn, this technique is provided in the GridSearchCV class.

When constructing this class, you must provide a dictionary of hyperparameters to evaluate in the param_grid argument. This is a map of the model parameter name and an array of values to try.

By default, accuracy is the score that is optimized, but other scores can be specified in the score argument of the GridSearchCV constructor.

By default, the grid search will only use one thread. By setting the n_jobs argument in the GridSearchCV constructor to -1, the process will use all cores on your machine. However, sometimes this may interfere with the main neural network training process.

The GridSearchCV process will then construct and evaluate one model for each combination of parameters. Cross validation is used to evaluate each individual model, and the default of 3-fold cross validation is used, although you can override this by specifying the cv argument to the GridSearchCV constructor.

Below is an example of defining a simple grid search:

Once completed, you can access the outcome of the grid search in the result object returned from grid.fit(). The best_score_ member provides access to the best score observed during the optimization procedure, and the best_params_ describes the combination of parameters that achieved the best results.

You can learn more about the GridSearchCV class in the scikit-learn API documentation.

Problem Description

Now that you know how to use Keras models with scikit-learn and how to use grid search in scikit-learn, let’s look at a bunch of examples.

All examples will be demonstrated on a small standard machine learning dataset called the Pima Indians onset of diabetes classification dataset. This is a small dataset with all numerical attributes that is easy to work with.

  1. Download the dataset and place it in your currently working directly with the name pima-indians-diabetes.csv (update: download from here).

As you proceed through the examples in this post, you will aggregate the best parameters. This is not the best way to grid search because parameters can interact, but it is good for demonstration purposes.

Note on Parallelizing Grid Search

All examples are configured to use parallelism (n_jobs=-1).

If you get an error like the one below:

Kill the process and change the code to not perform the grid search in parallel; set n_jobs=1.

Need help with Deep Learning in Python?

Take my free 2-week email course and discover MLPs, CNNs and LSTMs (with code).

Click to sign-up now and also get a free PDF Ebook version of the course.

How to Tune Batch Size and Number of Epochs

In this first simple example, you will look at tuning the batch size and number of epochs used when fitting the network.

The batch size in iterative gradient descent is the number of patterns shown to the network before the weights are updated. It is also an optimization in the training of the network, defining how many patterns to read at a time and keep in memory.

The number of epochs is the number of times the entire training dataset is shown to the network during training. Some networks are sensitive to the batch size, such as LSTM recurrent neural networks and Convolutional Neural Networks.

Here you will evaluate a suite of different mini-batch sizes from 10 to 100 in steps of 20.

The full code listing is provided below:

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running this example produces the following output:

You can see that the batch size of 10 and 100 epochs achieved the best result of about 70% accuracy.

How to Tune the Training Optimization Algorithm

Keras offers a suite of different state-of-the-art optimization algorithms.

In this example, you will tune the optimization algorithm used to train the network, each with default parameters.

This is an odd example because often, you will choose one approach a priori and instead focus on tuning its parameters on your problem (see the next example).

Here, you will evaluate the suite of optimization algorithms supported by the Keras API.

The full code listing is provided below:

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Note the function create_model() defined above does not return a compiled model like that one in the previous example. This is because setting an optimizer for a Keras model is done in the compile() function call; hence it is better to leave it to the KerasClassifier wrapper and the GridSearchCV model. Also, note that you specified loss="binary_crossentropy" in the wrapper as it should also be set during the compile() function call.

Running this example produces the following output:

The KerasClassifier wrapper will not compile your model again if the model is already compiled. Hence the other way to run GridSearchCV is to set the optimizer as an argument to the create_model() function, which returns an appropriately compiled model like the following:

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Note that in the above, you have the prefix model__ in the parameter dictionary param_grid. This is required for the KerasClassifier in the SciKeras module to make clear that the parameter needs to route into the create_model() function as arguments, rather than some parameter to set up in compile() or fit(). See also the routed parameter section of SciKeras documentation.

Running this example produces the following output:

The results suggest that the ADAM optimization algorithm is the best with a score of about 70% accuracy.

How to Tune Learning Rate and Momentum

It is common to pre-select an optimization algorithm to train your network and tune its parameters.

By far, the most common optimization algorithm is plain old Stochastic Gradient Descent (SGD) because it is so well understood. In this example, you will look at optimizing the SGD learning rate and momentum parameters.

The learning rate controls how much to update the weight at the end of each batch, and the momentum controls how much to let the previous update influence the current weight update.

You will try a suite of small standard learning rates and momentum values from 0.2 to 0.8 in steps of 0.2, as well as 0.9 (because it can be a popular value in practice). In Keras, the way to set the learning rate and momentum is the following:

In the SciKeras wrapper, you will route the parameters to the optimizer with the prefix optimizer__.

Generally, it is a good idea to also include the number of epochs in an optimization like this as there is a dependency between the amount of learning per batch (learning rate), the number of updates per epoch (batch size), and the number of epochs.

The full code listing is provided below:

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running this example produces the following output:

You can see that SGD is not very good on this problem; nevertheless, the best results were achieved using a learning rate of 0.001 and a momentum of 0.0 with an accuracy of about 68%.

How to Tune Network Weight Initialization

Neural network weight initialization used to be simple: use small random values.

Now there is a suite of different techniques to choose from. Keras provides a laundry list.

In this example, you will look at tuning the selection of network weight initialization by evaluating all the available techniques.

You will use the same weight initialization method on each layer. Ideally, it may be better to use different weight initialization schemes according to the activation function used on each layer. In the example below, you will use a rectifier for the hidden layer. Use sigmoid for the output layer because the predictions are binary. The weight initialization is now an argument to create_model() function, where you need to use the model__ prefix to ask the KerasClassifier to route the parameter to the model creation function.

The full code listing is provided below:

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running this example produces the following output:

We can see that the best results were achieved with a uniform weight initialization scheme achieving a performance of about 72%.

How to Tune the Neuron Activation Function

The activation function controls the non-linearity of individual neurons and when to fire.

Generally, the rectifier activation function is the most popular. However, it used to be the sigmoid and the tanh functions, and these functions may still be more suitable for different problems.

In this example, you will evaluate the suite of different activation functions available in Keras. You will only use these functions in the hidden layer, as a sigmoid activation function is required in the output for the binary classification problem. Similar to the previous example, this is an argument to the create_model() function, and you will use the model__ prefix for the GridSearchCV parameter grid.

Generally, it is a good idea to prepare data to the range of the different transfer functions, which you will not do in this case.

The full code listing is provided below:

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running this example produces the following output:

Surprisingly (to me at least), the “linear” activation function achieved the best results with an accuracy of about 71%.

How to Tune Dropout Regularization

In this example, you will look at tuning the dropout rate for regularization in an effort to limit overfitting and improve the model’s ability to generalize.

For the best results, dropout is best combined with a weight constraint such as the max norm constraint.

For more on using dropout in deep learning models with Keras see the post:

This involves fitting both the dropout percentage and the weight constraint. We will try dropout percentages between 0.0 and 0.9 (1.0 does not make sense) and maxnorm weight constraint values between 0 and 5.

The full code listing is provided below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running this example produces the following output.

We can see that the dropout rate of 20% and the MaxNorm weight constraint of 3 resulted in the best accuracy of about 77%. You may notice some of the result is nan. Probably it is due to the issue that the input is not normalized and you may run into a degenerated model by chance.

How to Tune the Number of Neurons in the Hidden Layer

The number of neurons in a layer is an important parameter to tune. Generally the number of neurons in a layer controls the representational capacity of the network, at least at that point in the topology.

Also, generally, a large enough single layer network can approximate any other neural network, at least in theory.

In this example, we will look at tuning the number of neurons in a single hidden layer. We will try values from 1 to 30 in steps of 5.

A larger network requires more training and at least the batch size and number of epochs should ideally be optimized with the number of neurons.

The full code listing is provided below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running this example produces the following output.

We can see that the best results were achieved with a network with 30 neurons in the hidden layer with an accuracy of about 73%.

Tips for Hyperparameter Optimization

This section lists some handy tips to consider when tuning hyperparameters of your neural network.

  • k-fold Cross Validation. You can see that the results from the examples in this post show some variance. A default cross-validation of 3 was used, but perhaps k=5 or k=10 would be more stable. Carefully choose your cross validation configuration to ensure your results are stable.
  • Review the Whole Grid. Do not just focus on the best result, review the whole grid of results and look for trends to support configuration decisions.
  • Parallelize. Use all your cores if you can, neural networks are slow to train and we often want to try a lot of different parameters. Consider spinning up a lot of AWS instances.
  • Use a Sample of Your Dataset. Because networks are slow to train, try training them on a smaller sample of your training dataset, just to get an idea of general directions of parameters rather than optimal configurations.
  • Start with Coarse Grids. Start with coarse-grained grids and zoom into finer grained grids once you can narrow the scope.
  • Do not Transfer Results. Results are generally problem specific. Try to avoid favorite configurations on each new problem that you see. It is unlikely that optimal results you discover on one problem will transfer to your next project. Instead look for broader trends like number of layers or relationships between parameters.
  • Reproducibility is a Problem. Although we set the seed for the random number generator in NumPy, the results are not 100% reproducible. There is more to reproducibility when grid searching wrapped Keras models than is presented in this post.

Summary

In this post, you discovered how you can tune the hyperparameters of your deep learning networks in Python using Keras and scikit-learn.

Specifically, you learned:

  • How to wrap Keras models for use in scikit-learn and how to use grid search.
  • How to grid search a suite of different standard neural network parameters for Keras models.
  • How to design your own hyperparameter optimization experiments.

Do you have any experience tuning hyperparameters of large neural networks? Please share your stories below.

Do you have any questions about hyperparameter optimization of neural networks or about this post? Ask your questions in the comments and I will do my best to answer.

815 Responses to How to Grid Search Hyperparameters for Deep Learning Models in Python with Keras

  1. Avatar
    Yanbo August 9, 2016 at 9:10 am #

    As always excellent post,. I’ve been doing some hyper-parameter optimization by hand, but I’ll definitely give Grid Search a try.

    Is it possible to set up a different threshold for sigmoid output in Keras? Rather then using 0.5 I was thinking of trying 0.7 or 0.8

    • Avatar
      Jason Brownlee August 15, 2016 at 11:10 am #

      Thanks Yanbo.

      I don’t think so, but you could implement your own activation function and do anything you wish.

      • Avatar
        Shudhan September 5, 2016 at 6:20 pm #

        My question is related to this thread. How to get the probablities as the output? I dont want the class output. I read for a regression problem that no activation function is needed in the output layer. Similiar implementation will get me the probabilities ?? or the output will exceed 0 and 1??

        • Avatar
          Jason Brownlee September 6, 2016 at 9:41 am #

          Hi Shudhan, you can use a sigmoid activation and treat the outputs like probabilities (they will be in the range of 0-1).

      • Avatar
        Swapna November 2, 2017 at 11:51 pm #

        excellent post

  2. Avatar
    eclipsedu August 18, 2016 at 5:55 pm #

    Sound awesome!Will this grid search method use the full cpu(which can be 8/16 cores) ?

    • Avatar
      Jason Brownlee August 19, 2016 at 5:23 am #

      It can if you set n_jobs=-1

      • Avatar
        Hemanth Naidu S August 20, 2019 at 10:52 pm #

        Hi Jason,

        In grid search, we do get train score right?
        Why it’s not displaying in model.cv_results_ only test score we are getting..

        • Avatar
          Jason Brownlee August 21, 2019 at 6:42 am #

          You get a cross-validation score for each configuration tested.

  3. Avatar
    Reza August 18, 2016 at 6:00 pm #

    Hi,
    Great post,
    Can I use this tips on CNNs in keras as well?
    Thanks!

    • Avatar
      Jason Brownlee August 19, 2016 at 5:24 am #

      They can be a start, but remember it is a good idea to use a repeating structure in a large CNN and you will need to tune the number of filters and pool size.

      • Avatar
        maxv April 29, 2019 at 3:30 am #

        Hi Jason thanks for everything.
        Could you explain what do you mean by repeatting structure in your reply please ?

        Quick question on the GridSearchCV for CNN, param_grid=param_grid using the sklearn wrapper gives this error : ”ValueError: filters is not a legal parameter ”
        How can we use the wrapper for the filters params of Conv1D ?
        Thanks

      • Avatar
        Salvin Sanjesh Prasad April 8, 2021 at 1:02 pm #

        Dear Jason,

        This is an An excellent post. I have question: how can we grid search the optimum the number of filters in three different layers of CNN. For example: [60, 70 ,80] in layer 1, [20, 30, 40] in layer 2 and [5,10,20] in layer 3. I have searched everywhere for codes using grid search but could not find this. I really need to use grid search for this. I would be highly grateful for your kind advice. If possible, also reply in via my email address that I have provided (as this was a requirement for me to comment)

        • Avatar
          Jason Brownlee April 9, 2021 at 5:16 am #

          Thanks.

          You might need to write some for-loops, e.g. do the search manually.

          Also, we never find an “optimal” configuration, just a good enough configuration given the time/resources available.

  4. Avatar
    Prashant August 22, 2016 at 4:55 pm #

    Hi Jason, First of all great post! I applied this by dividing the data into train and test and used train dataset for grid fit. Plan was to capture best parameters in train and apply them on test to see accuracy. But it seems grid.fit and model.fit applied with same parameters on same dataset (in this case train) give different accuracy results. Any idea why this happens. I can share the code if it helps.

    • Avatar
      Jason Brownlee August 23, 2016 at 6:00 am #

      You will see small variation in the performance of a neural net with the same parameters from run to run. This is because of the stochastic nature of the technique and how very hard it is to fix the random number seed successfully in python/numpy/theano.

      You will also see small variation due to the data used to train the method.

      Generally, you could use all of your data to grid search to try to reduce the second type of variation (slower). You could store results and use statistical significance tests to compare populations of results to see if differences are significant to sort out the first type or variation.

      I hope that helps.

  5. Avatar
    vinay August 22, 2016 at 9:05 pm #

    hi, I think this will best tutorial i ever found on web….Thanks for sharing….is it possible to use these tips on LSTM, Bilstm cnnlstm

    • Avatar
      Jason Brownlee August 23, 2016 at 5:57 am #

      Thanks Vinay, I’m glad it’s useful.

      Absolutely, you could use these tactics on other algorithm types.

  6. Avatar
    shudhan September 2, 2016 at 3:26 pm #

    Best place to learn the tuning.. my question – is it good to follow the order you mentioned to tune the parameters? I know the most significant parameters should be tuned first

    • Avatar
      Jason Brownlee September 3, 2016 at 6:56 am #

      Thanks. The order is a good start. It is best to focus on areas where you think you will get the biggest improvement first – which is often the structure of the network (layers and neurons).

      • Avatar
        Reed Guo September 2, 2018 at 5:59 pm #

        Hi, Jason

        Thanks for your post. It is excellent.

        I have a question.

        You tune batch size and epoch first. But if you set a inappropriate number of neurons or activation function, then batch size and epoch tuning won’t make sense.

        So I think we should tune all of these hyper-parameters at the same time.

        How do you think about it?

  7. Avatar
    Satheesh September 27, 2016 at 12:24 am #

    when I am using the categorical_entropy loss function and running the grid search with n_jobs more than 1 its throwing error “cannot pickle object class”, but the same thing is working fine with binary_entropyloss. Can you tell me if I am making any mistake in my code:
    def create_model(optimizer=’adam’):
    # create model
    model.add(Dense(30, input_dim=59, init=’normal’, activation=’relu’))
    model.add(Dense(15, init=’normal’, activation=’sigmoid’))
    model.add(Dense(3, init=’normal’, activation=’sigmoid’))
    # Compile model
    model.compile(loss=’categorical_crossentropy’, optimizer=optimizer, metrics=[‘accuracy’])
    return model

    # Create Keras Classifier
    print “——————— Running Grid Search on Keras Classifier for epochs and batch ——————”
    clf = model = KerasClassifier(build_fn = create_model, verbose=0)
    param_grid = {“batch_size”:range(10, 30, 10), “nb_epoch”:range(50, 150, 50)}
    optimizer = [‘SGD’, ‘RMSprop’, ‘Adagrad’, ‘Adadelta’, ‘Adam’, ‘Adamax’, ‘Nadam’]
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=4)
    grid_result = grid.fit(x_train, y_train)
    print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))

    • Avatar
      Jason Brownlee September 27, 2016 at 7:44 am #

      Strange Satheesh, I have not seen that before.

      Let me know if you figure it out.

      • Avatar
        Kai September 18, 2017 at 10:01 pm #

        I came cross and solved the problem several days ago. Please use “epochs” instead of “nb_epoch” in param_grid dict. Personally, I guess “cannot pickle object class” means the neuron network cannot be built because of some errors. Open to discussion.

        • Avatar
          Jason Brownlee September 19, 2017 at 7:40 am #

          Glad to hear it.

          I updated the example to use “epochs” to work with Keras 2.

  8. Avatar
    L Fenu November 9, 2016 at 7:47 pm #

    excellent post, thanks. It’s been very helpful to get me started on hyperparameterisation.

    One thing I haven’t been able to do yet is to grid search over parameters which are not proper to the NN but to the trainign set. For example, I can fine-tune the input_dim parameter by creating a function generator which takes care of creating the function that will create the model, like this:

    # fp_subset is a subset of columns of my whole training set.

    create_basic_ANN_model = kt.ANN_model_gen( # defined elsewhere
    input_dim=len(fp_subset), output_dim=1, layers_num=2, layers_sizes=[len(fp_subset)/5, len(fp_subset)/10, ],
    loss=’mean_squared_error’, optimizer=’adadelta’, metrics=[‘mean_squared_error’, ‘mean_absolute_error’]
    )

    model = KerasRegressor(build_fn=create_basic_ANN_model, verbose=1)
    # define the grid search parameters
    batch_size = [10, 100]
    epochs = [5, 10]

    param_grid = dict(batch_size=batch_size, nb_epoch=epochs)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=7)

    grid_results = grid.fit(trX, trY)

    this works but only as a for loop over the different fp_subset, which I must define manually.
    I could easily pick the best out of every run but it wuld be great if I could fold them all inside a big grid definition and fit, so as to automatically pick the largest.

    However, until now haven’t been able to figure out a way to get that in my head.
    If the wrapper function is useful to anyone, I can post a generalised version here.

    • Avatar
      Jason Brownlee November 10, 2016 at 7:42 am #

      Good question.

      You might just need to us a loop around the whole lot for different projections/views of your training data.

      • Avatar
        L Fenu November 11, 2016 at 1:05 am #

        Thanks. I ended up coding my own for loop, saving the results of each grid in a dict, sorting the hash by the perofrmance metrics, and picking the best model.

        Now, the next question is: How do I save the model’s architecture and weights to a .json .hdf5 file? I know how to do that for a simple model. But how do I extract the best model out of the gridsearch results?

        • Avatar
          Jason Brownlee November 11, 2016 at 10:04 am #

          Well done.

          No need. Once you know the parameters, you can use them to train a new standalone model on all of your training data and start making predictions.

          • Avatar
            Fenu Luca November 15, 2016 at 3:23 am #

            I may have found a way. How about this?

            best_model = grid_result.best_estimator_.model
            best_model_file_path = ‘your_pick_here’
            model2json = best_model.to_json()
            with open( best_model_file_path+’.json’, ‘w’) as json_file:
            json_file.write(model2json)
            best_model.save_weights(best_model_file_path+’.h5′)

  9. Avatar
    volador November 14, 2016 at 6:21 pm #

    Hi Jason, I think this is very best deep learning tutorial on the web. Thanks for your work. I have a question is :how to use the heuristic algorithm to optimize Hyperparameters for Deep Learning Models in Python With Keras, these algorithms like: Genetic algorithm, Particle swarm optimization, and Cuckoo algorithm etc. If the idea could be experimented, could you give an example

    • Avatar
      Jason Brownlee November 15, 2016 at 7:50 am #

      Thanks for your support volador.

      You could search the hyperparameter space using a stochastic optimization algorithm like a genetic algorithm and use the mean performance as the cost function orf fitness function. I don’t have a worked example, but it would be relatively easy to setup.

  10. Avatar
    Jan de Lange November 15, 2016 at 6:50 am #

    Hi Jason, very helpful intro into gridsearch for Keras. I have used your guidance in my code, but rather than using the default ‘accuracy’ to be optimized, my model requires a specific evaluation function to be optimized. You hint at this possibility in the introduction, but there is no example of it. I have followed the SciKit-learn documentation, but I fail to come up with the correct syntax.

    I have posted my question at StackOverflow, but since it is quite specific, it requires understanding of SciKit-learn in combination with Keras.

    Perhaps you can have a look? I think it would nicely extend your tutorial.

    http://stackoverflow.com/questions/40572743/scikit-learn-grid-search-own-scoring-object-syntax

    Thanks, Jan

  11. Avatar
    Jan de Lange November 16, 2016 at 7:31 am #

    Yup, same sources as I referenced in my post at Stackoverflow.

  12. Avatar
    Anthony Ohazulike December 6, 2016 at 12:46 am #

    Good tutorial again Jason…keep on the good job!

  13. Avatar
    nrcjea001 December 13, 2016 at 10:48 pm #

    Hi Jason

    First off, thank you for the tutorial. It’s very helpful.

    I was also hoping you would assist on how to adapt the keras grid search to stateful lstms as discussed in

    https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/

    I’ve coded the following:

    # create model
    model = KerasRegressor(build_fn=create_model, nb_epoch=1, batch_size=bats,
    verbose=2, shuffle=False)

    # define the grid search parameters
    h1n = [5, 10] # number of hidden neurons
    param_grid = dict(h1n=h1n)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=5)

    for i in range(100):
    grid.fit(trainX, trainY)
    grid.reset_states()

    Is grid.reset_states() corrrect? or would you suggest creating function callback for reset states.

    Thanks,

    • Avatar
      Jason Brownlee December 14, 2016 at 8:27 am #

      Great question.

      With stateful LSTMs we must control the resetting of states after each epoch. The sklearn framework does not open this capacity to us – at least it looks that way to me off the cuff.

      I think you may have to grid search stateful LSTM params manually with a ton of for loops. Sorry.

      If you discover something different, let me know. i.e. there may be a way in the back door to the sklearn grid search functionality that we can inject our own custom epoch handing.

  14. Avatar
    Thomas Maier December 21, 2016 at 2:53 am #

    Hi Jason

    Thanks a lot for this and all the other great tutorials!

    I tried to combine this gridsearch/keras approach with a pipeline. It works if I tune nb_epoch or batch_size, but I get an error if I try to tune the optimizer or something else in the keras building function (I did not forget to include the variable as an argument):

    def keras_model(optimizer = ‘adam’):
    model = Sequential()
    model.add(Dense(80, input_dim=79, init= ‘normal’))
    model.add(Activation(‘relu’))
    model.add(Dense(1, init=’normal’))
    model.add(Activation(‘linear’))
    model.compile(optimizer=optimizer, loss=’mse’)
    return model

    kRegressor = KerasRegressor(build_fn=keras_model, nb_epoch=500, batch_size=10, verbose=0)

    estimators = []
    estimators.append((‘imputer’, preprocessing.Imputer(strategy=’mean’)))
    estimators.append((‘scaler’, preprocessing.StandardScaler()))
    estimators.append((‘kerasR’, kRegressor))
    pipeline = Pipeline(estimators)

    param_grid = dict(kerasR__optimizer = [‘adam’,’rmsprop’])

    grid = GridSearchCV(pipeline, param_grid, cv=5, scoring=’neg_mean_squared_error’)

    Do you know this problem?

    Thanks, Thomas

    • Avatar
      Jason Brownlee December 21, 2016 at 8:44 am #

      Thanks Thomas. I’ve not seen this issue.

      I think we’re starting to push the poor Keras sklearn wrapper to the limit.

      Maybe the next step is to build out a few functions to do manual grid searching across network configs.

      • Avatar
        James April 14, 2018 at 12:26 am #

        Has there been a blog post on this?

    • Avatar
      Anastasiya December 12, 2018 at 9:41 pm #

      Have you solved this issue? I’m exploring Keras now as wel and came across exactly the same problem.

  15. Avatar
    Jimi December 21, 2016 at 3:26 pm #

    Great resource!

    Any thoughts on how to get the “history” objects out of grid search? It could be beneficial to plot the loss and accuracy to see when a model starts to flatten out.

    • Avatar
      Jason Brownlee December 22, 2016 at 6:30 am #

      Not sure off the cuff Jimi, perhaps repeat the run standalone for the top performing configuration.

  16. Avatar
    DeepLearning January 4, 2017 at 6:08 am #

    Thanks for the post. Can we optimize the number of hidden layers as well on top of number of neurons in each layers?
    Thanks

    • Avatar
      Jason Brownlee January 4, 2017 at 9:00 am #

      Yes, it just may be very time consuming depending on the size of the dataset and the number of layers/nodes involved.

      Try it on some small datasets from the UCI ML Repo.

      • Avatar
        DeepLearning January 4, 2017 at 12:02 pm #

        Thanks. Would you mind looking at below code?

        def create_model(neurons=1, neurons2=1):
        # create model
        model = Sequential()
        model.add(Dense(neurons1, input_dim=8))
        model.add(Dense(neurons2))
        model.add(Dense(1, init=’uniform’, activation=’sigmoid’))
        # Compile model
        model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
        return model
        # define the grid search parameters
        neurons1 = [1, 3, 5, 7]
        neurons2=[0,1,2]
        param_grid = dict(neurons1=neurons1, neurons2=neurons2)
        grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
        grid_result = grid.fit(X, Y)

        This code runs without error (I excluded certain X, y parts for brewity) but when I run “grid.fit(X, Y), it gives AssertionError.

        I’d appreciate if you can show me where I am wrong.

        • Avatar
          DeepLearning January 4, 2017 at 12:26 pm #

          Update” It worked when I deleted 0 from neurons2. Thanks

        • Avatar
          Jason Brownlee January 5, 2017 at 9:16 am #

          A Dense() with a value of 0 neurons might blow up. Try removing the 0 from your neurons2 array.

          A good debug strategy is to cut code back to the minimum, make it work, then and add complexity. Here. Try searching a grid of 1 and 1 neurons, make it all work, then expand the grid you search.

          Let me know how you go.

  17. Avatar
    DeepLearning January 9, 2017 at 11:04 am #

    I keep getting error messages and I tried a big for loops that scan for all possible combinations of layer numbers, neuron numbers, other optimization stuff within defined limits. It is very time consuming code, but I could not figure it out how to adjust layer structure and other optimization parameters in the same code using GridSearch. If you would provide a code for that in your blog one day, that would be much appreciated. Thanks.

  18. Avatar
    Rajneesh January 11, 2017 at 10:48 am #

    Hi Jason,
    Many thanks for this awesome tutorial !

  19. Avatar
    Andy January 22, 2017 at 1:02 pm #

    Hi Jason,

    Great tutorial! I’m running into a slight issue. I tried running this on my own variation of the code and got the following error:

    TypeError: get_params() got an unexpected keyword argument ‘deep’

    I copied and pasted your code using the given data set and got the same error. The code is showing an error on the grid_result = grid.fit(X, Y) line. I looked through the other comments and didn’t see anyone with the same issue. Do you know where this could be coming from?

    Thanks for your help!

    • Avatar
      YechiBechi January 23, 2017 at 2:18 am #

      same issue here,

      great tutorial, life saver.

    • Avatar
      Jason Brownlee January 23, 2017 at 8:35 am #

      Hi Andy, sorry to hear that.

      Is this happening with a specific example or with all of them?

      Are you able to check your version of Python/sklearn/keras/tf/theano?

      UPDATE:

      I can confirm the first example still works fine with Python 2.7, sklearn 0.18.1, Keras 1.2.0 and TensorFlow 0.12.1.

      • Avatar
        Andy January 25, 2017 at 7:12 am #

        The only differences are I am running Python 3.5 and Keras 1.2.1. The example I ran previously was the grid search for the number of neurons in a layer. But I just ran the first example and got the same error.

        Do you think the issue is due to the next version of Python? If so, what should my next steps be?

        Thanks for your help and quick response!

  20. Avatar
    kono February 8, 2017 at 3:14 am #

    Jason,

    Can you use early_stopping to decide n_epoch?

    • Avatar
      Jason Brownlee February 8, 2017 at 9:36 am #

      Yes, that is a good method to find a generalized model.

  21. Avatar
    Jayant February 23, 2017 at 4:33 am #

    Hi Jason,

    Really great article. I am a big fan of your blog and your books. Can you please explain your following statement?

    “A default cross-validation of 3 was used, but perhaps k=5 or k=10 would be more stable. Carefully choose your cross validation configuration to ensure your results are stable.”

    I didn’t see anywhere cross-validation being used.

    • Avatar
      Jason Brownlee February 23, 2017 at 8:56 am #

      Hi Jayant,

      Grid search uses k-fold cross-validation to evaluate the performance of each combination of parameters on unseen data.

  22. Avatar
    Jing February 28, 2017 at 2:09 am #

    Hi Jason,
    thanks for this awesome tutorial !
    I have two questions: 1. In “model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])”, accuracy is used for evaluate results. But GridSearchCV also has scoring parameter, if I set “scoring=’f1’”,which one is used for evaluate the results of grid search? 2.How to set two evaluate parameters ,e.g. ‘accuracy’and ’f1’ evaluating the results of grid search?

    • Avatar
      Jason Brownlee February 28, 2017 at 8:13 am #

      Hi Jing,

      You can set the “scoring” argument for GridSearchCV with a string of the performance measure to use, or the name of your own scoring function. You can learn about this argument here:
      http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

      You can see a full list of supported scoring measures here:
      http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

      As far as I know you can only grid search using a single measure.

      • Avatar
        Jing February 28, 2017 at 12:50 pm #

        Thank you so much for your help!

      • Avatar
        Jing February 28, 2017 at 1:54 pm #

        I find no matter what evaluate parameters used in GridSearchCV “scoring”,”metrics” in “model.compile” must be [‘accuracy’],otherwise the program gives “ValueError: The model is not configured to compute accuracy.You should pass ‘metrics=[“accuracy”]’ to the ‘model.compile()’method. So, if I set:
        model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
        grid = GridSearchCV(estimator=model, param_grid=param_grid, scoring=’recall’)
        the grid_result.best_score_ =0.72.My question is: 0.72 is accuracy or recall ? Thank you!

        • Avatar
          Jason Brownlee March 1, 2017 at 8:31 am #

          Hi Jing,

          When using GridSearchCV with Keras, I would suggest not specifying any metrics when compiling your Keras model.

          I would suggest only setting the “scoring” argument on the GridSearchCV. I would expect the metric reported by GridSearchCV to be the one that you specified.

          I hope that helps.

  23. Avatar
    Dan March 8, 2017 at 4:13 am #

    Great Blogpost. Love it. You are awesome Jason. I got one question to GridsearchCV. As far as i understand the crossvalidation already takes place in there. That’s why we do not need any kfold anymore.
    But with this technique we would have no validation set correct? e.g. with a default value of 3 we would have 2 training sets and one test set.

    That means in kfold as well as in GridsearchCV there is no requirement for creating a validation set anymore?

    Thanks

    • Avatar
      Jason Brownlee March 8, 2017 at 9:44 am #

      Hi Dan,

      Yes, GridSearchCV performs cross validation and you must specify the number of folds. You can hold back a validation set to double check the parameters found by the search if you like. This is optional.

      • Avatar
        Dan March 9, 2017 at 3:25 am #

        Thank you for the quick response Jason. Especially considering the huge amount of questions you get.

  24. Avatar
    Johan Steunenberg March 22, 2017 at 8:25 pm #

    What I’m missing in the tutorial is the info, how to get the best params in the model with KERAS. Do I pickup the best parameters and call ‘create_model’ again with those parameters or can I call the GridSearchCV’s ‘predict’ function? (I will try out for myself but for completeness it would be good to have it in the tutorial as well.)

    • Avatar
      Jason Brownlee March 23, 2017 at 8:49 am #

      I see, but we don’t know the best parameters, we must search for them.

  25. Avatar
    Maycown Miranda April 5, 2017 at 2:09 am #

    Hi, Jason. I am getting
    /usr/local/lib/python2.7/dist-packages/keras/wrappers/scikit_learn.py in check_params(self=, params={‘batch_size’: 10, ‘epochs’: 10})
    80 legal_params += inspect.getargspec(fn)[0]
    81 legal_params = set(legal_params)
    82
    83 for params_name in params:
    84 if params_name not in legal_params:
    —> 85 raise ValueError(‘{} is not a legal parameter’.format(params_name))
    params_name = ‘epochs’
    86
    87 def get_params(self, _):
    88 “””Gets parameters for this estimator.
    89

    ValueError: epochs is not a legal parameter

    • Avatar
      Jason Brownlee April 9, 2017 at 2:32 pm #

      It sounds like you need to upgrade to Keras v2.0 or higher.

      • Avatar
        Chandra Sutrisno Tjhong November 28, 2017 at 10:46 am #

        I experienced the same problem.I upgraded my keras and the same problem still occurs.

    • Avatar
      neumatron11 February 5, 2019 at 12:42 pm #

      I was getting the ‘not a legal paramater’ error when I was trying to pass required inputs into my create_model function in the wrapper.

      model = KerasClassifier(build_fn=create_model(input_dim = x ), verbose=0)

      when I removed it and included it in the grid search instead it ran fine, I just added it to the dictionary of parameters

      input_dim = [x]

  26. Avatar
    Usman May 3, 2017 at 7:56 am #

    Nice tutorial. I would like to optimize the number of hidden layers in the model. Can you please guide in this regard, thanks

    • Avatar
      Jason Brownlee May 4, 2017 at 7:59 am #

      Thanks Usman.

      Consider exploring specific patterns, e.g. small-big-small, etc.

  27. Avatar
    Carl May 5, 2017 at 12:58 pm #

    Do you know any way this could be possible using a network with multiple inputs?

    http://imgur.com/a/JJ7f1

    • Avatar
      Sukhpal December 16, 2019 at 2:18 am #

      The optmization of network topology ,learning rate ,batch size and epochs are done in stages?sir please tell me why these were done in stages

      • Avatar
        Jason Brownlee December 16, 2019 at 6:18 am #

        To make the explanation to the reader simpler.

        • Avatar
          Dan Thomas May 28, 2020 at 7:21 am #

          Also probably to reduce search space, and thus computational time.

  28. Avatar
    DanielP May 9, 2017 at 4:26 pm #

    Hi Jason, great to see posts like this – amazing job!

    Just noticed, when you tune the optimisation algorithm SGD performs at 34% accuracy. As no parameters are being passed to the SGD function, I’d assume it takes the default configuration, lr=0.01, momentum=0.0.

    Later on, as you look for better configurations for SGD, best result (68%) is found when {‘learn_rate’: 0.01, ‘momentum’: 0.0}.

    It seems to me that these two experiments use exactly the same network configuration (including the same SGD parameters), yet their resulting accuracies differ significantly. Do you have any intuition as to why this may be happening?

  29. Avatar
    Pradanuari May 14, 2017 at 3:13 am #

    Hi Jason!
    absolutely love your tutorial! But would you mind to give tutorial for how to tune the number of hidden layer?

    Thanks

  30. Avatar
    Pradanuari May 14, 2017 at 11:32 pm #

    Thank you so much Jason!

  31. Avatar
    Ibrahim El-Fayoumi May 17, 2017 at 12:53 pm #

    Hello Jason
    I tried to use your idea in a similar problem but I am getting error : AttributeError: ‘NoneType’ object has no attribute ‘loss’
    it looks like the model does not define loss function?

    This is the error I get:
    b\site-packages\keras-2.0.4-py3.5.egg\keras\wrappers\scikit_learn.py in fit(self=, x=memmap([[[ 0., 0., 0., …, 0., 0., 0.],
    …, 0., 0., …, 0., 0., 0.]]], dtype=float32), y=array([[ 0., 0., 0., …, 0., 0., 0.],
    …0.],
    [ 0., 0., 0., …, 0., 1., 0.]]), **kwargs={})
    135 self.model = self.build_fn(
    136 **self.filter_sk_params(self.build_fn.__call__))
    137 else:
    138 self.model = self.build_fn(**self.filter_sk_params(self.build_fn))
    139
    –> 140 loss_name = self.model.loss
    loss_name = undefined
    self.model.loss = undefined
    141 if hasattr(loss_name, ‘__name__’):
    142 loss_name = loss_name.__name__
    143 if loss_name == ‘categorical_crossentropy’ and len(y.shape) != 2:
    144 y = to_categorical(y)

    AttributeError: ‘NoneType’ object has no attribute ‘loss’
    ___________________________________________________________________________

    Process finished with exit code 1

    Regards
    Ibrahim

    • Avatar
      Jason Brownlee May 18, 2017 at 8:26 am #

      Does the example in the blog post work on your system?

      • Avatar
        Ibrahim El-Fayoumi May 18, 2017 at 12:18 pm #

        Ok, I think your code needs to be placed after
        if __name__ == ‘__main__’:

        to work with multiprocess…

        But thanks for the post is great…

        • Avatar
          Jason Brownlee May 19, 2017 at 8:12 am #

          Not on Linux and OS X when I tested it, but thanks for the tip.

        • Avatar
          Gautam August 25, 2017 at 11:33 pm #

          n_jobs=-1 doesnt work on Windows.

          @Ibrahim: Can you please explain, what part of the code needs to be behind
          if __name__ == ‘__main__’: )

          • Avatar
            Martin October 19, 2019 at 4:52 am #

            Assuming you have got several functions (i have a single python script acting as main file and the other stuff in a separate file, but at least functions like Jason does) you need to put this at the very begining of your main routine where everything comes together and is set-up. Note, since it is an if-condition, you need to tab everything below the condition.

            @Jason maybe you can add this in the section where you talk about the problems on parallelization as a hint for windows users.

          • Avatar
            Jason Brownlee October 19, 2019 at 6:53 am #

            Thanks. I really don’t know about windows.

            I’ve not seen a windows box in a long time and I’m impressed people use them for software development.

  32. Avatar
    Edward May 21, 2017 at 3:17 am #

    Hello Jason!
    I do the first step – try to tune Batch Size and Number of Epochs and get
    print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
    Best: 0.707031 using {‘epochs’: 100, ‘batch_size’: 40}
    After that I do the same and get
    print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
    Best: 0.688802 using {‘epochs’: 100, ‘batch_size’: 20}
    And so on
    The problem is in the grid_result.best_score_

    I expect that in the second step (for ample tuning optimizer) I will get grid_result.best_score_ better than in the first step (in the second step i use grid_result.best_params_ from the first step). But it is not true
    Tune all Hyperparameters is a very long time

    How to fix it?

    • Avatar
      Jason Brownlee May 21, 2017 at 6:01 am #

      Consider tuning different parameters, like network structure or number of input features.

      • Avatar
        Edward May 21, 2017 at 7:18 pm #

        Thanks a lot Jason!

  33. Avatar
    pattijane May 21, 2017 at 7:44 am #

    Hello,

    I’d like to have your opinion about a problem:

    I have two loss function plots, with SGD and Adamax as optimizer with same learning rate.
    Loss function of SGD looks like the red one, whereas Adamax’s looks like blue one.
    (http://cs231n.github.io/assets/nn3/learningrates.jpeg)

    I have better scores with Adamax on validation data. I’m confused about how to proceed, should I choose Adamax and play with learning rates a little more, or go on with SGD and somehow try to improve performance?

    Thanks!

    • Avatar
      Jason Brownlee May 22, 2017 at 7:49 am #

      Explore both, but focus on the validation score of interest (e.g. accuracy, RMSE, etc.) over loss.

      For example, you can get very low loss and get worse accuracy.

      • Avatar
        pattijane May 22, 2017 at 6:35 pm #

        Thanks for your response! I experimented with different learning rates and found out a reasonable one, (good for both Adamax and SGD) and now I try to fix learning rate and optimizer and focus on other hyperparameters such as batch-size and number of neurons. Or would be better if I set those first?

        • Avatar
          Jason Brownlee May 23, 2017 at 7:49 am #

          Number of neurons will have a big effect along with learning rate.

          Batch size will have a smaller effect and could be optimized last.

  34. Avatar
    Lotem May 23, 2017 at 1:47 am #

    Thanks for this post!

    One question – why not use grid search on all the parameters together, rather than preforming several grid searches and finding each parameter separately? surly the results are not the same…

    • Avatar
      Jason Brownlee May 23, 2017 at 7:54 am #

      Great question,

      In practice, the datasets are large and it can take a long time and require a lot of RAM.

  35. Avatar
    StatsSorceress May 25, 2017 at 6:52 am #

    Hi Jason,

    Excellent post!

    It seems to me that if you use the entire training set during your cross-validation, then your cross-validation error is going to give you an optimistically biased estimate of your validation error. I think this is because when you train the final model on the entire dataset, the validation set you create to estimate test performance comes out of the training set.

    My question is: assuming we have a lot of data, should we use perhaps only 50% of the training data for cross-validation for the hyperparameters, and then use the remaining 50% for fitting the final model (and a portion of that remaining 50% would be used for the validation set)? That way we wouldn’t be using the same data twice. I am assuming in this case that we would also have a separate test set.

    • Avatar
      Jason Brownlee June 2, 2017 at 11:38 am #

      Yes, it is a good idea to hold back a test set when tuning.

  36. Avatar
    Yang May 27, 2017 at 5:35 am #

    Thanks for your valuable post. I learned a lot from it.
    When I wrote my code for grid search, I encountered a question:

    I use fit_generator instead of fit in keras.
    Is it possible to use grid search with fit_generator ?

    I have some Merge layers in my deep learning model.
    Hence, the input of the neural network is not a single matrix.
    For example:
    Suppose we have 1,000 samples
    Input = [Input1,Input2]
    Input1 is a 1,000 *3 matrix
    Input2 is a 1,000*3*50*50 matrix (image)

    When I use the fit in your post, there is a bug….because the input1 and input2 don’t have the same dimension. So I wonder whether the fit_generator can work with grid search ?

    Thanks in advance!

  37. Avatar
    Yang May 27, 2017 at 6:46 am #

    Please ignore my previous reply.
    I find an answer here: https://github.com/fchollet/keras/issues/6451
    Right now, the GridsearchCV using the scikit wrapper for network with multiple inputs is not available.

  38. Avatar
    Kate liu May 28, 2017 at 4:31 pm #

    Hi Jason, thank you for your good tutorial of the grid research with Keras. I followed your example with my own dataset. It could be run. But when I using the autoencoder structure, instead of the sequential structure, to gird the parameters with my own data. It could not be run. I don’t know the reason. Could you help me? Are there any differences between the gird of sequential structure and the grid of model structure?

    The follows are my codes:

    from keras.models import Sequential
    from keras.layers import Dense, Input
    from keras.wrappers.scikit_learn import KerasClassifier
    from sklearn.model_selection import StratifiedKFold
    from sklearn.model_selection import cross_val_score
    from sklearn.model_selection import GridSearchCV
    import numpy as np
    from keras.optimizers import SGD, Adam, RMSprop, Adagrad
    from keras.regularizers import l1,l2
    from keras.models import Model
    import pandas as pd
    from keras.models import load_model

    np.random.seed(2017)

    def create_model(optimizer=’rmsprop’):

    # encoder layers
    encoding_dim =140
    input_img = Input(shape=(6,))
    encoded = Dense(300, activation=’relu’,W_regularizer=l1(0.01))(input_img)
    encoded = Dense(300, activation=’relu’,W_regularizer=l1(0.01))(encoded)
    encoded = Dense(300, activation=’relu’,W_regularizer=l1(0.01))(encoded)
    encoder_output = Dense(encoding_dim, activation=’relu’,W_regularizer=l1(0.01))(encoded)

    # decoder layers
    decoded = Dense(300, activation=’relu’,W_regularizer=l1(0.01))(encoder_output)
    decoded = Dense(300, activation=’relu’,W_regularizer=l1(0.01))(decoded)
    decoded = Dense(300, activation=’relu’,W_regularizer=l1(0.01))(decoded)
    decoded = Dense(6, activation=’relu’,W_regularizer=l1(0.01))(decoded)

    # construct the autoencoder model
    autoencoder = Model(input_img, decoded)

    # construct the encoder model for plotting
    encoder = Model(input_img, encoder_output)

    # Compile model
    autoencoder.compile(optimizer=’RMSprop’, loss=’mean_squared_error’,metrics=[‘accuracy’])

    return autoencoder

    • Avatar
      Jason Brownlee June 2, 2017 at 12:09 pm #

      I’m surprised, I would not think the network architecture would make a difference.

      Sorry, I have no good suggestions other than try to debug the cause of the fault.

  39. Avatar
    Kate liu May 28, 2017 at 4:36 pm #

    the command of autoencoder.compile is modified as the follows:
    # Compile model
    autoencoder.compile(optimizer=optimizer, loss=’mean_squared_error’,metrics=[‘accuracy’])

  40. Avatar
    Rahul May 30, 2017 at 12:07 am #

    Can we do this for functional API as well ?

  41. Avatar
    Ian Worthington May 30, 2017 at 10:36 pm #

    Thanks for a great tutorial Jason, appreciated.

    njobs=-1 didn’t work very well on my Windows 10 machine: took a very long time and never finished.

    https://stackoverflow.com/questions/28005307/gridsearchcv-no-reporting-on-high-verbosity seems to suggest this is (or at least was in 2015) a known problem under Windows so I changed to n_jobs=1, which also allowed me to see throughput using verbose=10.

  42. Avatar
    Ian Worthington May 31, 2017 at 1:56 am #

    Jason —

    Given all the parameters it is possible to adjust, is there any recommendation for which should be fixed first before exploring others, or can ALL results for one change when others are changed?

  43. Avatar
    Mario June 9, 2017 at 12:10 am #

    Hi and thank you for the resource.

    Am I right in my understanding that this only works on one machine?

    Any hints / pointers on how to run this on a cluster? I have found https://goo.gl/Q9Xy7B as a potential avenue using Spark (no Keras though).

    Any comment at all? Information on the subject is scarce.

    • Avatar
      Jason Brownlee June 9, 2017 at 6:26 am #

      Yes, this example is for a single machine. Sorry, I do not have examples for running on a cluster.

  44. Avatar
    Shaun June 16, 2017 at 11:54 pm #

    Hi Jason,

    I’m a little bit confused about the definition of the “score” or “accuracy”. How are they made? I believe that they are not simply comparing the results with target, otherwise it will be the overfitting model being the best (like the more neurons the better).

    But on the other hand, they are just using those combinations of parameters to train the model, so what is the difference between I manually set the parameters and see my result good or not, with risk of overfitting and the grid search that creates an accuracy score to determine which one is the best?

    Best regards,

    • Avatar
      Jason Brownlee June 17, 2017 at 7:30 am #

      The grid search will provide an estimate of the skill of the model with a set of parameters.

      Any one configuration in the grid search can be set and evaluated manually.

      Neural networks are stochastic and will give different predictions/skill when trained on the same data.

      Ideally, if you have the time/compute the grid search should use repeated k-fold cross validation to provide robust estimates of model skill. More here:
      https://machinelearningmastery.com/evaluate-skill-deep-learning-models/

      Does that help?

      • Avatar
        Shaun June 20, 2017 at 2:30 am #

        I’m new to the NN, a little bit puzzled. So say, if I have to many neurons that leads to overfitting (good on the train set, bad on the validation or test set), can grid search detect it by the score?

        My guess is yes, because there is a validation set in the GridsearchCV. Is that correct?

        • Avatar
          Jason Brownlee June 20, 2017 at 6:39 am #

          A larget network can overfit.

          The idea is to find a config that does well on the train and validation sets. We require a robust test harness. With enough resources, I’d recommend repeated k-fold cross validation within the grid search.

  45. Avatar
    Huyen June 19, 2017 at 4:21 pm #

    One more very useful tutorial, thank Jason.

    One question about GridSearch in my case. I have tried to tune parameters of my neural network for regression with 18 inputs size 800 but the time to use GridSearch totally long, like forever even though I have limited to the number. I saw in your code:

    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)

    Normally, n_jobs=1, can I increase that number to improve the performances?

    • Avatar
      Jason Brownlee June 20, 2017 at 6:36 am #

      We often cannot grid search with neural nets because it takes so long!

      Consider running on a large computer in the cloud over the weekend.

  46. Avatar
    Bobo June 21, 2017 at 4:57 am #

    Hi Jason,

    Any idea how to use GridSearchCV if you don’t want cross validation?

  47. Avatar
    makis June 28, 2017 at 11:54 pm #

    Hello. Thank you for the nice tutorial.

    I am trying to combine pipeline and gridsearch.

    Inside my keras model i use kernel_initializer=init_mode.
    Then I am trying to assign values to the init_mode dictionary in order to perform the gridsearch.

    I get the following error: ValueError: init_mode is not a legal parameter

    My code is here: https://www.dropbox.com/s/57n777j9w8bxf4t/keras_grid.py?dl=0

    Any tip? Thank you

  48. Avatar
    Abhijith Darshan Ravindra July 11, 2017 at 6:31 am #

    Hi Dr. Brownlee,

    When I run this in Spyder IDE nothing happens after grid.fit.

    It just appears to do nothing.

    Any suggestions as to why?

    • Avatar
      Jason Brownlee July 11, 2017 at 10:34 am #

      Consider running from the command line.

      The grid search may take a long time.

      • Avatar
        DY July 14, 2017 at 6:11 am #

        Hello Dr Brownlee,

        I saved your example codes into .py file and run it. Nothing happens after grid.fit. However, if I run line by line from your example codes it works. Do you know why?

        • Avatar
          Jason Brownlee July 14, 2017 at 8:36 am #

          It may take a long time. Consider reducing the scope of the search to see if you can get results sooner.

    • Avatar
      Tryfon September 18, 2017 at 11:46 pm #

      I had the same issue with you (using spyder and python 3.6) but after changing the parameter n_jobs = 1 it worked fine. Also n_jobs = 2 was stuck although spyder showed it was running in the backgound (I checked the CPU usage and was down to 1% vs the 55-80% when it is actually running).

      Don’t ask the reason why is that. My guess would be that it has to do with your system and the fact that it might not support parallelization (no CUDA GPU).

      • Avatar
        Jason Brownlee September 19, 2017 at 7:47 am #

        Consider running the example from the command line instead.

  49. Avatar
    Kamal Thapa July 27, 2017 at 3:46 pm #

    How can I do Hyper-parameter optimization for MLPRegressor in scikit learn?

  50. Avatar
    Josep August 3, 2017 at 2:31 am #

    Hi Jason,
    I’m unable to apply the grid search to a seq to seq LSTM network (Keras Regressor model in the scikit API). When I set the GridSearchCV scoring algorithm to r^2 (or any scoring function for regression problems) the model.fit expect a 2 dim input vector, not the 3 dim used in Keras.
    Otherwise, if I left the default scoring algorithm named “_passthrough_scorer”( I don’t know what it does, I don’t even know what it is) it works but the best_score doesn’t match with the real best parametrization. I’m really confused…I’ll had to write the grid search manually…

    • Avatar
      Josep August 3, 2017 at 2:42 am #

      I’ve solved it, I share it if someone have the same issue…,If you set the gridsearch scoring function to “None” it uses the scoring metrics of the Keras model.

      • Avatar
        Josep August 3, 2017 at 2:49 am #

        Sorry for bothering, but the results of the approach I’ve said are incorrect. I don’t know what to do.

    • Avatar
      Jason Brownlee August 3, 2017 at 6:54 am #

      Hi Josep,

      Consider writing your own for loop to iterate over params and run a Cross Validation for the params within the loop.

      This is how I do it now for large/complex models.

  51. Avatar
    kotb August 8, 2017 at 7:10 pm #

    Can i use this grid search without using keras model

  52. Avatar
    Aman Garg August 19, 2017 at 3:35 am #

    Hello Jason,

    Thanks for such a nice tutorial.

    Instead of getting a output as ‘Best: 0.720052 using {‘init_mode’: ‘uniform’}’ , it would be really nice if you could show us how to visualize this result with matplotlib so that it gets more easier.

  53. Avatar
    Michael August 20, 2017 at 4:42 am #

    Hi, Jason. Thanks, again, for all of the blog posts and example code. I’m trying to tune my binary classification Keras neural network. My dataset includes about 50,000 entries with 52 (numeric) variables. Using Grid Search, I’ve tested all sorts of combinations of layer size, number of epochs, batch size, optimizers, activations, learning rates, dropout rates, and L2 regularization parameters. My grid search shows every combination performs the same. For example, here is a snippet from my latest results:

    Best: 0.876381 using {‘act’: ‘relu’, ‘opt’: ‘Adam’}
    0.876381 (0.003878) with: {‘act’: ‘relu’, ‘opt’: ‘Adam’}
    0.876381 (0.003878) with: {‘act’: ‘relu’, ‘opt’: ‘SGD’}
    0.876381 (0.003878) with: {‘act’: ‘relu’, ‘opt’: ‘Adagrad’}
    0.876381 (0.003878) with: {‘act’: ‘relu’, ‘opt’: ‘Adadelta’}
    0.876361 (0.003880) with: {‘act’: ‘tanh’, ‘opt’: ‘Adam’}
    0.876381 (0.003878) with: {‘act’: ‘tanh’, ‘opt’: ‘SGD’}

    But I also get 0.876381 whether I have 1000 nodes or 1 node, and for every other combo I’ve tested. I’ve also tried different ways of scaling or transforming my input data with no impact.

    Do you have any thoughts on why I’m having trouble finding different combinations of parameters that actually have a difference in performance?

    Thank you for your help! You rock!

  54. Avatar
    Shubham Kumar September 3, 2017 at 11:54 am #

    Hey Jason.
    I was using grid search to tune hyperparameters for a CNN-LSTM classification problem.
    I used the code template on your blog about sequence classification.
    MY original data has 38932 instances, but for tuning I am using only 1000 to save time.
    But even then, I am not sure how to best search for those parameters and save time.

    Is it a bad idea to search for hyper parameters in a small subset (almost 1/40th of training in my case).
    Will the result vary largely when I use actual data size?
    Also, I passed in several parameters for the grid search. Left it overnight and it still hadn’t made enough progress, so I stopped the execution.
    How can I speed up this process?

    • Avatar
      Jason Brownlee September 3, 2017 at 3:44 pm #

      The result will be biased, but perhaps might give you an idea of the direction in which to proceed – this could be enough for you.

      I often run a lot of sanity check grid searches on small samples to get ideas on which direction to push.

      More data will result in less biased estimates of model skill, often proportionately to some point of diminishing returns.

      • Avatar
        Shubham Kumar September 4, 2017 at 3:10 am #

        Great !
        I did read that one of the sanity checks is to check whether the model overfits on a small sample! If yes, then we are good to go…
        I am slightly new to building proper models and find this part exciting but a little intimidating at the same time !
        I am going to use only a few hyper parameters at a time, and keep the rest constant and check what happens !

        Love your posts ! They are amazingly helpful .
        Does the Python LSTM book have code snippets in Python 3 as well?
        Coz it becomes a little difficult to search for the right modules and attributes otherwise :/

        • Avatar
          Jason Brownlee September 4, 2017 at 4:39 am #

          THanks.

          Yes, the code in my LSTM book was tested with Python 2.7 and Python 3.5.

  55. Avatar
    Kaushal Shetty September 8, 2017 at 12:24 am #

    Hi Jason, Is this a valid approach to decide the number of layers?
    def neural_train(layer1 = 1,layer2 = 1,layer3 = 1,layers = 1):

    input_tensor = Input(shape=(2001,))
    x = Dense(units = layer1,activation=’relu’)(input_tensor)
    if layers == 2:
    x = Dense(layer2,activation = ‘relu’)(x)
    if layers ==3 :
    x = Dense(layer2,activation = ‘relu’)(x)
    x = Dense(layer3,activation = ‘relu’)(x)

    output_tensor = Dense(10,activation=’softmax’)(x)
    model = Model(input_tensor,output_tensor)
    model.compile(optimizer = ‘rmsprop’,loss=’categorical_crossentropy’,metrics = [‘accuracy’])
    return model

    layer1 = [1024,512]
    layer2 = [256,100]
    layer3 = [60,40]
    epochs = [10,11]
    layers = [2,3]
    param_grid = dict(epochs = epochs,layer1 = layer1,layer2 = layer2,layer3 = layer3,layers=layers)
    model = KerasClassifier(build_fn = neural_train)
    gsv_model = GridSearchCV(model,param_grid=param_grid)
    gsv_model.fit(x_train,y_train)

  56. Avatar
    ari September 9, 2017 at 1:29 am #

    Very helpful post Jason. Thanks for this. Are there any advantages for using gridsearch over something like hyperas/hyperopt ? To your best knowledge is one faster than the other?

    • Avatar
      Jason Brownlee September 9, 2017 at 11:58 am #

      Depends on your data and model. Use the took that you prefer.

  57. Avatar
    Shubham Kumar September 10, 2017 at 4:38 am #

    {‘split0_test_score’: array([ 0.6641791, 0.6641791, 0.6641791, 0.6641791]), ‘split1_test_score’: array([ 0.65413534, 0.65413534, 0.65413534, 0.65413534]), ‘split2_test_score’: array([ 0.69924811, 0.69924811, 0.69924811, 0.69924811]), ‘mean_test_score’: array([ 0.6725, 0.6725, 0.6725, 0.6725]), ‘std_test_score’: array([ 0.01931902, 0.01931902, 0.01931902, 0.01931902]), ‘rank_test_score’: array([1, 1, 1, 1]), ‘split0_train_score’: array([ 0.67669174, 0.67669174, 0.67669174, 0.67669174]), ‘split1_train_score’: array([ 0.68164794, 0.68164794, 0.68164794, 0.68164794]), ‘split2_train_score’: array([ 0.65917602, 0.65917602, 0.65917602, 0.65917602]), ‘mean_train_score’: array([ 0.67250523, 0.67250523, 0.67250523, 0.67250523]), ‘std_train_score’: array([ 0.00963991, 0.00963991, 0.00963991, 0.00963991]), ‘mean_fit_time’: array([ 36.72573058, 37.0244147 , 38.12670692, 40.71116368]), ‘std_fit_time’: array([ 0.4829061 , 0.35207924, 0.13746276, 2.71443639]), ‘mean_score_time’: array([ 1.49508754, 1.76741695, 2.14029002, 2.67426189]), ‘std_score_time’: array([ 0.04907801, 0.11919153, 0.07953362, 0.13931651]), ‘param_dropout’: masked_array(data = [0.2 0.5 0.6 0.7],
    mask = [False False False False],
    fill_value = ?)
    , ‘params’: ({‘dropout’: 0.2}, {‘dropout’: 0.5}, {‘dropout’: 0.6}, {‘dropout’: 0.7})}

    Hey. I was hypertuning a model on 4 different choices of hyper parameters. However, in the grid_results_ dictionary, the rank_test_score key has array with all same values. I find that confusing. Shouldn’t it have 4 different values in each place?
    Something like [1,3,2,4] ?
    What could be the explanation for this?

    • Avatar
      Shubham Kumar September 10, 2017 at 4:50 am #

      It must have something to do with all mean_test_scores being the same ,

    • Avatar
      Jason Brownlee September 11, 2017 at 12:03 pm #

      If you are testing 4 different values for one parameter, then you must build 4 models/complete 4 runs.

      Does that help?

      • Avatar
        Shubham Kumar September 13, 2017 at 5:20 am #

        I am sorry. That’s confusing. 4 models or complete 4 runs means ?

        Things are different if we are gridsearching/randomsearching just for one hyperparameter?

        Does it have something to do with the actual code used to write TensorFlow /keras ?

        • Avatar
          Jason Brownlee September 13, 2017 at 12:36 pm #

          If you have one parameter and you want to test 4 values, each value needs one run. Ideally, we would run many times for each parameter value and take the average skill score given the stochastic nature of ML algorithms.

          For a random search, you run for as long as you like.

          Does that help?

  58. Avatar
    Shubham Kumar September 13, 2017 at 11:17 pm #

    What I understand is that when we have more than 1 (say 2) hyper-parameters in a grid, then for each combination, the code will complete as many epochs as I have specified, with as many training-cross-validation sets as specified (the CV in GridSearchCV). So, going through all those epochs, for each training-cross-validation set, we get the avg accuracy over all the cross-validation sets for every combination.

    So when you say 1 run only in the case of a single hyperparameter, that means only 1 training-crossvalidation set? Because only in this case, there won’t be any averaging involved.

    Is that what I have to do? Change the training-crossValidation set to just 1?

  59. Avatar
    Rishi September 18, 2017 at 5:18 am #

    Jason,
    would you please post an example of inheriting from KerasClassifier (or KerasRegressor) to create your own class? I’m attempting to do this and it works for the most part:

    class MLP_Regressor(KerasRegressor):

    def __init__(self, **sk_params):
    super().__init__(build_fn=None, **sk_params)

    def __call__(self, optimizer=’adam’, loss=’mean_squared_error’, **kwargs):
    # more code goes here (that was previously in ‘build_fn’

    I can include this in a pipeline and it runs perfectly:
    MLP Pipeline(memory=None,
    steps=[(‘MLP’, )])

    Only thing is: The Keras documentation includes the ‘build_fn’ keyword argument:

    keras.wrappers.scikit_learn.KerasClassifier(build_fn=None, **sk_params)

    While the actual KerasClassifier class definition shows the following in its __init__ method:

    def __init__(self, model, optimizer=’adam’, loss=’categorical_crossentropy’, **kwargs):
    super(KerasClassifier, self).__init__(model, optimizer, loss, **kwargs)

    I’m not sure if my __init__ in MLP_Regressor has been setup correctly (to avoid hidden bugs in the future).

    Would greatly appreciate it! (I’ve searched, but couldn’t find a single example of KerasClassifier inheritance).

    • Avatar
      Jason Brownlee September 18, 2017 at 5:49 am #

      Thanks for the suggestion, I have not done this but perhaps in the future.

      • Avatar
        Rishi November 21, 2017 at 12:54 pm #

        Jason, managed to get the inherited class working perfectly now:

        class MLP_Classifier(KerasClassifier):

        def __init__(self, build_fn=None, **sk_params):
        self.sk_params = sk_params
        super().__init__(build_fn=None, **sk_params)

        def __call__(self, callbacks=None, layer_sizes=None,activations=None,input_dim=0,init=’normal’,optimizer=’adam’, metrics=’accuracy’, loss=’binary_crossentropy’, use_dropout_input=False, use_dropout_hidden=False):
        “””
        Constructs, compiles and return a Keras model
        Implements the “build_fn” function

        Returns a “Sequential” model
        “””
        # Code to build a model (that would typically go in “build_fn”) goes here.
        return model

  60. Avatar
    Tmn September 20, 2017 at 2:45 am #

    Hi Jason,

    I can not thank you enough. I am sure that there are many people like me who have learnt a lost from your tutorial on both “R” and “Python”. I have been following your tutorial for more than 3 year now. Before I was using R however, recently I moved to python for Deep learning. And I find your tutorial as usual, exceptional. I think Andrew Ng and CS231n (andrej karpathy), theoretical course and your programming course on deep learning is one of the best in the world. You rock! Thanks a lot.

    I do have a question 🙂 as well.
    The grid search parameter tuning works perfectly with CPU. I agree with your suggestion not to tune everything at once. Now I moved to GPU implementation. I was able to execute the code if I chose options n_job=1. However, if I do multi-threading n_job=-1. I am getting “CUDA_ERROR_OUT_OF_MEMORY”. I have GeForce GTX 1080. Did you happen to encounter similar kind of error? I will post you the error log if needed.

    Once, again thank you.

    • Avatar
      Jason Brownlee September 20, 2017 at 6:00 am #

      Thanks for all of your support!

      Yes, I have the same and I would recommend using a “single thread” and let the GPU do its thing for a given single run.

      In general, I’d recommend contrasting different approaches to grid searching (cpu/gpu) and use the approach that is overall faster for your specific tests.

      • Avatar
        Tmn September 20, 2017 at 11:33 pm #

        Hi Jason,
        Thank you for the response. The parameter search using CPU (n_job=-1) is (2.961489-4.977758) while using GPU (n_job=1) is (140.101048-142.151023) second.

        One more thing, after grid search I have value for parameters {batch_size, activation, neurons, learn_rate..} and accuracy around 90%. However, I wonder why reusing these model parameter does not provide the same results, now accuracy is 52%. Even though I executed it many times with same parameter the accuracy remains the same (52%). I could not achieve the accuracy as shown in grid search using best model parameter. I am doing 5-fold CV I do not expect the accuracy to be the same since it is stochastic process but it should be around SD±5%. What do you think? Did you also happen to encounter the same thing ?

        Also the best parameter values changes in each executions with an accuracy SD±5%.

        Thanks

        P.S:
        Below code is something I am doing to limit GPU memory usage and run multiple grid search. However, we should know the memory usage in advance (cs231n.github.io/convolutional-networks/#case). Let me know if it makes sense.

        Also, we can use n-job. I tried with n_job = 2 however the GPU memory is allocated based on fraction. I am searching how to allocated memory based on MB. I will do more research on this “CUDA_ERROR_OUT_OF_MEMORY” and update you.

        import tensorflow as tf
        from keras.backend.tensorflow_backend import set_session
        config = tf.ConfigProto()
        config.gpu_options.per_process_gpu_memory_fraction = 0.3
        set_session(tf.Session(config=config))

        Thanks!

        • Avatar
          Jason Brownlee September 21, 2017 at 5:42 am #

          The results for the standalone model should fit into the distribution of the grid search results – if you repeated each grid search result many times, e.g. 10-30. See this post on evaluating model skill of neural networks:
          https://machinelearningmastery.com/evaluate-skill-deep-learning-models/

          Nice, sorry, I cannot give you good advice on grid searching with the GPU, it is not something I do generally. I am more likely to run instances serially or across AWS instances.

          • Avatar
            TMN October 6, 2017 at 2:12 am #

            Hi Jason,

            Could you please help on how to do features normalization while doing the grid search and cross-validation. Is normalization is done automatically here, GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=15,cv=rkf)? If I normalize the features during training X = scaler.transform(X_train), this will introduce bias in cross-validation. Also, if possible, can you please provide me references on using scikit-learn wrapper with Keras for advance options, is their any limitation on wrapper ?
            Thanks

            Without normalization:
            Best: 0.535211 using {‘learn_rate’: 0.01, ‘dropout_rate’: 25, ‘batch_size’: 40, ‘neurons’: 200, ‘init_mode’: ‘lecun_uniform’, ‘optimizer’: ‘SGD’, ‘activation’: ‘relu’, ‘epochs’: 1000}

            With normalization:
            Best: 0.695775 using {‘optimizer’: ‘SGD’, ‘batch_size’: 132, ‘init_mode’: ‘lecun_uniform’, ‘epochs’: 1000, ‘learn_rate’: 0.01, ‘dropout_rate’: 25, ‘neurons’: 200, ‘activation’: ‘relu’}

          • Avatar
            Jason Brownlee October 6, 2017 at 5:37 am #

            Perhaps you can normalize your data prior to the grid search?

          • Avatar
            TMN October 6, 2017 at 10:59 am #

            I normalize my data prior to grid search using X = scaler.transform(X_train) but dont you think it would introduce bias in the performance. Normally, I expect to normalize train set and use that normalization factor to normalize test or validation set before prediction. May be I did not understand you properly, how do you do normalization prior to grid search?

            Thanks

          • Avatar
            Jason Brownlee October 6, 2017 at 11:07 am #

            Yes, it’s a struggle or trade-off.

            Perhaps you can see if a Pipeline will work in the grid search, it may, but I expect it will error.

            Perhaps the bias is minor and you can ignore it.

            Perhaps you can implement your own grid search loop to only use training data to calculate data scaling coefficients.

          • Avatar
            TMN October 6, 2017 at 6:44 pm #

            I started looking at the pipeline (http://scikit-learn.org/stable/modules/pipeline.html) on how they have been using it for SVM, lets see. I would expect the pipeline to work for Keras as well, as this is a classical problem in machine learning. Why do you expect error here? I wanted to take the full advantage from automatic grid search. Well, the final option will be to implement my own grid search.

            The bias is really significant in 5-repeated 10-fold CV. Thanks

            Without normalization:
            Best: 0.535211 using {‘learn_rate’: 0.01, ‘dropout_rate’: 25, ‘batch_size’: 40, ‘neurons’: 200, ‘init_mode’: ‘lecun_uniform’, ‘optimizer’: ‘SGD’, ‘activation’: ‘relu’, ‘epochs’: 1000}

            With normalization:
            Best: 0.695775 using {‘optimizer’: ‘SGD’, ‘batch_size’: 132, ‘init_mode’: ‘lecun_uniform’, ‘epochs’: 1000, ‘learn_rate’: 0.01, ‘dropout_rate’: 25, ‘neurons’: 200, ‘activation’: ‘relu’}

          • Avatar
            Jason Brownlee October 7, 2017 at 5:51 am #

            If it works, that is great. I have seen cases where when grid search + keras gets fancy it causes errors.

            I have a tutorial on Pipeline here that might help:
            https://machinelearningmastery.com/automate-machine-learning-workflows-pipelines-python-scikit-learn/

  61. Avatar
    HWU September 22, 2017 at 6:52 am #

    This is such a great, thorough tutorial. Thanks for keeping your tutorials up to date! It’s so nice finding a resource with examples that you know will work because they’ve been tested on recent versions of required packages.

  62. Avatar
    Marjan September 29, 2017 at 1:08 pm #

    Thank you for your great tutorial. I tried to use it for my model with multiple inputs. but It didn`t work. I found that the scikit-learn wrapper does not work for multiple inputs. it gives me an error for grid.fit([input1,input2],y)
    Do you have any suggestion to handle it?
    Thanks,

    • Avatar
      Jason Brownlee September 30, 2017 at 7:34 am #

      Sorry I do not. Perhaps run the grid search manually (e.g. your own for loop)?

  63. Avatar
    Buz Fifer October 5, 2017 at 7:06 am #

    When I run your code to tune the dropout_rate, I get the following error:
    ValueError: dropout_rate is not a legal parameter

    In fact, I get this error for all labels except epochs and batch_size. Both of these were recognized and ran fine. I could not find a reference to valid labels anywhere, even in API docs. Any suggestions?

    • Avatar
      Jason Brownlee October 5, 2017 at 5:16 pm #

      What do you mean by valid labels exactly?

      • Avatar
        Buz Fifer October 6, 2017 at 3:02 am #

        Sorry, I should have included the code in the first place. I have added comments in the code to show exactly what I tried for each parameter.

        # ———— Define Keras Classifier Wrapper
        model1 = KerasClassifier(build_fn=kerasModel1, epochs=5, batch_size=10, verbose=0)

        # ———– define the grid search parameters
        mybatchs = [10, 20, 128]
        myepochs = [5, 10, 20, 50, 60, 80, 100]
        mylearn = [0.001, 0.002, 0.0025, 0.003]
        myopts = [‘Adam’, ‘Nadam’, ‘RMSprop’]
        myinits = [‘uniform’, ‘normal’, ‘lecun_uniform’, ‘lecun_normal’, ‘glorot_uniform’, ‘glorot_normal’]
        mydrop = [0.10, 0.20, 0.30, 0.35, 0.40, 0.50, 0.60, 0.70, 0.80]

        # ————- Not Recognized
        #param_grid = dict(optimizer=myopts)
        #param_grid = dict(learn_rate=mylearn)
        #param_grid = dict(learning_rate=mylearn)
        #param_grid = dict(init=myinits)
        #param_grid = dict(init_mode=myinits)
        #param_grid = dict(dropout_rate=mydrop)

        # ———— Recognized
        #param_grid = dict(epochs=myepochs) # —– OK
        #param_grid = dict(batch_size=mybatchs) # —– OK

        I removed comment # and ran each one separately. For example, running the first param_grid values resulted in: Error – optimizer is not a valid parameter. They all got the same rejection notice except for epochs and batch_size.
        I hope that helps.

  64. Avatar
    Buz Fifer October 6, 2017 at 3:09 am #

    Just to be clearer, each parameter had it’s own name in the error message as follows:

    Error – optimizer is not a valid parameter
    Error – learn_rate is not a valid parameter
    Error – learning_rate is not a valid parameter
    Error – init is not a valid parameter
    Error – init_mode is not a valid parameter
    Error – dropout_rate is not a valid parameter

    • Avatar
      Jason Brownlee October 6, 2017 at 5:39 am #

      That is odd, I don’t have any good ideas, other than continue to debug and try different variations to see if you can expose the cause of the issue.

      Double check all of your python libraries are up to date.

  65. Avatar
    ritika October 6, 2017 at 11:49 pm #

    Hi Jason, Very nice tutorial..very well explained

  66. Avatar
    TC October 17, 2017 at 10:27 am #

    Hi Jason thanks for the great post.

    Let’s say I’m using 5 fold CV on a relatively small dataset (not necessarily for a deep learning model). In this case, the variance of the performance metric might be quite high, and just by chance, a point on the grid that is in reality far from optimal, might be selected as the “best”.

    So are there any approaches to smooth out the response surface of the grid search, to deal with “spikes” in performance due to variance?

    • Avatar
      Jason Brownlee October 17, 2017 at 4:05 pm #

      Wonderful question.

      Yes, we can approach this problem by increasing the number of repeats (not folds) of each param combination.

      • Avatar
        TC October 20, 2017 at 8:40 am #

        Hi Jason, by “number of repeats” do you mean to just repeat the process many times, with perhaps a different random seed?

  67. Avatar
    Lea October 20, 2017 at 10:09 pm #

    Thank you for this great tutorial! I tried to adapt the code for a CNN, but I am running constantly in the same error. May anyone help?

    That is the code:

    def create_model(nb_filters=3, nb_conv=2, pool=20):
    model = Sequential()
    model.add(Convolution1D(nb_filters, nb_conv, activation=’relu’,
    input_shape=(X.shape[1], X.shape[2]), padding=”same”))
    model.add(MaxPooling1D(pool))
    model.add(Flatten())
    model.add(Dense(1, activation=’sigmoid’))
    model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
    model.summary()
    return model

    model = KerasClassifier(build_fn=create_model(), verbose=0)

    nb_conv = [2, 4, 6, 8, 10]
    pool= [10, 20, 30, 50]
    param_grid = dict(nb_conv=nb_conv, epochs=pool)
    grid = GridSearchCV(estimator=model, param_grid=param_grid)
    grid_result = grid.fit(X, y)

    And the error I am getting is “nb_conv is not a legal parameter”. Unfortunately, I do not understand why.

  68. Avatar
    went October 22, 2017 at 2:55 am #

    Hi Jason,

    Great post and Thank you.

    What do you think is the best sequence when tuning all those Hyperparameters? I think difference sequence will lead to difference final Hyperparameters..

  69. Avatar
    Bgie October 23, 2017 at 6:28 am #

    Hi Jason,

    What a great blog, I very much appreciate you sharing some of your expertise!

    I want to grid search the hyperparams from my CNN, but I’m using data augmentation with ImageDataGenerator. So I’m not calling model.fit but model.fit_generator for the actual training.
    This does not seem to be supported through the grid search..
    Am I forced to write my own KerasClassifier implementation?

    Would you advise to just fall back to using (nested) for loops instead, or would I be missing some ‘magic’ from the existing scikit gridsearch?

    • Avatar
      Jason Brownlee October 23, 2017 at 4:10 pm #

      I would recommend writing your own for loops to grid search instead.

  70. Avatar
    Shubham Kumar October 26, 2017 at 3:58 am #

    Hey Jason!

    Needed help with model improvement!
    Can you help me in understanding how to realize whether your model is suffering from
    bad local minima, vanishing/exploding gradient problem?

    • Avatar
      Jason Brownlee October 26, 2017 at 5:34 am #

      If you have exploding or vanishing gradients, then you will have NaN outputs.

      This post will give you ideas on how to lift skill:
      https://machinelearningmastery.com/improve-deep-learning-performance/

      This post will give you advice on how to effectively evaluate your model:
      https://machinelearningmastery.com/evaluate-skill-deep-learning-models/

      • Avatar
        Shubham Kumar November 6, 2017 at 7:05 am #

        NaN outputs as in my predictions ?
        Or the weights ?
        If exploding gradient then weight will be very large (probably NaN) hence output would also be NaN.
        But how will this logic be used for vanishing gradients. I this case the weights basically stop changing r8?

        • Avatar
          Shubham Kumar November 6, 2017 at 7:07 am #

          Should I use some kind of code that checks by how much the weights at each layer are changing…and if after a certain threshold they haven’t changed by a certain amount, I’ll declare vanishing gradient !

        • Avatar
          Jason Brownlee November 7, 2017 at 9:44 am #

          Try gradient clipping on the optimization algorithm.

  71. Avatar
    Mustafa Murat ARAT October 28, 2017 at 12:45 am #

    I have a question for you, Jason and for general audience. I tried to find optimal number of neurons for one of the hidden layers. i did loop over my function which contains my deep learning model. It is fast enough for the values I define and I get a result based on accuracy. However, when I use your code, it is extremely slow and never reached to an end. How long does it take on your computer?

    • Avatar
      Jason Brownlee October 28, 2017 at 5:14 am #

      You could try to test fewer parameters or try to search on a smaller dataset?

      • Avatar
        Mustafa Murat ARAT October 29, 2017 at 4:12 am #

        Hey Jason,

        Thank you for your quick reply. I try grid search for number of neurons on Iris data set for the purpose of learning. I scale the data first and then transform and encode the dependent variable. However, first of all, even though I use small data set or fewer parameters, it is slow; second of all, when I get the results, it is all zero. This is very basic example and I am pretty much sure that my code is correct but I guess I am missing out something.

        Best: 0.000000 using {‘neurons’: 3}
        0.000000 (0.000000) with: {‘neurons’: 3}
        0.000000 (0.000000) with: {‘neurons’: 5}

        THE CODE:

        from pandas import read_csv
        import numpy
        from sklearn.preprocessing import LabelEncoder
        from sklearn.preprocessing import StandardScaler
        from keras.wrappers.scikit_learn import KerasClassifier
        from keras.models import Sequential
        from keras.layers import Dense
        from keras.utils import np_utils
        from sklearn.model_selection import GridSearchCV

        dataframe=read_csv(“iris.csv”, header=None)
        dataset=dataframe.values
        X=dataset[:,0:4].astype(float)
        Y=dataset[:,4]

        seed=7
        numpy.random.seed(seed)

        #encode class values as integers
        encoder = LabelEncoder()
        encoder.fit(Y)
        encoded_Y = encoder.transform(Y)
        #one-hot encoding
        dummy_y = np_utils.to_categorical(encoded_Y)

        scaler = StandardScaler()
        X = scaler.fit_transform(X)

        def create_model(n_neurons):
        model = Sequential()
        model.add(Dense(n_neurons, input_dim=X.shape[1], activation=’relu’)) # hidden layer
        model.add(Dense(3, activation=’softmax’)) # output layer
        model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
        return model

        model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=10, initial_epoch=0, verbose=0)
        # define the grid search parameters
        neurons=[3, 5]

        #this does 3-fold classification. One can change k.
        param_grid = dict(n_neurons=neurons)
        grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
        grid_result = grid.fit(X, dummy_y)
        # summarize results
        print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
        means = grid_result.cv_results_[‘mean_test_score’]
        stds = grid_result.cv_results_[‘std_test_score’]
        params = grid_result.cv_results_[‘params’]
        for mean, stdev, param in zip(means, stds, params):
        print(“%f (%f) with: %r” % (mean, stdev, param))

        • Avatar
          Jason Brownlee October 29, 2017 at 5:59 am #

          Sorry, I cannot debug your code/problem for you.

          • Avatar
            Mustafa Murat ARAT October 30, 2017 at 8:29 am #

            I totally understand you. Thank you so much, though. I figured out my mistake. Iris dataset is very well balanced so I need to shuffle the data because GridSearchCV is using 3-Fold Cross Validation.

          • Avatar
            Jason Brownlee October 30, 2017 at 3:49 pm #

            Glad to hear it.

  72. Avatar
    jenny November 8, 2017 at 4:12 am #

    Thanks for sharing such a wonderful tutorial. Learnt many new things.

    How can i save all the models that the grid search is generation with identifiers for each model?

    I am an R user. This how I do it in R to save models with passing parameter values to its names.

    xgb.object <- paste0('/path/xgb_disc20_new_',

    sample.sizes[i], '_', s,'_',nrounds[j],'_',max.depth[k],'_',eta[l], '.RData')

    write.table(cbind(sample.sizes[i], s,nrounds[j],max.depth[k],eta[l],tpr, tnr, acc, roc.area,

    concordance), paste0('/path/xgb_disc20_new_', min.sample.size,'_', max.sample.size,

    '.csv'), append=TRUE, sep=",",row.names=FALSE,col.names=FALSE)

    How can this be achieved in python for keras(neural network) and other models in other libraries?

    • Avatar
      Jason Brownlee November 8, 2017 at 9:29 am #

      I would recommend using grid search to find the parameters for a well performing model then train a new standalone model with those parameters that you can then save.

  73. Avatar
    jenny November 8, 2017 at 9:34 pm #

    thank you jason for your quick reply . I will try that way.

  74. Avatar
    Wassim November 16, 2017 at 11:26 pm #

    Hi Jason,
    Thank you for the great tutorial. I just have an issue when using exactly your code: when I try to parallelize the grid search with n_jobs=-1, I end up with the error “AttributeError: Can’t get attribute ‘create_model’ on ” while it works well without parallelization. Any idea where the issue comes from?
    Thank you,
    Wassim

    • Avatar
      Jason Brownlee November 17, 2017 at 9:25 am #

      I’m not sure, perhaps you cannot parallelize the grid search with Keras models.

  75. Avatar
    Sangwon Chae November 28, 2017 at 9:51 pm #

    Hi Jason,

    The example code calculates the best score for accuracy to obtain the hyperparameter.

    In my problem, I want to find RMSE rather than accuracy because it is regression problem (numerical prediction).

    However, ‘grid_result.cv_resluts_’ only provides ‘fit_time’ and ‘score’, so it can not calculate RMSE.

    What should I do?

    Thank you.

  76. Avatar
    Estelle December 5, 2017 at 7:37 am #

    Hi Jason,

    Thank you for this post.

    Is there anything that prevents me to use Grid Search with train_on_batch() instead of fit()?

    Thank you for letting me know.

    All the best,

    Estelle

    • Avatar
      Jason Brownlee December 5, 2017 at 10:26 am #

      I think the wrapper is quite limited and does not offer this facility via sklearn.

      • Avatar
        Estelle December 6, 2017 at 8:15 am #

        Thanks for your quick answer.

        All the best,

        Estelle

  77. Avatar
    Peter December 8, 2017 at 1:56 pm #

    Thanks very much for the tutorial. It is extremely helpful for my work. I came across a problem with grid search with Keras (tensorflow backend). I want to run the same grid search on different datasets. Everything works fine on the first dataset. But when I fit the grid search to the second dataset, the program got stuck there. I run the grid search with n_jobs=-1 and put keras.backend.clear_session() between two fits. You can replicate this issue by fit to the data twice in your examples. Could you please kindly help me with this issue?

    • Avatar
      Jason Brownlee December 8, 2017 at 2:30 pm #

      I’m sorry to hear that, perhaps change n_jobs to 1?

      • Avatar
        Peter December 8, 2017 at 2:40 pm #

        Thanks for the quick reply. It works when n_jobs=1, but I do need parallel threads for speed.

        • Avatar
          Jason Brownlee December 9, 2017 at 5:34 am #

          The neural network will be using all the cores, so running multiple threads may not offer any benefit.

          • Avatar
            Peter December 11, 2017 at 9:53 am #

            I got it to work by just fitting one dataset in the python script and looping the python script over multiple datasets in a bash script. I am still not clear why second fitting fails in python, but this is a not-so-beautiful workaround.

          • Avatar
            Jason Brownlee December 11, 2017 at 4:52 pm #

            Glad to hear that you made some progress.

  78. Avatar
    Daniel Pamplona December 13, 2017 at 1:37 am #

    Hi Jason

    Thank you so much for sharing your knowledge.
    I am trying to optimize the number of hidden layers.
    I can´t figure it out how to do it with keras (actually I am wondering how to set up the function create_model in order to maximize the number of hidden layers)
    Could you please help me?
    Thank you

    • Avatar
      Jason Brownlee December 13, 2017 at 5:42 am #

      Perhaps the number of layers could be a parameter to your function.

  79. Avatar
    Sean December 15, 2017 at 1:43 am #

    Hi Jason,

    Thanks for this insightful and useful tutorial as always

    No doubt your blog posts are arguably the best in the field of data sciences

    Best wishes

  80. Avatar
    Sean December 16, 2017 at 12:06 am #

    Hello Jason,
    I decided to try the code on a textual data of about 3000 tweets having binary classification (Y) and the text corpus as (X). Started off with tuning the batch size and number of epochs

    but got the following error:

    Here’s the modified code below:

    Thanks

    • Avatar
      Jason Brownlee December 16, 2017 at 5:29 am #

      Sorry to hear that, it’s not clear to me. Perhaps post to stackoverflow to get help debugging your code?

  81. Avatar
    Olivier Blais December 16, 2017 at 3:24 am #

    Hi Jason, first thanks for your articles! Super useful!

    I tried to execute the gripsearch but cam up with parallelism issues. I have a Windows OS and I get this error when I try to run the script on multiple cpus:

    ImportError: [joblib] Attempting to do parallel computing without protecting your import on a system that does not support forking. To use parallel-computing in a script, you must protect your main loop using “if __name__ == ‘__main__'”. Please see the joblib documentation on Parallel for more information.

    Do you know how I should address that?

    Thanks in advance

    • Avatar
      Jason Brownlee December 16, 2017 at 5:35 am #

      Perhaps try setting the number of jobs to 1?

      • Avatar
        Olivier Blais December 28, 2017 at 5:30 am #

        Hi Jason! Yes this works but it is very slow as this is not parallel. Do you understand why it cannot run in parallel and how to fix that?

        Thanks again !
        Olivier

        • Avatar
          Jason Brownlee December 28, 2017 at 2:10 pm #

          The backend is parallelized and the two levels of parallelization are in conflict.

  82. Avatar
    Shabnam December 18, 2017 at 2:21 pm #

    Thanks a lot for such a wonderful post. Overall, there are a lot of parameters that need to be tuned. I was thinking to use RandomizedSearchCV instead of GridSearchCV. Still, it will be time consuming for a lot of simulations. Do you have any suggestion for fast parameter tuning? For example, can we say that specific parameters have more effect on scores, so lets try to Grid/RandomizedSearchCV them first?

  83. Avatar
    Henry December 20, 2017 at 10:51 pm #

    Dear Jason,

    Fantastic post, thank you for this wonderful tutorial.

    I was wondering if it would be more appropriate to tune all the hyperparameters at one go instead of breaking it up into various parts as shown above – you may be doing it for the sake of visibility of how each component is tuned but would it be better to tune everything together since there might be “interactions between the hyperparameters” which would not be captured if they were tuned separately?

  84. Avatar
    Hao January 3, 2018 at 2:26 am #

    Hi Jason,

    Many thanks for a series of excellent posts!

    I have an extremely imbalanced data set to study, of which #negative : #positive is about 100:1. When I built the first model, I performed 10-fold validation and in each validation round, I use oversampling to add positive samples on training data, but not on testing data. Now I question is: if I want to perform hyperparameter search, how do I tell GridSearchCV() to do oversampling for each round of cross-validation?

    Many thanks

    • Avatar
      Jason Brownlee January 3, 2018 at 5:40 am #

      Good question, you might need to use a Pipeline and have data prep happen within it.

  85. Avatar
    Justin Solms January 7, 2018 at 11:24 pm #

    Hello Jason

    A good 2018 to you. I have a question about how Keras early stopping callbacks might be able to use the GridSearchCV k-fold generated validation data set as their val_loss or val_acc. The question I posted on StackOverflow but I wished to call your attention to it – should you so wish.

    https://stackoverflow.com/questions/48127550/how-do-i-implement-early-stopping-with-keras-and-the-sklearn-gridsearchcv-cross

    Kind regards,
    Justin

    • Avatar
      Jason Brownlee January 8, 2018 at 5:43 am #

      I would suggest not combing CV and early stopping.

      • Avatar
        James March 11, 2018 at 6:16 am #

        Could early stopping be used as a substitute for grid searching epoch size?

        • Avatar
          Jason Brownlee March 11, 2018 at 6:30 am #

          Yes, but you might need to code it up yourself. sklearn might blow up.

  86. Avatar
    shwetabh shekhar January 19, 2018 at 12:03 am #

    Hello sir
    if i have large dataset the also we can do this hyperparameter tunning .
    If i have 70 to 80 feature column and about 50000 rows.
    can we apply this tunnig

    • Avatar
      Jason Brownlee January 19, 2018 at 6:31 am #

      Sure, you might need a large computer or to split the work up across many computers.

      Perhaps you can work with a sample of your data.

  87. Avatar
    shwetabh shekhar January 19, 2018 at 1:13 am #

    how to select the hidden layer if i have largedataset mentioned as above

  88. Avatar
    Kafeel Basha January 29, 2018 at 5:48 pm #

    Very good post.

    Hyper Parameter Tuning: How can I do grid search on number of neuron/epochs or batch size using Keras interface in R.

  89. Avatar
    neha February 2, 2018 at 6:34 am #

    Hi,I am facing a basic query where i have training and test set.i built lstm on training and using history = model.fit(trainX, trainY, epochs=100, batch_size=50,
    validation_data=(testX, testY), verbose=0, shuffle=False) to fit my model.
    After this i tried to model.predict(testX) to get predicted Y values.Now that was basic code.i am now trying to apply gridsearch.what variation in the history statement code i have to make to apply grid =
    GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
    grid_result = grid.fit(testX, testY, verbose=0, shuffle=False)

  90. Avatar
    neha February 2, 2018 at 6:50 am #

    can gridsearchcv work for time series as well?

    • Avatar
      Jason Brownlee February 2, 2018 at 8:25 am #

      Not really. You will have to write your own for loops and perform walk forward validation.

  91. Avatar
    Jack February 2, 2018 at 8:01 pm #

    Hi Jason, thank you for your great tutorial! My question here is about ‘grid_result.best_score’. In this article the best score seems to be the best mean score, but in a regression problem, the mean score is irrelevant, so I have to look for the best std score. Is that correct?

    • Avatar
      Jason Brownlee February 3, 2018 at 8:35 am #

      Mean score in regression will be mean error. Not irrelevant.

      • Avatar
        Jack February 3, 2018 at 8:15 pm #

        I see. But when I run the code, the ‘grid_result.best_score’ printed out the biggest score. I don’t think that’s right, cause in a regression problem I should look for the smallest mean error. Am I understanding this right?
        Below are the results:
        Best: 0.062234 using {‘optimizer’: ‘Nadam’}
        0.059561 (0.017101) with: {‘optimizer’: ‘SGD’}
        0.056818 (0.013662) with: {‘optimizer’: ‘RMSprop’}
        0.059617 (0.014734) with: {‘optimizer’: ‘Adagrad’}
        0.061506 (0.014503) with: {‘optimizer’: ‘Adadelta’}
        0.059331 (0.014835) with: {‘optimizer’: ‘Adam’}
        0.057696 (0.014828) with: {‘optimizer’: ‘Adamax’}
        0.062234 (0.010834) with: {‘optimizer’: ‘Nadam’}

  92. Avatar
    Mohamed Abd-Allah February 4, 2018 at 9:09 am #

    very good tutorial, But I have a small question. can I tune all these hyperparameters together or I should take a part of the dataset and tune them separately like the examples you mentioned.

    • Avatar
      Jason Brownlee February 5, 2018 at 7:43 am #

      Ideally, you would tune them all together, but this is often to computationally expensive.

  93. Avatar
    Vidar February 8, 2018 at 7:40 am #

    Is there a way to do similar things in R using the Caret package? Or other package that can help you with hyperparameter grid search when using Keras in R?

    • Avatar
      Jason Brownlee February 8, 2018 at 8:33 am #

      I don’t know if Keras and caret are compatible, sorry.

  94. Avatar
    joseph February 8, 2018 at 3:17 pm #

    hi Jason,

    do i need to split the training data for cross validation, or only perform splitting on the input data.

    • Avatar
      Jason Brownlee February 9, 2018 at 8:59 am #

      Why do you want to split exactly? You goals will help me answer your question.

      • Avatar
        joseph February 9, 2018 at 10:36 am #

        Thanks Jason for the quick reply…i will figure that out.. Just another minor question, is there any way to perform data preprocessing on 3d input (due to the input shape for lstm)

        • Avatar
          Jason Brownlee February 10, 2018 at 8:49 am #

          Sure, but it might be easier (or make more sense) to perform data prep prior to shaping data for the LSTM.

          • Avatar
            joseph February 10, 2018 at 12:12 pm #

            Thanks Jason..i will try that out.. Is it a good idea to tune the hyperparameter using the keras wrapper, then apply those tuned parameters on lstm model? Hope to get some comments on it. Thank you.

          • Avatar
            Jason Brownlee February 11, 2018 at 7:51 am #

            You can. Or you can write your own for loop and tune the model directly.

          • Avatar
            joseph February 12, 2018 at 1:10 pm #

            Thanks a lot Jason.. i will definitely try that one out..

  95. Avatar
    Boris Branson February 15, 2018 at 7:27 am #

    Hi Jason, wonderful post. I love your books – amazing.

    I wish to include callbacks in the Grid Search (one for TensorBoard and one for logging losses on every combination over the params).

    I have something like:

    loggerCB = keras.callbacks.TensorBoard(log_dir=’logs’, histogram_freq=0, write_graph=True)
    class LossHistory(keras.callbacks.Callback):
    def on_train_begin(self, logs={}):
    self.losses = []
    def on_batch_end(self, batch, logs={}):
    self.losses.append(logs.get(‘loss’))

    historyCB = LossHistory()

    grid_search = GridSearchCV(estimator=model,
    param_grid=fit_params,
    scoring=’accuracy’,
    cv=10)
    grid_search = grid_search.fit(X_train, y_train, fit_params={‘callbacks’: [loggerCB, historyCB]})

    BUT I got this error:
    TypeError: Unrecognized keyword arguments: {‘fit_params’: {‘callbacks’: [, ]}}

    How can I pass callbacks using Grid Search?

    Thanks,
    Boris Branson

    • Avatar
      Jason Brownlee February 15, 2018 at 8:52 am #

      ry, I have not used callbacks with a grid search. You might need to write your own for-loops for the search.

  96. Avatar
    Alessandro February 17, 2018 at 4:02 am #

    Hello Jason,
    let me congratulate for the good post.

    I am curious about the use of CV . Each time you call
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
    you are compiling a new keras model with the new set of parameters.

    Are these different models of keras, compiled one after another, accumulating in the memory? Would this imply a memory usage problem in the case of an extensive grid search with bigger models? Any tips?

    Best,
    Alessandro

    • Avatar
      Jason Brownlee February 17, 2018 at 8:50 am #

      Yes, each model is evaluated and discarded.

      For larger models, you could run each fold on a different machine (e.g. run the eval manually).

  97. Avatar
    Boris Branson February 23, 2018 at 9:26 pm #

    Hello Jason,

    I see you have used only SGD in the example of learning rate parameterization. Is it possible to combine different values for lr with different optimizers (not only SGD) in one grid search or i’d need a for loop?

    • Avatar
      Jason Brownlee February 24, 2018 at 9:11 am #

      Yes, but the more parameters you grid search at once, the slower the search.

  98. Avatar
    Priyansh February 25, 2018 at 7:39 pm #

    I Jason your article is super useful, but I am having problem using it for MNIST data set which is a three dimensional data set , When I try so ‘fit’ this one gives me error, Dimension error. Can you do one for MNIST data set. Thanks a lot

  99. Avatar
    TonyWang February 27, 2018 at 10:20 pm #

    Hi Jason, Great tutorial, always learn a lot from your post. I have question, is it possible to combine all the parameters and with gridsearch? Seems more than thousands of combinations. For some models it will cost few days or weeks. Is there any better solution for this? randomgridsearch or something else? Thanks again!

    • Avatar
      Jason Brownlee February 28, 2018 at 6:04 am #

      Yes, but as you say, you will need a lot of time or a lot of parallel compute resources to get a result.

      Random search is often preferred because you can uniformly sample the domain and get good enough results quickly.

      • Avatar
        TonyWang March 1, 2018 at 3:18 am #

        Thanks for your reply. Googled a lot but didn’t find any method to search optimizers and their params, say different optimizer, adam and it’s learning rates. Is there any suggestions? Thanks!

        • Avatar
          Jason Brownlee March 1, 2018 at 6:16 am #

          Yes, just start searching for viable params on your model/data. No need to find confirmation.

  100. Avatar
    Johnson Muthii March 2, 2018 at 8:56 am #

    Hello Jason,

    Thanks for this awesome tutorial. Am very fresh in machine learning and your tutorials are so simplified and easy to follow.

    Am encountering an error when i run the epochs and batch size tuning code. Kindly help

    This the code part bringing the error…

    # create model
    model = KerasClassifier(build_fn=create_model, verbose=0)
    # define the grid search parameters
    batch_size = [10, 20, 40, 60, 80, 100]
    epochs = [10, 50, 100]
    param_grid = dict(batch_size=batch_size, epochs=epochs)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs= 1)
    grid_result = grid.fit(X_train, y_train)

    TypeError: __call__() missing 1 required positional argument: ‘inputs’

    • Avatar
      Jason Brownlee March 2, 2018 at 3:20 pm #

      Sorry, I have not seen this error. Are you able to confirm that you have copied all of the code and that your development environment is up to date?

    • Avatar
      Nathan Rasch August 27, 2018 at 5:52 am #

      I ran into this over the weekend, and hopefully to same some one else some pain down the road:

      I kept getting the following error when working the prediction section of my code, which frankly was driving me nuts:

      TypeError: call() missing 1 required positional argument: ‘inputs’

      After researching the error message I came upon this comment which let me to the resolution:

      _The thing here is that KerasRegressor expects a callable that builds a model, rather than the model itself. By wrapping your function in this way you can return the build function (without calling it)._ [Source](https://stackoverflow.com/questions/47944463/specify-input-argument-with-kerasregressor)

      Solution: I needed to **wrap** my buildModel() function! 🙁

      Once I ‘wrapped’ the buildModel() function the prediction code blocks finally started working. Git it a try, and it should resolve your issue. The link I provided above should give you a working code example. If not let me know, and I’ll post my working example for you.

      Thanks!

      • Avatar
        Jason Brownlee August 27, 2018 at 6:15 am #

        It might be easier to write your own for loops to grid search Keras models.

  101. Avatar
    sonia March 7, 2018 at 2:05 am #

    dear jason
    how much time this program run while tunning ?like tuning epoch and batch size?

    • Avatar
      Jason Brownlee March 7, 2018 at 6:16 am #

      It depends on the size of the dataset, the size of the model and the speed of your system.

  102. Avatar
    Yumlembam Rahul March 12, 2018 at 1:59 pm #

    Hi,

    As you mention in your blog “As we proceed through the examples in this post, we will aggregate the best parameters. This is not the best way to grid search because parameters can interact, but it is good for demonstration purposes.” does this mean we should so the hyper parameter search in one grid instead of dividing.

    regrads,

    Yumlembam Rahul

  103. Avatar
    jessy March 15, 2018 at 8:51 pm #

    sir,

    I have tried above code. it is executing ,but not displaying results..i don’t know the reason ..

    • Avatar
      Jason Brownlee March 16, 2018 at 6:17 am #

      Perhaps try from the command line, then be patient.

      Perhaps try to reduce the data set size or use fewer combinations?

  104. Avatar
    Yumlembam Rahul March 16, 2018 at 5:25 pm #

    hi, in your example optimizer parameter are not specified while doing grid search do they assume default values if not specified??

    and for reproducibility of result i added the following code and have been able to get same result

    import os
    os.environ[‘PYTHONHASHSEED’] = ‘0’
    np.random.seed(42)
    rn.seed(12345)
    session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
    from keras import backend as K
    # The below tf.set_random_seed() will make random number generation
    # in the TensorFlow backend have a well-defined initial state.
    # For further details, see: https://www.tensorflow.org/api_docs/python/tf/set_random_seed
    tf.set_random_seed(1234)
    sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
    K.set_session(sess)

  105. Avatar
    jessy March 16, 2018 at 6:05 pm #

    sir,

    I have doubt ,Whether LSTM concept could be used for prediction of diabetes dataset(PIMA INDIAN DATASET)…I don’t know how LSTM Learns data from dataset..is it possible to put an hands on calculation..

  106. Avatar
    jessy March 19, 2018 at 7:32 pm #

    Is it possible to put an hands on calculation particularly for hidden layers and LSTM layers..Is it possible to put manual calculation on weights(how it transfer weight from one layer to another layer)…

    • Avatar
      Jason Brownlee March 20, 2018 at 6:15 am #

      Sure, but you will need to code these as extensions to the Keras library.

  107. Avatar
    jessy March 23, 2018 at 9:26 pm #

    sir ,
    i have tried above code without n_jobs==-1 parameter .it is working …I have doubt ,that is above code can be run using LSTM model …is that possible…

    • Avatar
      Jason Brownlee March 24, 2018 at 6:26 am #

      Perhaps set it to 1 thread and let Keras have all of the cores?

  108. Avatar
    Max March 25, 2018 at 1:12 am #

    Hi Jason,

    I’m sure it’s possible – but I can’t figure it out.
    The above code gives me as a result the best hyper-parameters as measured on the cross-validation.
    Now which adjustments to the code would be necessary to additionally calculate the optimum hyper-parameters on a test set?
    The optimum hyper-parameters seem to lead to significantly different results when applied to my model that I use to predict values.

    Thanks
    Max

  109. Avatar
    jessy March 28, 2018 at 7:12 pm #

    sir ,
    I have an doubt that is multivariate time series data can be used for classification or prediction .whether we can use that data for prediction or classification or both.

  110. Avatar
    jessy March 28, 2018 at 7:18 pm #

    sir,
    In LSTM model you are using only RMSE loss function …..why you are not used other loss function ..In particular sequence prediction problem (forecasting) you used only RMSE loss function ….why sir.

    • Avatar
      Jason Brownlee March 29, 2018 at 6:33 am #

      I use MSE not RMSE. You can try other loss functions if you prefer. I find MSE loss function works well for most problems.

  111. Avatar
    hamidi March 29, 2018 at 4:57 am #

    Hi
    Thanks for your nice post.

    Could you please let me know how to incorporate class_weight and tune it?

  112. Avatar
    Prabha April 13, 2018 at 8:58 pm #

    Hello, great post as always!
    I had a query regarding this. So I have a training set and a test set, and I am using a stacking ensemble for predictions.
    So when I run GridSearchCV on this, should I fit just the training set on this and print CV score on the training set ONLY? And not touch the test set at all?
    Also should I fit the new grid classifier on the set before printing the CV score or after?

  113. Avatar
    Aditya Jain April 17, 2018 at 1:53 am #

    model = KerasClassifier(build_fn=create_model, verbose=0)
    # define the grid search parameters
    batch_size = [10, 20]
    epochs = [10, 20, 30]
    param_grid = dict(batch_size=batch_size, epochs=epochs)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
    grid_result = grid.fit(x_train, y_train)

    When I am running this code snippet I am getting error as
    AttributeError: ‘NoneType’ object has no attribute ‘loss’

    Can you please help me on that ?

  114. Avatar
    Marshall April 28, 2018 at 5:28 am #

    Hi Jason,

    First and foremost, this is an incredible writeup – very informative.

    I’m getting an error that reads “can’t pickle _thread.RLock objects”

    When I use the following code:

    ————————————————————————–

    def build_neural_network(n_predictors, hidden_layer_neurons):
    “””
    Builds a Multi-Layer-Perceptron utilizing Keras.

    Parameters:
    x_train: (2D numpy array) A n x p matrix, with n observations
    and p features
    y_train: (1D numpy array) A numpy array of length n with the
    target training values.
    hidden_layer_neurons: (list) List of ints for the number of
    neurons in each hidden layer.

    Returns:
    model: A MLP with 2 hidden layers
    “””
    model = Sequential()
    input_layer_neurons = n_predictors

    model.add(Dense(units=hidden_layer_neurons[0],
    input_dim=input_layer_neurons,
    kernel_initializer=’uniform’,
    activation=’relu’))

    model.add(Dense(units=hidden_layer_neurons[1],
    kernel_initializer=’uniform’,
    activation=’relu’))

    model.add(Dense(units=1))

    model.compile(optimizer=’rmsprop’,
    loss=’mse’)

    return model

    # columns variable defined elsewhere, works as expected

    mlp = build_neural_network(len(columns), [8, 12])

    model = KerasRegressor(build_fn=mlp)

    # create parameter lists for GridSearchCV
    batch_size = list(np.arange(10, 250, 10))
    epochs = list(np.arange(5, 20, 5))

    neural_net_grid_dict = {‘batch_size’: batch_size,
    ‘epochs’: epochs}

    neural_net_grid = GridSearchCV(estimator=model,
    param_grid=neural_net_grid_dict,
    scoring=’neg_mean_squared_error’,
    verbose=1,
    n_jobs=-1)

    mask = df[‘Date’] == ‘2006-11-06’
    X, y = create_X_y(df[mask], columns)

    grid_result = neural_net_grid.fit(X, y)

    ——————————————————–

    Any idea what might be going on?

    • Avatar
      Jason Brownlee April 29, 2018 at 6:21 am #

      Sorry, I have not seen this error. Perhaps try posting to stackoverflow?

  115. Avatar
    Cristiana April 29, 2018 at 4:52 am #

    Thanks so much ! This post helped me a lot !

    • Avatar
      Jason Brownlee April 29, 2018 at 6:28 am #

      I’m glad to hear that.

    • Avatar
      Sandra July 26, 2018 at 4:56 am #

      I am experiencing the same error “can’t pickle _thread.RLock objects”, may I know how you solved it?

  116. Avatar
    Juan May 9, 2018 at 4:16 am #

    Hi Jason,

    how can tune your model to found hyperparameters (learning rate, epoch and output dim in hidden layer) using RandomizedSearchCV?

    Thanks !!
    Regards

    Juan

    • Avatar
      Jason Brownlee May 9, 2018 at 6:28 am #

      Specify ranges and search. What is the problem exactly?

  117. Avatar
    June May 12, 2018 at 6:18 am #

    Hi Jason, I got a help from this blog post. Thank you very much!

    I have one question though. What if I want to test with optimizers that has customized parameters and not default parameters. From your example, it’s just an array of Strings of optimizers name.

    Do you know how I can do this?

    Best,
    June

    • Avatar
      Jason Brownlee May 12, 2018 at 6:52 am #

      You can provide lists of strings with optimizer names if you wish.

      • Avatar
        June May 12, 2018 at 7:14 am #

        Yes. Isn’t this what’s provided in the example code?
        optimizer = [‘SGD’, ‘RMSprop’, ‘Adagrad’, ‘Adadelta’, ‘Adam’, ‘Adamax’, ‘Nadam’]

        What I meant was not with default ones but like when I have my own optimizer defined as follows:

        sgd_custom = SGD(lr_rate=0.7)
        adam_custom = (decay=0.005)

        How can I give optimizer list for this setting? optimizer=[sgd_custom, adam_custom]?

        • Avatar
          Jason Brownlee May 13, 2018 at 6:02 am #

          Good question.

          Yes, you could provide a list of pre-configured objects to use instead of strings.

  118. Avatar
    Philipp May 15, 2018 at 3:52 pm #

    Hi Jason,

    Your posts are really helpful – thanks a lot!

    1. I’m using grid search on my own Keras CNN and everything is working. One thing that keep’s confusing me though: The F1 measures reported by grid search are always a bit (3-4%) higher than when running the same network configurations in Keras directly. I know that Keras isn’t using CV, but this shouldn’t lead to systematic deviations in one direction but to deviations in both directions I think.

    2. Also I found that my network is always performing slightly better (accuracy) when using the TF-Layers API instead of Keras, even though the network configurations are exactly the same (as far as I can control this in Keras).

    Any ideas why Keras seems to perform poorer? Have others experienced the same issues with Keras? I just can’t figure it out…

    Cheers,
    Philipp

    • Avatar
      Jason Brownlee May 16, 2018 at 5:58 am #

      No good idea sorry. It might be statistical chance, or it might be real. See if you can tease this out with some hypothesis tests on the results.

  119. Avatar
    Philipp May 19, 2018 at 4:33 am #

    Thanks, Jason.

    Just to let you know: Apparently it has something to do with the F1 score. Accuracy scores reported by grid search are pretty much the same as my results in Keras.

  120. Avatar
    Ng Minh Hieu May 28, 2018 at 3:43 am #

    Hi Jason, thank you for very detailed and interesting tutorial.
    1. I tried to grid hyperparameters of epochs and batch size as your code. No result was launched and no error message appeared. after that, i changed n_jobs equal 1, python gave me the result. I do not understand why value of n_jobs = -1 prevented the calculation process.

    2. If i have complicated network (with two layers for example), could you tell me how grid can be implemented with number of epochs and batch size?

    Thank you a lot!

    • Avatar
      Jason Brownlee May 28, 2018 at 6:03 am #

      Might have caused a deadlock internally.

      I don’t understand your second question sorry, perhaps you can rephrase it?

  121. Avatar
    Sumit May 28, 2018 at 7:21 pm #

    Hi, Jason, excellent post and help lot for improving my predictive model.

    I have one question, is there any way I can optimise number of layer in network ?

    • Avatar
      Jason Brownlee May 29, 2018 at 6:25 am #

      Yes, use a grid search and choose the configuration with the lowest loss.

  122. Avatar
    John May 30, 2018 at 2:20 pm #

    I tried the gird search but got this error

    ipython-input-49-ea7e264ec276> in ()
    3 param_grid = dict(batch_size=batch_size, epochs=epochs)
    4 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
    —-> 5 grid_result = grid.fit(xs, testY)
    6 # summarize results
    7 print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))

    ~\Anaconda3\envs\tfdeeplearning\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
    612 refit_metric = ‘score’
    613
    –> 614 X, y, groups = indexable(X, y, groups)
    615 n_splits = cv.get_n_splits(X, y, groups)
    616 # Regenerate parameter iterable for each fit

    ~\Anaconda3\envs\tfdeeplearning\lib\site-packages\sklearn\utils\validation.py in indexable(*iterables)
    196 else:
    197 result.append(np.array(X))
    –> 198 check_consistent_length(*result)
    199 return result
    200

    ~\Anaconda3\envs\tfdeeplearning\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
    171 if len(uniques) > 1:
    172 raise ValueError(“Found input variables with inconsistent numbers of”
    –> 173 ” samples: %r” % [int(l) for l in lengths])
    174
    175

    ValueError: Found input variables with inconsistent numbers of samples: [17, 1]

  123. Avatar
    amina May 31, 2018 at 1:38 am #

    Hey,
    what refers 8 in the input dim ? i have a time serie problem a dataset with 41 observation how could i deal with this ?

    • Avatar
      Jason Brownlee May 31, 2018 at 6:20 am #

      It refers to 8 input variables.

      You could define a window of lag obs as input features. Perhaps experiment with different window sizes.

  124. Avatar
    lara May 31, 2018 at 2:08 am #

    could we use only one hiden layer that contain lstm bloc. i want to grid search hyperparametre for my lstm achitecture how could i specify this on code.

    • Avatar
      Jason Brownlee May 31, 2018 at 6:23 am #

      Yes, you could adapt the above examples to search layers/nodes in an LSTM.

  125. Avatar
    Angelo June 18, 2018 at 5:06 am #

    Astounding post, thank you! I wonder how I could evaluate the loss and accuracy evolution of the KerasClassifier according to epoch. Is there something like the history class returned from the model.fit method from SciKitLearn?

    • Avatar
      Jason Brownlee June 18, 2018 at 6:45 am #

      Not that I am aware, I believe you would need to use the Keras API directly and collect history objects from each run.

  126. Avatar
    Babu July 2, 2018 at 6:49 pm #

    Dear Jason,

    I found this article as very useful for my research. Thank you very much.

    Is it possible to find the best CNN architecture (No.of layers, Kernel size, Kernel initialization, Pooling Technique etc) for a given dataset by using GridSearch or RandomSearch?

    • Avatar
      Jason Brownlee July 3, 2018 at 6:23 am #

      There is no “best”, just good enough based on the time and resources we have available.

      • Avatar
        prateek bhadauria July 13, 2018 at 8:50 pm #

        Hello Jason Sir , i want to know that how could i apply CNN concept for non image data which contains large datasets in form of rows and coloumns , and how could i apply padding in 50,000 Rows and 20 coloumns , Kindly suggest an approach.

        • Avatar
          Jason Brownlee July 14, 2018 at 6:17 am #

          CNN is not appropriate unless there is some spatial relationship between the observations, e.g. time or space.

          • Avatar
            maxv April 29, 2019 at 3:01 am #

            Hi

            thanks for this post and the replies to questions.

            I have a question on the properties of the cnn, if you have a dataset like the pollution dataset.

            If we have one binary variable as target in a classification with 10 exogenous variables and it is a daily forecast.
            Let us say we have 500 days of data.

            I can create a multivariate timeseries forecast and have 5 timesteps in my window so that my train shape will be (500,5,10)

            If I apply Conv1D, it should extract features out of all the 10 variables right ?
            or does it apply a Conv1D on each exogenous variable separately.

            What I try to understand is : does it capture interactions of exogenous variables ?

            Does the Conv2D only work for images or for times series too ?

            For each window of 5 timesteps, we have 5 timesteps and 10 exogenous variables so we could think this is 2D.

            Thanks J

          • Avatar
            maxv May 1, 2019 at 6:52 am #

            Hi
            I think you are pointing me again to the same tutorial but my questions come from this one.

            Questions see above.

            Question 1 :
            If I apply Conv1D, it should extract features out of all the 10 variables right ?
            or does it apply a Conv1D on each exogenous variable separately.

            Question 2 : does it capture interactions of exogenous variables ?

            Question 3 :

            Does the Conv2D only work for images or for times series too ?

          • Avatar
            Jason Brownlee May 1, 2019 at 7:11 am #

            If you have multiple parallel time series, you can use separate Conv1D layers for each or one and merge into the model OR one Conv1D layer and treat each time series as a separate channel.

            Test both, but I recommend the latter.

            In both cases, the model will capture interactions.

            No Conv2D can work for any data that has a temporal or spatial relationship in two dimensions.

  127. Avatar
    James July 10, 2018 at 12:34 am #

    Thanks for the tutorial Jason, very informative. I wonder if you know of a relatively un-intrusive way of reducing the memory footprint of Grid (or equivalently Random) SearchCV, since they seem to store every model produced during the search in memory, instead of e.g just the best. I’m handling 3d data and trying 3d cnns, so the models quickly get too big to have e.g 25 in memory at once.

    Wondered about hacky divide and conquer strategies on a higher level, e.g if the full space for a parameter is

    [1,5,10,15,20,25],

    do a grid search of [1,5,10], keep best model (m1) and discard the rest, search [15,20,25], keep best (m2), then keep best of [m1,m2], but this would still be fiddly/somewhat arbitrary to get correct for a given amount of memory and parameter space. I’d rather not have to implement my own parameter search, but if I go too far down this route I may as well end up doing so

    Thanks

    • Avatar
      Jason Brownlee July 10, 2018 at 6:49 am #

      Split the search across multiple scripts and machines or implement the for-loops of the search yourself (preferred).

  128. Avatar
    Kemas Farosi July 11, 2018 at 8:50 pm #

    Hi Jason,

    Great tutorial, I have a question, is it possible to find how many hidden layers in my deep neural networks by grid search ? because i want to find the best layer numbers in my DNN.

    thanks

  129. Avatar
    Vugar Bayramov July 19, 2018 at 11:33 pm #

    Hi Jason!!

    Awesome content. Thanks very much for your effort.

    I have a question regarding the model with multidimensional output. What i mean is my y_train is an array with [value1, value2, value3] which i am trying to predict. While using the example above for selection of the best activation function for my probelm i got this error below:

    ValueError: y_true and y_pred have different number of output

    How can i solve this issue?

    Regards

    Vugar

  130. Avatar
    Nick July 27, 2018 at 9:37 pm #

    While doing the grid search some combinations lead to a:
    ValueError: Input contains NaN, infinity or a value too large for dtype(‘float32’).

    so the grid search stops. Do you know if its possible just to skip these combinations to prevent the search from stopping or why this happens with some NN hyperparameters?

    Regards

    • Avatar
      Jason Brownlee July 28, 2018 at 6:35 am #

      Perhaps. It might be easier to run the grid search yourself with some for-loops.

    • Avatar
      Pramod Hankare May 20, 2020 at 2:45 pm #

      Hi Nick, did you eventually find a solution for this?

  131. Avatar
    billa July 30, 2018 at 10:29 pm #

    Is it possible to tune the neurons inside the convolution layer for image classification?

  132. Avatar
    Zenon Uchida July 31, 2018 at 8:56 pm #

    Do filters (in the code below) denote to number of neurons?
    conv = Conv1D(filters=64, kernel_size=5, activation=’relu’)(embedding)
    if not, should filters also be tuned?
    I’m pretty sure kernel_size should be tuned.

    • Avatar
      Jason Brownlee August 1, 2018 at 7:43 am #

      No, they are the number of filters.

      Yes, the number of filter pas and kernel size can and should be tuned.

  133. Avatar
    Khaw August 7, 2018 at 11:53 pm #

    Thank you for your awesome explanation.

    Is it possible to do the same grid search for hyperparametrs in the R package Keras? I do not find the equivalent of the gridCV function

  134. Avatar
    Beatriz August 15, 2018 at 10:05 am #

    Hi Jason,

    I’m trying to do a grid search in my Seq2Seq model.

    I’m not sure if I understand the values X,Y I should put inside the grid.fit() function.

    In my case, I tried two numpy arrays with three dimensions (samples, max length of words, number of characters)

    Anyway, I’m not sure if that is the reason it is not working for me. I get the following error:

    TypeError: Cannot clone object ” (type ): it does not seem to be a scikit-learn estimator as it does not implement a ‘get_params’ methods.

    What do you think is going wrong?

    • Avatar
      Jason Brownlee August 15, 2018 at 1:53 pm #

      You might need to implement the for-loops of your grid search manually in order to have more control over the process.

  135. Avatar
    ammara August 15, 2018 at 9:16 pm #

    Thanks for such a great content!!
    I have a query that what is the “random_state” used in deep models, is this a
    hyper-parameter?if it is then how much it is important for model training. kindly guide me.
    Thanks in advance.

  136. Avatar
    Hoo Yu Heng August 21, 2018 at 4:17 am #

    For those who face the error of ‘cannot pickle object class’, make sure u use create_model and not create_model() in the KerasClassifier constructor:

    model = KerasClassifier(build_fn=create_model, verbose=0, epochs=100)

    not
    model = KerasClassifier(build_fn=create_model(), verbose=0, epochs=100)

  137. Avatar
    Natanos August 21, 2018 at 8:31 pm #

    Sorry but when I run this program, it ends in “Using TensorFlow backend” and not finished in almost 3 hours.

    Is this normal? if not, what should I do? thanks

    • Avatar
      Jason Brownlee August 22, 2018 at 6:11 am #

      Perhaps try searching fewer parameters?

    • Avatar
      clemm September 19, 2018 at 7:00 pm #

      Hello,

      Same problem here with a gridsearch reduced to one epoch and one batch_size : the fit function never ends (keras version : 2.2.2). But the same code worked with an other computer (keras version : 2.0.5).

      • Avatar
        Jason Brownlee September 20, 2018 at 7:56 am #

        Perhaps run the grid search manually? Just some for-loops.

  138. Avatar
    Nathan Rasch August 27, 2018 at 10:28 am #

    Has anyone had a change to combine RandomizedSearchCV with SelectKBest?

    I have a “FeatureUnion” that includes “SelectKBest”, but then the “model.add(Dense….” call in the model build function complains about the “input_dim” being incorrect. I’m not sure how to attach to the value “SelectKBest” is currently considering as part of the random search, so that I can feed it to the build model function as a param for “input_dim”.

    Ex:
    features = []
    features.append((‘Scaler’, StandardScaler()))
    features.append((‘SelectKBest’, SelectKBest( k = 5)))
    featureUnion = FeatureUnion(features)

    def buildModel(optimizer = ‘Adam’, lr = 0.001, decay = 0.0, epsilon = None):
    opt = None
    model = Sequential()
    model.add(Dense(20, input_dim = ???? …)

    We get a nice, juicy error about the input dim when running this. 🙁

    If anyone has a working example or link to some one who does I’d be very grateful.

    Thanks!
    Nathan

    • Avatar
      Nathan Rasch August 27, 2018 at 11:36 am #

      OK, solved my own issue:

      The key is just to remove the “input_dim” param from the “model.Add” method call. Then you can pass whatever values you want to test with as part of the params dict.

      Ex:

      # Notice we don’t have a “Input dim” param on the model.add call anymore
      def buildModel():
      model = Sequential()
      model.add(Dense(20, kernel_initializer=’normal’, activation = ‘relu’))

      # We add the SelectKBest__k values we want to test to the “params” dict:
      params = {
      ‘housingModel__epochs’ : [ 1, 2 ],
      ‘housingModel__batch_size’ : [ 15, 30, 65 ],
      ‘FeatureUnion__SelectKBest__k’: [5, 6, 7, 8, 9, 10]
      }

      # And create the FeatureUnion
      features = []
      features.append((‘Scaler’, StandardScaler()))
      features.append((‘SelectKBest’, SelectKBest()))
      featureUnion = FeatureUnion(features)

      And that’s that. 🙂

      Thanks!

    • Avatar
      Jason Brownlee August 27, 2018 at 1:56 pm #

      Perhaps write your own for-loop or use regularization to let the model ignore irrelevant features?

  139. Avatar
    Piyush September 11, 2018 at 2:15 am #

    @Jason Brownlee

    Great tutorial, though I suggest to combine all chunks of code and give a one final code which tunes all hyper parameters at once, e.g., define a grid with all hyper parameters rather than focusing on them one by one.

    Also, once the tuned hyper parameters are found, provide a code with predictive model with tuned hyper parameters which can be used in actual problem to predict class labels.

  140. Avatar
    Michael Pappas September 29, 2018 at 7:23 am #

    Does anyone else has two problems with the first example? I’m using theano as backend and I run into two errors:

    1) RuntimeError: You can’t initialize the GPU in a subprocess if the parent process already did it (goes away when I change .theanorc to cpu instead of cuda0)

    2) sklearn.externals.joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

    Any ideas?

    • Avatar
      Jason Brownlee September 30, 2018 at 5:59 am #

      Perhaps try running on the CPU as a first step?

      • Avatar
        Michael Pappas October 2, 2018 at 1:04 am #

        Then I get the second error as mentioned above.

        • Avatar
          FERNANDO FREGAPANE SCALIA October 6, 2018 at 2:52 am #

          I have the same error with all libraries updated.

          Any ideas, please?

  141. Avatar
    Vasileios Papanikolaou October 13, 2018 at 11:23 am #

    Hey Jason, thank you for this excellent post and your whole contribution to the ML/DL community! It really means a lot. I have quick q: Let’s say that once you define the model architecture and perform your first grid search over – say one hyperparameter. How can you redefine the model using the optimal hyperparameter, without rewriting the ‘create_model’ function? Thanks a lot in adavance

    • Avatar
      Jason Brownlee October 14, 2018 at 5:59 am #

      You can create the model directly, using the hyperparametres found via the search.

      Perhaps I’m missing something in your question?

  142. Avatar
    Janosh Riebesell November 4, 2018 at 9:04 pm #

    Slight correction:

    > We can see that the dropout rate of 0.2% and the maxnorm weight constraint of 4 resulted in the best accuracy of about 72%.

    Should be either 0.2 or 20 %.

  143. Avatar
    Robert Guenther November 6, 2018 at 5:51 am #

    Jason,

    Ditto all the good things said above. You definitely are fulfilling your mission of making us (data scientist) better at machine learning.

    Thank you,
    Robert

  144. Avatar
    sukhpal November 15, 2018 at 12:43 am #

    when i run the above code i got this message

    model = Sequential()
    ^
    IndentationError: expected an indented block
    kindly help me to remove this error

  145. Avatar
    Long November 15, 2018 at 2:03 pm #

    Great tutorial as always,

    I also had 1 experience with Keras & scikit-learn wrapper when doing the train-test split. It turned out that I should not use params like validation_split/validation_data in Keras because cross validation from GridSearchCV already takes care of that.

    I would like to ask, should I use scoring metrics from Keras itself or should I use metrics provided by GridSearchCV?
    The docs here is not really clear https://keras.io/scikit-learn-api

    And how about other parameters (if available) that appear to be overridden by scikit-learn wrapper), which ones should I pick, keras or scikit-learn?

    Thank you so much Jason.

    • Avatar
      Jason Brownlee November 16, 2018 at 6:11 am #

      Probably use sklearn’s metrics.

      What other parameters exactly?

  146. Avatar
    sukhpal November 16, 2018 at 9:12 pm #

    when i run the code i receive this message instead of output.kindly help me

    runfile(‘C:/Users/sukhpal/untitled9.py’, wdir=’C:/Users/sukhpal’)
    Using Theano backend.
    C:\Users\sukhpal\Anaconda2\lib\site-packages\sklearn\cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
    “This module will be removed in 0.20.”, DeprecationWarning)

  147. Avatar
    sukhpal November 17, 2018 at 1:34 pm #

    but sir no output is displayed on screen

  148. Avatar
    kamal November 17, 2018 at 4:44 pm #

    sir as my program also gives no error but no output is displayed on screen

    • Avatar
      Jason Brownlee November 18, 2018 at 6:38 am #

      Ensure you are running from the command line and wait a few minutes.

  149. Avatar
    kamal November 19, 2018 at 12:57 am #

    sir when i run the code from command prompt it gives me this error
    Traceback (most recent call last):
    File “C:/Python27/oop1.py”, line 3, in
    from sklearn.model_selection import GridSearchCV
    File “C:\Python27\lib\site-packages\sklearn\__init__.py”, line 134, in
    from .base import clone
    File “C:\Python27\lib\site-packages\sklearn\base.py”, line 11, in
    from scipy import sparse
    ImportError: No module named scipy

  150. Avatar
    Kareem JEIROUDI November 20, 2018 at 12:08 am #

    Hey Jason,

    A very helpful post, thanks for your efforts. However, I’m still wondering if you can put these optimizations together, do you think that’s possible?? And if so, how?
    The problem is that in your examples, you could configure the learning rate and momentum only as you used SGD but not any other optimizer.
    I’ll try to write a function such that one can specify all these parameters before grid-searching, plus I’d like to modify the number of layers in a network.
    Let me know what you think about all this.
    Thanks again for this awesome post!

    • Avatar
      Jason Brownlee November 20, 2018 at 6:36 am #

      I’m not sure I understand, sorry. Perhaps you can elaborate?

  151. Avatar
    kamal November 20, 2018 at 2:15 am #

    sir as i have installed theano keras now my run option in editor window of python disappear.sir how i run my program as there is no direct option of run.

  152. Avatar
    kamal November 20, 2018 at 12:09 pm #

    sir my program give this error now…plz help me
    ======================== RESTART: C:\Python27\oop1.py ========================
    Using Theano backend.

    You can find the C code in this temporary file: c:\users\sukhpal\appdata\local\temp\theano_compilation_error_ei4ugz

    Traceback (most recent call last):
    File “C:\Python27\oop1.py”, line 4, in
    from keras.models import Sequential
    File “C:\Python27\lib\site-packages\keras\__init__.py”, line 3, in
    from . import utils
    File “C:\Python27\lib\site-packages\keras\utils\__init__.py”, line 6, in
    from . import conv_utils
    File “C:\Python27\lib\site-packages\keras\utils\conv_utils.py”, line 9, in
    from .. import backend as K
    File “C:\Python27\lib\site-packages\keras\backend\__init__.py”, line 86, in
    from .theano_backend import *
    File “C:\Python27\lib\site-packages\keras\backend\theano_backend.py”, line 7, in
    import theano
    File “C:\Python27\lib\site-packages\theano\__init__.py”, line 110, in
    from theano.compile import (
    File “C:\Python27\lib\site-packages\theano\compile\__init__.py”, line 12, in
    from theano.compile.mode import *
    File “C:\Python27\lib\site-packages\theano\compile\mode.py”, line 11, in
    import theano.gof.vm
    File “C:\Python27\lib\site-packages\theano\gof\vm.py”, line 674, in
    from . import lazylinker_c
    File “C:\Python27\lib\site-packages\theano\gof\lazylinker_c.py”, line 140, in
    preargs=args)
    File “C:\Python27\lib\site-packages\theano\gof\cmodule.py”, line 2388, in compile_str
    (status, compile_stderr.replace(‘\n’, ‘. ‘)))
    Exception: Compilation failed (return status=1): The system cannot find the path specified.

  153. Avatar
    kamal November 22, 2018 at 7:28 pm #

    sir when i run the program in python ieditor window i encounter this problem
    Using Theano backend.
    WARNING (theano.configdefaults): g++ not available, if using conda: conda install m2w64-toolchain

    Warning (from warnings module):
    File “C:\Python27\lib\site-packages\theano\configdefaults.py”, line 560
    warnings.warn(“DeprecationWarning: there is no c++ compiler.”
    UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory
    WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
    WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.

  154. Avatar
    Steven Veenma November 26, 2018 at 1:50 am #

    Thanks for this excellent tutorial. It helped me getting a feeling using different parameters. I use Keras/Tensorflow/GPU and with smaller grids this works fine. But when I search for a larger grid I am running into errors. The GPU seems to keep old models in the memory. clear.session() apparently is not been implemented in keras.wrappers.scikit_learn.KerasClassifier. I raised an issue of this at https://github.com/keras-team/keras/issues/11693 where you can find more details. If this can’t be solved I have two options:
    1. Try it with Theano
    2. Program a function myself that does the job and implement clear.session() in it
    Or do you have another advice?

    • Avatar
      Jason Brownlee November 26, 2018 at 6:20 am #

      Nice discovery.

      For larger searches, I recommend writing a custom for-loop, output results to file and even spread the search across multiple machines (sub-grids).

  155. Avatar
    Nate Star November 29, 2018 at 5:37 am #

    Thanks for the great post! However, when we do hyper-parameter tuning, shouldn’t we be utilizing cross-fold validation and optimizing for the average validation error across folds? In this article we are optimizing for training accuracy which would bias our model towards the training data and may lead to parameters that do not generalize well.

    • Avatar
      Jason Brownlee November 29, 2018 at 7:48 am #

      Yes, we are using cross-validation for tuning in this tutorial.

  156. Avatar
    MK December 24, 2018 at 6:24 am #

    Hi,

    First of all thank you so much for this great post! I have one question:

    Is it possible to optimize both learning rate and optimizer type together ? Because i am getting error when i try, error tells me that “learning_rate” is not a legal parameter”. Could you please give me some hint about it.

    Thank you in advance

    • Avatar
      Jason Brownlee December 25, 2018 at 7:15 am #

      Not really, pick an optimizer (e.g. SGD), then tune the learning rate.

  157. Avatar
    Jitendra December 25, 2018 at 8:49 pm #

    Hello Jason, I am building a stateful model and have initiated batch_size as 20. This works well while I am fitting the model after passing batch_size=batch_size in mode.fit.

    However, batch_size=batch_size doesn’t seem to work while I am predicting on test set.

    Is there a rule or something which I am missing which states that we have to use different batch sizes for train and test. One that I am aware of explains that train and test lengths have to be multiples of batch size. Request your help please.

    • Avatar
      Jason Brownlee December 26, 2018 at 6:43 am #

      Batch size really only matters during training, it is part of SGD. It is only used for memory efficiency at test time – no effect on model skill.

  158. Avatar
    kamal December 26, 2018 at 4:08 pm #

    sir can be combine all the codes into one to produce single optimal model

    • Avatar
      Jason Brownlee December 27, 2018 at 5:40 am #

      The code finds a set of hyperparametres for configuring a model for your problem.

  159. Avatar
    Paul January 3, 2019 at 11:58 am #

    Would it be beneficial to do nested cross validation instead? So first doing a gridsearch, and then cross validating the gridsearch results.
    Thanks!

    • Avatar
      Jason Brownlee January 4, 2019 at 6:24 am #

      It might be, it depends on how much data you have. Not enough and the results may be optimistic.

  160. Avatar
    Martin January 21, 2019 at 5:17 am #

    In those examples there are no hidden layers. The first layer, i.e. the input layer, isn’t hidden layer. Is that right?

  161. Avatar
    Martin January 21, 2019 at 5:53 am #

    Thanks Jason. In keras the first input layer is the first hidden layer!

    • Avatar
      Jason Brownlee January 21, 2019 at 11:57 am #

      No, the input layer is defined via an argument on the first hidden layer. A hidden layer is not an input layer.

  162. Avatar
    Alex January 31, 2019 at 11:36 pm #

    Jason, thank you for the excellent post. This was exactly what I needed. I have a feed-forward MLP network that I use to predict dam water inflow (for energy generation in Brazil) from past rainfall. By combining Keras and GridSearchCV I managed to find the best set of hyperparameters for my task.

  163. Avatar
    jessy February 1, 2019 at 9:47 am #

    sir,
    i have tried above in anaconda prompt. its taking lot of time to execute …pls tell me another way(or) to execute same code in anaconda

    • Avatar
      Jason Brownlee February 1, 2019 at 11:05 am #

      I have some suggestions:

      Perhaps try testing fewer hyperparameters?
      Perhaps try running on less data?
      Perhaps try running on a faster computer?

  164. Avatar
    jagon February 1, 2019 at 10:08 am #

    jagon
    give me an idea to execute same code in diiferent ways ….tell me steps

  165. Avatar
    jagon February 8, 2019 at 9:54 am #

    jagon
    sir i have executed the above code in anaconda prompt,another way of executing same code ..tell me steps pls…

  166. Avatar
    Wonbin February 11, 2019 at 9:18 pm #

    Thank you so much for this really helpful post! I’ve been learning all about ML here in your posts since the beginning. Thank you 🙂

    I have still one question now even though I read all your comments and links about the question that people have asked.
    When it comes to regression tasks, we might configure in

    ———————————————-
    scoring=’neg_mean_squared_error’
    ———————————————-

    and the result will return ‘the negated value of the metric’ (this maybe means negative value?) like below.

    ——————————————————————————————
    Best: -19222385.393424 using {‘optimizer’: ‘Adamax’}
    -3704635991.649002 (2334839512.648289) with: {‘optimizer’: ‘SGD’}
    -21009285.029564 (9977061.839532) with: {‘optimizer’: ‘RMSprop’}
    -19966418.799906 (9785063.647908) with: {‘optimizer’: ‘Adagrad’}
    -21064977.853754 (9371950.402550) with: {‘optimizer’: ‘Adadelta’}
    -19659670.962081 (9634316.972027) with: {‘optimizer’: ‘Adam’}
    -19222385.393424 (9437755.930065) with: {‘optimizer’: ‘Adamax’}
    -19785109.598847 (9571852.777559) with: {‘optimizer’: ‘Nadam’}
    ——————————————————————————————

    So should I just use 19222385.393424 as MSE instead of minus 19222385.393424?
    Is the value (after deleting the minus) MSE of the model?
    I couldn’t really get what the minus is meaning…

    I look forward to your reply. Thank you for your help!

    • Avatar
      Jason Brownlee February 12, 2019 at 8:01 am #

      Yes, the scikit-learn will invert the metric and make it negative. More here:
      https://machinelearningmastery.com/faq/single-faq/why-are-some-scores-like-mse-negative-in-scikit-learn

      That is a very large loss, perhaps the model can be improved?

      • Avatar
        Wonbin February 12, 2019 at 2:36 pm #

        Cheers mate! I checked your Frequently Asked Questions section and it looked the minus can be just ignored. But another reason I was confused with the number (‘Best: -19222385.393424′ in here) is like below, please see the output of my code.

        In:
        ——————————————————————————————
        def create_model(optimizer=’adam’):

        # Compile model
        model.compile(loss=’mae’, optimizer=optimizer, metrics=[‘mse’])

        # create model
        model = KerasRegressor(build_fn=create_model, verbose=2, epochs=300, batch_size=256)
        ——————————————————————————————

        Out:
        ——————————————————————————————
        Epoch 1/1
        – 41s – loss: 2958.7100 – mean_squared_error: 45877955.0626
        ——————————————————————————————
        (I just set the eopch to 1)

        In:
        ——————————————————————————————
        # summarize results

        ——————————————————————————————

        Out:
        ——————————————————————————————
        Best: -19222385.393424 using {‘optimizer’: ‘Adamax’}

        -3704635991.649002 (2334839512.648289) with: {‘optimizer’: ‘SGD’}
        -21009285.029564 (9977061.839532) with: {‘optimizer’: ‘RMSprop’}
        -19966418.799906 (9785063.647908) with: {‘optimizer’: ‘Adagrad’}
        -21064977.853754 (9371950.402550) with: {‘optimizer’: ‘Adadelta’}
        -19659670.962081 (9634316.972027) with: {‘optimizer’: ‘Adam’}
        -19222385.393424 (9437755.930065) with: {‘optimizer’: ‘Adamax’}
        -19785109.598847 (9571852.777559) with: {‘optimizer’: ‘Nadam’}
        ——————————————————————————————

        So, my question is why the best score(19222385.393424) quite different with the mean_squared_error (45877955.0626) which was the output of the first code?

        About your comment “That is a very large loss, perhaps the model can be improved?”,
        I didn’t transform the target variable (like log-transformation). Is transforming the target variable necessary in neural networks?

  167. Avatar
    Wonbin February 15, 2019 at 2:34 am #

    Thanks for your help always!
    I spent about 2 weeks tuning hyperparameters, but very unfortunately I got in serious troubles and may have to redo all the things because of the two reasons…
    I want to ask you two questions..
    ————————————————-
    Q1) Is there a good guideline on the sequence of hyperparameters to be tuned?
    e.g. What parameters should I last dive into especially to prevent from wasting time?

    Q2) Between loss and metrics(like ‘mse, ‘mape’ and so on), which one should I see when choosing the best parameters?
    ————————————————-
    Could you please give me some advice on the above questions….?

    (p.s. I’m doing on a regression task)

  168. Avatar
    Edu February 22, 2019 at 10:21 am #

    Hi, Jones. Congratulations on the content. If possible, I would like to ask a question. I read and reread the part “How to Tune the Number of Neurons in the Hidden Layer” and I could not understand what would change in the code if the output was a value between 0 and 100 and not just a clipping of 0 and 1. I need to do ” gridsearch “in a time series using univariate LSTM. Sorry if my question is too simple. Thank you.

  169. Avatar
    Adrian February 22, 2019 at 10:51 pm #

    Hi, I am trying to use the dropout seleccion in my funciontal CNN and I get this error when I execute:
    Cannot clone object ” (type ): it does not seem to be a scikit-learn estimator as it does not implement a ‘get_params’ methods.

    I have no idea how to solve it.
    Thanks

  170. Avatar
    sukhpal February 23, 2019 at 11:32 pm #

    sir please provide me the python code of plots for comparison of various algorithm like adam adagrad,adadelta rmsprop for different epochs

  171. Avatar
    daifeng March 4, 2019 at 7:15 pm #

    Hi, recently, I’m training using keras for a large data sets, so only the function “fit_generator” in keras can be used. So, I am wondering how to use grid search for such function, for only “fit” is offered in GridSearchCV class.

  172. Avatar
    ismetb March 13, 2019 at 9:03 pm #

    Excellent post again Jason, thank you. My two questions are:

    1) How can I search optimizer and learning rate together? When I write optimizer=optimizer(lr=lr), the code does not run.

    2) Is there an importance order of parameters? I would like to grid search for several parameters and I want to group the most important ones

    Regards

  173. Avatar
    Teresa Lisanti March 15, 2019 at 9:12 am #

    Hi Jason, i am using a LeNet and i would like to say how to use the grid search. Is ti necessary write a function that return a model? I have a file named lenet.py where i wrote my neural network whit the class LeNet; can i use this file instead of make a new function? Thanks you for your answer.

    • Avatar
      Jason Brownlee March 15, 2019 at 2:26 pm #

      You can develop the grid search any way you wish.

  174. Avatar
    Neel March 16, 2019 at 7:44 pm #

    Hi Jason, superhelful tutorial! Thanks for investing the time in writing this.

    I am working on a multiclass classification problem using Keras.

    Grid Search provides best parameters for the metrics defined in the model.compile. I tried disabling that and adding scoring= in the grid.fit command line. However, as others have also pointed out grid search only accepts “accuracy” as a valid score for optimisation and providing the best hyperparameters. For me the optimum hyperparameters would the one providing high accuracy, precision, recall on my unseen test data.

    Is there a way I can save all models trained using gridsearch or as gridsearch iterates through a set of hyperparameter, I can extract the model and run a classification_report for that model. In the end, I want a model which gives the best results in the classification_report

  175. Avatar
    Neel March 16, 2019 at 7:53 pm #

    Part 2 of the question:

    I am currently splitting my data using train_test_split (Note1) and passing the testX and testY as a cross validation in grid_result. Purpose is to define a cross validation set to a model.

    Note 1:

    (trainX, testX, trainY, testY) = train_test_split(datatraincv, labeltraincv, test_size=0.40, random_state=42, shuffle=True)

    grid_result = grid.fit(trainX, trainY, validation_data=(testX, testY), callbacks=[early_stopping_monitor])

    *Question*: Is this required or sklearn automatically splits data in Train/Cross Val when we use grid.fit?

    __

    Additionally I have “datatest” and “labeltest” data which I use to predict and get the actual results on unseen data (Note 2)

    Note 2:
    predictions = grid_result.best_estimator_.model.predict(datatest, batch_size=32)
    print(classification_report(labeltest.argmax(axis=1), predictions.argmax(axis=1), target_names=target_names))

    *Question*: Is this required or the cross val data that I feed in is also unseen to the Keras algo? I used to earlier code in Matlab and CV data was for Theta selection such that Theta Train = the one which gets the highest accuracy in the Cross Val data.

    • Avatar
      Jason Brownlee March 17, 2019 at 6:20 am #

      The grid search will use k-fold cross validation to split the data.

      Typically a new model is refit using the best parameters after the tuning process.

  176. Avatar
    Arian March 19, 2019 at 3:04 am #

    Hey Jason,

    thanks for the nice tutorial, i really enjoyed it!
    I would like to parallelize the process by setting n_jobs to -1 or something else than 1, but when i try to run fit on the grid i get this error:
    “_pickle.PicklingError: Could not pickle the task to send it to the workers.”

    I did some research and found out, that this has to do something with Keras Objects not being pickle compatible.

    Do you know a solution for my problem or a different method to parallelize Gridserach for Keras on CPUs?

    Thank you very much!

    • Avatar
      Jason Brownlee March 19, 2019 at 8:59 am #

      You might have to run the grid search manually with for-loops.

  177. Avatar
    Patrick March 26, 2019 at 8:55 am #

    Thanks for the detailed explanation! It’s very helpful. I have two questions for you:

    In what order would you suggest to tune the parameters?
    Which parameters should be tuned together?

  178. Avatar
    Jaime April 12, 2019 at 11:59 pm #

    Hello Jason,

    Is it possible to change the metric from Accuracy to mse. I am using LSTMs por trajectory forecasting, so the values are continous. I am havine the error “ValueError: The model is not configured to compute accuracy. You should pass metrics=["accuracy"] to the model.compile() method.”

    • Avatar
      Jason Brownlee April 13, 2019 at 6:32 am #

      You must remove accuracy for regression problems.

  179. Avatar
    Jaime April 13, 2019 at 2:00 am #

    Hello Jason,

    Sorry for the fool question. I found how to do it.

    • Avatar
      Jason Brownlee April 13, 2019 at 6:37 am #

      No problem.

    • Avatar
      LIFEN HUANG June 4, 2019 at 1:20 pm #

      Hello, do you know how to do if? Could you share to me because I also face this problem of The model is not configured to compute accuracy. Thanks a lot!

  180. Avatar
    abhijit April 20, 2019 at 5:10 pm #

    Hello Jason,

    I tried a custom AUC metric with gridsearch cv in keras , Can you just me where i am getting wrong. I am getting an error for this

  181. Avatar
    Bagus April 25, 2019 at 10:23 pm #

    Hi Jason,

    I don’t see splitting the dataset for test/train there. Is this (data splitting) done within cross-validation? If so, what is the percentage of division between train/test/validate inside the gridSearchCV?

    • Avatar
      Jason Brownlee April 26, 2019 at 8:34 am #

      In the above tutorial, the dataset is split using k-fold cross validation as part of the grid search.

  182. Avatar
    thiagu May 3, 2019 at 10:48 am #

    HI JASON,

    IN THE ABOVE EXAMPLE CAN WE USE LSTM MODEL.IS THAT POSSIBLE..

  183. Avatar
    krs reddy May 20, 2019 at 7:33 pm #

    jason,

    how to optimize for neuons in different layers? when model has 2 or more hidden layers and task is to get optimal no.of neurons in each layer– how to proceed??

    • Avatar
      Jason Brownlee May 21, 2019 at 6:33 am #

      We never get an optimal model, we get a good enough model.

      Try a suite of configurations and see what works well.

  184. Avatar
    krs reddy May 20, 2019 at 7:37 pm #

    jason,

    how to hyper parameter tune for optimization algorithm, learning rate and momentum in a single go??

    here optimization algo is hyperparameter to a model , learning rate and momentum are hyperparameters to optimization function.

  185. Avatar
    Prem Alphonse May 21, 2019 at 2:31 pm #

    Hi Jason,
    As you have shown one by one tuning of each parameters, can it be done in same way and combine the best values to build final model, or each parameter change may depend on others which we have to do nested loops to find the best set of parameters.

    • Avatar
      Jason Brownlee May 21, 2019 at 2:47 pm #

      Yes, you can use nested loops across each hyperparameter if you wish.

  186. Avatar
    Marco Sabatini May 31, 2019 at 5:09 am #

    set cv=5 (0r 3) instead of jobs=-1

  187. Avatar
    Niez Ghabi June 12, 2019 at 10:21 pm #

    Hello,

    You didn’t mention how we could tune many parameters, it’s like each parameter is tuned on it’s own. moreover, you haven’t set an example of tuning the number of hidden layers of the number of neurones in more than one hidden layer. Can you please set an example ?

    Thank you

    • Avatar
      Jason Brownlee June 13, 2019 at 6:17 am #

      It is easier to tune one parameter at a time, you can train more if you like, but it will require more time/compute.

      You can tune the number of layers if you wish, I left out that example.

      • Avatar
        Niez Ghabi June 13, 2019 at 11:33 pm #

        When I tune the number of layers, do I have to choose a particular number of layers in the current model/estimator ? Can this parameter be also tuned along with others ?

        What combination would be better ? Tuning every parameter on it’s own or tuning all parameters together ?

        Thank you.

        • Avatar
          Jason Brownlee June 14, 2019 at 6:45 am #

          I often recommend using a large model with a big capacity and use regularization, like weight decay to reduce overfitting.

          Nevertheless, you can tune the number of layers and nodes at the same time if you wish.

          • Avatar
            Niez Ghabi June 14, 2019 at 5:31 pm #

            Okay, but since tuning many parameters at the same time will require more time to compute, I was wandering whether tuning every parameter on it’s own would provide the same result as tuning them all together.

            Thank you.

          • Avatar
            Jason Brownlee June 15, 2019 at 6:27 am #

            No, tuning parameters one at a time is an approximation of tuning all parameters at once.

            We typically avoid tuning all parameters at once because of the computational cost.

  188. Avatar
    Niez Ghabi June 18, 2019 at 1:54 am #

    Than you very muc, this helped a lot. I will try tuning parameters at a time then.

  189. Avatar
    Guirado June 27, 2019 at 11:35 pm #

    Hello Jason! Thank you very much for your posts, they are the best teaching source I have found.

    Do you know how could I use Grid Search without defining my model as you explain at the beginning of the post? This is because I did transfer learning to MobileNet freezing the weights of all the layers except the last ones.

    Then, if you could also explain how to adapt the code for images to use it for my CNN MobileNet it would be really helpful.

    Thank you.

  190. Avatar
    DJ June 28, 2019 at 12:58 am #

    Hi Jason

    First of all thank you for providing this post.
    It really helped me a lot. I adjusted the many parameters through gridsearch as you did and got a better model.

    As shown in this post, when I train with train data, the accuracy is more than 70%. Then I tried to predict the test data with the best model I got. I then compared the actual label of the test data with the predicted value from the model through the test data.

    By the way! Accuracy goes up and down nearly 20-30 %. Why is this accuracy so low? Both train data and test data are of the same type (pima-indian-diabates). I even used a dropout to prevent overfitting. I’m so embarrassed.

    • Avatar
      Jason Brownlee June 28, 2019 at 6:05 am #

      Perhaps the model is a little unstable based on the small sample size.

      Perhaps try some weight decay?

      • Avatar
        DJ July 2, 2019 at 6:05 pm #

        Thank you, Jason.

        First, the data is the same pima-indians-diabetes data as the example in this post.
        The training data was divided into 461 samples and the test data was divided into 307 samples.

        Then I created the KerasClassifier model in the following way.

        I used L2 regularization (Weight Decay) as you told me.

        Then I did the Gridsearch with the following parameters and found the optimal parameters.

        Then I predicted the test data with that model and got the results.

        However, when I still predict the test data, the accuracy is about 20%. It’s the opposite of 7-80% when I predict train data.

        Even when I was trying to prevent overfitting using both weight decay and dropout.

        Is it because the amount of data is inevitably too small? I still think there is a problem if the accuracy is only 20%.

        Or is there a problem with my model and learning process? I do not know. I even did cross validation …

        Thank you, Jason. Can you tell me what the problem is?

  191. Avatar
    SOA July 4, 2019 at 2:20 am #

    Hello Doctor Jason. I have sequential data. Can I tune the batch size and the Epochs using the KerasClassifier or the KerasRegressor?I try the kerasClassifer but it did not work.

    The error was in grid_result = grid.fit(X, Y)

    ValueError: Found input variables with inconsistent numbers of samples: [10584, 30246]

    My ultimate goal is to predict the next time step or sequence using previous time steps. Thanks.

  192. Avatar
    zeinab July 20, 2019 at 3:30 pm #

    Hi Jason,
    Can I use grid search for selecting the best network (cnn, lstm, gru … )?

    • Avatar
      Jason Brownlee July 21, 2019 at 6:26 am #

      If you have the resources.

      • Avatar
        zeinab July 22, 2019 at 3:32 am #

        sorry, What do you mean by the resources? Can you give me an example?

        • Avatar
          Jason Brownlee July 22, 2019 at 8:27 am #

          Compute resources – e.g. time and access to big machines.

          • Avatar
            Chris Connolly October 23, 2019 at 1:28 am #

            Just a note, it’s easy enough to signup to Google Colab which allows you to run your jupyter code on some pretty powerful machines. There’s a little config that allows you to take advantage of using the GPU. Very useful for deep learning when you don’t have the compute resources available.

          • Avatar
            Jason Brownlee October 23, 2019 at 6:53 am #

            Thanks for the tip.

            Not a fan.

  193. Avatar
    Charlotte Vereecke August 10, 2019 at 12:23 am #

    Hello,
    I would like to use this for chosing the best parameters to predict stock market prices.

    But now I was wondering if the metrics ‘accuracy’ are suited in my ?
    I would rather use mse, but apperently that doesnt work.
    i did some research and found that i can use negative mse, but i don’t understand the outcome then
    I get Best: -0.211220 using {‘batch_size’: 10}

    • Avatar
      Charlotte August 10, 2019 at 1:04 am #

      this is the code i used:

      def create_model():
      model = Sequential()
      model.add(Dense(100, input_dim=1, activation=’relu’))
      model.add(Dense(1, activation=’sigmoid’))
      model.compile(optimizer = ‘adam’, loss = ‘mean_squared_error’, metrics=[“mean_squared_error”])
      return model

      seed = 7
      np.random.seed(seed)

      df = pd.read_csv(“ACKB.BR_LONG”, parse_dates = True, index_col=0)
      print(df.head())

      data = df.values

      data_train, data_test = train_test_split(data, train_size=0.8, test_size=0.2, random_state=1)

      scaler_X_Test = MinMaxScaler()
      scaler_X_Train = MinMaxScaler()
      scaler_Y_Test = MinMaxScaler()
      scaler_Y_Train = MinMaxScaler()

      X_train = data_train[:, 3]
      X_train = X_train.reshape(-1,1)
      X_train = scaler_X_Train.fit_transform(X_train)

      X_test = data_test[:, 3]
      X_test = X_test.reshape(-1,1)
      X_test = scaler_X_Test.fit_transform(X_test)

      Y_train = data_train[:, 6]
      Y_train = Y_train.reshape(-1,1)
      Y_train = scaler_Y_Train.fit_transform(Y_train)

      Y_test = data_test[:, 6]
      Y_test = Y_test.reshape(-1,1)
      Y_test = scaler_Y_Test.fit_transform(Y_test)

      model = KerasClassifier(build_fn=create_model, epochs=10)

      batch_size = [10, 20, 40, 60, 80, 100]
      epochs = [10, 50, 100]

      param_grid = dict(batch_size=batch_size)
      grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, scoring= ‘neg_mean_squared_error’)
      grid_result = grid.fit(X_train, Y_train)

      print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))

    • Avatar
      Jason Brownlee August 10, 2019 at 7:21 am #

      Generally you cannot predict stock prices:
      https://machinelearningmastery.com/faq/single-faq/can-you-help-me-with-machine-learning-for-finance-or-the-stock-market

      Additionally, classification is not appropriate for regression problems, instead you must calculate error:
      https://machinelearningmastery.com/faq/single-faq/how-do-i-calculate-accuracy-for-regression

      I hope that helps.

      • Avatar
        Charlotte August 10, 2019 at 10:40 pm #

        So you mean I can not use gridsearch at all?

        • Avatar
          Jason Brownlee August 11, 2019 at 5:57 am #

          You can grid search regression models. Perhaps re-read my previous comment, see you missed my point 🙂

        • Avatar
          Charlotte August 11, 2019 at 6:36 am #

          Hello

          I changed my code to this

          model = KerasRegressor(build_fn=create_model, epochs=10)

          batch_size = [10, 20, 40, 60, 80, 100]
          epochs = [10, 50, 100]

          param_grid = dict(batch_size=batch_size, epochs=epochs)
          grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, scoring=’neg_mean_squared_error’)
          grid_result = grid.fit(X_train, Y_train)

          but i still get a negative scoring
          I don’t understand Why, because when I programme 1 modle without using gridsearch, i get a positive mse

          Can you please help me

  194. Avatar
    Prem August 21, 2019 at 9:35 am #

    Hi Jason,
    May I know which way is better,

    -Tune each hyperparameter individually and find optimum as you explained, then put together to build the final model

    – Tune all parameters together to build the model

    Second Takes longer time than first

    Thanks
    Prem

    • Avatar
      Jason Brownlee August 21, 2019 at 2:06 pm #

      Typically all parameters together is preferred, but if the search space is large, we can sacrifice some purity and test subsets of params, or even just the most important parameter, such as learning rate, then other parameters.

      • Avatar
        Prem August 21, 2019 at 3:54 pm #

        Thanks Jason

  195. Avatar
    ABHIJEET NAYAK August 24, 2019 at 3:27 am #

    Hey Jason,

    First of all, Thanks for sharing such detailed and balanced post. I think all most 90% of the time your post are recommended by google on searching anything related to “Deep Learning”.

    I have one query though, like you have said in this post:
    “This is not the best way to grid search because parameters can interact, but it is good
    for demonstration purposes”.

    So, I tried to interact most of the parameters and run a grid search but its taking hell of a time to run. Its not showing errors also, can you please give a look at the codes and let me know if I have made some errors? Or is there any way to make it run fast?

    Thanks Again!!

    from keras.layers import Dropout
    def create_model(learn_rate=0.01, momentum=0, dropout_rate=0.0, neurons=1):
    # create model
    model = Sequential()
    model.add(Dense(neurons, input_dim=18, kernel_initializer=”he_normal”, activation=’relu’))
    model.add(Dense(neurons, kernel_initializer=”he_normal”, activation=’relu’))
    model.add(Dropout(dropout_rate))
    model.add(Dense(neurons, kernel_initializer=”he_normal”, activation=’relu’))
    model.add(Dropout(dropout_rate))
    model.add(Dense(4, kernel_initializer=”he_normal”, activation=’softmax’))
    # Compile model
    model.compile(loss=’sparse_categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
    return model
    # fix random seed for reproducibility
    seed = 7
    np.random.seed(seed)

    X = x_train2
    Y = y_train1
    # create model
    model = KerasClassifier(build_fn=create_model, verbose=0)
    # define the grid search parameters
    learn_rate = [0.001, 0.01, 0.1, 0.2, 0.3]
    momentum = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]
    batch_size = [10, 20, 40, 60, 80, 100]
    epochs = [2,5,10]
    dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
    neurons = [1, 5, 10, 15, 20, 25, 30]

    #Grid_Search
    param_grid = dict(batch_size=batch_size, epochs=epochs, learn_rate=learn_rate, momentum=momentum, dropout_rate=dropout_rate, neurons=neurons)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
    grid_result = grid.fit(X, Y)

    # summarize results
    print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
    means = grid_result.cv_results_[‘mean_test_score’]
    stds = grid_result.cv_results_[‘std_test_score’]
    params = grid_result.cv_results_[‘params’]
    for mean, stdev, param in zip(means, stds, params):
    print(“%f (%f) with: %r” % (mean, stdev, param))

  196. Avatar
    Suraj Pawar September 1, 2019 at 3:03 am #

    What is the default scoring metrics that KerasClassifier uses? is it accuracy?

    • Avatar
      Jason Brownlee September 1, 2019 at 5:45 am #

      It might be accuracy.

      It’s probably a good idea to always specify a metric.

  197. Avatar
    John White September 3, 2019 at 2:11 pm #

    Hi Jason, awesome tutorial! I have a conceptual question:

    Even if we do find the best model after tuning, the weights will be different, yielding different models and results. So the best model for this maybe wouldn’t be the best if we compiled and ran it again with the “best parameters”. If we seed the weights with the parameters for reproducibility, we don’t know if those would be the best weights. On the other hand, if we tune the weights, then the “best parameters” won’t be best parameters anymore? I am stuck in a loop. Is there a general guideline on what parameters to tune first as opposed to others?

    Or is this whole logic flawed somewhere and I am way overthinking? Thanks for your time!

    • Avatar
      Jason Brownlee September 4, 2019 at 5:55 am #

      Thanks.

      Yes, this is why we try to find the best model on average, over multiple CV runs.

      You can also use techniques to reduce the variance of the final model, e.g. ensembles.

  198. Avatar
    Jack September 4, 2019 at 12:09 pm #

    Hi Jason,
    I would like to ask you a question. I used Keras+Gridsearchcv to adjust parameters of Convlstm network. Before, my input was [1000,33,1,11,8], and the output was [1000,11,33].Grid search can be normally performed, but I changed the input into [1000*33,1,1,11,8], and the output was [1000*33,11,1]. After the training of the first parameter combination, the error ‘Found array with dim 3. Estimator expectation <= 2' appeared.I want to know how to solve.Thx

    • Avatar
      Jason Brownlee September 4, 2019 at 1:44 pm #

      Perhaps try a manual grid search instead? I’m not sure sklearn supports 3d input data.

      • Avatar
        Jack September 4, 2019 at 7:40 pm #

        Oddly enough, I removed the scoring of parameters from the GridSearchCV, allowing grid search to directly inherit the ‘rmse’ in model.compile, which seems to solve the problem.I don’t know if my operation is correct

        • Avatar
          Jack September 4, 2019 at 7:41 pm #

          Correct it, it’s ‘mse’ in model.compile

        • Avatar
          Jason Brownlee September 5, 2019 at 6:51 am #

          Interesting. Thanks for sharing.

  199. Avatar
    Anirban September 6, 2019 at 7:32 pm #

    Hi Jason,
    Thanks for this really helpful tutorial.
    While running the following code in google colab I am getting this error

    “/usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:1978: FutureWarning: The default value of cv will change from 3 to 5 in version 0.22. Specify it explicitly to silence this warning.
    warnings.warn(CV_WARNING, FutureWarning)”

    here is the code

    # ———————————————
    # define the model
    # ———————————————
    def create_model( learn_rate=0.01, momentum=0, dropout_rate=0.0, weight_constraint=0, epochs = 10, verbose=2):
    model = Sequential()
    model.add(LSTM(50, input_shape=(1000,6),return_sequences = True))
    model.add(LSTM(50, return_sequences = True))
    model.add(LSTM(50, return_sequences = True))
    model.add(Dense(1))
    # Compile model
    adam=keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
    model.compile(loss=’mean_squared_error’, optimizer=’adam’, metrics=[‘accuracy’])
    return model

    # fix random seed for reproducibility
    seed = 7
    np.random.seed(seed)

    # ————————
    # create model
    # ————————
    model = KerasRegressor(build_fn=create_model, verbose=0)
    # define the grid search parameters
    #batch_size = [10 , 30]
    #epochs = [10 , 20]
    learn_rate = [0.001, 0.01]
    dropout_rate = [0.0, 0.2]
    # —————————————-
    # grid search
    # —————————————-
    param_grid = dict(batch_size=batch_size, epochs=epochs, learn_rate=learn_rate, dropout_rate=dropout_rate)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
    grid_result = grid.fit(input_matrix, output_matrix,)

    model.summary()
    plt.figure(figsize=(12,6))
    plt.plot(grid_result.history[‘loss’], label=’train’)
    plt.legend()
    plt.show()

    # ————————————
    # summarize results
    # ————————————
    print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
    means = grid_result.cv_results_[‘mean_test_score’]
    stds = grid_result.cv_results_[‘std_test_score’]
    params = grid_result.cv_results_[‘params’]
    for mean, stdev, param in zip(means, stds, params):
    print(“%f (%f) with: %r” % (mean, stdev, param))

    if you can help me out with this, it would be greatly appreciated.
    Thanks
    Regards
    Anirban

  200. Avatar
    mustafa mohammed September 8, 2019 at 6:48 am #

    hello Jason Brownlee
    from keras.models import Sequential
    from keras.layers import Dense
    from keras.wrappers.scikit_learn import KerasClassifier
    def create_model():
    model = Sequential()
    model.add(LSTM(100, input_shape=(train_X.shape[1], train_X.shape[2])))#kernel_initializer=’uniform’ kernel_constraint=min_max_norm(min_value=s.all(),max_value=d.all()
    model.add(Dropout(0.2))
    model.add(Dense(1))#, activation=’sigmoid’))
    #model.add(Activation(‘sigmoid’))
    #layer.get_weights()
    #weight = model.get_weights()
    #np.savetxt(‘f:\\weight.csv’ , weight , fmt=’%s’, delimiter=’,’)
    #model.get_layer(0).set_weights(y, r)
    model.compile(loss=’mae’, optimizer=’adam’,metrics=[‘accuracy’])#mean_squared_error
    # create model
    return model

    # Fit the model
    #history = model.fit(train_X, train_y, epochs=150,validation_data=(test_X, test_y),batch_size=24,verbose=2,shuffle=False)

    #pyplot.plot(history.history[‘loss’], label=’train’)
    #pyplot.plot(history.history[‘val_loss’], label=’test’)
    #pyplot.legend()
    #pyplot.show()

    # fix random seed for reproducibility
    seed = 7
    numpy.random.seed(seed)
    # load dataset

    # split into input (X) and output (Y) variables
    X = train_X#dataset.iloc[:,0:6]

    Y = train[:, -1]#dataset.iloc[:,]
    # create model
    model = KerasClassifier(build_fn=create_model, verbose=0)
    # define the grid search parameters
    batch_size = [24, 48 ,40, 60, 80, 100]
    epochs = [10, 50, 100]
    param_grid = dict(batch_size=batch_size, epochs=epochs)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
    grid_result = grid.fit(X, Y)
    # summarize results
    print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
    print (“dhdh”,grid_result.best_score_ )
    print (“dhbbbbbbbbdh”,grid_result.best_params_ )

    means = grid_result.cv_results_[‘mean_test_score’]
    stds = grid_result.cv_results_[‘std_test_score’]
    params = grid_result.cv_results_[‘params’]
    for mean, stdev, param in zip(means, stds, params):
    print(“%f (%f) with: %r” % (mean, stdev, param))
    history = model.fit(train_X, train_y, epochs=150,validation_data=(test_X, test_y),batch_size=batch_size,verbose=2,shuffle=False)

    —> 56 history = model.fit(train_X, train_y, epochs=150,validation_data=(test_X, test_y),batch_size=batch_size,verbose=2,shuffle=False)

    TypeError: unsupported operand type(s) for +: ‘int’ and ‘list’

    What caused this error؟

  201. Avatar
    mustafa mohammed September 9, 2019 at 7:44 pm #

    I am very tired in shred the solution
    How to add initial weights from a csv file and bias from a csv file to an LSTM network of type regression. Note that the input consists of 6 nodes, one hidden layer of 100 knots and one node output.
    I hope you help me

    • Avatar
      Jason Brownlee September 10, 2019 at 5:44 am #

      No problem:

      1. You can load your csv file as a numpy array.
      2. Then define your model in keras.
      3. Then reshape the weights into the required format for each layer.
      4. Then call set_weights() on each layer with your weights.

      To discover the expected shape for weight arrays in each layer, use layer.get_weights() and check the size attribute.

      If this is a challenge, perhaps post to stackoverflow or hire a freelancer?

      • Avatar
        mustafa mohammed September 10, 2019 at 9:26 pm #

        Can you give me the code?
        Note that the reshape of the input layer are (samples , timestep , feature)

        • Avatar
          Jason Brownlee September 11, 2019 at 5:35 am #

          Sorry, I don’t have the capacity to prepare custom code for you.

          Perhaps hire a freelance programmer?

  202. Avatar
    kadar September 13, 2019 at 8:57 pm #

    Hi, its a very good explanation and code i found. but when i tried it, i got errors for every parameter tuning saying that its not a legal parameter. can i know why is this happening.

    1.
    neurons = [1, 5, 10, 15, 20, 25, 30]
    param_grid = dict(neurons=neurons)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
    grid_result = grid.fit(x, y)

    Errror: neurons is not a legal parameter.

    2.
    learn_rate = [0.001, 0.01, 0.1, 0.2, 0.3]
    momentum = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]
    param_grid = dict(learn_rate=learn_rate, momentum=momentum)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
    grid_result = grid.fit(x, y)

    Errror: learn_rate is not a legal parameter.

    Same with every parameter happened. can you plz help in this.

    • Avatar
      Jason Brownlee September 14, 2019 at 6:17 am #

      Sorry to hear that, did you try coping the complete example?

  203. Avatar
    Sahil September 15, 2019 at 1:37 am #

    Hi Jason,

    I get the below error at grid.fit() while running the code for a project on google colab:

    PicklingError: Could not pickle the task to send it to the workers.

    What could be the reason for this?

    • Avatar
      Jason Brownlee September 15, 2019 at 6:23 am #

      Perhaps try running the example on your workstation from the command line?

  204. Avatar
    Naresh Kumar September 20, 2019 at 9:17 pm #

    Thanks Jason for the amazing blog. But I have couple of questions

    1. How do I interpret the mean and standard deviation for my results. Let’s say if I use hyper-parameter the network for activation function so I get best activation like ‘relu’ which gives mean and standard deviation.

    2. Should I hyper-parameter the parameters for the AUTO ENCODER as well as I don’t know best parameter for the network ?

  205. Avatar
    Pooria October 2, 2019 at 8:44 pm #

    Dear Jason thanks a lot for your wonderful blog I learned a lot of things here.
    Unfortunately,I have a small problem, when I am trying to use grid search I would be grateful if you could help me,
    The problem is that when I want to use the grid search for instance optimizer tuning, I get
    0.165123 (0.233519) with: {‘optimizer’: ‘SGD’}
    0.165123 (0.233519) with: {‘optimizer’: ‘RMSprop’}
    0.165123 (0.233519) with: {‘optimizer’: ‘Adagrad’}
    0.165123 (0.233519) with: {‘optimizer’: ‘Adadelta’}
    0.165123 (0.233519) with: {‘optimizer’: ‘Adam’}
    0.165123 (0.233519) with: {‘optimizer’: ‘Adamax’}
    0.165123 (0.233519) with: {‘optimizer’: ‘Nadam’}

    the same numbers for every optimizer but when I try them in a network with same structure (same number of layer and neuron and changing the optimizer manually) and compared the results I get different results for instance ADAM is way better than every others but according to grid search all of the optimizer make the same error and I don’t know what is happening here!
    one point to add is that I have already checked the training and testing data set and they are correct.

    • Avatar
      Jason Brownlee October 3, 2019 at 6:46 am #

      The results suggest tuning the optimizer might not be useful, perhaps try the learning rate or model capacity.

  206. Avatar
    Prem October 4, 2019 at 10:04 am #

    Hi Jason,

    Can you publish a paper for tensorflow keras with Scikit-Learn in Python along with Grid Search Hyperparameters for similar diabetes dataset please.

    Thanks

  207. Avatar
    shahad October 20, 2019 at 11:15 am #

    *How to Tune Batch Size and Number of Epochs*

    on this step “grid_result = grid.fit(X, Y)” I get this error

    BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

    how can I fix it 🙁 🙁

  208. Avatar
    poya October 25, 2019 at 3:37 am #

    Hi Jason, thanks for you awesome post,
    I was wondering if you could by any chance can help me chose loss instead of accuracy to be optimized in the grid search. I mean which section I should change?

    Thanks.

    • Avatar
      Jason Brownlee October 25, 2019 at 6:49 am #

      Great question!

      On the GridSearchCV set the “scoring” argument to one of these:
      https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

      Or run the grid search without sklearn – manually and tune the keras result directly.

      • Avatar
        amir November 5, 2019 at 1:19 am #

        hi, I have the same question,

        so I did what you suggested to this guy, but when I change the scoring to ‘neg_mean_squared_error’ from the link that you suggested I expect to get values which are close to my loss value but they are not!
        loss: 4.2473e-05
        Best: -0.039595 using {‘neurons’: 10}
        am I doing something wrong here?
        and also for the second suggestion can you explain a bit more?

        thanks

        • Avatar
          Jason Brownlee November 5, 2019 at 6:56 am #

          I meant that you can write a for loop and fit and evaluate a model manually for each config.

          • Avatar
            amir November 9, 2019 at 2:46 am #

            actually I am relatively new to python so I was wondering if you could again put me in the right direction

            so I tried this one, it kinda works but it gives me wrong results. so am I doing it the right way(I mean the for loops)?
            Thanks

          • Avatar
            Jason Brownlee November 9, 2019 at 6:17 am #

            I believe I have examples on the blog you can use as a starting point, for example:
            https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/

  209. Avatar
    SUBHANKAR BHATTACHARYA October 30, 2019 at 3:39 am #

    Hi Jason,

    I see in cross_val_score or in GridSearchCV , you have used n_jobs = -1 which means all cores of the CPU has to be used.

    However, if I intend to use this on GPU, this parameter has to be set to default (NONE or 1).
    I find the GPU training to be awefully slower than what is happening with CPU ..

    I have successfully installed tensorflow-gpu with all needed CUDA, CUDNN libraries and I am quite confused with the performance…

  210. Avatar
    pooria October 30, 2019 at 8:52 am #

    Hi Jason, thanks for answering all comments/questions.

    I have a major problem here that I don’t understand at all:

    I used the Grid search using the following structure for instance just for tuning the epochs:

    def create_model(neurons=50):
    # create model
    model=Sequential()

    model.add(Dense(neurons,activation=’relu’,input_shape=(n_cols,),kernel_initializer=’uniform’))
    model.add(Dense(neurons,activation=’relu’,kernel_initializer=’uniform’))
    model.add(Dense(neurons,activation=’relu’,kernel_initializer=’uniform’))
    model.add(Dense(neurons,activation=’relu’,kernel_initializer=’uniform’))
    model.add(Dense(1,activation=’relu’,kernel_initializer=’uniform’))
    # Compile model
    model.compile(loss=’mean_squared_error’, optimizer=’adam’)
    return model

    model = KerasClassifier(build_fn=create_model, epochs=50, batch_size=15, verbose=2)
    neurons = [10,90]
    param_grid = dict(neurons=neurons)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1,cv=3)
    grid_result = grid.fit(X, Y)
    # summarize results
    print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
    means = grid_result.cv_results_[‘mean_test_score’]
    stds = grid_result.cv_results_[‘std_test_score’]
    params = grid_result.cv_results_[‘params’]
    for mean, stdev, param in zip(means, stds, params):
    print(“%f (%f) with: %r” % (mean, stdev, param))

    and I am getting really ridiculous results such as:
    loss: 892786.6804
    which is irrelevant because I am just trying to tune epochs(that the smallest loss, for some folds I get even bigger errors),
    and then when I just try Kfold with same structure I get reasonable loss:3.9826e-06
    I am using the following structure:
    def test_model():

    model=Sequential()

    model.add(Dense(50,activation=’relu’,input_shape=(n_cols,)))

    model.add(Dense(50,activation=’relu’))
    model.add(Dense(50,activation=’relu’))
    model.add(Dense(50,activation=’relu’))

    model.add(Dense(1,activation=’relu’))

    model.compile(optimizer=’adam’,loss=’mean_squared_error’)

    return model

    estimators = []
    estimators.append((‘standardize’, StandardScaler()))
    estimators.append((‘mlp’, KerasRegressor(build_fn=test_model, epochs=50, batch_size=15, verbose=2)))
    pipeline = Pipeline(estimators)
    kfold = KFold(n_splits=10)
    results = cross_val_score(pipeline, X, Y, cv=kfold)

    it is driving me crazy can you help me out please? I don’t understand whats wrong I used both samples from you blog and just changed the name of variables to my inputs.

    Thanks in advance.

  211. Avatar
    Tom November 3, 2019 at 9:53 pm #

    Hi,
    Thanks for the great post, good to see the comments are still active 🙂

    I have a question regarding the score during training.

    When I’m training my CNN without grid search, I train it for 10 epochs with a validation set and use a ModelCheckpoint callback to save the weights with the best validation accuracy/loss (which usually occurs at an earlier epoch than 10, but not always the same one).

    I would like to do something similar with grid search: to keep the number of epochs set to 10, and perform grid search on other hyper-parameters, with the score used for the search being the best validation accuracy/loss during the training (and not the score after the last epoch).

    I couldn’t find a straightforward way to do this, do you know of one?

    Thanks!

    • Avatar
      Jason Brownlee November 4, 2019 at 6:41 am #

      You might want to grid search the model manually rather than use the sklearn grid search functionality. Just to give you more control over things like the checkpoint.

  212. Avatar
    Osman November 4, 2019 at 7:29 am #

    Hello,
    thanks for the great post. I could learn a lot about deep learning from this post.

    But I have an issue with my model, It works very fine when n_jobs =1 but it takes forever to get it done but when set to n_jobs = -1 it computes really fast but I get this error -[BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.] I get this error when i run this command

    grid_result = grid.fit(ip_train1, op_train1)

    I’m working on windows os and using gpus cuda cores.

    this is my model

    def create_model():
    model = Sequential()
    model.add(Dense(27, input_dim =54, activation=’relu’))
    model.add(Dense(14, activation=’relu’))
    model.add(Dense(7, activation=’softmax’))

    model.compile(loss=’categorical_crossentropy’ ,optimizer=’adam’ ,metrics=[‘accuracy’])
    return model
    model = KerasClassifier(build_fn=create_model, verbose=0)
    # define the grid search parameters
    batch_size = [10, 20, 40, 60, 80, 100]
    epochs = [10, 50, 100]
    param_grid = dict(batch_size=batch_size, epochs=epochs)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs= -1, cv=3, verbose=100)
    grid_result = grid.fit(ip_train1, op_train1)

    and I get this error

    Fitting 3 folds for each of 18 candidates, totalling 54 fits
    [Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(54, 95564), dtype=int32) to new file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(95564, 7), dtype=float32) to new file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
    Pickling array (shape=(63709,), dtype=int32).
    Pickling array (shape=(31855,), dtype=int32).
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
    Pickling array (shape=(63709,), dtype=int32).
    Pickling array (shape=(31855,), dtype=int32).
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
    Pickling array (shape=(63710,), dtype=int32).
    Pickling array (shape=(31854,), dtype=int32).
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
    Pickling array (shape=(63709,), dtype=int32).
    Pickling array (shape=(31855,), dtype=int32).
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
    Pickling array (shape=(63709,), dtype=int32).
    Pickling array (shape=(31855,), dtype=int32).
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
    Pickling array (shape=(63710,), dtype=int32).
    Pickling array (shape=(31854,), dtype=int32).
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
    Pickling array (shape=(63709,), dtype=int32).
    Pickling array (shape=(31855,), dtype=int32).
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
    Pickling array (shape=(63709,), dtype=int32).
    Pickling array (shape=(31855,), dtype=int32).
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
    Pickling array (shape=(63710,), dtype=int32).
    Pickling array (shape=(31854,), dtype=int32).
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
    Pickling array (shape=(63709,), dtype=int32).
    Pickling array (shape=(31855,), dtype=int32).
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
    Pickling array (shape=(63709,), dtype=int32).
    Pickling array (shape=(31855,), dtype=int32).
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
    Pickling array (shape=(63710,), dtype=int32).
    Pickling array (shape=(31854,), dtype=int32).
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
    Pickling array (shape=(54,), dtype=object).
    Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
    Pickling array (shape=(63709,), dtype=int32).
    Pickling array (shape=(31855,), dtype=int32).
    [Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 3 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 5 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 6 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 7 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 8 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 10 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 11 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 12 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 13 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 14 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 15 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 16 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 17 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 18 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 19 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 20 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 21 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 22 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 23 tasks | elapsed: 2.8s
    [Parallel(n_jobs=-1)]: Done 24 tasks | elapsed: 2.8s
    Traceback (most recent call last):

    File “”, line 1, in
    grid_result = grid.fit(ip_train1, op_train1)

    File “C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py”, line 688, in fit
    self._run_search(evaluate_candidates)

    File “C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py”, line 1149, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))

    File “C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py”, line 667, in evaluate_candidates
    cv.split(X, y, groups)))

    File “C:\ProgramData\Anaconda3\lib\site-packages\joblib\parallel.py”, line 934, in __call__
    self.retrieve()

    File “C:\ProgramData\Anaconda3\lib\site-packages\joblib\parallel.py”, line 833, in retrieve
    self._output.extend(job.get(timeout=self.timeout))

    File “C:\ProgramData\Anaconda3\lib\site-packages\joblib\_parallel_backends.py”, line 521, in wrap_future_result
    return future.result(timeout=timeout)

    File “C:\ProgramData\Anaconda3\lib\concurrent\futures\_base.py”, line 432, in result
    return self.__get_result()

    File “C:\ProgramData\Anaconda3\lib\concurrent\futures\_base.py”, line 384, in __get_result
    raise self._exception

    BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

    how to resolve this problem?

    • Avatar
      Jason Brownlee November 4, 2019 at 1:29 pm #

      Perhaps set the number of jobs to 1, and let Keras/TensorFlow have access to all other cores?

  213. Avatar
    sukhpal December 16, 2019 at 10:52 am #

    Optimising epochs – May be epoch does not need to be optimised as the model training could be stopped when the validation loss plateaus, which may happen at different epoch depending on the optimiser and or the dataset used.
    The above may justify why the SGD optimiser might not have converged at 200 epochs, which was selected by the GSO. please explain why to optimise epoch.

  214. Avatar
    sukhpal December 16, 2019 at 10:59 am #

    we will look at tuning the number of neurons in a single hidden layer. We will try values from 1 to 30 in steps of 5.sir, Please, clarify what do you mean here by the resulting values ranging from 1 to 30?

  215. Avatar
    Sukhpal December 17, 2019 at 1:50 am #

    Wh grid search optimization is done in stages

    • Avatar
      Jason Brownlee December 17, 2019 at 6:36 am #

      You can do all at once if you like. I break it down to make it easier for beginners to understand what is going on.

  216. Avatar
    kanda December 17, 2019 at 4:36 am #

    Hi Mr. Brownlee and thank u for all your tutorials!

    I’m runing gridsearchCV from sklearn to try to find the best model parameters following this tutorial. The gridsearch best score was 0.8404 (R2_score). however, I can’t reach this R2 accruacy again at all trying prediction on test igot 0.4696 and on the train data I get 0.7521: using:

    p=gridCNN.best_estimator_.predict(xtest)
    r2_score(np.asarray(ytest).ravel(), p)

    also i tried to rebuild the model using the best parameters i got 0.6764 of accruacy

    So how can I reach again the gridsearch accruacy ( 0.8404) noting that i noting that I eliminated the cross validation using this during the grid search. So I should have the same accruacy!!:

    cv=ShuffleSplit(1, test_size=0.2, random_state=584)

    and thank you in advance

    • Avatar
      kanda December 17, 2019 at 4:37 am #

      here is the problem detailed : https://stackoverflow.com/questions/59349364/getting-low-accruacy-than-the-gridsearchcv

    • Avatar
      Jason Brownlee December 17, 2019 at 6:39 am #

      You’re welcome.

      It is possible that the grid search evaluation was optimistic. Perhaps change the grid search cv config to make it more robust?

      No, recall that we are estimating the performance of the model on unseen data, there will be noise in that estimate. More repeats/folds will give a more robust mean estimate of performance.

      • Avatar
        kanda December 17, 2019 at 11:44 am #

        thank you for your fast reply !!! I didn’t really get what you want to say !! What should I do please !!

        • Avatar
          Jason Brownlee December 17, 2019 at 1:37 pm #

          Increase the repeats (and maybe folds) of your cross-validation process to better estimate the mean performance.

  217. Avatar
    sukhpal December 22, 2019 at 12:52 pm #

    Loss minimisation is important in determining how best a deep learning model will perform. mentioned the selected the best parameters by recording accuracies . Could you please justify why use accuracies instead of the loss to determine best model parameters?
    observed the validation loss and terminate training to avoid overfitting. Can you please clarify these contradicting statements?

    • Avatar
      Jason Brownlee December 23, 2019 at 6:44 am #

      It’s just an example of how to use the API, you can use any metric you like.

  218. Avatar
    sukhpal December 28, 2019 at 7:11 pm #

    sir is this possible that i apply grid search on my input dataset and apply validation on another dataset of optimized model after grid search on input data

  219. Avatar
    kamal January 1, 2020 at 10:33 pm #

    This sentence is not clear (The resulting values ranging from 1 to 30 can be utilized as a number of neurons in steps of 5). Please, clarify what do you mean here by the resulting values ranging from 1 to 30? sir as i got revision.please help me to clarify this

  220. Avatar
    Sweta January 10, 2020 at 11:52 pm #

    Hi Jason,
    A very helpful tutorial. Thanks a lot!

    I need to know, now that we have done hyperparameter tuning as above, how does one incorporate them into the actual training code? Do you have an example tutorial for it? I mean where exactly we add it in our actual code below?

    For eg,
    # Layer 1
    model.add(Dense(128, input_dim=806, activation =’relu’))
    model.add(Dropout(0.6))

    #Layer-2
    model.add(Dense(64, activation=’relu’))
    model.add(Dropout(0.6))

    Question 2 is, Do we need to add the Dropout layer after every layer or it is just once? Similarily for other hyperparameters. Do we need to add the hyperparameters all the time in all the layers?

    Thanks!

    • Avatar
      Jason Brownlee January 11, 2020 at 7:26 am #

      Thanks.

      Once you find a config that works well, you can fit a standalone model with that config.

      Test different uses of dropout an use what works best. Typically it is used after each hidden layer.

      • Avatar
        sweta January 11, 2020 at 6:16 pm #

        Thanks for the response.

        I have already found the best config with the above examples. But i need to know how to fit a standalone model with that config? Do you have an example tutorial for it?can you help with this?

  221. Avatar
    George January 13, 2020 at 2:08 pm #

    Hi Jason,
    whats the difference between using ‘softmax’ and ‘categorical_crossentropy’ AND ‘sigmoid’ and ‘binary_crossentropy’.
    Which accuracy matches to accuracy from confusion matrix?
    When each combination to be used?

    • Avatar
      Jason Brownlee January 14, 2020 at 7:13 am #

      Cross entropy is a loss function that can be used for binary or multi-class classification.

      Sigmoid and Softmax are activation functions. Sigmoid is for a binomial probability distribution, softmax is for a multi-class classification.

      Sorry, I don’t follow your final questions, perhaps you van elaborate?

  222. Avatar
    George January 14, 2020 at 10:55 am #

    Got it, Thanks Jason

  223. Avatar
    Surajit Chakraborty January 15, 2020 at 10:24 pm #

    Hi,

    Please find below my code that performs GridSearch along with Cross Validation using sklearn.model_selection.GridSearchCV for the mnist dataset that works perfectly fine.

    x———————–Code Start ———————————–x——————————————————-x

    # Build Function to create model, required by KerasClassifier

    def create_model(optimizer_val=’RMSprop’,hidden_layer_size=16,activation_fn=’relu’,dropout_rate=0.1,regularization_fn=tf.keras.regularizers.l1(0.001),kernel_initializer_fn=tf.keras.initializers.glorot_uniform,bias_initializer_fn=tf.keras.initializers.zeros):

    model = tf.keras.models.Sequential([

    tf.keras.layers.Flatten(input_shape=(28, 28)),

    tf.keras.layers.Dense(units=hidden_layer_size, activation=activation_fn,kernel_regularizer=regularization_fn,kernel_initializer=kernel_initializer_fn,bias_initializer=bias_initializer_fn),

    tf.keras.layers.Dropout(dropout_rate),

    tf.keras.layers.Dense(units=hidden_layer_size,activation=’softmax’,kernel_regularizer=regularization_fn,kernel_initializer=kernel_initializer_fn,bias_initializer=bias_initializer_fn)

    ])

    optimizer_val_final=optimizer_val

    model.compile(optimizer=optimizer_val, loss=’sparse_categorical_crossentropy’, metrics=[‘accuracy’])

    return model

    #Create the model with the wrapper

    model = tf.keras.wrappers.scikit_learn.KerasClassifier(build_fn=create_model, epochs=100, batch_size=10, verbose=2)

    #Initialize the parameter grid

    nn_param_grid = {

    ‘epochs’: [10],

    ‘batch_size’:[128],

    ‘optimizer_val’: [‘Adam’,’SGD’],

    ‘hidden_layer_size’: [128],

    ‘activation_fn’: [‘relu’],

    ‘dropout_rate’: [0.2],

    ‘regularization_fn’:[‘l1′,’l2′,’L1L2’],

    ‘kernel_initializer_fn’:[‘glorot_normal’, ‘glorot_uniform’],

    ‘bias_initializer_fn’:[tf.keras.initializers.zeros]

    }

    #Perform GridSearchCV

    grid = GridSearchCV(estimator=model, param_grid=nn_param_grid, verbose=2, cv=3,scoring=precision_custom,return_train_score=False,n_jobs=-1)

    grid_result = grid.fit(x_train, y_train)

    x———————–Code End ———————————–x——————————————————-x

    My idea is to pass different optimizers with different learning rates , say Adam for learning rates 0.1,0.01 and 0.001. I also want to try out SGD with different learning rates and momentum values.

    In that case , when I pass ‘optimizer_val’: [tf.keras.optimizers.Adam(0.1)], I get the error as given below:

    Cannot clone object , as the constructor either does not set or modifies parameter optimizer_val

    Please advise as to how can I rectify this error.

    Thanks

    Surajit

    • Avatar
      Jason Brownlee January 16, 2020 at 6:16 am #

      I’m happy to answer questions, but I don’t have the capacity to review/debug your code.

      • Avatar
        Surajit Chakraborty January 16, 2020 at 7:01 am #

        Hi,
        Thanks for your reply. I just gave the code for understanding. What I am struggling with is as given below with sklearn GridSearchCV by building only 1 parameter grid.

        My idea is to pass different optimizers with different learning rates , say Adam for learning rates 0.1,0.01 and 0.001. I also want to try out SGD with different learning rates and momentum values.

        Thanks
        Surajit

  224. Avatar
    Surajit Chakraborty January 18, 2020 at 11:42 am #

    Hi,

    Thanks for your reply. Just a quick question. Can you help me with use cases as when to prefer adaptive learning rate optimizers like Adam and when to opt for SGD ?

    Thanks
    Surajit

    • Avatar
      Jason Brownlee January 19, 2020 at 7:08 am #

      When you reach the limit of SGD try adaptive methods and see if they can do better.

      Or, use adaptive methods first to get a good result fast, then see if you can do better manually.

  225. Avatar
    abbas January 23, 2020 at 3:49 am #

    thanks jason for always posting best things ever.
    i want to know that can i run all these all it once??

    batch_size = [10, 20, 40, 60, 80, 100]
    epochs = [10, 50, 100]
    optimizer = [‘SGD’, ‘RMSprop’, ‘Adagrad’, ‘Adadelta’, ‘Adam’, ‘Adamax’, ‘Nadam’]
    learn_rate = [0.001, 0.01, 0.1, 0.2, 0.3]
    momentum = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]
    activation = [‘softmax’, ‘softplus’, ‘softsign’, ‘relu’, ‘tanh’, ‘sigmoid’, ‘hard_sigmoid’, ‘linear’]
    dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
    neurons = [1, 5, 10, 15, 20, 25, 30]
    init_mode = [‘uniform’, ‘lecun_uniform’, ‘normal’, ‘zero’, ‘glorot_normal’, ‘glorot_uniform’, ‘he_normal’, ‘he_uniform’]

    • Avatar
      Jason Brownlee January 23, 2020 at 6:41 am #

      Perhaps, but it will be slow and most of the configs would be a waste of time to test.

  226. Avatar
    Raman January 27, 2020 at 12:01 pm #

    Thanks Jason, this is a really good tutorial.
    Could you please help me I am working on a regression problem using a deep learning, however getting a below error. Essentially I am trying gridsearch for regression problem

    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3, scoring=’neg_mean_absolute_error’)

    TypeError: Cannot clone object ” (type ): it does not seem to be a scikit-learn estimator as it does not implement a ‘get_params’ methods.

    • Avatar
      Jason Brownlee January 27, 2020 at 2:33 pm #

      It might suggestion you are trying to grid search a non sklearn model.

  227. Avatar
    Jerry February 9, 2020 at 11:46 am #

    Hey Jason great Tutorial,
    Can you make another one using randomized search as the literature states that is more efficient!

  228. Avatar
    Randa February 16, 2020 at 11:03 pm #

    Hello,
    what is the solution with this error?
    TypeError: If no scoring is specified, the estimator passed should have a ‘score’ method. The estimator does not.

    Thank you

    • Avatar
      Jason Brownlee February 17, 2020 at 7:48 am #

      You need to specify the “scoring” argument to the grid search.

  229. Avatar
    aggelos papoutsis February 25, 2020 at 5:23 pm #

    Hi jason,

    can i use train/test split with your examples above?

    so instead to have grid_result = grid.fit(X, Y)

    change to grid_result = grid.fit(x_train, y_train)

    and then test withx_test, y_test

    • Avatar
      Jason Brownlee February 26, 2020 at 8:15 am #

      The sklearn grid search only use cross-validation, not train-test splits.

      You will have to grid search manually, I believe.

      • Avatar
        AGGELOS PAPOUTSIS February 26, 2020 at 4:59 pm #

        ok thank you i see. so grid search allows only to estimate the best parameters and then you can run another experiment and use all the other things like confusion matrix etc.

  230. Avatar
    debmalya March 4, 2020 at 1:04 am #

    model = Sequential()
    model.add(Dense(128, activation=’relu’, input_dim=n_input_1))
    #model.add(Dense(50, activation=’relu’))
    #model.add(Dense(25, activation=’relu’))
    model.add(Dense(1))
    model.compile(optimizer=’adam’, loss=’mse’)

    seed = 7
    np.random.seed(seed)

    # define the grid search parameters
    batch_size = [10, 20, 40, 60, 80, 100]
    epochs = [10, 50, 100]
    param_grid = dict(batch_size=batch_size, epochs=epochs)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
    grid_result = grid.fit(scaled_X, y)
    # summarize results
    print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
    means = grid_result.cv_results_[‘mean_test_score’]
    stds = grid_result.cv_results_[‘std_test_score’]
    params = grid_result.cv_results_[‘params’]
    for mean, stdev, param in zip(means, stds, params):
    print(“%f (%f) with: %r” % (mean, stdev, param))

    this is giving the error-

    TypeError Traceback (most recent call last)
    in
    11 param_grid = dict(batch_size=batch_size, epochs=epochs)
    12 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
    —> 13 grid_result = grid.fit(scaled_X, y)
    14 # summarize results
    15 print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
    607
    608 scorers, self.multimetric_ = _check_multimetric_scoring(
    –> 609 self.estimator, scoring=self.scoring)
    610
    611 if self.multimetric_:

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\metrics\scorer.py in _check_multimetric_scoring(estimator, scoring)
    340 if callable(scoring) or scoring is None or isinstance(scoring,
    341 str):
    –> 342 scorers = {“score”: check_scoring(estimator, scoring=scoring)}
    343 return scorers, False
    344 else:

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\metrics\scorer.py in check_scoring(estimator, scoring, allow_none)
    293 “If no scoring is specified, the estimator passed should ”
    294 “have a ‘score’ method. The estimator %r does not.”
    –> 295 % estimator)
    296 elif isinstance(scoring, Iterable):
    297 raise ValueError(“For evaluating multiple scores, use ”

    TypeError: If no scoring is specified, the estimator passed should have a ‘score’ method. The estimator does not.

  231. Avatar
    debmalya March 12, 2020 at 3:20 pm #

    Hi

    I used this model-

    model = Sequential()
    model.add(Dense(128, activation=’relu’, input_dim=n_input))
    model.add(Dense(1))
    model.compile(optimizer=’adam’, loss=’mse’,metrics=[‘mse’])

    This is a MLP for time series forecasting. Now when I am doing hyper parameter tuning I am getting this error-

    TypeError: If no scoring is specified, the estimator passed should have a ‘score’ method. The estimator does not.

    on this line-

    grid_result = grid.fit(scaled_X, y)

    Thanks in advance for your reply!

  232. Avatar
    ahmed March 12, 2020 at 9:03 pm #

    Thanks for this post.

    Can you please tell me that how can we do grid search if we have dataset in the form of images?

    • Avatar
      Jason Brownlee March 13, 2020 at 8:14 am #

      Manually, with for-loops over configs you want to test.

  233. Avatar
    debmalya March 14, 2020 at 1:08 am #

    Hi

    I am using MLP for time series forecasting and I wanted to do grid search for hyper parameter tuning-

    from sklearn.model_selection import GridSearchCV
    # fix random seed for reproducibility
    seed = 7
    np.random.seed(seed)

    # define the grid search parameters
    batch_size = [10, 20, 40, 60, 80, 100]
    epochs = [10, 50, 100]
    param_grid = dict(batch_size=batch_size, epochs=epochs)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3,scoring=’neg_mean_absolute_error’)

    grid_result = grid.fit(scaled_train,y_train_c)
    # summarize results
    print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
    means = grid_result.cv_results_[‘mean_test_score’]
    stds = grid_result.cv_results_[‘std_test_score’]
    params = grid_result.cv_results_[‘params’]
    for mean, stdev, param in zip(means, stds, params):
    print(“%f (%f) with: %r” % (mean, stdev, param))

    and I got this error-

    TypeError Traceback (most recent call last)
    in
    9 param_grid = dict(batch_size=batch_size, epochs=epochs)
    10 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3,scoring=’neg_mean_absolute_error’)
    —> 11 grid_result = grid.fit(scaled_train,y_train_c)
    12 # summarize results
    13 print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
    631 n_splits = cv.get_n_splits(X, y, groups)
    632
    –> 633 base_estimator = clone(self.estimator)
    634
    635 parallel = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\base.py in clone(estimator, safe)
    58 “it does not seem to be a scikit-learn estimator ”
    59 “as it does not implement a ‘get_params’ methods.”
    —> 60 % (repr(estimator), type(estimator)))
    61 klass = estimator.__class__
    62 new_object_params = estimator.get_params(deep=False)

    TypeError: Cannot clone object ” (type ): it does not seem to be a scikit-learn estimator as it does not implement a ‘get_params’ methods.

    Please help me I couldnt find anything related to this. Thanks in advance!

  234. Avatar
    Bianca March 27, 2020 at 9:46 am #

    Hi Jason

    Thank you for this post. I really enjoy all of them actually, as you manage to explain everything at a beginners level. However, am I right that this approach cannot be used for multi-classification problems? Keras requires one hot encoding and this confuses sklearn 🙁 It’s throwing an error, because my labels are thus one hot encoded.

    I found the Keras Tuner (https://github.com/keras-team/keras-tuner), would you recommend this package?

    Thanks in advance 🙂

  235. Avatar
    ElHanzo March 27, 2020 at 11:32 pm #

    hello,
    i have a question of understanding , when using keras and gridsearchcv from scikit-learn.
    as far as i understand, you are passing a keras model to the gridsearchcv function. the keras model then evaluates each set of hyperparameters the gridsearchcv function is passing to the model.(so the gridsearcher is just some kind of nesting of for loops that automatically saves the best combination of hyperparams)
    when i.e. using 3 folds for cross validation, the model evaluates all three subsets (i.e. fold one and fold two for training and fold three for testing and so on) and calculates the mean/average of i.e. the accuracy score for all evaluations. and then you can see what the best hyperparam combination is and so on.
    now, if i want to display the accuracy (or loss) for each epoch (when using this params) i normally what do something like (pseudo code):
    -define a model
    -grid_result = grid_searcher.fit this model with trainingset
    -best_model = best_estimator_.model.model
    -predicting model on testset

    so when getting the history with:
    -history = best_model.history.history

    i could now get the [“acc”, “loss”, etc] in dependancy of the epochs and display them in a plot (epochs on x axis, accuracy on y axis ).
    my first question is now, which acc value is saved in history? because when using 3 folds, there should be three accuracies for each epoch. or is it the mean/average for each epoch that is saved?

    and if so, is there a way to get the accuracy for each epoch for mean/average of the test(=validation and not the testset i later want to predict) per epoch?

    because i want to compare the accuracy for the training and test sets for each epoch.

    and another question: you could also use “validation_split” for the model itself. if using this option with crossvalidation, would this mean, that the gridsearcher function is splitting into training and testing sets,
    and then the model itself also is splitting into a training and test set and evaluating with respect to this test set? would this automatically cause some kind of overfitting, because the validation set is part of the training?

    thanks in advance and kind regards

  236. Avatar
    Nick Yang April 5, 2020 at 3:59 am #

    Hi Jason, in the example you used a KerasClassifier wrapper, would a KerasRegressor wrapper work the same? In grid search, what is the scoring function for the regression? I’m trying to make sense of the score.

    Thanks

  237. Avatar
    Joel April 23, 2020 at 5:03 am #

    Hi Jason, I’m running into an issue that the sklearn scoring metrics need 2d-array, whereas my training samples are 3d (for CNN). How do i make use of the metrics in this case?

    • Avatar
      Joel April 23, 2020 at 5:58 am #

      I guess the problem is that i’m developing an autoencoder, which input == output shape. So i guess i need to reshape the scoring metrtic input into a 2d array first

    • Avatar
      Jason Brownlee April 23, 2020 at 6:12 am #

      Perhaps reshape your data to meet the expectations of the metrics?

  238. Avatar
    Zhang April 30, 2020 at 2:31 am #

    Hi Jason,
    Thank you very much for sharing your knowledge !
    I have one question when tuning the epoch and batch_size. Even though I put the numpy seed, every time I get different results in terms of best_parameters and best_accuracy.
    I understand that if we don’t avoid randomness, the parameter tuning will have no sense, because with each pair of epoch and batch_size, the test runs on a different model (random initial weight or else).
    Could you confirm my thought?
    Do you have a way to avoid randomness in the grid search?
    Best regards,

    • Avatar
      Jason Brownlee April 30, 2020 at 6:51 am #

      Go the other way and control for the random nature of learning/searching – use repeated stratified k-fold cv and calculate the average.

  239. Avatar
    manar May 2, 2020 at 2:35 pm #

    Hi Jason,
    I’m working on multi task learning .what about “grid .fit (x,y) ” which i have two outputs ?
    and (x,y) are the training data or the all data?

  240. Avatar
    Soothy May 2, 2020 at 11:38 pm #

    Hi Jason, Can you maybe make a blog on how we can visualize the results of the hyperparameter tuning through interactive graphs?
    Thanks!

    • Avatar
      Jason Brownlee May 3, 2020 at 6:13 am #

      Thanks for the suggestion.

      I’m not a big fan of interactive graphs though, sorry.

  241. Avatar
    Richard May 16, 2020 at 6:07 am #

    Hi Jason. Thank you so much for all this information, it is literally a lifesaver for my ML class!

    I am a bit confused though and hoping you could clarify. I am currently setting up a NN to predict house prices. I have 1800 observations and 11 input parameters.

    Does it make sense to just start straight away with the hyperparameter optimization using Gridsearchcv? I am optimising neurons, epochs, hidden_layers and dropout_rate at the same time to find the best model to use based on MSE. Is that senseful?

    Then after selecting these hyperparameters above I use gridsearchcv again to select the best learning rate and momentum for the GD by selecting the model with the lowest MSE.

    Am I skipping any necessary steps?

  242. Avatar
    ll May 20, 2020 at 2:45 pm #

    Great post!

  243. Avatar
    Adrian Garcia Badaracco May 20, 2020 at 6:25 pm #

    Hi Jason,

    First off, thank you for this article. It was very helpful when I was first learning how to integrate Keras and Scikit-Learn.

    I actually ended up submitting a PR to the tensorflow team to fix a lot of the issues with these wrappers, some of which are reported in other comments in this very post (ex: “the can’t pickle _thread.RLock objects” issues).

    That PR ended up turning into an entire package, that now fixes dozens of open issues in tf. If you can, I’d appreciate it if you took a look, and maybe updated the article to use it, if you think that is appropriate. Any feedback is welcome!

    https://github.com/adriangb/scikeras
    https://pypi.org/project/scikeras/

    Thanks!

  244. Avatar
    sukhpal May 21, 2020 at 10:21 pm #

    Sir as i applied grid search optimization with deep learning to produce optimal model.How i can furthur improve performance of optimized model.
    Is there any other machine learning techniques to apply on optimized deep model after grid search optimization

  245. Avatar
    Mohammad May 26, 2020 at 8:09 am #

    Hello,

    why the MSE in the result is different than the scoring value?
    I used a grid search for the regression model. To see how the model works, I make all the hyperparameters have only one value like the following:
    ———
    # Create hyperparameter space
    epochs = [80]
    learning_rates = [0.001]
    n_filters= [32]
    ———

    then I run the grid search as the following:

    ———
    # Create grid search
    grid = GridSearchCV(estimator=my_network, scoring=’neg_mean_squared_error’, refit=’neg_mean_squared_error’,
    cv=5, param_grid=hyperparameters)

    # Fit grid search
    grid_result = grid.fit(X_train, y_train, batch_size= batch_size)
    ———-

    The MSE on the results is different than the scoring of each k-fold. for example, the last MSE is 0.007 and the last score is (split4_test_score’: array([-0.27008066]),)

    why it is different ?

    thanks

    • Avatar
      Jason Brownlee May 26, 2020 at 1:20 pm #

      We are minimizing cross entropy and evaluating using accuracy.

      • Avatar
        Mohammad May 27, 2020 at 12:25 am #

        sorry I don’t get you. How can I make it MSE not accuracy then? because, as far I know, the regression models should not by in “accuracy”.

        Thanks

        • Avatar
          Jason Brownlee May 27, 2020 at 7:56 am #

          I thought you were referring to the above tutorial where all examples are classification.

          Sorry, I don’t understand your question. Perhaps you can simplify or rephrase it?

          • Avatar
            Mohammad May 27, 2020 at 9:11 am #

            sorry for not being clear.

            – I wrote a CNN regression model, in which MSE is used to evaluate the model
            – used Grid search code as mentioned in my first post, in which neg_mean_squered_error is used
            – during the code running, I can see the MSE value which has a low value (e.g. 0.007).
            – from grid.result, the score is high (e.g. 0.2).

            why the score value is different than the MSE value?

            ——
            I hope I make it clear.
            by the way, I have similar problem to this
            https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/#comment-536437

          • Avatar
            Jason Brownlee May 27, 2020 at 1:27 pm #

            A CNN is in appropriate for regression, unless it is a sequence input.

            If it is a sequence input, using a grid search in sklearn would be inappropriate as you will need to manually walk-forward validation to evaluate the model.

            Finally, to answer the specific question, perhaps you are using CV in the grid search which will average across multiple runs and you are comparing this to one value from one model at one point during training?

          • Avatar
            Martha June 6, 2020 at 1:50 am #

            Hello Jason,

            I am working on my Master thesis about the prediction of the pollution, basically a regression problem, and I need to find the best parameters for differents models.

            I’m surprised about your last answer…Because I have prepared a grid search over a KerasRegressor using TimeSeriesSplit as CV.

            What is the alternative to find the best parameter in a timeseries problem?

            Thanks in advance!

          • Avatar
            Jason Brownlee June 6, 2020 at 7:57 am #

            My preference is a manual grid search where models are evaluated using walk-forward validation.

  246. Avatar
    tbob May 26, 2020 at 6:37 pm #

    I’ve been trying to run the code with my own parameters, specifically learning rate & weight initialisation, however i keep running an error which states that learn_rate isn’t a legal parameter? I understand that it’s not a legal parameter in the fit method for grid, but don’t quite understand how you got yours working in the code?

    • Avatar
      Jason Brownlee May 27, 2020 at 7:45 am #

      Perhaps try tuning the parameter manually with your own loop?

  247. Avatar
    Shadi June 21, 2020 at 11:36 pm #

    Hello Jason! Thank you very much for your posts 🙂
    This is my question:How can I use early stopping in my code?where should I put it?

    # callbacks=[tf.keras.callbacks.EarlyStopping(monitor=’val_loss’, patience=10,mode=”auto”)]

    ################################################################
    ###define the model:

    numpy.random import seed
    seed(1)

    def create_model(optimizer=’rmsprop’):
    model = Sequential()
    model.add(LSTM(50, activation=’relu’, return_sequences=True))
    model.add(LSTM(50, activation=’relu’))
    model.add(Dense(1))

    model.compile(loss=’mse’,optimizer = optimizer)

    return model

    clf = KerasRegressor(build_fn=create_model,epochs = 500,callbacks=[tf.keras.callbacks.EarlyStopping( patience=10)])

    param_grid = {
    ‘clf__optimizer’ : [‘adam’,’rmsprop’],
    ‘clf__batch_size’ : [500,45,77]
    }

    pipeline = Pipeline([
    (‘clf’,clf)
    ])

    from sklearn.model_selection import TimeSeriesSplit, GridSearchCV

    tscv = TimeSeriesSplit(n_splits=5)

    grid = GridSearchCV(pipeline, cv=tscv,param_grid=param_grid,return_train_score=True,verbose=10,
    scoring = ‘neg_mean_squared_error’)

    grid.fit(Xtrain2,ytrain.values)

    grid.cv_results_

    #####################################################################

  248. Avatar
    Abinash June 22, 2020 at 9:29 pm #

    Sir, instead of ‘accuracy’ I want to use ‘AUC’ in the metrics in model.compile because I have highly imbalanced data. But I am unable to do that as it throws an error: ‘Could not pickle the task to send it to the workers’. How to solve this??

    • Avatar
      Jason Brownlee June 23, 2020 at 6:19 am #

      You might need to run the grid search manually with for loops and calculate AUC using the sklearn function manually.

  249. Avatar
    Abinash June 24, 2020 at 2:29 am #

    Sir, is there any way of using any classification metrics like ‘Recall’, ‘Precision’, TruePositives’ etc. in the place of ‘accuracy’ in the metrics of model.compile in the code corresponding to the section ‘How to Tune Batch Size and Number of Epochs’ of this page? I just do not want to use ‘accuracy’ because of the dataset I am using. If ‘accuracy’ can be used then why not ‘Recall’, I just do not get it. Help needed.

    • Avatar
      Jason Brownlee June 24, 2020 at 6:36 am #

      Yes, you can specify them directly as a list of metrics.

      • Avatar
        Ronald May 1, 2021 at 5:20 am #

        Hi Jason,

        If you select multiple metrics, how can you view them when reviewing the results? I tried to look through the cv_results_ dictionary but it always seems to only store the accuracy metric (which is the first metric in my list).

        • Avatar
          Jason Brownlee May 1, 2021 at 6:11 am #

          I would expect all metrics to be stored in there, I’m suprised.

  250. Avatar
    Marco July 9, 2020 at 12:51 am #

    Hi, many thanks for your post. My question is:

    I am trying to grid search almost all hyper-parameters at once:

    def create_model(optimizer, activation, dropout_rate, neurons, init_mode):
    # create model
    model = Sequential()
    model.add(Dense(neurons=neurons, input_dim=8, kernel_initializer=init_mode, activation=activation)) #neuron activation
    model.add(Dropout(dropout_rate=dropout_rate))
    model.add(Dense(1, kernel_initializer=init_mode, activation=’sigmoid’))
    # Compile model
    model.compile(loss=’binary_crossentropy’, optimizer=optimizer, metrics=[‘accuracy’])
    return model

    … load data…
    … create model…

    batch_size = [5, 10]
    epochs = [1, 2]
    optimizer = [‘SGD’, ‘Adam’]
    ##learning_rate = [0.01, 0.1] ## NON MI FA METTERE LR E MOMENTUM QUANDO C’è DROPOUT
    ##momentum = [0.2, 0.4]
    activation = [‘softmax’, ‘relu’]
    dropout_rate = [0.1, 0.2]
    neurons = [5, 10]
    init_mode = [‘uniform’, ‘normal’]

    param_grid = dict(batch_size=batch_size,
    epochs=epochs,
    optimizer=optimizer,
    activation=activation,
    dropout_rate=dropout_rate,
    neurons=neurons,
    init_mode=init_mode)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
    grid_result = grid.fit(X, Y)

    But I get this error: TypeError: __init__() missing 1 required positional argument: ‘units’

    Do you know what could be the reason and possible solution? Or if I am doing anything wrong?

    Thank you!
    Marco

  251. Avatar
    Jack July 23, 2020 at 7:34 am #

    Hy Jason, very insightful tutorial. I have customised your Grid Search example to test for almost all hyperparameters and I have fed a dataset of 300000+ data points. The problem is that the process is extremely slow and I am using GPU nvidia on centOS 7 through SSH connetion. I have two questions in regards:

    1. I am wondering if there might be a technique to accelerate the grid search process?

    2. Would it make sense perhaps to decrease the size of my dataset to test for hyperparametrs optimisation? or it would make no sense at all because my model will then implement a different (way larger) dataset compared to the one used for Grid search?

    Thanks Jason,
    Jack

    • Avatar
      Jason Brownlee July 23, 2020 at 2:39 pm #

      Some ideas:
      – Test fewer hyperparameters
      – Use a faster machine
      – Use a smaller dataset

      It’s a trade off of your expected improvement from searching and the time/resources you want to spend.

  252. Avatar
    Srikar July 24, 2020 at 4:27 pm #

    Hey jason,

    Thank you very much for tutorial. I am using a dataset of only 400 points for the optimization of my lstm model and when I try to grid fit my data using the grid, my error score gets printed out as NAN.What could be the issue? Also, have you scaled your data in the tutorial or have you fed it as it is?

    Also, instead of the binary_crossentropy error, I am using the means squared error for my model. Kindly do let me know. I am trying to optimize my number of epochs and batch size first.

  253. Avatar
    Ella July 27, 2020 at 1:11 am #

    Hi Jason, this below is just your code put into two functions.

    I am wondering, why it does not work? it says that activation is not defined! but when I run it as you made it (with create_model function and rest of the code outside) it works amazingly.

    def create_model(activation=’relu’):
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=8, kernel_initializer=’uniform’, activation=activation))
    model.add(Dense(1, kernel_initializer=’uniform’, activation=’sigmoid’))
    # Compile model
    model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
    return model

    deft keras_model(model):
    # fix random seed for reproducibility
    seed = 7
    numpy.random.seed(seed)
    # load dataset
    dataset = numpy.loadtxt(“pima-indians-diabetes.csv”, delimiter=”,”)
    # split into input (X) and output (Y) variables
    X = dataset[:,0:8]
    Y = dataset[:,8]
    # create model
    model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=10, verbose=0)
    # define the grid search parameters
    activation = [‘softmax’, ‘softplus’, ‘softsign’, ‘relu’, ‘tanh’, ‘sigmoid’, ‘hard_sigmoid’, ‘linear’]
    param_grid = dict(activation=activation)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
    grid_result = grid.fit(X, Y)
    # summarize results
    print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
    means = grid_result.cv_results_[‘mean_test_score’]
    stds = grid_result.cv_results_[‘std_test_score’]
    params = grid_result.cv_results_[‘params’]
    for mean, stdev, param in zip(means, stds, params):
    print(“%f (%f) with: %r” % (mean, stdev, param))

  254. Avatar
    Isaac August 10, 2020 at 5:16 am #

    Hi Jason, thank you for this amazing tutorial, it has been super useful!

    I have a question: I am trying to tune GaussianNoise(gn) and LeakyRelu(alpha) in my lstm model, but I I get the errors that both gn and alpha are not legal parameters.

    Do you know if they are supported by GridSearchCV? Because for all other hyperparameters it works perfectly!

    Thank you!
    Isaac

    • Avatar
      Jason Brownlee August 10, 2020 at 5:58 am #

      Sorry to hear you’re having trouble. I would expect that grid search is not concerned about the specific layers used in your model.

  255. Avatar
    Michael August 19, 2020 at 1:37 am #

    Thank you for this great post. This will certainly help me make better deep learning models.

    Is it possible to use this same procedure with a TimeSeriesGenerator?

    In particular, when I try to fit the model using fit_generator(), I get an error saying that gridsearchcv object has no attribute ‘fit_generator’.

    And I’m not sure what my x and y variables would be if I were to use .fit() because my dataset is a one-variable time series.

  256. Avatar
    GRIGORIY SOKOLOV August 28, 2020 at 3:22 am #

    Very useful, thank you very much.

  257. Avatar
    Abhi Bhagat September 6, 2020 at 8:08 pm #

    Can you please explain what ” random seed ” actually does ?

  258. Avatar
    Jessy September 20, 2020 at 4:18 pm #

    hi jason,
    can i use feature selection technique with lstm? Dataset contain 50 features. My question is (feature selection +LSTM) and then perform classification. Does LSTM do feature engineering process on its own.

    • Avatar
      Jason Brownlee September 21, 2020 at 8:07 am #

      Yes, you can use a step-wise procedure or RFE-like procedure to see what combination of input time series result in the best performing model.

      Yes, LSTM will perform automatic feature extraction.

  259. Avatar
    George September 22, 2020 at 4:58 pm #

    Hi Jason,
    While fitting with GridSearchCV can we include validation set, which is better, using it or not using it


    grid_result = grid.fit(X_train, y_train,validation_data=(X_valid, y_valid))

    or

    grid_result = grid.fit(X_train, y_train)

    Thanks

    • Avatar
      Jason Brownlee September 23, 2020 at 6:35 am #

      It is better to draw the validation set from the train set in each fold.

      i.e. I think it’s better to write the grid search manually if you want to use a validation set.

  260. Avatar
    George September 23, 2020 at 9:21 am #

    Thanks Jason,
    We can try sklearn.model_selection.PredefinedSplit
    also, this is very useful
    https://machinelearningmastery.com/train-to-the-test-set-in-machine-learning/

  261. Avatar
    Jessy October 6, 2020 at 1:50 pm #

    hi jason,
    I have a doubt that is ..my research problem is sequence classification (sleep stage and EEG eye state). can i use simulated annealing technique to select subset of features and then passing into lstm…( is that correct) or lstm can itselef select the required features for sequence prediction.. i have little confusion in feature engineering in deep learning…

    • Avatar
      Jason Brownlee October 6, 2020 at 1:59 pm #

      Perhaps try it and compare the results to using all features.

  262. Avatar
    Parijat October 22, 2020 at 8:22 pm #

    Hi Jason,

    Thanks for such an elaborate explanation. I am facing a problem: my epochs and batch_size combination are not changing.

    params = {‘epochs’:[100,150],’batch_size’:[16,32]}

    Problem snap:

    lr=0.01,n1=26,n2=13,p=0.2,activation1=sigmoid,activation2=linear
    epochs=100,batch_size=16
    lr=0.01,n1=26,n2=13,p=0.2,activation1=sigmoid,activation2=linear
    epochs=100,batch_size=16

    ————————————–

    The code:

    def build_regressor(batch_size,epochs,n1=26,n2=13,p=0.2,lr=1e-02,activation1=’sigmoid’,activation2=’linear’):
    print(f’lr={lr},n1={n1},n2={n2},p={p},activation1={activation1},activation2={activation2}’)
    print(f’epochs={epochs},batch_size={batch_size}’)
    model = Sequential([
    Dense(n1, activation=activation1, input_shape=(x_train.shape[1],)),
    Dropout(p),
    Dense(n2, activation=activation1),
    Dropout(p),
    Dense(1, activation=activation2),#elu
    ])

    optimizer=Adam(lr=lr)
    model.compile(optimizer=optimizer,loss=’mse’,metrics=[‘mse’]) #mean_squared_error
    return model

    model=KerasRegressor(build_fn= build_regressor,verbose=-1)
    params = {‘epochs’:[100,150],’batch_size’:[16,32]}
    model=GridSearchCV(estimator=model,param_grid=params,cv=5)
    hist=model.fit(x_train,y_train,verbose=0)

    As the params combination is changing, the prog is running in an infinite loop. I am not able to find any syntax error.

  263. Avatar
    George November 13, 2020 at 10:06 am #

    Hi Jason,
    Question on tuning number of neurons in hidden layer, under that headings you have given

    In this example, we will look at tuning the number of neurons in a single hidden layer.

    Does the single hidden layer means

    Input Layer plus One Hidden Layer
    OR
    Only the Input Layer

    As per the model, it is a tuning for Input Layer, please correct me, Thanks

    model = Sequential()
    model.add(Dense(neurons, input_dim=8, kernel_initializer='uniform', activation='linear', kernel_constraint=maxnorm(4)))
    model.add(Dropout(0.2))
    model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))

    • Avatar
      George November 13, 2020 at 10:12 am #

      The Confusion for me is,
      First i tuned for no of neurons as per above model,
      Next i tune for number of hidden layers as

      model.add(Dense(neurons_1L, ....))
      model.add(Dropout(rate=dropout_rate))
      for i in range(int(hidden_layers)):
      model.add(Dense(neurons_1L, ....))
      model.add(Dropout(dropout_rate))
      model.add(Dense(1, kernel_initializer=init_mode, activation='sigmoid'))

      If i get 2 hidden layers, should i put
      Input Layer plus 2 Hidden Layers (which is 3 Layers) or Only 2 Layers
      Thanks

    • Avatar
      Jason Brownlee November 13, 2020 at 10:28 am #

      The first hidden layer and visible layer are defined on one line.

      We are tuning the number of nodes in the first hidden layer, not the visible layer.

  264. Avatar
    So so Slowly November 13, 2020 at 12:44 pm #

    My running code is as follows and although it takes about 8 9, it does not give very slow results, both when I use gpu and when I use cpu. What should I do?

    from keras.models import Sequential
    from keras.layers import Dense
    from keras.layers import Flatten
    from keras.layers import Dropout
    from keras.layers import LSTM
    from numpy import mean
    from numpy import std
    from keras.utils import to_categorical
    from sklearn.model_selection import GridSearchCV
    from tensorflow import keras
    from tensorflow.keras import layers
    from keras.optimizers import Adam
    from keras.optimizers import SGD
    from keras.layers import Dense
    from keras.layers import Dropout
    from keras.wrappers.scikit_learn import KerasClassifier
    from keras.constraints import maxnorm

    def create_model(dropout_rate=0.0):
    verbose, epochs, batch_size = 0, 40, 5
    n_timesteps, n_features,n_outputs = trainX.shape[1],trainX.shape[2], testY.shape[1]
    model = Sequential()
    model.add(LSTM(100, input_shape=(n_timesteps,n_features)))
    model.add(Dropout(dropout_rate))
    model.add(Dense(100, activation=’relu’))
    model.add(Dense(n_outputs, activation=’softmax’))
    #opt = keras.optimizers.SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
    opt = keras.optimizers.Adam(learning_rate= 0.000001)
    model.compile(loss=’categorical_crossentropy’,optimizer = opt,metrics=[‘accuracy’])
    return model

    # run an experiment
    def run_experiment():
    print((trainX).shape,(trainY).shape,(testX).shape, (testY).shape)
    model = KerasClassifier(build_fn=create_model, epochs=40, batch_size=5, verbose=0)
    dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
    param_grid = dict(dropout_rate=dropout_rate)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=3)
    grid_result = grid.fit(trainX, trainY)
    print(“a”)
    print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
    means = grid_result.cv_results_[‘mean_test_score’]
    stds = grid_result.cv_results_[‘std_test_score’]
    params = grid_result.cv_results_[‘params’]
    for mean, stdev, param in zip(means, stds, params):
    print(“%f (%f) with: %r” % (mean, stdev, param))

    run_experiment()

  265. Avatar
    Erwan Delh November 14, 2020 at 3:48 am #

    Hello,

    Thank you for this post. I knew the key knowledges about grid-search optimisation, but I didn’t know yet how to implement it.

    I’ve got a few question, regarding a really specific application of ML that I’m doing (I’ve already made researchs, but found few answers) :

    – is it possible, instead of making a kfold for the CV, to give a CVset directly? (I do Human Activity Recognition, so I need to think at a subject scale, and not a data scale, in order to be confident in my ability to generalise) –> I have the impression, that I can do so, by assigning a list of indexes, for which, I want my data to be part of the CVset or not)

    – is it possible to give as input, a tensorflow datasets?

    thank you for your answers

    • Avatar
      Jason Brownlee November 14, 2020 at 6:37 am #

      Sure, you can design the test harness any way you like.

  266. Avatar
    Prajna November 20, 2020 at 9:24 pm #

    Thanks Jason for the post. But if we are using only GridSearchCV, can we calculate Precision, Recall and F1 Score for the CNN model once hyper parameter tuning is done……?..if yes, please let me know how to do that….like i have not used train_test_split…… i only used GridSearchCV…..

  267. Avatar
    Fatima November 23, 2020 at 1:46 am #

    Hi Jason,
    Thanks a lot for this amazing tutorial, it helps me a lot to start applying deep learning.
    I have a data set that contains 2738 rows, with 26 columns (Colum Number 26 is the target variable), I want to apply Deep Neural Network (DNN) to do the classification (to predict a binary class label)
    My question is how to tune the number of Neurons in each hidden layer, is it dependent on the size of the dataset? Do you have a way that will be helpful to tune the number of Neurons in each hidden layer?

    Thanks for your help!

    • Avatar
      Jason Brownlee November 23, 2020 at 6:18 am #

      You’re welcome.

      Good question, perhaps use a little trial and error and see what works good enough, then try tuning that.

  268. Avatar
    snehal November 25, 2020 at 6:44 pm #

    hello @Jason can we create grid search for numbers of hidden layers? I tried to create but it didn’t work.

  269. Avatar
    progammer newbie December 3, 2020 at 12:13 am #

    Hi Jason,

    thanks very much for your precious work. I’ll definitely buy your e-book set to support you and learn more!

    Our Institute offers additional processing power, since we have many programmers here. Do you think that it should be possible to do the optimization with all hyperparameters at once?

    • Avatar
      Jason Brownlee December 3, 2020 at 8:19 am #

      You’re welcome.

      Yes, if you have the resources, tune everything at once.

  270. Avatar
    Abdullah December 16, 2020 at 1:40 am #

    Amazing explanation Jason, I am thank you so much for that. Just I have one question if we would how many hidden layers we use can we get it?

    • Avatar
      Jason Brownlee December 16, 2020 at 7:52 am #

      Thanks.

      I recommend testing different numbers of hidden layers and discover what works well/best for your model and dataset.

  271. Avatar
    Carlos Castro December 17, 2020 at 12:12 pm #

    Thank you Jason, I have a question. How do you set the proportion for training and test? I need a 75 25 rate. Thank you!!

    • Avatar
      Jason Brownlee December 17, 2020 at 1:00 pm #

      You’re welcome!

      Tough question. It depends on the data, you want both train and test to be “representative” of the problem.

      Start with 50/50, evaluate, then try more aggressive splits like 70/30 etc and compare results (variance of repeated evals).

  272. Avatar
    Xerxes December 25, 2020 at 12:37 pm #

    Hello,

    Is it possible to do this using data from ImageDataGenerator in the grid.fit?

  273. Avatar
    Saransh Gupta December 29, 2020 at 3:52 pm #

    Thanks Jason, it is a very helpful tutorial

  274. Avatar
    Greg December 31, 2020 at 8:58 am #

    How to handle multiple inputs with the GridSearch. I can handle one input one output for keras model, but then when I have a model with 2 inputs, things go badly wrong. I get something like:

    AssertionError: Could not compute output Tensor(“dense_9/Sigmoid:0”, shape=(None, 1), dtype=float32)

    • Avatar
      Jason Brownlee December 31, 2020 at 9:27 am #

      You may need to write your own for loop to enumerate the configurations to test.

      • Avatar
        Greg December 31, 2020 at 11:16 pm #

        Looks like GridSearch is still like that. I can perform the same functionality with Keras Tuner for multiple inputs, but the reason why I was trying to use GridSeach directly from scikit-learn is that I want something that plays nicely with dask and form what I can tell dask does not handle Keras Tuner but it does handle the normal GridSearch from Keras.

  275. Avatar
    Helene January 7, 2021 at 9:12 pm #

    Thanks a lot for this excellent post!

    I am working on a regression problem using keras.wrappers.scikit_learn.KerasRegressor. I don’t completely understand the grid_result.best_score_ in this case: is it a mean gap in % between the prediction and the real solution?

    I also noticed in the source code of GridSearchCV function that version 0.22 changed the cv default value from 3-fold to 5-fold.

    • Avatar
      Jason Brownlee January 8, 2021 at 5:44 am #

      You’re welcome.

      Best score is the largest mean score across the CV folds for the corresponding configration.

  276. Avatar
    Ritesh February 4, 2021 at 6:25 pm #

    Hi Jason,

    Thanks for this amazing post!

    I am stuck here, could you please share any idea to solve the below issues:

    1. Facing when mentioned the optimizer
    optimizers = [‘SGD’, ‘RMSprop’, ‘Adagrad’, ‘Adadelta’, ‘Adam’, ‘Adamax’, ‘Nadam’]
    param_grid = dict(optimizer=optimizers)

    ValueError: optimizer is not a legal parameter

    2. Facing when mentioned the learn_rate and momentum
    learn_rate = [0.001, 0.01, 0.1]
    momentum = [0.0, 0.2, 0.4]
    param_grid = dict(learn_rate=learn_rate, momentum=momentum)

    ValueError: learn_rate is not a legal parameter

  277. Avatar
    sanglok February 5, 2021 at 4:42 pm #

    Thanks for the great post.

    I have a question regarding the GridSearchCV outputs.

    I am using KerasRegressor

    I can see the outputs with ‘loss’ and ‘mse’
    however, this loss is training loss or validation loss?

    If the loss is the training loss, how can I show the validation loss in the GridSearchCV?

    • Avatar
      Jason Brownlee February 6, 2021 at 5:44 am #

      GridSearchCV won’t return model loss, it will report the cross-validation performance for your model on your chosen metric.

  278. Avatar
    Sonika Jha February 16, 2021 at 1:24 am #

    Say for example we’re searching learning rate using grid search, then in that case each model would be trained till the last epoch(say 200) and if any model happens to overfit till it reaches 200th epoch, then the final validation loss would be low even, though the model might have performed really well early on (lets say during 100th epoch). So I believe early stopping should be mandatory in doing better grid search.But going by the google searches on this it seems it’s very uncommon to use Grid search+early stopping.Please share your thoughts on this.

  279. Avatar
    Marek March 2, 2021 at 9:50 pm #

    I am wondering is function GridSearchCV evaluating best parameter setting on some internally split test size or it trains and test on whole dataset which is passed in? If the latter, how is it actually useful please?

    Thanks!

    • Avatar
      Jason Brownlee March 3, 2021 at 5:35 am #

      Yes, internally it uses k-fold cv, and if you want you can specify the cross-validation procedure via the “cv” argument.

  280. Avatar
    ali March 4, 2021 at 12:01 am #

    Hi Jason, and thank you for this amazing tutorial!

    only have one question. is there any way to use grid search or bayesian optimization for Functional API Models?
    because this method doesn’t work in this case.

    I wonder what is the best way to tune hyperparameters for functional models?

    thanks a lot!

    • Avatar
      Jason Brownlee March 4, 2021 at 5:50 am #

      You’re welcome.

      You would use grid search or bayes optimization, but not both. They are two different solutions to the same problem.

  281. Avatar
    Imran Khan March 16, 2021 at 5:31 pm #

    Hi..
    It was a very good experience in reading this article, I gained a lot.
    Thank you for this wonderful post.
    I was looking for one more thing in this post i.e. How to optimize the number of layers.

    • Avatar
      Jason Brownlee March 17, 2021 at 6:01 am #

      Thanks!

      You can write a for-loop and iterate over different numbers of layers to try in your model.

  282. Avatar
    Imran Khan March 16, 2021 at 5:41 pm #

    Hi.
    I have a few queries regarding tuning the model by RandomizedSearchCV or GridSearchCV:
    1. What is the difference between metrics and scores with respect to RandomizedSearchCV or GridSearchCV.

    2.What metrics and score we should use for regression problems while performing RandomizedSearchCV or GridSearchCV.

    3.RandomizedSearchCV saves a lot of time, but how far the selected parameters from RandomizedSearchCV are best. I mean authentication of it being the best among the lot.

    • Avatar
      Jason Brownlee March 17, 2021 at 6:02 am #

      The difference in scores is due to the difference in the search algorithms.

      You must choose a metric that best captures what is important to you and your project stakeholders about a final model.

      We cannot know the best parametres or how long they might take to locate, instead, we only only run a search for as long as we have resources available.

  283. Avatar
    JG April 20, 2021 at 7:19 pm #

    Hi Jason,

    An old but very useful tutorial about Sklearn GridSearchCV() module. Thank you.

    Because Keras Wrapper via KerasRegressor and KerasClassifier try to take advantage of all these stuff coming from Sklearn:
    KFold(), cross_val_score (), GridSearchCV(), Pipeline(),…

    My questions are:

    1º) I see that also Sklearn is advancing to neural networks implementation with MLPRegressor and MLPClassifier by its own …so probably you do not need more KerasRegressor and KerasClassifier because direct integration of MLP neural models coming from Sklearn… what do you think?

    2º) do you have any post to see the scope of these sklearn steps ? such as for example trying to include a more advance ANN such as convolutional (CNN), recurrent (LSTM) …

    3º) or these Sklearn steps has not any meaning, because keras or tensorflow libraries can develop o wrapper better this modules relating to GridSearcCV, KFold, cross_val_score, …that currently they do not have as their own?

    regards,

    • Avatar
      Jason Brownlee April 21, 2021 at 5:55 am #

      Perhaps. Keras is still a better lib for custom neural nets I believe.

      No, CNNs and LSTMs don’t play nice with sklearn given the structure of the input for the models.

      sklearn still offers useful data transforms and metrics I believe.

  284. Avatar
    JG April 21, 2021 at 5:52 pm #

    thanks

  285. Avatar
    alex May 22, 2021 at 4:27 pm #

    …. How can you find the best dropout & weight when not comparing training with validation result ?????? I think there is a mistake here no ?

    • Avatar
      Jason Brownlee May 23, 2021 at 5:23 am #

      You can find a good drop out rate with trial and error.

  286. Avatar
    Negin May 26, 2021 at 5:25 am #

    Hello,

    If we have weighted loss and we need to find the suitable weight for each part of the loss function using GridSearchCV, is it possible? how can I do it?

  287. Avatar
    Irfan Tariq May 31, 2021 at 7:46 am #

    KerasClassifier doesn’t support sample_weight.

    i want to use adaboost classifier to boost my LSTM model but I am getting the following error. how I can solve this problem.

  288. Avatar
    sara July 17, 2021 at 7:15 pm #

    Thanks Jason,

    Do you know if this GridSearch is applicable for data with shape like this (samples, steps, features)

    When I tried it out, I got the following error:

    ValueError: Found array with dim 3. Estimator expected <= 2.

    Did I mess something ?

    • Avatar
      Jason Brownlee July 18, 2021 at 5:21 am #

      No, you may have to write custom code, e.g. some for loops.

  289. Avatar
    Mahdi Arjomandazar July 20, 2021 at 6:39 pm #

    Dear Jason,
    I have adopted your code for my dataset which comprises 8 variables. Since It’s a regression analysis, I have changed the pre-defined model compile hyperparameter loss criteria to ‘mean_squared_error’.
    Nonetheless, it yields lower accuracy than expected (~0.002).
    Therefore, I’ve also changed the metrics to ‘mse, cosine’ which doesn’t tend to be compatible with the KerasClassifier structure.

    Kindly assist me how to tune the code for regression.
    All the best,
    Mahdi

  290. Avatar
    Jack August 23, 2021 at 8:15 am #

    Dear Jason,

    I’ve been trying to crack this one for a while and reading your code and articles have really helped.

    I’ve tried to combine some of the examples above, but they they fail very quickly, declaring that “All estimators failed to fit” (I’ve put my code below).

    Is there an underlying reason why (apart from compute resource) these can’t work together?

    Kind regards,
    Jack

    # Make a scorer
    mse_scorer = make_scorer( mean_squared_error, greater_is_better = False )

    # Walk-forward validation: How many splits
    tss_cv = TimeSeriesSplit(n_splits = 3)

    LSTM_random_grid = {
    ‘lstm_units’: [5, 6]
    , ‘activation’ : [‘relu’,’tanh’]
    , ‘batch_size’: [50, 100]
    , ‘epochs’: [1,2]
    }

    # Function to create and returns a Keras model
    def build_model(activation=’relu’, lstm_units=150):
    # Design network
    model = tf.keras.models.Sequential()
    # Add LSTM hidden layer
    model.add(tf.keras.layers.LSTM(lstm_units, input_shape=(train_X_nn.shape[1],train_X_nn.shape[2])))
    # Add the dense layer
    model.add(tf.keras.layers.Dense(train_y.shape[1], activation=activation))
    # Compile model
    model.compile(loss=’mean_squared_error’, optimizer=’adam’, metrics = [‘mse’])
    return model

    # Create model
    model = KerasRegressor(build_fn=build_model, verbose=10)

    # Prepare the search
    LSTM_random_search = RandomizedSearchCV(
    estimator=model
    , param_distributions = LSTM_random_grid
    , n_iter = 20 # E.g. For n_iter=20, fitting 5 folds for each of 20 candidates, totals 100 fits
    , cv=tss_cv
    , verbose = 10
    , scoring = mse_scorer
    , random_state = seedVal
    , n_jobs = -1 #-1 to use all cores
    , pre_dispatch = 2 # Limit the number of jobs despatched in parallel
    )

    # Best fit
    LSTM_random_result = LSTM_random_search.fit(traintest_X_nn, traintest_y.values.ravel())

    • Avatar
      Adrian Tam August 24, 2021 at 8:18 am #

      Because GridSearchCV() expects the scoring function to go up when it improves. Usually something failing fast is due to set up error.

  291. Avatar
    Victor Soeby November 12, 2021 at 10:22 pm #

    Hi.

    I have followed this amazing post in my project, where i have a model composed of LSTM layers. Here i’ve used the KerasRegressor in pretty much the same way you use the classifier.
    I currently have a gridsearch running at 14/16 avaliable cores on my PC, and it has been going for two days straight. I’m interested in getting the best performance, which i’ve read can be done using the GPU when working with Keras models (I have a 1070 Ti).

    Do you have any links or information regarding how one would implement GPU utilization in this grid search example of yours? I am completely new to implementing GPU’s, as this is my first Keras project, since i ‘graduated’ from Scikit Learn.

    • Avatar
      Adrian Tam November 14, 2021 at 2:25 pm #

      Unfortunately, probably you can’t do it with GPU. You can run ONE model with GPU on many data (but your machine learning library needs to support GPU, such as Tensorflow) but GPU as a SIMD machine, it inherently cannot run two different programs parallel.

  292. Avatar
    Murilo Souza December 7, 2021 at 9:13 pm #

    I am getting the following error when trying to tune the activation function:

    ValueError: Invalid parameter activation for estimator KerasClassifier.
    This issue can likely be resolved by setting this parameter in the KerasClassifier constructor:
    KerasClassifier(activation=relu)
    Check the list of available parameters with estimator.get_params().keys()

    This is happening only when trying to tune activation functions. Doesn’t matter wich activation function i use there, it always throw this error.

    • Avatar
      Adrian Tam December 8, 2021 at 8:06 am #

      What did you set in the parameter?

  293. Avatar
    Adnan Abid February 1, 2022 at 5:21 pm #

    Hi Jason,

    A great tutorial indeed!

    Can you please let me know if we can tune all the parameters collectively
    i.e. we run a single experiment while providing different variations of all the above parameters in [.., .., …]
    All different parameters mean a separate list for
    batch size […..]
    optmization algorithm […..]
    activation function […..]
    learning rate […..]
    no. of neurons [….]
    dropout [….]
    epochs [….]

    This can result into a large number of experiments.
    Or should fix few parameters and then find the optimal for the others?

  294. Avatar
    Mohamad Darouich February 24, 2022 at 12:25 am #

    Hi Jason,

    A great tutorial indeed, thank you very much!

    I tried to use KerasRegressor with the GridSearchCv instead of KerasClssifier.
    but unfortunetly this didn’t work, i became the following error:

    FileNotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ram://aea24a7a-9df6-4443-ac2d-d6011f6fa569/variables/variables
    You may be trying to load on a different device from the computational device. Consider setting the experimental_io_device option in tf.saved_model.LoadOptions to the io_device such as ‘/job:localhost’.

    i’ll be thankful, if yoi could tell me how i fix this problem?!

    • Avatar
      James Carmichael February 24, 2022 at 12:52 pm #

      Hi Mohamad,

      Thanks for asking.

      I’m eager to help, but I just don’t have the capacity to debug code for you.

      I am happy to make some suggestions:

      Consider aggressively cutting the code back to the minimum required. This will help you isolate the problem and focus on it.
      Consider cutting the problem back to just one or a few simple examples.
      Consider finding other similar code examples that do work and slowly modify them to meet your needs. This might expose your misstep.
      Consider posting your question and code to StackOverflow.

  295. Avatar
    Jessica March 15, 2022 at 2:42 am #

    Dear Jason and team. I have 2 questions:

    1. Is it a good practice to obtain “mean_train_score” and “mean_test_score” to know if the model evaluated in “GridSearchCV” suffers from overfitting or underfitting? Is it good practice to also get the model evaluation metric using “GridSearchCV”?

    2. AFTER performing “GridSearchCV” it is necessary to create a model in the traditional way (example: rfr = RandomForestRegressor.set_params(**optimal_params); rfr.fit(X, y); rfr.predict(X_pred)) considering ALL the data (different to what is done in “GridSearchCV”, where the data was divided into train and test, and therefore the data for train consisted of k-1 folds) as TRAIN data and the optimal parameters obtained with GridSearchCV?

    Thank you so much!!! Your information is fantastic, thank you for making it freely available.

  296. Avatar
    Ayenew March 17, 2022 at 6:20 am #

    It’s interesting and helpful tutorial. I think there are some changes on the KerasClassifier class. First it’s now deprecated and is separated from the main keras library. Also there are some changes on how to use it.

  297. Avatar
    Tyler July 5, 2022 at 12:51 pm #

    Hi

    Almost the same blocks of code for tuning multiple params. Is that just for educational purposes or a limitation?

    Using D.R.Y, can we not tune all of the parameters at the same time with GridSearch like we can with other ML models?

    If not, what are the params we can technically tune together in a single code-block?

    Thanks for the tutorial.

    • Avatar
      James Carmichael July 6, 2022 at 3:11 am #

      Hi Tyler…For educational purposes.

  298. Avatar
    Tyler July 5, 2022 at 1:37 pm #

    Also, in your GridSearch for neurons you have


    model = Sequential()
    model.add(Dense(neurons, input_shape=(X_train.shape[1],), kernel_initializer='uniform', activation='sigmoid', kernel_constraint=MaxNorm(4)))
    model.add(Dropout(0.2))
    model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))

    This is for regression it seems. For my binary classification should I change the first layer to activation='sigmoid'? Why do you have linear in the first layer and sigmoid in the third layer? And should I remove model.add(Dropout(0.2))?

    Thanks

  299. Avatar
    Gian Paolo July 8, 2022 at 3:36 am #

    When migrating from tf.keras.wrappers.scikit_learn to Scikeras as per your latest update, I have installed the SciKeras package in my env.
    My kernel dies after importing Tensorflow in my notebook.
    Can you tell how you installed Scikeras to make it work in your code?

    Thanks

    • Avatar
      James Carmichael July 8, 2022 at 5:49 am #

      Hi Gian…Please clarify exactly what you mean by “kernel dies” so that we may better assist you.

  300. Avatar
    Gian Paolo July 8, 2022 at 12:20 pm #

    It is a pop up message from the jupyter notebook. I run my scripts with that.
    – I pip install scikeras and tensorfllow
    – when I execute the line ‘ import tensorflow in my jupiter notebook, the pop up message appears saying the kernel has died and automatically will restart, but it does not.
    – a red box appears on the upper right cornet of jupyter notebook reading dead kernel
    – same behaviour when I shutdown and restart the kernel and repeat the command

    I am using a M1 mac with an Anaconda installation.

  301. Avatar
    Tarun July 12, 2022 at 7:55 pm #

    Dear Jason,

    As always, excellent post. I have a query. Can I use CV if I have a Time Series ? Since in Time Series, CV doesn’t work.

    Please let me know.

    Regards
    Tarun

  302. Avatar
    jackie July 20, 2022 at 3:10 am #

    Hi Jason,

    Thanks for your tutorials, they are very helpful !

    I used the grid.fit(x,y) to tune the batchsize and epoch for my CNN. One strange thing I encountered is that, when I used all my data, the ‘mean_test_score’ is only 30%.
    But if I split data and only use train set for grid.fit(x_train, y_train), then my accuracy is over 70%, even if I use 0.001 to split the train test set(meaning my trainset is 99.9%)

    I can’t figure out the reason for this, I have tried multiple times and it always gave me 30% when using all samples, and whenever I use part of data it always gave over 70% accuracy.

    Thanks!

  303. Avatar
    Richard July 24, 2022 at 3:40 am #

    Hi Jason,
    This is an excellent article and resource.
    I have a very large data set (900,000 rows, 10 columns).
    The grid search method take all available CPU and memory on my machine.
    I’ve tried using AWS sagemaker instead, but tensorflow > 2.7 is not supported for scikeras.
    I’d like to use sequential ANN.
    Any tips on how i can get this to work for me?
    Thanks

  304. Avatar
    Larissa Benevides September 30, 2022 at 12:09 pm #

    Hello Jason,

    Great tutorials, very didactic and explanatory. Anyone interested and with little knowledge can understand your tutorials. Very good!

    • Avatar
      James Carmichael October 1, 2022 at 6:54 am #

      You are very welcome Larissa! We greatly appreciate your feedback and support!

  305. Avatar
    Ugur October 2, 2022 at 5:30 am #

    create_model function call need ()

    • Avatar
      James Carmichael October 2, 2022 at 8:11 am #

      Thank you for the feedback Ugur!

  306. Avatar
    christian October 26, 2022 at 12:05 am #

    If we use a sample of our dataset for tuning ,the parameters obtained will still work well with the whole dataset?
    for example tuning the number of neurons, we know number of neurons depends on the size of our data so the number of neurons obtained with the sample will not suit well on the whole dataset. Please clarify on that point.
    Thanks

    • Avatar
      James Carmichael October 26, 2022 at 7:10 am #

      Hi christian…The number of neurons for the “input layer” is established by the training dataset size, however the “hidden layers” may contain any number of neurons.

  307. Avatar
    christian October 26, 2022 at 12:21 am #

    Suppose we tune the number of neurons in the hidden layers and we get 50 , how do we decide if we use 1, 2 or 3 hidden layers? and let’s say we decide to have 2 hidden layers ,will each layer have 50 neurons?

  308. Avatar
    Johnny B November 17, 2022 at 10:13 am #

    Hello –
    This is a great post, thanks for making it. I have a couple of questions I’m hoping you can answer.

    1.) Is there a way to see which data points are used as the training data points for each k-fold?

    2.) Could we just put this: tf.random.set_seed(7), inside of the create_model() function to get more consistent results? Either way, is there an issue/problem with doing so?

    3.) When this code is executed:
    model = KerasClassifier(model=create_model, verbose=0)
    Is there a reason why it is create_model instead of create_model()?

    4.) In the following two lines of code:
    model = KerasClassifier(model=create_model, verbose=0)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
    Is there a reason why we set the KerasClassifier equal to the variable ‘model’, couldn’t that just be any variable name? Just wondering if there is a reason because it gets a little confusing since the ‘model’ inside of KerasClassifier is asking for a model name too, or is this just the convention?

    Thanks!
    Johnny

  309. Avatar
    Johnny B November 18, 2022 at 5:04 am #

    Hi James –
    That link was helpful, but didn’t really answer any of my questions, but I might have missed something. Thanks anyways. Still a great post!

  310. Avatar
    Jim November 25, 2022 at 7:19 am #

    Hi, Jason. Thank you for this tutorial. I’m getting a couple of errors that I can’t solve:

    1: I’ve tried installing scikeras with conda install conda-forge scikeras and pip install scikeras[tensorflow] --user but I’m still getting a ModuleNotFoundError: No module named 'scikeras' error.

    2: I also get NameError: name '_read' is not defined when running `dataset = np.loadtxt(“pima-indians-diabetes.csv”, delimiter = ‘,’)

    Any ideas on how to resolve these errors?

    • Avatar
      James Carmichael November 25, 2022 at 9:15 am #

      Hi Jim…You may want to try your code in Google Colab until you can resolve the installation issue.

  311. Avatar
    zen November 28, 2022 at 4:17 am #

    is it good to use grid search , random search or bayes search , iam using currently now RNN gru arhitecture model

    • Avatar
      James Carmichael November 29, 2022 at 9:37 am #

      Hi Zen…My recommendation would be investigate bayesian methods for this case.

  312. Avatar
    Sara December 14, 2022 at 11:34 pm #

    Hello, thank you for your nice and informative website.
    I learned that by using the train-test-split and without explicitly mentioning any validation test, we still have not any data leakage if we use GridSearchCV for hyper tuning.
    My question is how I can still prevent data leakage just by having a train and test sets and no explicit validation test if I want to ensemble the classifiers.

  313. Avatar
    AGGELOS PAPOUTSIS January 16, 2023 at 4:20 am #

    Hi Jason,

    Why do you apply the grid fit procedure on X ,y and do not train test split and apply to the X_train, y_train ?

    Thanks

  314. Avatar
    Sue October 26, 2023 at 2:29 am #

    Hi, I can’t run this coding both in Kaggle and my laptop Python 3.11.3. Sorry to ask this but I am a newbies in Python. May I know is it possible for Mr Jason/ Mr James to show this in Kaggle?

    • Avatar
      James Carmichael October 26, 2023 at 10:48 am #

      Hi Sue…Please clarify the issue you are having with running the code. This will better enable us to guide you.

  315. Avatar
    Filbe January 4, 2024 at 10:19 pm #

    Hello!
    This post is awesome it literally covers anything I wanted to learn, however I get errors when I try to replicate the learning rate optimization.

    This is my test setup:

    optimizer = tf.keras.optimizers.Adam()

    simple_model_wrapper = KerasRegressor(model_OG_trained,optimizer= optimizer,metrics=[‘mae’],verbose=False)

    #Define hyperparameters
    batch_size = [2,4,6,8,10]
    learning_rate = scipy.stats.uniform()
    epochs = np.random.randint(10,51)
    loss = [‘mse’,dual_loss] #dual_loss is a custom loss object
    param_distributions = dict(optimizer__learning_rate=learning_rate,batch_size=batch_size, epochs=epochs,loss=loss)

    simple_model_grid = RandomizedSearchCV(estimator=simple_model_wrapper, param_distributions=param_distributions)

    And when calling fit I get this issue:
    > 51 model.optimizer.build(model.trainable_variables)
    52 return model

    AttributeError: ‘NoneType’ object has no attribute ‘build’ Could you recommend some way of fixing this, or more info so I can debug please?

  316. Avatar
    Edie March 19, 2024 at 2:10 am #

    I am trying to run the first version of the example on how to tune the training optimization algorithm. Everything works fine on Colab (with Python 3.10.2, Keras 2.15, Scikit 1.2.2 and tensorflow 2.15.0). When I have tried to run the same code on my laptop (Macbook Pro M1 Max with Python 3.11.7, Keras 3.0.5, Scikit 1.4.1.post1 and tensorflow 2.16.1) I have the following error that arise when the instruction grid.fit is executed:
    ValueError: Could not interpret metric identifier: loss

    I have tried many solution but apparently nothing works. I would appreciate any help. Thanks in advance.

    • Avatar
      James Carmichael March 19, 2024 at 8:14 am #

      Hi Edie…did you copy and paste the code or type it? Also…keep the following in mind:

      The error ValueError: Could not interpret metric identifier: loss typically occurs in the context of using machine learning libraries like Keras (TensorFlow) when configuring the model for training. This error suggests there’s an issue with how the loss function or a metric is specified. Here are some common reasons and solutions for this error:

      ### Common Causes and Solutions

      1. **Incorrectly Specifying the Loss Function**: Ensure that the loss function is correctly specified when compiling the model. For example, use 'binary_crossentropy' for binary classification problems, 'categorical_crossentropy' for multi-class classification problems, etc.

      python
      model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

      2. **Incorrectly Specifying Metrics**: If you intended to use ‘loss’ as a metric, ensure that you are using it correctly. Generally, ‘loss’ is not directly specified in the metrics argument since it is inherently a part of the training process. If you want to monitor the loss during training, just ensure that the loss argument is correctly set. The training process automatically includes loss as a metric to monitor.

      3. **Custom Loss Function or Metric Not Defined Properly**: If you’re using a custom loss function or metric, ensure it is correctly defined. A custom loss function should take the true labels and predicted labels as arguments and return a loss value. After defining it, you can pass it directly to the loss or metrics argument by its function name.

      python
      def custom_loss_function(y_true, y_pred):
      # Calculate and return loss
      return loss_value

      model.compile(optimizer='adam', loss=custom_loss_function, metrics=['accuracy'])

      4. **Typographical Error**: Check for typographical errors in specifying the loss function or metrics. This includes incorrect spelling or using unsupported values in the context of your specific library version.

      5. **Library Version Mismatch**: Ensure that the syntax and identifiers you are using are compatible with the version of the library (like TensorFlow or Keras) you are working with. Sometimes, the way to specify loss functions or metrics changes between versions.

      6. **Misplaced Argument**: Make sure that the loss argument is correctly placed within the compile() method of your model, and not mistakenly placed within the fit() method or any other method.

      ### General Advice

      – **Consult Documentation**: Always consult the official documentation of the library you are using (e.g., TensorFlow, Keras) for the correct syntax and available options for loss functions and metrics.
      – **Update Libraries**: If you suspect a version mismatch issue, consider updating your machine learning library to the latest version, but ensure your code is compatible with the update.

      If these solutions don’t resolve your issue, it might help to review the exact context in which the error occurs, including how you’ve defined and compiled your model.

Leave a Reply