How to Grid Search Hyperparameters for Deep Learning Models in Python with Keras

By Jason Brownlee on August 4, 2022 in Deep Learning 815

Hyperparameter optimization is a big part of deep learning.

The reason is that neural networks are notoriously difficult to configure, and a lot of parameters need to be set. On top of that, individual models can be very slow to train.

In this post, you will discover how to use the grid search capability from the scikit-learn Python machine learning library to tune the hyperparameters of Keras’s deep learning models.

After reading this post, you will know:

How to wrap Keras models for use in scikit-learn and how to use grid search
How to grid search common neural network parameters, such as learning rate, dropout rate, epochs, and number of neurons
How to define your own hyperparameter tuning experiments on your own projects

Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Aug/2016: First published
Update Nov/2016: Fixed minor issue in displaying grid search results in code examples
Update Oct/2016: Updated examples for Keras 1.1.0, TensorFlow 0.10.0 and scikit-learn v0.18
Update Mar/2017: Updated example for Keras 2.0.2, TensorFlow 1.0.1 and Theano 0.9.0
Update Sept/2017: Updated example to use Keras 2 “epochs” instead of Keras 1 “nb_epochs”
Update March/2018: Added alternate link to download the dataset
Update Oct/2019: Updated for Keras 2.3.0 API
Update Jul/2022: Updated for TensorFlow/Keras and SciKeras 0.8

How to grid search hyperparameters for deep learning models in Python with Keras
Photo by 3V Photo, some rights reserved.

Overview

In this post, you will discover how you can use the scikit-learn grid search capability. You will be given a suite of examples that you can copy and paste into your own project as a starting point.

Below is a list of the topics this post will cover:

How to use Keras models in scikit-learn
How to use grid search in scikit-learn
How to tune batch size and training epochs
How to tune optimization algorithms
How to tune learning rate and momentum
How to tune network weight initialization
How to tune activation functions
How to tune dropout regularization
How to tune the number of neurons in the hidden layer

How to Use Keras Models in scikit-learn

Keras models can be used in scikit-learn by wrapping them with the KerasClassifier or KerasRegressor class from the module SciKeras. You may need to run the command pip install scikeras first to install the module.

To use these wrappers, you must define a function that creates and returns your Keras sequential model, then pass this function to the model argument when constructing the KerasClassifier class.

For example:

def create_model():
	...
	return model

model = KerasClassifier(model=create_model)

def create_model():

...

return model

model = KerasClassifier(model=create_model)

The constructor for the KerasClassifier class can take default arguments that are passed on to the calls to model.fit(), such as the number of epochs and the batch size.

For example:

def create_model():
	...
	return model

model = KerasClassifier(model=create_model, epochs=10)

def create_model():

...

return model

model = KerasClassifier(model=create_model, epochs=10)

The constructor for the KerasClassifier class can also take new arguments that can be passed to your custom create_model() function. These new arguments must also be defined in the signature of your create_model() function with default parameters.

For example:

def create_model(dropout_rate=0.0):
	...
	return model

model = KerasClassifier(model=create_model, dropout_rate=0.2)

def create_model(dropout_rate=0.0):

...

return model

model = KerasClassifier(model=create_model, dropout_rate=0.2)

You can learn more about these from the SciKeras documentation.

How to Use Grid Search in scikit-learn

Grid search is a model hyperparameter optimization technique.

In scikit-learn, this technique is provided in the GridSearchCV class.

When constructing this class, you must provide a dictionary of hyperparameters to evaluate in the param_grid argument. This is a map of the model parameter name and an array of values to try.

By default, accuracy is the score that is optimized, but other scores can be specified in the score argument of the GridSearchCV constructor.

By default, the grid search will only use one thread. By setting the n_jobs argument in the GridSearchCV constructor to -1, the process will use all cores on your machine. However, sometimes this may interfere with the main neural network training process.

The GridSearchCV process will then construct and evaluate one model for each combination of parameters. Cross validation is used to evaluate each individual model, and the default of 3-fold cross validation is used, although you can override this by specifying the cv argument to the GridSearchCV constructor.

Below is an example of defining a simple grid search:

param_grid = dict(epochs=[10,20,30])
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)

param_grid = dict(epochs=[10,20,30])

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)

grid_result = grid.fit(X, Y)

Once completed, you can access the outcome of the grid search in the result object returned from grid.fit(). The best_score_ member provides access to the best score observed during the optimization procedure, and the best_params_ describes the combination of parameters that achieved the best results.

You can learn more about the GridSearchCV class in the scikit-learn API documentation.

Problem Description

Now that you know how to use Keras models with scikit-learn and how to use grid search in scikit-learn, let’s look at a bunch of examples.

All examples will be demonstrated on a small standard machine learning dataset called the Pima Indians onset of diabetes classification dataset. This is a small dataset with all numerical attributes that is easy to work with.

Download the dataset and place it in your currently working directly with the name pima-indians-diabetes.csv (update: download from here).

As you proceed through the examples in this post, you will aggregate the best parameters. This is not the best way to grid search because parameters can interact, but it is good for demonstration purposes.

Note on Parallelizing Grid Search

All examples are configured to use parallelism (n_jobs=-1).

If you get an error like the one below:

INFO (theano.gof.compilelock): Waiting for existing lock by process '55614' (I am process '55613')
INFO (theano.gof.compilelock): To manually release the lock, delete ...

1 2	INFO (theano.gof.compilelock): Waiting for existing lock by process '55614' (I am process '55613') INFO (theano.gof.compilelock): To manually release the lock, delete ...

Kill the process and change the code to not perform the grid search in parallel; set n_jobs=1.

Need help with Deep Learning in Python?

Take my free 2-week email course and discover MLPs, CNNs and LSTMs (with code).

Click to sign-up now and also get a free PDF Ebook version of the course.

How to Tune Batch Size and Number of Epochs

In this first simple example, you will look at tuning the batch size and number of epochs used when fitting the network.

The batch size in iterative gradient descent is the number of patterns shown to the network before the weights are updated. It is also an optimization in the training of the network, defining how many patterns to read at a time and keep in memory.

The number of epochs is the number of times the entire training dataset is shown to the network during training. Some networks are sensitive to the batch size, such as LSTM recurrent neural networks and Convolutional Neural Networks.

Here you will evaluate a suite of different mini-batch sizes from 10 to 100 in steps of 20.

The full code listing is provided below:

# Use scikit-learn to grid search the batch size and epochs
import numpy as np
import tensorflow as tf
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
# Function to create model, required for KerasClassifier
def create_model():
	# create model
	model = Sequential()
	model.add(Dense(12, input_shape=(8,), activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
# fix random seed for reproducibility
seed = 7
tf.random.set_seed(seed)
# load dataset
dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(model=create_model, verbose=0)
# define the grid search parameters
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

# Use scikit-learn to grid search the batch size and epochs

import numpy as np

import tensorflow as tf

from sklearn.model_selection import GridSearchCV

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from scikeras.wrappers import KerasClassifier

# Function to create model, required for KerasClassifier

def create_model():

# create model

model = Sequential()

model.add(Dense(12, input_shape=(8,), activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# Compile model

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

# fix random seed for reproducibility

seed = 7

tf.random.set_seed(seed)

# load dataset

dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",")

# split into input (X) and output (Y) variables

X = dataset[:,0:8]

Y = dataset[:,8]

# create model

model = KerasClassifier(model=create_model, verbose=0)

# define the grid search parameters

batch_size = [10, 20, 40, 60, 80, 100]

epochs = [10, 50, 100]

param_grid = dict(batch_size=batch_size, epochs=epochs)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)

grid_result = grid.fit(X, Y)

# summarize results

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) with: %r" % (mean, stdev, param))

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running this example produces the following output:

Best: 0.705729 using {'batch_size': 10, 'epochs': 100}
0.597656 (0.030425) with: {'batch_size': 10, 'epochs': 10}
0.686198 (0.017566) with: {'batch_size': 10, 'epochs': 50}
0.705729 (0.017566) with: {'batch_size': 10, 'epochs': 100}
0.494792 (0.009207) with: {'batch_size': 20, 'epochs': 10}
0.675781 (0.017758) with: {'batch_size': 20, 'epochs': 50}
0.683594 (0.011049) with: {'batch_size': 20, 'epochs': 100}
0.535156 (0.053274) with: {'batch_size': 40, 'epochs': 10}
0.622396 (0.009744) with: {'batch_size': 40, 'epochs': 50}
0.671875 (0.019918) with: {'batch_size': 40, 'epochs': 100}
0.592448 (0.042473) with: {'batch_size': 60, 'epochs': 10}
0.660156 (0.041707) with: {'batch_size': 60, 'epochs': 50}
0.674479 (0.006639) with: {'batch_size': 60, 'epochs': 100}
0.476562 (0.099896) with: {'batch_size': 80, 'epochs': 10}
0.608073 (0.033197) with: {'batch_size': 80, 'epochs': 50}
0.660156 (0.011500) with: {'batch_size': 80, 'epochs': 100}
0.615885 (0.015073) with: {'batch_size': 100, 'epochs': 10}
0.617188 (0.039192) with: {'batch_size': 100, 'epochs': 50}
0.632812 (0.019918) with: {'batch_size': 100, 'epochs': 100}

Best: 0.705729 using {'batch_size': 10, 'epochs': 100}

0.597656 (0.030425) with: {'batch_size': 10, 'epochs': 10}

0.686198 (0.017566) with: {'batch_size': 10, 'epochs': 50}

0.705729 (0.017566) with: {'batch_size': 10, 'epochs': 100}

0.494792 (0.009207) with: {'batch_size': 20, 'epochs': 10}

0.675781 (0.017758) with: {'batch_size': 20, 'epochs': 50}

0.683594 (0.011049) with: {'batch_size': 20, 'epochs': 100}

0.535156 (0.053274) with: {'batch_size': 40, 'epochs': 10}

0.622396 (0.009744) with: {'batch_size': 40, 'epochs': 50}

0.671875 (0.019918) with: {'batch_size': 40, 'epochs': 100}

0.592448 (0.042473) with: {'batch_size': 60, 'epochs': 10}

0.660156 (0.041707) with: {'batch_size': 60, 'epochs': 50}

0.674479 (0.006639) with: {'batch_size': 60, 'epochs': 100}

0.476562 (0.099896) with: {'batch_size': 80, 'epochs': 10}

0.608073 (0.033197) with: {'batch_size': 80, 'epochs': 50}

0.660156 (0.011500) with: {'batch_size': 80, 'epochs': 100}

0.615885 (0.015073) with: {'batch_size': 100, 'epochs': 10}

0.617188 (0.039192) with: {'batch_size': 100, 'epochs': 50}

0.632812 (0.019918) with: {'batch_size': 100, 'epochs': 100}

You can see that the batch size of 10 and 100 epochs achieved the best result of about 70% accuracy.

How to Tune the Training Optimization Algorithm

Keras offers a suite of different state-of-the-art optimization algorithms.

In this example, you will tune the optimization algorithm used to train the network, each with default parameters.

This is an odd example because often, you will choose one approach a priori and instead focus on tuning its parameters on your problem (see the next example).

Here, you will evaluate the suite of optimization algorithms supported by the Keras API.

The full code listing is provided below:

# Use scikit-learn to grid search the batch size and epochs
import numpy as np
import tensorflow as tf
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
# Function to create model, required for KerasClassifier
def create_model():
	# create model
	model = Sequential()
	model.add(Dense(12, input_shape=(8,), activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# return model without compile
	return model
# fix random seed for reproducibility
seed = 7
tf.random.set_seed(seed)
# load dataset
dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(model=create_model, loss="binary_crossentropy", epochs=100, batch_size=10, verbose=0)
# define the grid search parameters
optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']
param_grid = dict(optimizer=optimizer)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

# Use scikit-learn to grid search the batch size and epochs

import numpy as np

import tensorflow as tf

from sklearn.model_selection import GridSearchCV

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from scikeras.wrappers import KerasClassifier

# Function to create model, required for KerasClassifier

def create_model():

# create model

model = Sequential()

model.add(Dense(12, input_shape=(8,), activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# return model without compile

return model

# fix random seed for reproducibility

seed = 7

tf.random.set_seed(seed)

# load dataset

dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",")

# split into input (X) and output (Y) variables

X = dataset[:,0:8]

Y = dataset[:,8]

# create model

model = KerasClassifier(model=create_model, loss="binary_crossentropy", epochs=100, batch_size=10, verbose=0)

# define the grid search parameters

optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']

param_grid = dict(optimizer=optimizer)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)

grid_result = grid.fit(X, Y)

# summarize results

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) with: %r" % (mean, stdev, param))

Note the function create_model() defined above does not return a compiled model like that one in the previous example. This is because setting an optimizer for a Keras model is done in the compile() function call; hence it is better to leave it to the KerasClassifier wrapper and the GridSearchCV model. Also, note that you specified loss="binary_crossentropy" in the wrapper as it should also be set during the compile() function call.

Running this example produces the following output:

Best: 0.697917 using {'optimizer': 'Adam'}
0.674479 (0.033804) with: {'optimizer': 'SGD'}
0.649740 (0.040386) with: {'optimizer': 'RMSprop'}
0.595052 (0.032734) with: {'optimizer': 'Adagrad'}
0.348958 (0.001841) with: {'optimizer': 'Adadelta'}
0.697917 (0.038051) with: {'optimizer': 'Adam'}
0.652344 (0.019918) with: {'optimizer': 'Adamax'}
0.684896 (0.011201) with: {'optimizer': 'Nadam'}

Best: 0.697917 using {'optimizer': 'Adam'}

0.674479 (0.033804) with: {'optimizer': 'SGD'}

0.649740 (0.040386) with: {'optimizer': 'RMSprop'}

0.595052 (0.032734) with: {'optimizer': 'Adagrad'}

0.348958 (0.001841) with: {'optimizer': 'Adadelta'}

0.697917 (0.038051) with: {'optimizer': 'Adam'}

0.652344 (0.019918) with: {'optimizer': 'Adamax'}

0.684896 (0.011201) with: {'optimizer': 'Nadam'}

The KerasClassifier wrapper will not compile your model again if the model is already compiled. Hence the other way to run GridSearchCV is to set the optimizer as an argument to the create_model() function, which returns an appropriately compiled model like the following:

# Use scikit-learn to grid search the batch size and epochs
import numpy as np
import tensorflow as tf
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
# Function to create model, required for KerasClassifier
def create_model(optimizer='adam'):
	# create model
	model = Sequential()
	model.add(Dense(12, input_shape=(8,), activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
	return model
# fix random seed for reproducibility
seed = 7
tf.random.set_seed(seed)
# load dataset
dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(model=create_model, epochs=100, batch_size=10, verbose=0)
# define the grid search parameters
optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']
param_grid = dict(model__optimizer=optimizer)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

# Use scikit-learn to grid search the batch size and epochs

import numpy as np

import tensorflow as tf

from sklearn.model_selection import GridSearchCV

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from scikeras.wrappers import KerasClassifier

# Function to create model, required for KerasClassifier

def create_model(optimizer='adam'):

# create model

model = Sequential()

model.add(Dense(12, input_shape=(8,), activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# Compile model

model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])

return model

# fix random seed for reproducibility

seed = 7

tf.random.set_seed(seed)

# load dataset

dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",")

# split into input (X) and output (Y) variables

X = dataset[:,0:8]

Y = dataset[:,8]

# create model

model = KerasClassifier(model=create_model, epochs=100, batch_size=10, verbose=0)

# define the grid search parameters

optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']

param_grid = dict(model__optimizer=optimizer)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)

grid_result = grid.fit(X, Y)

# summarize results

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) with: %r" % (mean, stdev, param))

Note that in the above, you have the prefix model__ in the parameter dictionary param_grid. This is required for the KerasClassifier in the SciKeras module to make clear that the parameter needs to route into the create_model() function as arguments, rather than some parameter to set up in compile() or fit(). See also the routed parameter section of SciKeras documentation.

Running this example produces the following output:

Best: 0.697917 using {'model__optimizer': 'Adam'}
0.636719 (0.019401) with: {'model__optimizer': 'SGD'}
0.683594 (0.020915) with: {'model__optimizer': 'RMSprop'}
0.585938 (0.038670) with: {'model__optimizer': 'Adagrad'}
0.518229 (0.120624) with: {'model__optimizer': 'Adadelta'}
0.697917 (0.049445) with: {'model__optimizer': 'Adam'}
0.652344 (0.027805) with: {'model__optimizer': 'Adamax'}
0.686198 (0.012890) with: {'model__optimizer': 'Nadam'}

Best: 0.697917 using {'model__optimizer': 'Adam'}

0.636719 (0.019401) with: {'model__optimizer': 'SGD'}

0.683594 (0.020915) with: {'model__optimizer': 'RMSprop'}

0.585938 (0.038670) with: {'model__optimizer': 'Adagrad'}

0.518229 (0.120624) with: {'model__optimizer': 'Adadelta'}

0.697917 (0.049445) with: {'model__optimizer': 'Adam'}

0.652344 (0.027805) with: {'model__optimizer': 'Adamax'}

0.686198 (0.012890) with: {'model__optimizer': 'Nadam'}

The results suggest that the ADAM optimization algorithm is the best with a score of about 70% accuracy.

How to Tune Learning Rate and Momentum

It is common to pre-select an optimization algorithm to train your network and tune its parameters.

By far, the most common optimization algorithm is plain old Stochastic Gradient Descent (SGD) because it is so well understood. In this example, you will look at optimizing the SGD learning rate and momentum parameters.

The learning rate controls how much to update the weight at the end of each batch, and the momentum controls how much to let the previous update influence the current weight update.

You will try a suite of small standard learning rates and momentum values from 0.2 to 0.8 in steps of 0.2, as well as 0.9 (because it can be a popular value in practice). In Keras, the way to set the learning rate and momentum is the following:

...
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.2)

1 2	... optimizer = tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.2)

In the SciKeras wrapper, you will route the parameters to the optimizer with the prefix optimizer__.

Generally, it is a good idea to also include the number of epochs in an optimization like this as there is a dependency between the amount of learning per batch (learning rate), the number of updates per epoch (batch size), and the number of epochs.

The full code listing is provided below:

# Use scikit-learn to grid search the learning rate and momentum
import numpy as np
import tensorflow as tf
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from scikeras.wrappers import KerasClassifier
# Function to create model, required for KerasClassifier
def create_model():
	# create model
	model = Sequential()
	model.add(Dense(12, input_shape=(8,), activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	return model
# fix random seed for reproducibility
seed = 7
tf.random.set_seed(seed)
# load dataset
dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(model=create_model, loss="binary_crossentropy", optimizer="SGD", epochs=100, batch_size=10, verbose=0)
# define the grid search parameters
learn_rate = [0.001, 0.01, 0.1, 0.2, 0.3]
momentum = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]
param_grid = dict(optimizer__learning_rate=learn_rate, optimizer__momentum=momentum)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

# Use scikit-learn to grid search the learning rate and momentum

import numpy as np

import tensorflow as tf

from sklearn.model_selection import GridSearchCV

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.optimizers import SGD

from scikeras.wrappers import KerasClassifier

# Function to create model, required for KerasClassifier

def create_model():

# create model

model = Sequential()

model.add(Dense(12, input_shape=(8,), activation='relu'))

model.add(Dense(1, activation='sigmoid'))

return model

# fix random seed for reproducibility

seed = 7

tf.random.set_seed(seed)

# load dataset

dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",")

# split into input (X) and output (Y) variables

X = dataset[:,0:8]

Y = dataset[:,8]

# create model

model = KerasClassifier(model=create_model, loss="binary_crossentropy", optimizer="SGD", epochs=100, batch_size=10, verbose=0)

# define the grid search parameters

learn_rate = [0.001, 0.01, 0.1, 0.2, 0.3]

momentum = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]

param_grid = dict(optimizer__learning_rate=learn_rate, optimizer__momentum=momentum)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)

grid_result = grid.fit(X, Y)

# summarize results

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) with: %r" % (mean, stdev, param))

Running this example produces the following output:

Best: 0.686198 using {'optimizer__learning_rate': 0.001, 'optimizer__momentum': 0.0}
0.686198 (0.036966) with: {'optimizer__learning_rate': 0.001, 'optimizer__momentum': 0.0}
0.651042 (0.009744) with: {'optimizer__learning_rate': 0.001, 'optimizer__momentum': 0.2}
0.652344 (0.038670) with: {'optimizer__learning_rate': 0.001, 'optimizer__momentum': 0.4}
0.656250 (0.065907) with: {'optimizer__learning_rate': 0.001, 'optimizer__momentum': 0.6}
0.671875 (0.022326) with: {'optimizer__learning_rate': 0.001, 'optimizer__momentum': 0.8}
0.661458 (0.015733) with: {'optimizer__learning_rate': 0.001, 'optimizer__momentum': 0.9}
0.665365 (0.021236) with: {'optimizer__learning_rate': 0.01, 'optimizer__momentum': 0.0}
0.671875 (0.003189) with: {'optimizer__learning_rate': 0.01, 'optimizer__momentum': 0.2}
0.640625 (0.008438) with: {'optimizer__learning_rate': 0.01, 'optimizer__momentum': 0.4}
0.648438 (0.003189) with: {'optimizer__learning_rate': 0.01, 'optimizer__momentum': 0.6}
0.649740 (0.003683) with: {'optimizer__learning_rate': 0.01, 'optimizer__momentum': 0.8}
0.651042 (0.001841) with: {'optimizer__learning_rate': 0.01, 'optimizer__momentum': 0.9}
0.651042 (0.001841) with: {'optimizer__learning_rate': 0.1, 'optimizer__momentum': 0.0}
0.651042 (0.001841) with: {'optimizer__learning_rate': 0.1, 'optimizer__momentum': 0.2}
0.651042 (0.001841) with: {'optimizer__learning_rate': 0.1, 'optimizer__momentum': 0.4}
0.651042 (0.001841) with: {'optimizer__learning_rate': 0.1, 'optimizer__momentum': 0.6}
0.651042 (0.001841) with: {'optimizer__learning_rate': 0.1, 'optimizer__momentum': 0.8}
0.651042 (0.001841) with: {'optimizer__learning_rate': 0.1, 'optimizer__momentum': 0.9}
0.651042 (0.001841) with: {'optimizer__learning_rate': 0.2, 'optimizer__momentum': 0.0}
0.651042 (0.001841) with: {'optimizer__learning_rate': 0.2, 'optimizer__momentum': 0.2}
0.651042 (0.001841) with: {'optimizer__learning_rate': 0.2, 'optimizer__momentum': 0.4}
0.651042 (0.001841) with: {'optimizer__learning_rate': 0.2, 'optimizer__momentum': 0.6}
0.651042 (0.001841) with: {'optimizer__learning_rate': 0.2, 'optimizer__momentum': 0.8}
0.651042 (0.001841) with: {'optimizer__learning_rate': 0.2, 'optimizer__momentum': 0.9}
0.652344 (0.003189) with: {'optimizer__learning_rate': 0.3, 'optimizer__momentum': 0.0}
0.651042 (0.001841) with: {'optimizer__learning_rate': 0.3, 'optimizer__momentum': 0.2}
0.651042 (0.001841) with: {'optimizer__learning_rate': 0.3, 'optimizer__momentum': 0.4}
0.651042 (0.001841) with: {'optimizer__learning_rate': 0.3, 'optimizer__momentum': 0.6}
0.651042 (0.001841) with: {'optimizer__learning_rate': 0.3, 'optimizer__momentum': 0.8}
0.651042 (0.001841) with: {'optimizer__learning_rate': 0.3, 'optimizer__momentum': 0.9}

Best: 0.686198 using {'optimizer__learning_rate': 0.001, 'optimizer__momentum': 0.0}

0.686198 (0.036966) with: {'optimizer__learning_rate': 0.001, 'optimizer__momentum': 0.0}

0.651042 (0.009744) with: {'optimizer__learning_rate': 0.001, 'optimizer__momentum': 0.2}

0.652344 (0.038670) with: {'optimizer__learning_rate': 0.001, 'optimizer__momentum': 0.4}

0.656250 (0.065907) with: {'optimizer__learning_rate': 0.001, 'optimizer__momentum': 0.6}

0.671875 (0.022326) with: {'optimizer__learning_rate': 0.001, 'optimizer__momentum': 0.8}

0.661458 (0.015733) with: {'optimizer__learning_rate': 0.001, 'optimizer__momentum': 0.9}

0.665365 (0.021236) with: {'optimizer__learning_rate': 0.01, 'optimizer__momentum': 0.0}

0.671875 (0.003189) with: {'optimizer__learning_rate': 0.01, 'optimizer__momentum': 0.2}

0.640625 (0.008438) with: {'optimizer__learning_rate': 0.01, 'optimizer__momentum': 0.4}

0.648438 (0.003189) with: {'optimizer__learning_rate': 0.01, 'optimizer__momentum': 0.6}

0.649740 (0.003683) with: {'optimizer__learning_rate': 0.01, 'optimizer__momentum': 0.8}

0.651042 (0.001841) with: {'optimizer__learning_rate': 0.01, 'optimizer__momentum': 0.9}

0.651042 (0.001841) with: {'optimizer__learning_rate': 0.1, 'optimizer__momentum': 0.0}

0.651042 (0.001841) with: {'optimizer__learning_rate': 0.1, 'optimizer__momentum': 0.2}

0.651042 (0.001841) with: {'optimizer__learning_rate': 0.1, 'optimizer__momentum': 0.4}

0.651042 (0.001841) with: {'optimizer__learning_rate': 0.1, 'optimizer__momentum': 0.6}

0.651042 (0.001841) with: {'optimizer__learning_rate': 0.1, 'optimizer__momentum': 0.8}

0.651042 (0.001841) with: {'optimizer__learning_rate': 0.1, 'optimizer__momentum': 0.9}

0.651042 (0.001841) with: {'optimizer__learning_rate': 0.2, 'optimizer__momentum': 0.0}

0.651042 (0.001841) with: {'optimizer__learning_rate': 0.2, 'optimizer__momentum': 0.2}

0.651042 (0.001841) with: {'optimizer__learning_rate': 0.2, 'optimizer__momentum': 0.4}

0.651042 (0.001841) with: {'optimizer__learning_rate': 0.2, 'optimizer__momentum': 0.6}

0.651042 (0.001841) with: {'optimizer__learning_rate': 0.2, 'optimizer__momentum': 0.8}

0.651042 (0.001841) with: {'optimizer__learning_rate': 0.2, 'optimizer__momentum': 0.9}

0.652344 (0.003189) with: {'optimizer__learning_rate': 0.3, 'optimizer__momentum': 0.0}

0.651042 (0.001841) with: {'optimizer__learning_rate': 0.3, 'optimizer__momentum': 0.2}

0.651042 (0.001841) with: {'optimizer__learning_rate': 0.3, 'optimizer__momentum': 0.4}

0.651042 (0.001841) with: {'optimizer__learning_rate': 0.3, 'optimizer__momentum': 0.6}

0.651042 (0.001841) with: {'optimizer__learning_rate': 0.3, 'optimizer__momentum': 0.8}

0.651042 (0.001841) with: {'optimizer__learning_rate': 0.3, 'optimizer__momentum': 0.9}

You can see that SGD is not very good on this problem; nevertheless, the best results were achieved using a learning rate of 0.001 and a momentum of 0.0 with an accuracy of about 68%.

How to Tune Network Weight Initialization

Neural network weight initialization used to be simple: use small random values.

Now there is a suite of different techniques to choose from. Keras provides a laundry list.

In this example, you will look at tuning the selection of network weight initialization by evaluating all the available techniques.

You will use the same weight initialization method on each layer. Ideally, it may be better to use different weight initialization schemes according to the activation function used on each layer. In the example below, you will use a rectifier for the hidden layer. Use sigmoid for the output layer because the predictions are binary. The weight initialization is now an argument to create_model() function, where you need to use the model__ prefix to ask the KerasClassifier to route the parameter to the model creation function.

The full code listing is provided below:

# Use scikit-learn to grid search the weight initialization
import numpy as np
import tensorflow as tf
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
# Function to create model, required for KerasClassifier
def create_model(init_mode='uniform'):
	# create model
	model = Sequential()
	model.add(Dense(12, input_shape=(8,), kernel_initializer=init_mode, activation='relu'))
	model.add(Dense(1, kernel_initializer=init_mode, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
# fix random seed for reproducibility
seed = 7
tf.random.set_seed(seed)
# load dataset
dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(model=create_model, epochs=100, batch_size=10, verbose=0)
# define the grid search parameters
init_mode = ['uniform', 'lecun_uniform', 'normal', 'zero', 'glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform']
param_grid = dict(model__init_mode=init_mode)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

# Use scikit-learn to grid search the weight initialization

import numpy as np

import tensorflow as tf

from sklearn.model_selection import GridSearchCV

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from scikeras.wrappers import KerasClassifier

# Function to create model, required for KerasClassifier

def create_model(init_mode='uniform'):

# create model

model = Sequential()

model.add(Dense(12, input_shape=(8,), kernel_initializer=init_mode, activation='relu'))

model.add(Dense(1, kernel_initializer=init_mode, activation='sigmoid'))

# Compile model

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

# fix random seed for reproducibility

seed = 7

tf.random.set_seed(seed)

# load dataset

dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",")

# split into input (X) and output (Y) variables

X = dataset[:,0:8]

Y = dataset[:,8]

# create model

model = KerasClassifier(model=create_model, epochs=100, batch_size=10, verbose=0)

# define the grid search parameters

init_mode = ['uniform', 'lecun_uniform', 'normal', 'zero', 'glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform']

param_grid = dict(model__init_mode=init_mode)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)

grid_result = grid.fit(X, Y)

# summarize results

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) with: %r" % (mean, stdev, param))

Running this example produces the following output:

Best: 0.716146 using {'model__init_mode': 'uniform'}
0.716146 (0.034987) with: {'model__init_mode': 'uniform'}
0.678385 (0.029635) with: {'model__init_mode': 'lecun_uniform'}
0.716146 (0.030647) with: {'model__init_mode': 'normal'}
0.651042 (0.001841) with: {'model__init_mode': 'zero'}
0.695312 (0.027805) with: {'model__init_mode': 'glorot_normal'}
0.690104 (0.023939) with: {'model__init_mode': 'glorot_uniform'}
0.647135 (0.057880) with: {'model__init_mode': 'he_normal'}
0.665365 (0.026557) with: {'model__init_mode': 'he_uniform'}

Best: 0.716146 using {'model__init_mode': 'uniform'}

0.716146 (0.034987) with: {'model__init_mode': 'uniform'}

0.678385 (0.029635) with: {'model__init_mode': 'lecun_uniform'}

0.716146 (0.030647) with: {'model__init_mode': 'normal'}

0.651042 (0.001841) with: {'model__init_mode': 'zero'}

0.695312 (0.027805) with: {'model__init_mode': 'glorot_normal'}

0.690104 (0.023939) with: {'model__init_mode': 'glorot_uniform'}

0.647135 (0.057880) with: {'model__init_mode': 'he_normal'}

0.665365 (0.026557) with: {'model__init_mode': 'he_uniform'}

We can see that the best results were achieved with a uniform weight initialization scheme achieving a performance of about 72%.

How to Tune the Neuron Activation Function

The activation function controls the non-linearity of individual neurons and when to fire.

Generally, the rectifier activation function is the most popular. However, it used to be the sigmoid and the tanh functions, and these functions may still be more suitable for different problems.

In this example, you will evaluate the suite of different activation functions available in Keras. You will only use these functions in the hidden layer, as a sigmoid activation function is required in the output for the binary classification problem. Similar to the previous example, this is an argument to the create_model() function, and you will use the model__ prefix for the GridSearchCV parameter grid.

Generally, it is a good idea to prepare data to the range of the different transfer functions, which you will not do in this case.

The full code listing is provided below:

# Use scikit-learn to grid search the activation function
import numpy as np
import tensorflow as tf
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
# Function to create model, required for KerasClassifier
def create_model(activation='relu'):
	# create model
	model = Sequential()
	model.add(Dense(12, input_shape=(8,), kernel_initializer='uniform', activation=activation))
	model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
# fix random seed for reproducibility
seed = 7
tf.random.set_seed(seed)
# load dataset
dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(model=create_model, epochs=100, batch_size=10, verbose=0)
# define the grid search parameters
activation = ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']
param_grid = dict(model__activation=activation)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

# Use scikit-learn to grid search the activation function

import numpy as np

import tensorflow as tf

from sklearn.model_selection import GridSearchCV

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from scikeras.wrappers import KerasClassifier

# Function to create model, required for KerasClassifier

def create_model(activation='relu'):

# create model

model = Sequential()

model.add(Dense(12, input_shape=(8,), kernel_initializer='uniform', activation=activation))

model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))

# Compile model

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

# fix random seed for reproducibility

seed = 7

tf.random.set_seed(seed)

# load dataset

dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",")

# split into input (X) and output (Y) variables

X = dataset[:,0:8]

Y = dataset[:,8]

# create model

model = KerasClassifier(model=create_model, epochs=100, batch_size=10, verbose=0)

# define the grid search parameters

activation = ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']

param_grid = dict(model__activation=activation)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)

grid_result = grid.fit(X, Y)

# summarize results

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) with: %r" % (mean, stdev, param))

Running this example produces the following output:

Best: 0.710938 using {'model__activation': 'linear'}
0.651042 (0.001841) with: {'model__activation': 'softmax'}
0.703125 (0.012758) with: {'model__activation': 'softplus'}
0.671875 (0.009568) with: {'model__activation': 'softsign'}
0.710938 (0.024080) with: {'model__activation': 'relu'}
0.669271 (0.019225) with: {'model__activation': 'tanh'}
0.675781 (0.011049) with: {'model__activation': 'sigmoid'}
0.677083 (0.004872) with: {'model__activation': 'hard_sigmoid'}
0.710938 (0.034499) with: {'model__activation': 'linear'}

Best: 0.710938 using {'model__activation': 'linear'}

0.651042 (0.001841) with: {'model__activation': 'softmax'}

0.703125 (0.012758) with: {'model__activation': 'softplus'}

0.671875 (0.009568) with: {'model__activation': 'softsign'}

0.710938 (0.024080) with: {'model__activation': 'relu'}

0.669271 (0.019225) with: {'model__activation': 'tanh'}

0.675781 (0.011049) with: {'model__activation': 'sigmoid'}

0.677083 (0.004872) with: {'model__activation': 'hard_sigmoid'}

0.710938 (0.034499) with: {'model__activation': 'linear'}

Surprisingly (to me at least), the “linear” activation function achieved the best results with an accuracy of about 71%.

How to Tune Dropout Regularization

In this example, you will look at tuning the dropout rate for regularization in an effort to limit overfitting and improve the model’s ability to generalize.

For the best results, dropout is best combined with a weight constraint such as the max norm constraint.

For more on using dropout in deep learning models with Keras see the post:

Dropout Regularization in Deep Learning Models With Keras

This involves fitting both the dropout percentage and the weight constraint. We will try dropout percentages between 0.0 and 0.9 (1.0 does not make sense) and maxnorm weight constraint values between 0 and 5.

The full code listing is provided below.

# Use scikit-learn to grid search the dropout rate
import numpy as np
import tensorflow as tf
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.constraints import MaxNorm
from scikeras.wrappers import KerasClassifier
# Function to create model, required for KerasClassifier
def create_model(dropout_rate, weight_constraint):
	# create model
	model = Sequential()
	model.add(Dense(12, input_shape=(8,), kernel_initializer='uniform', activation='linear', kernel_constraint=MaxNorm(weight_constraint)))
	model.add(Dropout(dropout_rate))
	model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
# fix random seed for reproducibility
seed = 7
tf.random.set_seed(seed)
# load dataset
dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",")
print(dataset.dtype, dataset.shape)
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(model=create_model, epochs=100, batch_size=10, verbose=0)
# define the grid search parameters
weight_constraint = [1.0, 2.0, 3.0, 4.0, 5.0]
dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
param_grid = dict(model__dropout_rate=dropout_rate, model__weight_constraint=weight_constraint)
#param_grid = dict(model__dropout_rate=dropout_rate)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

# Use scikit-learn to grid search the dropout rate

import numpy as np

import tensorflow as tf

from sklearn.model_selection import GridSearchCV

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.layers import Dropout

from tensorflow.keras.constraints import MaxNorm

from scikeras.wrappers import KerasClassifier

# Function to create model, required for KerasClassifier

def create_model(dropout_rate, weight_constraint):

# create model

model = Sequential()

model.add(Dense(12, input_shape=(8,), kernel_initializer='uniform', activation='linear', kernel_constraint=MaxNorm(weight_constraint)))

model.add(Dropout(dropout_rate))

model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))

# Compile model

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

# fix random seed for reproducibility

seed = 7

tf.random.set_seed(seed)

# load dataset

dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",")

print(dataset.dtype, dataset.shape)

# split into input (X) and output (Y) variables

X = dataset[:,0:8]

Y = dataset[:,8]

# create model

model = KerasClassifier(model=create_model, epochs=100, batch_size=10, verbose=0)

# define the grid search parameters

weight_constraint = [1.0, 2.0, 3.0, 4.0, 5.0]

dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]

param_grid = dict(model__dropout_rate=dropout_rate, model__weight_constraint=weight_constraint)

#param_grid = dict(model__dropout_rate=dropout_rate)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)

grid_result = grid.fit(X, Y)

# summarize results

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) with: %r" % (mean, stdev, param))

Running this example produces the following output.

Best: 0.766927 using {'model__dropout_rate': 0.2, 'model__weight_constraint': 3.0}
0.729167 (0.021710) with: {'model__dropout_rate': 0.0, 'model__weight_constraint': 1.0}
0.746094 (0.022326) with: {'model__dropout_rate': 0.0, 'model__weight_constraint': 2.0}
0.753906 (0.022097) with: {'model__dropout_rate': 0.0, 'model__weight_constraint': 3.0}
0.750000 (0.012758) with: {'model__dropout_rate': 0.0, 'model__weight_constraint': 4.0}
0.751302 (0.012890) with: {'model__dropout_rate': 0.0, 'model__weight_constraint': 5.0}
0.739583 (0.026748) with: {'model__dropout_rate': 0.1, 'model__weight_constraint': 1.0}
0.733073 (0.001841) with: {'model__dropout_rate': 0.1, 'model__weight_constraint': 2.0}
0.753906 (0.030425) with: {'model__dropout_rate': 0.1, 'model__weight_constraint': 3.0}
0.748698 (0.031466) with: {'model__dropout_rate': 0.1, 'model__weight_constraint': 4.0}
0.753906 (0.030425) with: {'model__dropout_rate': 0.1, 'model__weight_constraint': 5.0}
0.760417 (0.024360) with: {'model__dropout_rate': 0.2, 'model__weight_constraint': 1.0}
nan (nan) with: {'model__dropout_rate': 0.2, 'model__weight_constraint': 2.0}
0.766927 (0.021710) with: {'model__dropout_rate': 0.2, 'model__weight_constraint': 3.0}
0.755208 (0.010253) with: {'model__dropout_rate': 0.2, 'model__weight_constraint': 4.0}
0.750000 (0.008438) with: {'model__dropout_rate': 0.2, 'model__weight_constraint': 5.0}
0.725260 (0.015073) with: {'model__dropout_rate': 0.3, 'model__weight_constraint': 1.0}
0.738281 (0.008438) with: {'model__dropout_rate': 0.3, 'model__weight_constraint': 2.0}
0.748698 (0.003683) with: {'model__dropout_rate': 0.3, 'model__weight_constraint': 3.0}
0.740885 (0.023073) with: {'model__dropout_rate': 0.3, 'model__weight_constraint': 4.0}
0.735677 (0.008027) with: {'model__dropout_rate': 0.3, 'model__weight_constraint': 5.0}
0.743490 (0.009207) with: {'model__dropout_rate': 0.4, 'model__weight_constraint': 1.0}
0.751302 (0.006639) with: {'model__dropout_rate': 0.4, 'model__weight_constraint': 2.0}
0.750000 (0.024910) with: {'model__dropout_rate': 0.4, 'model__weight_constraint': 3.0}
0.744792 (0.030314) with: {'model__dropout_rate': 0.4, 'model__weight_constraint': 4.0}
0.751302 (0.010253) with: {'model__dropout_rate': 0.4, 'model__weight_constraint': 5.0}
0.757812 (0.006379) with: {'model__dropout_rate': 0.5, 'model__weight_constraint': 1.0}
0.740885 (0.030978) with: {'model__dropout_rate': 0.5, 'model__weight_constraint': 2.0}
0.742188 (0.003189) with: {'model__dropout_rate': 0.5, 'model__weight_constraint': 3.0}
0.718750 (0.016877) with: {'model__dropout_rate': 0.5, 'model__weight_constraint': 4.0}
0.726562 (0.019137) with: {'model__dropout_rate': 0.5, 'model__weight_constraint': 5.0}
0.725260 (0.013279) with: {'model__dropout_rate': 0.6, 'model__weight_constraint': 1.0}
0.738281 (0.013902) with: {'model__dropout_rate': 0.6, 'model__weight_constraint': 2.0}
0.743490 (0.001841) with: {'model__dropout_rate': 0.6, 'model__weight_constraint': 3.0}
0.722656 (0.009568) with: {'model__dropout_rate': 0.6, 'model__weight_constraint': 4.0}
0.747396 (0.024774) with: {'model__dropout_rate': 0.6, 'model__weight_constraint': 5.0}
0.729167 (0.006639) with: {'model__dropout_rate': 0.7, 'model__weight_constraint': 1.0}
0.717448 (0.012890) with: {'model__dropout_rate': 0.7, 'model__weight_constraint': 2.0}
0.710938 (0.027621) with: {'model__dropout_rate': 0.7, 'model__weight_constraint': 3.0}
0.718750 (0.014616) with: {'model__dropout_rate': 0.7, 'model__weight_constraint': 4.0}
0.743490 (0.021236) with: {'model__dropout_rate': 0.7, 'model__weight_constraint': 5.0}
0.713542 (0.009207) with: {'model__dropout_rate': 0.8, 'model__weight_constraint': 1.0}
nan (nan) with: {'model__dropout_rate': 0.8, 'model__weight_constraint': 2.0}
0.721354 (0.009207) with: {'model__dropout_rate': 0.8, 'model__weight_constraint': 3.0}
0.716146 (0.009207) with: {'model__dropout_rate': 0.8, 'model__weight_constraint': 4.0}
0.716146 (0.015073) with: {'model__dropout_rate': 0.8, 'model__weight_constraint': 5.0}
0.682292 (0.018688) with: {'model__dropout_rate': 0.9, 'model__weight_constraint': 1.0}
0.696615 (0.011201) with: {'model__dropout_rate': 0.9, 'model__weight_constraint': 2.0}
0.696615 (0.026557) with: {'model__dropout_rate': 0.9, 'model__weight_constraint': 3.0}
0.694010 (0.001841) with: {'model__dropout_rate': 0.9, 'model__weight_constraint': 4.0}
0.696615 (0.022628) with: {'model__dropout_rate': 0.9, 'model__weight_constraint': 5.0}

Best: 0.766927 using {'model__dropout_rate': 0.2, 'model__weight_constraint': 3.0}

0.729167 (0.021710) with: {'model__dropout_rate': 0.0, 'model__weight_constraint': 1.0}

0.746094 (0.022326) with: {'model__dropout_rate': 0.0, 'model__weight_constraint': 2.0}

0.753906 (0.022097) with: {'model__dropout_rate': 0.0, 'model__weight_constraint': 3.0}

0.750000 (0.012758) with: {'model__dropout_rate': 0.0, 'model__weight_constraint': 4.0}

0.751302 (0.012890) with: {'model__dropout_rate': 0.0, 'model__weight_constraint': 5.0}

0.739583 (0.026748) with: {'model__dropout_rate': 0.1, 'model__weight_constraint': 1.0}

0.733073 (0.001841) with: {'model__dropout_rate': 0.1, 'model__weight_constraint': 2.0}

0.753906 (0.030425) with: {'model__dropout_rate': 0.1, 'model__weight_constraint': 3.0}

0.748698 (0.031466) with: {'model__dropout_rate': 0.1, 'model__weight_constraint': 4.0}

0.753906 (0.030425) with: {'model__dropout_rate': 0.1, 'model__weight_constraint': 5.0}

0.760417 (0.024360) with: {'model__dropout_rate': 0.2, 'model__weight_constraint': 1.0}

nan (nan) with: {'model__dropout_rate': 0.2, 'model__weight_constraint': 2.0}

0.766927 (0.021710) with: {'model__dropout_rate': 0.2, 'model__weight_constraint': 3.0}

0.755208 (0.010253) with: {'model__dropout_rate': 0.2, 'model__weight_constraint': 4.0}

0.750000 (0.008438) with: {'model__dropout_rate': 0.2, 'model__weight_constraint': 5.0}

0.725260 (0.015073) with: {'model__dropout_rate': 0.3, 'model__weight_constraint': 1.0}

0.738281 (0.008438) with: {'model__dropout_rate': 0.3, 'model__weight_constraint': 2.0}

0.748698 (0.003683) with: {'model__dropout_rate': 0.3, 'model__weight_constraint': 3.0}

0.740885 (0.023073) with: {'model__dropout_rate': 0.3, 'model__weight_constraint': 4.0}

0.735677 (0.008027) with: {'model__dropout_rate': 0.3, 'model__weight_constraint': 5.0}

0.743490 (0.009207) with: {'model__dropout_rate': 0.4, 'model__weight_constraint': 1.0}

0.751302 (0.006639) with: {'model__dropout_rate': 0.4, 'model__weight_constraint': 2.0}

0.750000 (0.024910) with: {'model__dropout_rate': 0.4, 'model__weight_constraint': 3.0}

0.744792 (0.030314) with: {'model__dropout_rate': 0.4, 'model__weight_constraint': 4.0}

0.751302 (0.010253) with: {'model__dropout_rate': 0.4, 'model__weight_constraint': 5.0}

0.757812 (0.006379) with: {'model__dropout_rate': 0.5, 'model__weight_constraint': 1.0}

0.740885 (0.030978) with: {'model__dropout_rate': 0.5, 'model__weight_constraint': 2.0}

0.742188 (0.003189) with: {'model__dropout_rate': 0.5, 'model__weight_constraint': 3.0}

0.718750 (0.016877) with: {'model__dropout_rate': 0.5, 'model__weight_constraint': 4.0}

0.726562 (0.019137) with: {'model__dropout_rate': 0.5, 'model__weight_constraint': 5.0}

0.725260 (0.013279) with: {'model__dropout_rate': 0.6, 'model__weight_constraint': 1.0}

0.738281 (0.013902) with: {'model__dropout_rate': 0.6, 'model__weight_constraint': 2.0}

0.743490 (0.001841) with: {'model__dropout_rate': 0.6, 'model__weight_constraint': 3.0}

0.722656 (0.009568) with: {'model__dropout_rate': 0.6, 'model__weight_constraint': 4.0}

0.747396 (0.024774) with: {'model__dropout_rate': 0.6, 'model__weight_constraint': 5.0}

0.729167 (0.006639) with: {'model__dropout_rate': 0.7, 'model__weight_constraint': 1.0}

0.717448 (0.012890) with: {'model__dropout_rate': 0.7, 'model__weight_constraint': 2.0}

0.710938 (0.027621) with: {'model__dropout_rate': 0.7, 'model__weight_constraint': 3.0}

0.718750 (0.014616) with: {'model__dropout_rate': 0.7, 'model__weight_constraint': 4.0}

0.743490 (0.021236) with: {'model__dropout_rate': 0.7, 'model__weight_constraint': 5.0}

0.713542 (0.009207) with: {'model__dropout_rate': 0.8, 'model__weight_constraint': 1.0}

nan (nan) with: {'model__dropout_rate': 0.8, 'model__weight_constraint': 2.0}

0.721354 (0.009207) with: {'model__dropout_rate': 0.8, 'model__weight_constraint': 3.0}

0.716146 (0.009207) with: {'model__dropout_rate': 0.8, 'model__weight_constraint': 4.0}

0.716146 (0.015073) with: {'model__dropout_rate': 0.8, 'model__weight_constraint': 5.0}

0.682292 (0.018688) with: {'model__dropout_rate': 0.9, 'model__weight_constraint': 1.0}

0.696615 (0.011201) with: {'model__dropout_rate': 0.9, 'model__weight_constraint': 2.0}

0.696615 (0.026557) with: {'model__dropout_rate': 0.9, 'model__weight_constraint': 3.0}

0.694010 (0.001841) with: {'model__dropout_rate': 0.9, 'model__weight_constraint': 4.0}

0.696615 (0.022628) with: {'model__dropout_rate': 0.9, 'model__weight_constraint': 5.0}

We can see that the dropout rate of 20% and the MaxNorm weight constraint of 3 resulted in the best accuracy of about 77%. You may notice some of the result is nan. Probably it is due to the issue that the input is not normalized and you may run into a degenerated model by chance.

How to Tune the Number of Neurons in the Hidden Layer

The number of neurons in a layer is an important parameter to tune. Generally the number of neurons in a layer controls the representational capacity of the network, at least at that point in the topology.

Also, generally, a large enough single layer network can approximate any other neural network, at least in theory.

In this example, we will look at tuning the number of neurons in a single hidden layer. We will try values from 1 to 30 in steps of 5.

A larger network requires more training and at least the batch size and number of epochs should ideally be optimized with the number of neurons.

The full code listing is provided below.

# Use scikit-learn to grid search the number of neurons
import numpy as np
import tensorflow as tf
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from scikeras.wrappers import KerasClassifier
from tensorflow.keras.constraints import MaxNorm
# Function to create model, required for KerasClassifier
def create_model(neurons):
	# create model
	model = Sequential()
	model.add(Dense(neurons, input_shape=(8,), kernel_initializer='uniform', activation='linear', kernel_constraint=MaxNorm(4)))
	model.add(Dropout(0.2))
	model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
# fix random seed for reproducibility
seed = 7
tf.random.set_seed(seed)
# load dataset
dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(model=create_model, epochs=100, batch_size=10, verbose=0)
# define the grid search parameters
neurons = [1, 5, 10, 15, 20, 25, 30]
param_grid = dict(model__neurons=neurons)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

# Use scikit-learn to grid search the number of neurons

import numpy as np

import tensorflow as tf

from sklearn.model_selection import GridSearchCV

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.layers import Dropout

from scikeras.wrappers import KerasClassifier

from tensorflow.keras.constraints import MaxNorm

# Function to create model, required for KerasClassifier

def create_model(neurons):

# create model

model = Sequential()

model.add(Dense(neurons, input_shape=(8,), kernel_initializer='uniform', activation='linear', kernel_constraint=MaxNorm(4)))

model.add(Dropout(0.2))

model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))

# Compile model

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

# fix random seed for reproducibility

seed = 7

tf.random.set_seed(seed)

# load dataset

dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",")

# split into input (X) and output (Y) variables

X = dataset[:,0:8]

Y = dataset[:,8]

# create model

model = KerasClassifier(model=create_model, epochs=100, batch_size=10, verbose=0)

# define the grid search parameters

neurons = [1, 5, 10, 15, 20, 25, 30]

param_grid = dict(model__neurons=neurons)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)

grid_result = grid.fit(X, Y)

# summarize results

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) with: %r" % (mean, stdev, param))

Running this example produces the following output.

Best: 0.729167 using {'model__neurons': 30}
0.701823 (0.010253) with: {'model__neurons': 1}
0.717448 (0.011201) with: {'model__neurons': 5}
0.717448 (0.008027) with: {'model__neurons': 10}
0.720052 (0.019488) with: {'model__neurons': 15}
0.709635 (0.004872) with: {'model__neurons': 20}
0.708333 (0.003683) with: {'model__neurons': 25}
0.729167 (0.009744) with: {'model__neurons': 30}

Best: 0.729167 using {'model__neurons': 30}

0.701823 (0.010253) with: {'model__neurons': 1}

0.717448 (0.011201) with: {'model__neurons': 5}

0.717448 (0.008027) with: {'model__neurons': 10}

0.720052 (0.019488) with: {'model__neurons': 15}

0.709635 (0.004872) with: {'model__neurons': 20}

0.708333 (0.003683) with: {'model__neurons': 25}

0.729167 (0.009744) with: {'model__neurons': 30}

We can see that the best results were achieved with a network with 30 neurons in the hidden layer with an accuracy of about 73%.

Tips for Hyperparameter Optimization

This section lists some handy tips to consider when tuning hyperparameters of your neural network.

k-fold Cross Validation. You can see that the results from the examples in this post show some variance. A default cross-validation of 3 was used, but perhaps k=5 or k=10 would be more stable. Carefully choose your cross validation configuration to ensure your results are stable.
Review the Whole Grid. Do not just focus on the best result, review the whole grid of results and look for trends to support configuration decisions.
Parallelize. Use all your cores if you can, neural networks are slow to train and we often want to try a lot of different parameters. Consider spinning up a lot of AWS instances.
Use a Sample of Your Dataset. Because networks are slow to train, try training them on a smaller sample of your training dataset, just to get an idea of general directions of parameters rather than optimal configurations.
Start with Coarse Grids. Start with coarse-grained grids and zoom into finer grained grids once you can narrow the scope.
Do not Transfer Results. Results are generally problem specific. Try to avoid favorite configurations on each new problem that you see. It is unlikely that optimal results you discover on one problem will transfer to your next project. Instead look for broader trends like number of layers or relationships between parameters.
Reproducibility is a Problem. Although we set the seed for the random number generator in NumPy, the results are not 100% reproducible. There is more to reproducibility when grid searching wrapped Keras models than is presented in this post.

Summary

In this post, you discovered how you can tune the hyperparameters of your deep learning networks in Python using Keras and scikit-learn.

Specifically, you learned:

How to wrap Keras models for use in scikit-learn and how to use grid search.
How to grid search a suite of different standard neural network parameters for Keras models.
How to design your own hyperparameter optimization experiments.

Do you have any experience tuning hyperparameters of large neural networks? Please share your stories below.

Do you have any questions about hyperparameter optimization of neural networks or about this post? Ask your questions in the comments and I will do my best to answer.

815 Responses to How to Grid Search Hyperparameters for Deep Learning Models in Python with Keras

Yanbo August 9, 2016 at 9:10 am #

As always excellent post,. I’ve been doing some hyper-parameter optimization by hand, but I’ll definitely give Grid Search a try.

Is it possible to set up a different threshold for sigmoid output in Keras? Rather then using 0.5 I was thinking of trying 0.7 or 0.8

Reply
- Jason Brownlee August 15, 2016 at 11:10 am #
  
  Thanks Yanbo.
  
  I don’t think so, but you could implement your own activation function and do anything you wish.
  
  Reply
  - Shudhan September 5, 2016 at 6:20 pm #
    
    My question is related to this thread. How to get the probablities as the output? I dont want the class output. I read for a regression problem that no activation function is needed in the output layer. Similiar implementation will get me the probabilities ?? or the output will exceed 0 and 1??
    
    Reply
    - Jason Brownlee September 6, 2016 at 9:41 am #
      
      Hi Shudhan, you can use a sigmoid activation and treat the outputs like probabilities (they will be in the range of 0-1).
      
      Reply
  - Swapna November 2, 2017 at 11:51 pm #
    
    excellent post
    
    Reply
    - Jason Brownlee November 3, 2017 at 5:18 am #
      
      Thanks Swapna.
      
      Reply
eclipsedu August 18, 2016 at 5:55 pm #

Sound awesome!Will this grid search method use the full cpu(which can be 8/16 cores) ?

Reply
- Jason Brownlee August 19, 2016 at 5:23 am #
  
  It can if you set n_jobs=-1
  
  Reply
  - Hemanth Naidu S August 20, 2019 at 10:52 pm #
    
    Hi Jason,
    
    In grid search, we do get train score right?
    Why it’s not displaying in model.cv_results_ only test score we are getting..
    
    Reply
    - Jason Brownlee August 21, 2019 at 6:42 am #
      
      You get a cross-validation score for each configuration tested.
      
      Reply
Reza August 18, 2016 at 6:00 pm #

Hi,
Great post,
Can I use this tips on CNNs in keras as well?
Thanks!

Reply
- Jason Brownlee August 19, 2016 at 5:24 am #
  
  They can be a start, but remember it is a good idea to use a repeating structure in a large CNN and you will need to tune the number of filters and pool size.
  
  Reply
  - maxv April 29, 2019 at 3:30 am #
    
    Hi Jason thanks for everything.
    Could you explain what do you mean by repeatting structure in your reply please ?
    
    Quick question on the GridSearchCV for CNN, param_grid=param_grid using the sklearn wrapper gives this error : ”ValueError: filters is not a legal parameter ”
    How can we use the wrapper for the filters params of Conv1D ?
    Thanks
    
    Reply
    - Jason Brownlee April 29, 2019 at 8:25 am #
      
      Yes, see this post:
      https://machinelearningmastery.com/review-of-architectural-innovations-for-convolutional-neural-networks-for-image-classification/
      
      Perhaps try manually grid searching the parametres if you are working with time series, so that you can use walk forward validation:
      https://machinelearningmastery.com/how-to-develop-deep-learning-models-for-univariate-time-series-forecasting/
      
      Reply
  - Salvin Sanjesh Prasad April 8, 2021 at 1:02 pm #
    
    Dear Jason,
    
    This is an An excellent post. I have question: how can we grid search the optimum the number of filters in three different layers of CNN. For example: [60, 70 ,80] in layer 1, [20, 30, 40] in layer 2 and [5,10,20] in layer 3. I have searched everywhere for codes using grid search but could not find this. I really need to use grid search for this. I would be highly grateful for your kind advice. If possible, also reply in via my email address that I have provided (as this was a requirement for me to comment)
    
    Reply
    - Jason Brownlee April 9, 2021 at 5:16 am #
      
      Thanks.
      
      You might need to write some for-loops, e.g. do the search manually.
      
      Also, we never find an “optimal” configuration, just a good enough configuration given the time/resources available.
      
      Reply
Prashant August 22, 2016 at 4:55 pm #

Hi Jason, First of all great post! I applied this by dividing the data into train and test and used train dataset for grid fit. Plan was to capture best parameters in train and apply them on test to see accuracy. But it seems grid.fit and model.fit applied with same parameters on same dataset (in this case train) give different accuracy results. Any idea why this happens. I can share the code if it helps.

Reply
- Jason Brownlee August 23, 2016 at 6:00 am #
  
  You will see small variation in the performance of a neural net with the same parameters from run to run. This is because of the stochastic nature of the technique and how very hard it is to fix the random number seed successfully in python/numpy/theano.
  
  You will also see small variation due to the data used to train the method.
  
  Generally, you could use all of your data to grid search to try to reduce the second type of variation (slower). You could store results and use statistical significance tests to compare populations of results to see if differences are significant to sort out the first type or variation.
  
  I hope that helps.
  
  Reply
vinay August 22, 2016 at 9:05 pm #

hi, I think this will best tutorial i ever found on web….Thanks for sharing….is it possible to use these tips on LSTM, Bilstm cnnlstm

Reply
- Jason Brownlee August 23, 2016 at 5:57 am #
  
  Thanks Vinay, I’m glad it’s useful.
  
  Absolutely, you could use these tactics on other algorithm types.
  
  Reply
shudhan September 2, 2016 at 3:26 pm #

Best place to learn the tuning.. my question – is it good to follow the order you mentioned to tune the parameters? I know the most significant parameters should be tuned first

Reply
- Jason Brownlee September 3, 2016 at 6:56 am #
  
  Thanks. The order is a good start. It is best to focus on areas where you think you will get the biggest improvement first – which is often the structure of the network (layers and neurons).
  
  Reply
  - Reed Guo September 2, 2018 at 5:59 pm #
    
    Hi, Jason
    
    Thanks for your post. It is excellent.
    
    I have a question.
    
    You tune batch size and epoch first. But if you set a inappropriate number of neurons or activation function, then batch size and epoch tuning won’t make sense.
    
    So I think we should tune all of these hyper-parameters at the same time.
    
    How do you think about it?
    
    Reply
    - Jason Brownlee September 3, 2018 at 6:13 am #
      
      They are all connected. If we could, we would tune all the parameters, but almost always it requires too many resources.
      
      Reply
      - max v April 21, 2019 at 12:52 am #
        
        Hi Jason,
        Do you recommend any particular order, which hyper parameter shoudl we tune first ?
        There is an order in this article, you start with batch size and training epochs , then optimization etc.
        Did you find any ressource or research paper, explaining the best consecutive tuning order ?
        thanks
      - Jason Brownlee April 21, 2019 at 8:25 am #
        
        Learning rate!
        
        Yes, I have many. Perhaps start here:
        https://machinelearningmastery.com/learning-rate-for-deep-learning-neural-networks/
        
        More here:
        https://machinelearningmastery.com/start-here/#better
Satheesh September 27, 2016 at 12:24 am #

when I am using the categorical_entropy loss function and running the grid search with n_jobs more than 1 its throwing error “cannot pickle object class”, but the same thing is working fine with binary_entropyloss. Can you tell me if I am making any mistake in my code:
def create_model(optimizer=’adam’):
# create model
model.add(Dense(30, input_dim=59, init=’normal’, activation=’relu’))
model.add(Dense(15, init=’normal’, activation=’sigmoid’))
model.add(Dense(3, init=’normal’, activation=’sigmoid’))
# Compile model
model.compile(loss=’categorical_crossentropy’, optimizer=optimizer, metrics=[‘accuracy’])
return model

# Create Keras Classifier
print “——————— Running Grid Search on Keras Classifier for epochs and batch ——————”
clf = model = KerasClassifier(build_fn = create_model, verbose=0)
param_grid = {“batch_size”:range(10, 30, 10), “nb_epoch”:range(50, 150, 50)}
optimizer = [‘SGD’, ‘RMSprop’, ‘Adagrad’, ‘Adadelta’, ‘Adam’, ‘Adamax’, ‘Nadam’]
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=4)
grid_result = grid.fit(x_train, y_train)
print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))

Reply
- Jason Brownlee September 27, 2016 at 7:44 am #
  
  Strange Satheesh, I have not seen that before.
  
  Let me know if you figure it out.
  
  Reply
  - Kai September 18, 2017 at 10:01 pm #
    
    I came cross and solved the problem several days ago. Please use “epochs” instead of “nb_epoch” in param_grid dict. Personally, I guess “cannot pickle object class” means the neuron network cannot be built because of some errors. Open to discussion.
    
    Reply
    - Jason Brownlee September 19, 2017 at 7:40 am #
      
      Glad to hear it.
      
      I updated the example to use “epochs” to work with Keras 2.
      
      Reply
L Fenu November 9, 2016 at 7:47 pm #

excellent post, thanks. It’s been very helpful to get me started on hyperparameterisation.

One thing I haven’t been able to do yet is to grid search over parameters which are not proper to the NN but to the trainign set. For example, I can fine-tune the input_dim parameter by creating a function generator which takes care of creating the function that will create the model, like this:

# fp_subset is a subset of columns of my whole training set.

create_basic_ANN_model = kt.ANN_model_gen( # defined elsewhere
input_dim=len(fp_subset), output_dim=1, layers_num=2, layers_sizes=[len(fp_subset)/5, len(fp_subset)/10, ],
loss=’mean_squared_error’, optimizer=’adadelta’, metrics=[‘mean_squared_error’, ‘mean_absolute_error’]
)

model = KerasRegressor(build_fn=create_basic_ANN_model, verbose=1)
# define the grid search parameters
batch_size = [10, 100]
epochs = [5, 10]

param_grid = dict(batch_size=batch_size, nb_epoch=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=7)

grid_results = grid.fit(trX, trY)

this works but only as a for loop over the different fp_subset, which I must define manually.
I could easily pick the best out of every run but it wuld be great if I could fold them all inside a big grid definition and fit, so as to automatically pick the largest.

However, until now haven’t been able to figure out a way to get that in my head.
If the wrapper function is useful to anyone, I can post a generalised version here.

Reply
- Jason Brownlee November 10, 2016 at 7:42 am #
  
  Good question.
  
  You might just need to us a loop around the whole lot for different projections/views of your training data.
  
  Reply
  - L Fenu November 11, 2016 at 1:05 am #
    
    Thanks. I ended up coding my own for loop, saving the results of each grid in a dict, sorting the hash by the perofrmance metrics, and picking the best model.
    
    Now, the next question is: How do I save the model’s architecture and weights to a .json .hdf5 file? I know how to do that for a simple model. But how do I extract the best model out of the gridsearch results?
    
    Reply
    - Jason Brownlee November 11, 2016 at 10:04 am #
      
      Well done.
      
      No need. Once you know the parameters, you can use them to train a new standalone model on all of your training data and start making predictions.
      
      Reply
      - Fenu Luca November 15, 2016 at 3:23 am #
        
        I may have found a way. How about this?
        
        best_model = grid_result.best_estimator_.model
        best_model_file_path = ‘your_pick_here’
        model2json = best_model.to_json()
        with open( best_model_file_path+’.json’, ‘w’) as json_file:
        json_file.write(model2json)
        best_model.save_weights(best_model_file_path+’.h5′)
volador November 14, 2016 at 6:21 pm #

Hi Jason, I think this is very best deep learning tutorial on the web. Thanks for your work. I have a question is :how to use the heuristic algorithm to optimize Hyperparameters for Deep Learning Models in Python With Keras, these algorithms like: Genetic algorithm, Particle swarm optimization, and Cuckoo algorithm etc. If the idea could be experimented, could you give an example

Reply
- Jason Brownlee November 15, 2016 at 7:50 am #
  
  Thanks for your support volador.
  
  You could search the hyperparameter space using a stochastic optimization algorithm like a genetic algorithm and use the mean performance as the cost function orf fitness function. I don’t have a worked example, but it would be relatively easy to setup.
  
  Reply
Jan de Lange November 15, 2016 at 6:50 am #

Hi Jason, very helpful intro into gridsearch for Keras. I have used your guidance in my code, but rather than using the default ‘accuracy’ to be optimized, my model requires a specific evaluation function to be optimized. You hint at this possibility in the introduction, but there is no example of it. I have followed the SciKit-learn documentation, but I fail to come up with the correct syntax.

I have posted my question at StackOverflow, but since it is quite specific, it requires understanding of SciKit-learn in combination with Keras.

Perhaps you can have a look? I think it would nicely extend your tutorial.

http://stackoverflow.com/questions/40572743/scikit-learn-grid-search-own-scoring-object-syntax

Thanks, Jan

Reply
- Jason Brownlee November 15, 2016 at 8:02 am #
  
  Sorry Jan, I have not used a custom scoring function before.
  
  Here are a list of built-in scoring functions:
  http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
  
  Here is help on defining your own scoring function:
  http://scikit-learn.org/stable/modules/model_evaluation.html#defining-your-scoring-strategy-from-metric-functions
  
  Let me know how you go.
  
  Reply
Jan de Lange November 16, 2016 at 7:31 am #

Yup, same sources as I referenced in my post at Stackoverflow.

Reply
- Jason Brownlee November 16, 2016 at 9:35 am #
  
  Excellent. Good luck Jan.
  
  Reply
Anthony Ohazulike December 6, 2016 at 12:46 am #

Good tutorial again Jason…keep on the good job!

Reply
- Jason Brownlee December 6, 2016 at 8:26 am #
  
  Thanks Anthony.
  
  Reply
nrcjea001 December 13, 2016 at 10:48 pm #

Hi Jason

First off, thank you for the tutorial. It’s very helpful.

I was also hoping you would assist on how to adapt the keras grid search to stateful lstms as discussed in

https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/

I’ve coded the following:

# create model
model = KerasRegressor(build_fn=create_model, nb_epoch=1, batch_size=bats,
verbose=2, shuffle=False)

# define the grid search parameters
h1n = [5, 10] # number of hidden neurons
param_grid = dict(h1n=h1n)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=5)

for i in range(100):
grid.fit(trainX, trainY)
grid.reset_states()

Is grid.reset_states() corrrect? or would you suggest creating function callback for reset states.

Thanks,

Reply
- Jason Brownlee December 14, 2016 at 8:27 am #
  
  Great question.
  
  With stateful LSTMs we must control the resetting of states after each epoch. The sklearn framework does not open this capacity to us – at least it looks that way to me off the cuff.
  
  I think you may have to grid search stateful LSTM params manually with a ton of for loops. Sorry.
  
  If you discover something different, let me know. i.e. there may be a way in the back door to the sklearn grid search functionality that we can inject our own custom epoch handing.
  
  Reply
Thomas Maier December 21, 2016 at 2:53 am #

Hi Jason

Thanks a lot for this and all the other great tutorials!

I tried to combine this gridsearch/keras approach with a pipeline. It works if I tune nb_epoch or batch_size, but I get an error if I try to tune the optimizer or something else in the keras building function (I did not forget to include the variable as an argument):

def keras_model(optimizer = ‘adam’):
model = Sequential()
model.add(Dense(80, input_dim=79, init= ‘normal’))
model.add(Activation(‘relu’))
model.add(Dense(1, init=’normal’))
model.add(Activation(‘linear’))
model.compile(optimizer=optimizer, loss=’mse’)
return model

kRegressor = KerasRegressor(build_fn=keras_model, nb_epoch=500, batch_size=10, verbose=0)

estimators = []
estimators.append((‘imputer’, preprocessing.Imputer(strategy=’mean’)))
estimators.append((‘scaler’, preprocessing.StandardScaler()))
estimators.append((‘kerasR’, kRegressor))
pipeline = Pipeline(estimators)

param_grid = dict(kerasR__optimizer = [‘adam’,’rmsprop’])

grid = GridSearchCV(pipeline, param_grid, cv=5, scoring=’neg_mean_squared_error’)

Do you know this problem?

Thanks, Thomas

Reply
- Jason Brownlee December 21, 2016 at 8:44 am #
  
  Thanks Thomas. I’ve not seen this issue.
  
  I think we’re starting to push the poor Keras sklearn wrapper to the limit.
  
  Maybe the next step is to build out a few functions to do manual grid searching across network configs.
  
  Reply
  - James April 14, 2018 at 12:26 am #
    
    Has there been a blog post on this?
    
    Reply
    - Jason Brownlee April 14, 2018 at 6:46 am #
      
      Not yet, maybe it’s time.
      
      Reply
- Anastasiya December 12, 2018 at 9:41 pm #
  
  Have you solved this issue? I’m exploring Keras now as wel and came across exactly the same problem.
  
  Reply
Jimi December 21, 2016 at 3:26 pm #

Great resource!

Any thoughts on how to get the “history” objects out of grid search? It could be beneficial to plot the loss and accuracy to see when a model starts to flatten out.

Reply
- Jason Brownlee December 22, 2016 at 6:30 am #
  
  Not sure off the cuff Jimi, perhaps repeat the run standalone for the top performing configuration.
  
  Reply
DeepLearning January 4, 2017 at 6:08 am #

Thanks for the post. Can we optimize the number of hidden layers as well on top of number of neurons in each layers?
Thanks

Reply
- Jason Brownlee January 4, 2017 at 9:00 am #
  
  Yes, it just may be very time consuming depending on the size of the dataset and the number of layers/nodes involved.
  
  Try it on some small datasets from the UCI ML Repo.
  
  Reply
  - DeepLearning January 4, 2017 at 12:02 pm #
    
    Thanks. Would you mind looking at below code?
    
    def create_model(neurons=1, neurons2=1):
    # create model
    model = Sequential()
    model.add(Dense(neurons1, input_dim=8))
    model.add(Dense(neurons2))
    model.add(Dense(1, init=’uniform’, activation=’sigmoid’))
    # Compile model
    model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
    return model
    # define the grid search parameters
    neurons1 = [1, 3, 5, 7]
    neurons2=[0,1,2]
    param_grid = dict(neurons1=neurons1, neurons2=neurons2)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
    grid_result = grid.fit(X, Y)
    
    This code runs without error (I excluded certain X, y parts for brewity) but when I run “grid.fit(X, Y), it gives AssertionError.
    
    I’d appreciate if you can show me where I am wrong.
    
    Reply
    - DeepLearning January 4, 2017 at 12:26 pm #
      
      Update” It worked when I deleted 0 from neurons2. Thanks
      
      Reply
      - Jason Brownlee January 5, 2017 at 9:16 am #
        
        Excellent, glad to hear it.
    - Jason Brownlee January 5, 2017 at 9:16 am #
      
      A Dense() with a value of 0 neurons might blow up. Try removing the 0 from your neurons2 array.
      
      A good debug strategy is to cut code back to the minimum, make it work, then and add complexity. Here. Try searching a grid of 1 and 1 neurons, make it all work, then expand the grid you search.
      
      Let me know how you go.
      
      Reply
DeepLearning January 9, 2017 at 11:04 am #

I keep getting error messages and I tried a big for loops that scan for all possible combinations of layer numbers, neuron numbers, other optimization stuff within defined limits. It is very time consuming code, but I could not figure it out how to adjust layer structure and other optimization parameters in the same code using GridSearch. If you would provide a code for that in your blog one day, that would be much appreciated. Thanks.

Reply
- Jason Brownlee January 10, 2017 at 8:55 am #
  
  I’ll try to find the time.
  
  Reply
Rajneesh January 11, 2017 at 10:48 am #

Hi Jason,
Many thanks for this awesome tutorial !

Reply
- Jason Brownlee January 12, 2017 at 9:24 am #
  
  I’m glad you found it useful Rajneesh.
  
  Reply
Andy January 22, 2017 at 1:02 pm #

Hi Jason,

Great tutorial! I’m running into a slight issue. I tried running this on my own variation of the code and got the following error:

TypeError: get_params() got an unexpected keyword argument ‘deep’

I copied and pasted your code using the given data set and got the same error. The code is showing an error on the grid_result = grid.fit(X, Y) line. I looked through the other comments and didn’t see anyone with the same issue. Do you know where this could be coming from?

Thanks for your help!

Reply
- YechiBechi January 23, 2017 at 2:18 am #
  
  same issue here,
  
  great tutorial, life saver.
  
  Reply
- Jason Brownlee January 23, 2017 at 8:35 am #
  
  Hi Andy, sorry to hear that.
  
  Is this happening with a specific example or with all of them?
  
  Are you able to check your version of Python/sklearn/keras/tf/theano?
  
  UPDATE:
  
  I can confirm the first example still works fine with Python 2.7, sklearn 0.18.1, Keras 1.2.0 and TensorFlow 0.12.1.
  
  Reply
  - Andy January 25, 2017 at 7:12 am #
    
    The only differences are I am running Python 3.5 and Keras 1.2.1. The example I ran previously was the grid search for the number of neurons in a layer. But I just ran the first example and got the same error.
    
    Do you think the issue is due to the next version of Python? If so, what should my next steps be?
    
    Thanks for your help and quick response!
    
    Reply
    - Jannes January 27, 2017 at 5:51 am #
      
      It’s a bug in Keras 1.2.1. You can either downgrade to 1.2.0 or get the code from their github (where they already fixed it).
      
      Reply
      - Jason Brownlee January 27, 2017 at 12:22 pm #
        
        Yes, I have a write up of the problem and available fixes here:
        http://stackoverflow.com/questions/41796618/python-keras-cross-val-score-error/41841066#41841066
      - Andy January 28, 2017 at 4:21 pm #
        
        Thank you so much for your help!
kono February 8, 2017 at 3:14 am #

Jason,

Can you use early_stopping to decide n_epoch?

Reply
- Jason Brownlee February 8, 2017 at 9:36 am #
  
  Yes, that is a good method to find a generalized model.
  
  Reply
Jayant February 23, 2017 at 4:33 am #

Hi Jason,

Really great article. I am a big fan of your blog and your books. Can you please explain your following statement?

“A default cross-validation of 3 was used, but perhaps k=5 or k=10 would be more stable. Carefully choose your cross validation configuration to ensure your results are stable.”

I didn’t see anywhere cross-validation being used.

Reply
- Jason Brownlee February 23, 2017 at 8:56 am #
  
  Hi Jayant,
  
  Grid search uses k-fold cross-validation to evaluate the performance of each combination of parameters on unseen data.
  
  Reply
Jing February 28, 2017 at 2:09 am #

Hi Jason,
thanks for this awesome tutorial !
I have two questions: 1. In “model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])”, accuracy is used for evaluate results. But GridSearchCV also has scoring parameter, if I set “scoring=’f1’”,which one is used for evaluate the results of grid search? 2.How to set two evaluate parameters ,e.g. ‘accuracy’and ’f1’ evaluating the results of grid search？

Reply
- Jason Brownlee February 28, 2017 at 8:13 am #
  
  Hi Jing,
  
  You can set the “scoring” argument for GridSearchCV with a string of the performance measure to use, or the name of your own scoring function. You can learn about this argument here:
  http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
  
  You can see a full list of supported scoring measures here:
  http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
  
  As far as I know you can only grid search using a single measure.
  
  Reply
  - Jing February 28, 2017 at 12:50 pm #
    
    Thank you so much for your help!
    
    Reply
  - Jing February 28, 2017 at 1:54 pm #
    
    I find no matter what evaluate parameters used in GridSearchCV “scoring”,”metrics” in “model.compile” must be [‘accuracy’],otherwise the program gives “ValueError: The model is not configured to compute accuracy.You should pass ‘metrics=[“accuracy”]’ to the ‘model.compile()’method. So, if I set:
    model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
    grid = GridSearchCV(estimator=model, param_grid=param_grid, scoring=’recall’)
    the grid_result.best_score_ =0.72.My question is: 0.72 is accuracy or recall ? Thank you!
    
    Reply
    - Jason Brownlee March 1, 2017 at 8:31 am #
      
      Hi Jing,
      
      When using GridSearchCV with Keras, I would suggest not specifying any metrics when compiling your Keras model.
      
      I would suggest only setting the “scoring” argument on the GridSearchCV. I would expect the metric reported by GridSearchCV to be the one that you specified.
      
      I hope that helps.
      
      Reply
Dan March 8, 2017 at 4:13 am #

Great Blogpost. Love it. You are awesome Jason. I got one question to GridsearchCV. As far as i understand the crossvalidation already takes place in there. That’s why we do not need any kfold anymore.
But with this technique we would have no validation set correct? e.g. with a default value of 3 we would have 2 training sets and one test set.

That means in kfold as well as in GridsearchCV there is no requirement for creating a validation set anymore?

Thanks

Reply
- Jason Brownlee March 8, 2017 at 9:44 am #
  
  Hi Dan,
  
  Yes, GridSearchCV performs cross validation and you must specify the number of folds. You can hold back a validation set to double check the parameters found by the search if you like. This is optional.
  
  Reply
  - Dan March 9, 2017 at 3:25 am #
    
    Thank you for the quick response Jason. Especially considering the huge amount of questions you get.
    
    Reply
    - Jason Brownlee March 9, 2017 at 9:55 am #
      
      I’m here to help, if I can Dan.
      
      Reply
Johan Steunenberg March 22, 2017 at 8:25 pm #

What I’m missing in the tutorial is the info, how to get the best params in the model with KERAS. Do I pickup the best parameters and call ‘create_model’ again with those parameters or can I call the GridSearchCV’s ‘predict’ function? (I will try out for myself but for completeness it would be good to have it in the tutorial as well.)

Reply
- Jason Brownlee March 23, 2017 at 8:49 am #
  
  I see, but we don’t know the best parameters, we must search for them.
  
  Reply
Maycown Miranda April 5, 2017 at 2:09 am #

Hi, Jason. I am getting
/usr/local/lib/python2.7/dist-packages/keras/wrappers/scikit_learn.py in check_params(self=, params={‘batch_size’: 10, ‘epochs’: 10})
80 legal_params += inspect.getargspec(fn)[0]
81 legal_params = set(legal_params)
82
83 for params_name in params:
84 if params_name not in legal_params:
—> 85 raise ValueError(‘{} is not a legal parameter’.format(params_name))
params_name = ‘epochs’
86
87 def get_params(self, _):
88 “””Gets parameters for this estimator.
89

ValueError: epochs is not a legal parameter

Reply
- Jason Brownlee April 9, 2017 at 2:32 pm #
  
  It sounds like you need to upgrade to Keras v2.0 or higher.
  
  Reply
  - Chandra Sutrisno Tjhong November 28, 2017 at 10:46 am #
    
    I experienced the same problem.I upgraded my keras and the same problem still occurs.
    
    Reply
- neumatron11 February 5, 2019 at 12:42 pm #
  
  I was getting the ‘not a legal paramater’ error when I was trying to pass required inputs into my create_model function in the wrapper.
  
  model = KerasClassifier(build_fn=create_model(input_dim = x ), verbose=0)
  
  when I removed it and included it in the grid search instead it ran fine, I just added it to the dictionary of parameters
  
  input_dim = [x]
  
  Reply
Usman May 3, 2017 at 7:56 am #

Nice tutorial. I would like to optimize the number of hidden layers in the model. Can you please guide in this regard, thanks

Reply
- Jason Brownlee May 4, 2017 at 7:59 am #
  
  Thanks Usman.
  
  Consider exploring specific patterns, e.g. small-big-small, etc.
  
  Reply
Carl May 5, 2017 at 12:58 pm #

Do you know any way this could be possible using a network with multiple inputs?

http://imgur.com/a/JJ7f1

Reply
- Sukhpal December 16, 2019 at 2:18 am #
  
  The optmization of network topology ,learning rate ,batch size and epochs are done in stages?sir please tell me why these were done in stages
  
  Reply
  - Jason Brownlee December 16, 2019 at 6:18 am #
    
    To make the explanation to the reader simpler.
    
    Reply
    - Dan Thomas May 28, 2020 at 7:21 am #
      
      Also probably to reduce search space, and thus computational time.
      
      Reply
DanielP May 9, 2017 at 4:26 pm #

Hi Jason, great to see posts like this – amazing job!

Just noticed, when you tune the optimisation algorithm SGD performs at 34% accuracy. As no parameters are being passed to the SGD function, I’d assume it takes the default configuration, lr=0.01, momentum=0.0.

Later on, as you look for better configurations for SGD, best result (68%) is found when {‘learn_rate’: 0.01, ‘momentum’: 0.0}.

It seems to me that these two experiments use exactly the same network configuration (including the same SGD parameters), yet their resulting accuracies differ significantly. Do you have any intuition as to why this may be happening?

Reply
- Jason Brownlee May 10, 2017 at 8:43 am #
  
  Hi Daniel, yes great point.
  
  Neural networks are stochastic and give different results when evaluated on the same data.
  
  Ideally, each configuration would be evaluated using the average of multiple (30+) repeats.
  
  This post might help:
  https://machinelearningmastery.com/randomness-in-machine-learning/
  
  Reply
Pradanuari May 14, 2017 at 3:13 am #

Hi Jason!
absolutely love your tutorial! But would you mind to give tutorial for how to tune the number of hidden layer?

Thanks

Reply
- Jason Brownlee May 14, 2017 at 7:32 am #
  
  I have an example here:
  https://machinelearningmastery.com/exploratory-configuration-multilayer-perceptron-network-time-series-forecasting/
  
  Reply
Pradanuari May 14, 2017 at 11:32 pm #

Thank you so much Jason!

Reply
- Jason Brownlee May 15, 2017 at 5:53 am #
  
  I’m glad it helped Pradanuari.
  
  Reply
Ibrahim El-Fayoumi May 17, 2017 at 12:53 pm #

Hello Jason
I tried to use your idea in a similar problem but I am getting error : AttributeError: ‘NoneType’ object has no attribute ‘loss’
it looks like the model does not define loss function?

This is the error I get:
b\site-packages\keras-2.0.4-py3.5.egg\keras\wrappers\scikit_learn.py in fit(self=, x=memmap([[[ 0., 0., 0., …, 0., 0., 0.],
…, 0., 0., …, 0., 0., 0.]]], dtype=float32), y=array([[ 0., 0., 0., …, 0., 0., 0.],
…0.],
[ 0., 0., 0., …, 0., 1., 0.]]), **kwargs={})
135 self.model = self.build_fn(
136 **self.filter_sk_params(self.build_fn.__call__))
137 else:
138 self.model = self.build_fn(**self.filter_sk_params(self.build_fn))
139
–> 140 loss_name = self.model.loss
loss_name = undefined
self.model.loss = undefined
141 if hasattr(loss_name, ‘__name__’):
142 loss_name = loss_name.__name__
143 if loss_name == ‘categorical_crossentropy’ and len(y.shape) != 2:
144 y = to_categorical(y)

AttributeError: ‘NoneType’ object has no attribute ‘loss’
___________________________________________________________________________

Process finished with exit code 1

Regards
Ibrahim

Reply
- Jason Brownlee May 18, 2017 at 8:26 am #
  
  Does the example in the blog post work on your system?
  
  Reply
  - Ibrahim El-Fayoumi May 18, 2017 at 12:18 pm #
    
    Ok, I think your code needs to be placed after
    if __name__ == ‘__main__’:
    
    to work with multiprocess…
    
    But thanks for the post is great…
    
    Reply
    - Jason Brownlee May 19, 2017 at 8:12 am #
      
      Not on Linux and OS X when I tested it, but thanks for the tip.
      
      Reply
    - Gautam August 25, 2017 at 11:33 pm #
      
      n_jobs=-1 doesnt work on Windows.
      
      @Ibrahim: Can you please explain, what part of the code needs to be behind
      if __name__ == ‘__main__’: )
      
      Reply
      - Martin October 19, 2019 at 4:52 am #
        
        Assuming you have got several functions (i have a single python script acting as main file and the other stuff in a separate file, but at least functions like Jason does) you need to put this at the very begining of your main routine where everything comes together and is set-up. Note, since it is an if-condition, you need to tab everything below the condition.
        
        @Jason maybe you can add this in the section where you talk about the problems on parallelization as a hint for windows users.
      - Jason Brownlee October 19, 2019 at 6:53 am #
        
        Thanks. I really don’t know about windows.
        
        I’ve not seen a windows box in a long time and I’m impressed people use them for software development.
Edward May 21, 2017 at 3:17 am #

Hello Jason!
I do the first step – try to tune Batch Size and Number of Epochs and get
print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
Best: 0.707031 using {‘epochs’: 100, ‘batch_size’: 40}
After that I do the same and get
print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
Best: 0.688802 using {‘epochs’: 100, ‘batch_size’: 20}
And so on
The problem is in the grid_result.best_score_

I expect that in the second step (for ample tuning optimizer) I will get grid_result.best_score_ better than in the first step (in the second step i use grid_result.best_params_ from the first step). But it is not true
Tune all Hyperparameters is a very long time

How to fix it?

Reply
- Jason Brownlee May 21, 2017 at 6:01 am #
  
  Consider tuning different parameters, like network structure or number of input features.
  
  Reply
  - Edward May 21, 2017 at 7:18 pm #
    
    Thanks a lot Jason!
    
    Reply
pattijane May 21, 2017 at 7:44 am #

Hello,

I’d like to have your opinion about a problem:

I have two loss function plots, with SGD and Adamax as optimizer with same learning rate.
Loss function of SGD looks like the red one, whereas Adamax’s looks like blue one.
(http://cs231n.github.io/assets/nn3/learningrates.jpeg)

I have better scores with Adamax on validation data. I’m confused about how to proceed, should I choose Adamax and play with learning rates a little more, or go on with SGD and somehow try to improve performance?

Thanks!

Reply
- Jason Brownlee May 22, 2017 at 7:49 am #
  
  Explore both, but focus on the validation score of interest (e.g. accuracy, RMSE, etc.) over loss.
  
  For example, you can get very low loss and get worse accuracy.
  
  Reply
  - pattijane May 22, 2017 at 6:35 pm #
    
    Thanks for your response! I experimented with different learning rates and found out a reasonable one, (good for both Adamax and SGD) and now I try to fix learning rate and optimizer and focus on other hyperparameters such as batch-size and number of neurons. Or would be better if I set those first?
    
    Reply
    - Jason Brownlee May 23, 2017 at 7:49 am #
      
      Number of neurons will have a big effect along with learning rate.
      
      Batch size will have a smaller effect and could be optimized last.
      
      Reply
Lotem May 23, 2017 at 1:47 am #

Thanks for this post!

One question – why not use grid search on all the parameters together, rather than preforming several grid searches and finding each parameter separately? surly the results are not the same…

Reply
- Jason Brownlee May 23, 2017 at 7:54 am #
  
  Great question,
  
  In practice, the datasets are large and it can take a long time and require a lot of RAM.
  
  Reply
StatsSorceress May 25, 2017 at 6:52 am #

Hi Jason,

Excellent post!

It seems to me that if you use the entire training set during your cross-validation, then your cross-validation error is going to give you an optimistically biased estimate of your validation error. I think this is because when you train the final model on the entire dataset, the validation set you create to estimate test performance comes out of the training set.

My question is: assuming we have a lot of data, should we use perhaps only 50% of the training data for cross-validation for the hyperparameters, and then use the remaining 50% for fitting the final model (and a portion of that remaining 50% would be used for the validation set)? That way we wouldn’t be using the same data twice. I am assuming in this case that we would also have a separate test set.

Reply
- Jason Brownlee June 2, 2017 at 11:38 am #
  
  Yes, it is a good idea to hold back a test set when tuning.
  
  Reply
Yang May 27, 2017 at 5:35 am #

Thanks for your valuable post. I learned a lot from it.
When I wrote my code for grid search, I encountered a question:

I use fit_generator instead of fit in keras.
Is it possible to use grid search with fit_generator ?

I have some Merge layers in my deep learning model.
Hence, the input of the neural network is not a single matrix.
For example:
Suppose we have 1,000 samples
Input = [Input1,Input2]
Input1 is a 1,000 *3 matrix
Input2 is a 1,000*3*50*50 matrix (image)

When I use the fit in your post, there is a bug….because the input1 and input2 don’t have the same dimension. So I wonder whether the fit_generator can work with grid search ?

Thanks in advance!

Reply
Yang May 27, 2017 at 6:46 am #

Please ignore my previous reply.
I find an answer here: https://github.com/fchollet/keras/issues/6451
Right now, the GridsearchCV using the scikit wrapper for network with multiple inputs is not available.

Reply
Kate liu May 28, 2017 at 4:31 pm #

Hi Jason, thank you for your good tutorial of the grid research with Keras. I followed your example with my own dataset. It could be run. But when I using the autoencoder structure, instead of the sequential structure, to gird the parameters with my own data. It could not be run. I don’t know the reason. Could you help me? Are there any differences between the gird of sequential structure and the grid of model structure?

The follows are my codes:

from keras.models import Sequential
from keras.layers import Dense, Input
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
import numpy as np
from keras.optimizers import SGD, Adam, RMSprop, Adagrad
from keras.regularizers import l1,l2
from keras.models import Model
import pandas as pd
from keras.models import load_model

np.random.seed(2017)

def create_model(optimizer=’rmsprop’):

# encoder layers
encoding_dim =140
input_img = Input(shape=(6,))
encoded = Dense(300, activation=’relu’,W_regularizer=l1(0.01))(input_img)
encoded = Dense(300, activation=’relu’,W_regularizer=l1(0.01))(encoded)
encoded = Dense(300, activation=’relu’,W_regularizer=l1(0.01))(encoded)
encoder_output = Dense(encoding_dim, activation=’relu’,W_regularizer=l1(0.01))(encoded)

# decoder layers
decoded = Dense(300, activation=’relu’,W_regularizer=l1(0.01))(encoder_output)
decoded = Dense(300, activation=’relu’,W_regularizer=l1(0.01))(decoded)
decoded = Dense(300, activation=’relu’,W_regularizer=l1(0.01))(decoded)
decoded = Dense(6, activation=’relu’,W_regularizer=l1(0.01))(decoded)

# construct the autoencoder model
autoencoder = Model(input_img, decoded)

# construct the encoder model for plotting
encoder = Model(input_img, encoder_output)

# Compile model
autoencoder.compile(optimizer=’RMSprop’, loss=’mean_squared_error’,metrics=[‘accuracy’])

return autoencoder

Reply
- Jason Brownlee June 2, 2017 at 12:09 pm #
  
  I’m surprised, I would not think the network architecture would make a difference.
  
  Sorry, I have no good suggestions other than try to debug the cause of the fault.
  
  Reply
Kate liu May 28, 2017 at 4:36 pm #

the command of autoencoder.compile is modified as the follows:
# Compile model
autoencoder.compile(optimizer=optimizer, loss=’mean_squared_error’,metrics=[‘accuracy’])

Reply
Rahul May 30, 2017 at 12:07 am #

Can we do this for functional API as well ?

Reply
- Jason Brownlee June 2, 2017 at 12:28 pm #
  
  Perhaps, I have not done this.
  
  Reply
Ian Worthington May 30, 2017 at 10:36 pm #

Thanks for a great tutorial Jason, appreciated.

njobs=-1 didn’t work very well on my Windows 10 machine: took a very long time and never finished.

https://stackoverflow.com/questions/28005307/gridsearchcv-no-reporting-on-high-verbosity seems to suggest this is (or at least was in 2015) a known problem under Windows so I changed to n_jobs=1, which also allowed me to see throughput using verbose=10.

Reply
- Jason Brownlee June 2, 2017 at 12:37 pm #
  
  Thanks for the tip.
  
  Reply
Ian Worthington May 31, 2017 at 1:56 am #

Jason —

Given all the parameters it is possible to adjust, is there any recommendation for which should be fixed first before exploring others, or can ALL results for one change when others are changed?

Reply
- Jason Brownlee June 2, 2017 at 12:39 pm #
  
  Great question, see this paper:
  https://arxiv.org/abs/1206.5533
  
  Reply
  - Ian Worthington June 3, 2017 at 2:39 am #
    
    Thanks Jason, I’ll check it out.
    
    Reply
Mario June 9, 2017 at 12:10 am #

Hi and thank you for the resource.

Am I right in my understanding that this only works on one machine?

Any hints / pointers on how to run this on a cluster? I have found https://goo.gl/Q9Xy7B as a potential avenue using Spark (no Keras though).

Any comment at all? Information on the subject is scarce.

Reply
- Jason Brownlee June 9, 2017 at 6:26 am #
  
  Yes, this example is for a single machine. Sorry, I do not have examples for running on a cluster.
  
  Reply
Shaun June 16, 2017 at 11:54 pm #

Hi Jason,

I’m a little bit confused about the definition of the “score” or “accuracy”. How are they made? I believe that they are not simply comparing the results with target, otherwise it will be the overfitting model being the best (like the more neurons the better).

But on the other hand, they are just using those combinations of parameters to train the model, so what is the difference between I manually set the parameters and see my result good or not, with risk of overfitting and the grid search that creates an accuracy score to determine which one is the best?

Best regards,

Reply
- Jason Brownlee June 17, 2017 at 7:30 am #
  
  The grid search will provide an estimate of the skill of the model with a set of parameters.
  
  Any one configuration in the grid search can be set and evaluated manually.
  
  Neural networks are stochastic and will give different predictions/skill when trained on the same data.
  
  Ideally, if you have the time/compute the grid search should use repeated k-fold cross validation to provide robust estimates of model skill. More here:
  https://machinelearningmastery.com/evaluate-skill-deep-learning-models/
  
  Does that help?
  
  Reply
  - Shaun June 20, 2017 at 2:30 am #
    
    I’m new to the NN, a little bit puzzled. So say, if I have to many neurons that leads to overfitting (good on the train set, bad on the validation or test set), can grid search detect it by the score?
    
    My guess is yes, because there is a validation set in the GridsearchCV. Is that correct?
    
    Reply
    - Jason Brownlee June 20, 2017 at 6:39 am #
      
      A larget network can overfit.
      
      The idea is to find a config that does well on the train and validation sets. We require a robust test harness. With enough resources, I’d recommend repeated k-fold cross validation within the grid search.
      
      Reply
Huyen June 19, 2017 at 4:21 pm #

One more very useful tutorial, thank Jason.

One question about GridSearch in my case. I have tried to tune parameters of my neural network for regression with 18 inputs size 800 but the time to use GridSearch totally long, like forever even though I have limited to the number. I saw in your code:

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)

Normally, n_jobs=1, can I increase that number to improve the performances?

Reply
- Jason Brownlee June 20, 2017 at 6:36 am #
  
  We often cannot grid search with neural nets because it takes so long!
  
  Consider running on a large computer in the cloud over the weekend.
  
  Reply
Bobo June 21, 2017 at 4:57 am #

Hi Jason,

Any idea how to use GridSearchCV if you don’t want cross validation?

Reply
- Jason Brownlee June 21, 2017 at 8:19 am #
  
  GridSearch supports k-fold cross-validation by default. That is what the “CV” is in the name:
  http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
  
  Reply
  - Bobo June 24, 2017 at 3:07 am #
    
    So sklearn has no GridSearch without cross validation?
    In any case I found kind of a hack here to get rid of cv:
    https://stackoverflow.com/questions/44636370/scikit-learn-gridsearchcv-without-cross-validation-unsupervised-learning
    
    Reply
    - Jason Brownlee June 24, 2017 at 8:04 am #
      
      You can configure the k in CV to 1 to it does train/test. Then configure it to repeat.
      
      Reply
makis June 28, 2017 at 11:54 pm #

Hello. Thank you for the nice tutorial.

I am trying to combine pipeline and gridsearch.

Inside my keras model i use kernel_initializer=init_mode.
Then I am trying to assign values to the init_mode dictionary in order to perform the gridsearch.

I get the following error: ValueError: init_mode is not a legal parameter

My code is here: https://www.dropbox.com/s/57n777j9w8bxf4t/keras_grid.py?dl=0

Any tip? Thank you

Reply
Abhijith Darshan Ravindra July 11, 2017 at 6:31 am #

Hi Dr. Brownlee,

When I run this in Spyder IDE nothing happens after grid.fit.

It just appears to do nothing.

Any suggestions as to why?

Reply
- Jason Brownlee July 11, 2017 at 10:34 am #
  
  Consider running from the command line.
  
  The grid search may take a long time.
  
  Reply
  - DY July 14, 2017 at 6:11 am #
    
    Hello Dr Brownlee,
    
    I saved your example codes into .py file and run it. Nothing happens after grid.fit. However, if I run line by line from your example codes it works. Do you know why?
    
    Reply
    - Jason Brownlee July 14, 2017 at 8:36 am #
      
      It may take a long time. Consider reducing the scope of the search to see if you can get results sooner.
      
      Reply
- Tryfon September 18, 2017 at 11:46 pm #
  
  I had the same issue with you (using spyder and python 3.6) but after changing the parameter n_jobs = 1 it worked fine. Also n_jobs = 2 was stuck although spyder showed it was running in the backgound (I checked the CPU usage and was down to 1% vs the 55-80% when it is actually running).
  
  Don’t ask the reason why is that. My guess would be that it has to do with your system and the fact that it might not support parallelization (no CUDA GPU).
  
  Reply
  - Jason Brownlee September 19, 2017 at 7:47 am #
    
    Consider running the example from the command line instead.
    
    Reply
Kamal Thapa July 27, 2017 at 3:46 pm #

How can I do Hyper-parameter optimization for MLPRegressor in scikit learn?

Reply
- Jason Brownlee July 28, 2017 at 8:28 am #
  
  Yes.
  
  Reply
Josep August 3, 2017 at 2:31 am #

Hi Jason,
I’m unable to apply the grid search to a seq to seq LSTM network (Keras Regressor model in the scikit API). When I set the GridSearchCV scoring algorithm to r^2 (or any scoring function for regression problems) the model.fit expect a 2 dim input vector, not the 3 dim used in Keras.
Otherwise, if I left the default scoring algorithm named “_passthrough_scorer”( I don’t know what it does, I don’t even know what it is) it works but the best_score doesn’t match with the real best parametrization. I’m really confused…I’ll had to write the grid search manually…

Reply
- Josep August 3, 2017 at 2:42 am #
  
  I’ve solved it, I share it if someone have the same issue…,If you set the gridsearch scoring function to “None” it uses the scoring metrics of the Keras model.
  
  Reply
  - Josep August 3, 2017 at 2:49 am #
    
    Sorry for bothering, but the results of the approach I’ve said are incorrect. I don’t know what to do.
    
    Reply
- Jason Brownlee August 3, 2017 at 6:54 am #
  
  Hi Josep,
  
  Consider writing your own for loop to iterate over params and run a Cross Validation for the params within the loop.
  
  This is how I do it now for large/complex models.
  
  Reply
kotb August 8, 2017 at 7:10 pm #

Can i use this grid search without using keras model

Reply
- Jason Brownlee August 9, 2017 at 6:27 am #
  
  For sure!
  
  Reply
Aman Garg August 19, 2017 at 3:35 am #

Hello Jason,

Thanks for such a nice tutorial.

Instead of getting a output as ‘Best: 0.720052 using {‘init_mode’: ‘uniform’}’ , it would be really nice if you could show us how to visualize this result with matplotlib so that it gets more easier.

Reply
- Jason Brownlee August 19, 2017 at 6:23 am #
  
  Great suggestion, thanks.
  
  Reply
Michael August 20, 2017 at 4:42 am #

Hi, Jason. Thanks, again, for all of the blog posts and example code. I’m trying to tune my binary classification Keras neural network. My dataset includes about 50,000 entries with 52 (numeric) variables. Using Grid Search, I’ve tested all sorts of combinations of layer size, number of epochs, batch size, optimizers, activations, learning rates, dropout rates, and L2 regularization parameters. My grid search shows every combination performs the same. For example, here is a snippet from my latest results:

Best: 0.876381 using {‘act’: ‘relu’, ‘opt’: ‘Adam’}
0.876381 (0.003878) with: {‘act’: ‘relu’, ‘opt’: ‘Adam’}
0.876381 (0.003878) with: {‘act’: ‘relu’, ‘opt’: ‘SGD’}
0.876381 (0.003878) with: {‘act’: ‘relu’, ‘opt’: ‘Adagrad’}
0.876381 (0.003878) with: {‘act’: ‘relu’, ‘opt’: ‘Adadelta’}
0.876361 (0.003880) with: {‘act’: ‘tanh’, ‘opt’: ‘Adam’}
0.876381 (0.003878) with: {‘act’: ‘tanh’, ‘opt’: ‘SGD’}

But I also get 0.876381 whether I have 1000 nodes or 1 node, and for every other combo I’ve tested. I’ve also tried different ways of scaling or transforming my input data with no impact.

Do you have any thoughts on why I’m having trouble finding different combinations of parameters that actually have a difference in performance?

Thank you for your help! You rock!

Reply
- Jason Brownlee August 20, 2017 at 6:09 am #
  
  Very odd results. Double check your train/test data.
  
  Also, see this post for a long list of ideas to try:
  https://machinelearningmastery.com/improve-deep-learning-performance/
  
  Reply
  - Michael August 20, 2017 at 9:20 am #
    
    Thanks
    
    Reply
Shubham Kumar September 3, 2017 at 11:54 am #

Hey Jason.
I was using grid search to tune hyperparameters for a CNN-LSTM classification problem.
I used the code template on your blog about sequence classification.
MY original data has 38932 instances, but for tuning I am using only 1000 to save time.
But even then, I am not sure how to best search for those parameters and save time.

Is it a bad idea to search for hyper parameters in a small subset (almost 1/40th of training in my case).
Will the result vary largely when I use actual data size?
Also, I passed in several parameters for the grid search. Left it overnight and it still hadn’t made enough progress, so I stopped the execution.
How can I speed up this process?

Reply
- Jason Brownlee September 3, 2017 at 3:44 pm #
  
  The result will be biased, but perhaps might give you an idea of the direction in which to proceed – this could be enough for you.
  
  I often run a lot of sanity check grid searches on small samples to get ideas on which direction to push.
  
  More data will result in less biased estimates of model skill, often proportionately to some point of diminishing returns.
  
  Reply
  - Shubham Kumar September 4, 2017 at 3:10 am #
    
    Great !
    I did read that one of the sanity checks is to check whether the model overfits on a small sample! If yes, then we are good to go…
    I am slightly new to building proper models and find this part exciting but a little intimidating at the same time !
    I am going to use only a few hyper parameters at a time, and keep the rest constant and check what happens !
    
    Love your posts ! They are amazingly helpful .
    Does the Python LSTM book have code snippets in Python 3 as well?
    Coz it becomes a little difficult to search for the right modules and attributes otherwise :/
    
    Reply
    - Jason Brownlee September 4, 2017 at 4:39 am #
      
      THanks.
      
      Yes, the code in my LSTM book was tested with Python 2.7 and Python 3.5.
      
      Reply
Kaushal Shetty September 8, 2017 at 12:24 am #

Hi Jason, Is this a valid approach to decide the number of layers?
def neural_train(layer1 = 1,layer2 = 1,layer3 = 1,layers = 1):

input_tensor = Input(shape=(2001,))
x = Dense(units = layer1,activation=’relu’)(input_tensor)
if layers == 2:
x = Dense(layer2,activation = ‘relu’)(x)
if layers ==3 :
x = Dense(layer2,activation = ‘relu’)(x)
x = Dense(layer3,activation = ‘relu’)(x)

output_tensor = Dense(10,activation=’softmax’)(x)
model = Model(input_tensor,output_tensor)
model.compile(optimizer = ‘rmsprop’,loss=’categorical_crossentropy’,metrics = [‘accuracy’])
return model

layer1 = [1024,512]
layer2 = [256,100]
layer3 = [60,40]
epochs = [10,11]
layers = [2,3]
param_grid = dict(epochs = epochs,layer1 = layer1,layer2 = layer2,layer3 = layer3,layers=layers)
model = KerasClassifier(build_fn = neural_train)
gsv_model = GridSearchCV(model,param_grid=param_grid)
gsv_model.fit(x_train,y_train)

Reply
- Jason Brownlee September 9, 2017 at 11:45 am #
  
  Maybe, you must have a test harness that you can trust, then explore different configurations of your model.
  
  I have more on robustly evaluating neural nets here:
  https://machinelearningmastery.com/evaluate-skill-deep-learning-models/
  
  Reply
ari September 9, 2017 at 1:29 am #

Very helpful post Jason. Thanks for this. Are there any advantages for using gridsearch over something like hyperas/hyperopt ? To your best knowledge is one faster than the other?

Reply
- Jason Brownlee September 9, 2017 at 11:58 am #
  
  Depends on your data and model. Use the took that you prefer.
  
  Reply
Shubham Kumar September 10, 2017 at 4:38 am #

{‘split0_test_score’: array([ 0.6641791, 0.6641791, 0.6641791, 0.6641791]), ‘split1_test_score’: array([ 0.65413534, 0.65413534, 0.65413534, 0.65413534]), ‘split2_test_score’: array([ 0.69924811, 0.69924811, 0.69924811, 0.69924811]), ‘mean_test_score’: array([ 0.6725, 0.6725, 0.6725, 0.6725]), ‘std_test_score’: array([ 0.01931902, 0.01931902, 0.01931902, 0.01931902]), ‘rank_test_score’: array([1, 1, 1, 1]), ‘split0_train_score’: array([ 0.67669174, 0.67669174, 0.67669174, 0.67669174]), ‘split1_train_score’: array([ 0.68164794, 0.68164794, 0.68164794, 0.68164794]), ‘split2_train_score’: array([ 0.65917602, 0.65917602, 0.65917602, 0.65917602]), ‘mean_train_score’: array([ 0.67250523, 0.67250523, 0.67250523, 0.67250523]), ‘std_train_score’: array([ 0.00963991, 0.00963991, 0.00963991, 0.00963991]), ‘mean_fit_time’: array([ 36.72573058, 37.0244147 , 38.12670692, 40.71116368]), ‘std_fit_time’: array([ 0.4829061 , 0.35207924, 0.13746276, 2.71443639]), ‘mean_score_time’: array([ 1.49508754, 1.76741695, 2.14029002, 2.67426189]), ‘std_score_time’: array([ 0.04907801, 0.11919153, 0.07953362, 0.13931651]), ‘param_dropout’: masked_array(data = [0.2 0.5 0.6 0.7],
mask = [False False False False],
fill_value = ?)
, ‘params’: ({‘dropout’: 0.2}, {‘dropout’: 0.5}, {‘dropout’: 0.6}, {‘dropout’: 0.7})}

Hey. I was hypertuning a model on 4 different choices of hyper parameters. However, in the grid_results_ dictionary, the rank_test_score key has array with all same values. I find that confusing. Shouldn’t it have 4 different values in each place?
Something like [1,3,2,4] ?
What could be the explanation for this?

Reply
- Shubham Kumar September 10, 2017 at 4:50 am #
  
  It must have something to do with all mean_test_scores being the same ,
  
  Reply
- Jason Brownlee September 11, 2017 at 12:03 pm #
  
  If you are testing 4 different values for one parameter, then you must build 4 models/complete 4 runs.
  
  Does that help?
  
  Reply
  - Shubham Kumar September 13, 2017 at 5:20 am #
    
    I am sorry. That’s confusing. 4 models or complete 4 runs means ?
    
    Things are different if we are gridsearching/randomsearching just for one hyperparameter?
    
    Does it have something to do with the actual code used to write TensorFlow /keras ?
    
    Reply
    - Jason Brownlee September 13, 2017 at 12:36 pm #
      
      If you have one parameter and you want to test 4 values, each value needs one run. Ideally, we would run many times for each parameter value and take the average skill score given the stochastic nature of ML algorithms.
      
      For a random search, you run for as long as you like.
      
      Does that help?
      
      Reply
Shubham Kumar September 13, 2017 at 11:17 pm #

What I understand is that when we have more than 1 (say 2) hyper-parameters in a grid, then for each combination, the code will complete as many epochs as I have specified, with as many training-cross-validation sets as specified (the CV in GridSearchCV). So, going through all those epochs, for each training-cross-validation set, we get the avg accuracy over all the cross-validation sets for every combination.

So when you say 1 run only in the case of a single hyperparameter, that means only 1 training-crossvalidation set? Because only in this case, there won’t be any averaging involved.

Is that what I have to do? Change the training-crossValidation set to just 1?

Reply
- Jason Brownlee September 15, 2017 at 12:06 pm #
  
  Yes, 1 run is one CV pass (k folds).
  
  Reply
Rishi September 18, 2017 at 5:18 am #

Jason,
would you please post an example of inheriting from KerasClassifier (or KerasRegressor) to create your own class? I’m attempting to do this and it works for the most part:

class MLP_Regressor(KerasRegressor):

def __init__(self, **sk_params):
super().__init__(build_fn=None, **sk_params)

def __call__(self, optimizer=’adam’, loss=’mean_squared_error’, **kwargs):
# more code goes here (that was previously in ‘build_fn’

I can include this in a pipeline and it runs perfectly:
MLP Pipeline(memory=None,
steps=[(‘MLP’, )])

Only thing is: The Keras documentation includes the ‘build_fn’ keyword argument:

keras.wrappers.scikit_learn.KerasClassifier(build_fn=None, **sk_params)

While the actual KerasClassifier class definition shows the following in its __init__ method:

def __init__(self, model, optimizer=’adam’, loss=’categorical_crossentropy’, **kwargs):
super(KerasClassifier, self).__init__(model, optimizer, loss, **kwargs)

I’m not sure if my __init__ in MLP_Regressor has been setup correctly (to avoid hidden bugs in the future).

Would greatly appreciate it! (I’ve searched, but couldn’t find a single example of KerasClassifier inheritance).

Reply
- Jason Brownlee September 18, 2017 at 5:49 am #
  
  Thanks for the suggestion, I have not done this but perhaps in the future.
  
  Reply
  - Rishi November 21, 2017 at 12:54 pm #
    
    Jason, managed to get the inherited class working perfectly now:
    
    class MLP_Classifier(KerasClassifier):
    
    def __init__(self, build_fn=None, **sk_params):
    self.sk_params = sk_params
    super().__init__(build_fn=None, **sk_params)
    
    def __call__(self, callbacks=None, layer_sizes=None,activations=None,input_dim=0,init=’normal’,optimizer=’adam’, metrics=’accuracy’, loss=’binary_crossentropy’, use_dropout_input=False, use_dropout_hidden=False):
    “””
    Constructs, compiles and return a Keras model
    Implements the “build_fn” function
    
    Returns a “Sequential” model
    “””
    # Code to build a model (that would typically go in “build_fn”) goes here.
    return model
    
    Reply
    - Jason Brownlee November 22, 2017 at 10:47 am #
      
      Well done!
      
      Reply
Tmn September 20, 2017 at 2:45 am #

Hi Jason,

I can not thank you enough. I am sure that there are many people like me who have learnt a lost from your tutorial on both “R” and “Python”. I have been following your tutorial for more than 3 year now. Before I was using R however, recently I moved to python for Deep learning. And I find your tutorial as usual, exceptional. I think Andrew Ng and CS231n (andrej karpathy), theoretical course and your programming course on deep learning is one of the best in the world. You rock! Thanks a lot.

I do have a question 🙂 as well.
The grid search parameter tuning works perfectly with CPU. I agree with your suggestion not to tune everything at once. Now I moved to GPU implementation. I was able to execute the code if I chose options n_job=1. However, if I do multi-threading n_job=-1. I am getting “CUDA_ERROR_OUT_OF_MEMORY”. I have GeForce GTX 1080. Did you happen to encounter similar kind of error? I will post you the error log if needed.

Once, again thank you.

Reply
- Jason Brownlee September 20, 2017 at 6:00 am #
  
  Thanks for all of your support!
  
  Yes, I have the same and I would recommend using a “single thread” and let the GPU do its thing for a given single run.
  
  In general, I’d recommend contrasting different approaches to grid searching (cpu/gpu) and use the approach that is overall faster for your specific tests.
  
  Reply
  - Tmn September 20, 2017 at 11:33 pm #
    
    Hi Jason,
    Thank you for the response. The parameter search using CPU (n_job=-1) is (2.961489-4.977758) while using GPU (n_job=1) is (140.101048-142.151023) second.
    
    One more thing, after grid search I have value for parameters {batch_size, activation, neurons, learn_rate..} and accuracy around 90%. However, I wonder why reusing these model parameter does not provide the same results, now accuracy is 52%. Even though I executed it many times with same parameter the accuracy remains the same (52%). I could not achieve the accuracy as shown in grid search using best model parameter. I am doing 5-fold CV I do not expect the accuracy to be the same since it is stochastic process but it should be around SD±5%. What do you think? Did you also happen to encounter the same thing ?
    
    Also the best parameter values changes in each executions with an accuracy SD±5%.
    
    Thanks
    
    P.S:
    Below code is something I am doing to limit GPU memory usage and run multiple grid search. However, we should know the memory usage in advance (cs231n.github.io/convolutional-networks/#case). Let me know if it makes sense.
    
    Also, we can use n-job. I tried with n_job = 2 however the GPU memory is allocated based on fraction. I am searching how to allocated memory based on MB. I will do more research on this “CUDA_ERROR_OUT_OF_MEMORY” and update you.
    
    import tensorflow as tf
    from keras.backend.tensorflow_backend import set_session
    config = tf.ConfigProto()
    config.gpu_options.per_process_gpu_memory_fraction = 0.3
    set_session(tf.Session(config=config))
    
    Thanks!
    
    Reply
    - Jason Brownlee September 21, 2017 at 5:42 am #
      
      The results for the standalone model should fit into the distribution of the grid search results – if you repeated each grid search result many times, e.g. 10-30. See this post on evaluating model skill of neural networks:
      https://machinelearningmastery.com/evaluate-skill-deep-learning-models/
      
      Nice, sorry, I cannot give you good advice on grid searching with the GPU, it is not something I do generally. I am more likely to run instances serially or across AWS instances.
      
      Reply
      - TMN October 6, 2017 at 2:12 am #
        
        Hi Jason,
        
        Could you please help on how to do features normalization while doing the grid search and cross-validation. Is normalization is done automatically here, GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=15,cv=rkf)? If I normalize the features during training X = scaler.transform(X_train), this will introduce bias in cross-validation. Also, if possible, can you please provide me references on using scikit-learn wrapper with Keras for advance options, is their any limitation on wrapper ?
        Thanks
        
        Without normalization:
        Best: 0.535211 using {‘learn_rate’: 0.01, ‘dropout_rate’: 25, ‘batch_size’: 40, ‘neurons’: 200, ‘init_mode’: ‘lecun_uniform’, ‘optimizer’: ‘SGD’, ‘activation’: ‘relu’, ‘epochs’: 1000}
        
        With normalization:
        Best: 0.695775 using {‘optimizer’: ‘SGD’, ‘batch_size’: 132, ‘init_mode’: ‘lecun_uniform’, ‘epochs’: 1000, ‘learn_rate’: 0.01, ‘dropout_rate’: 25, ‘neurons’: 200, ‘activation’: ‘relu’}
      - Jason Brownlee October 6, 2017 at 5:37 am #
        
        Perhaps you can normalize your data prior to the grid search?
      - TMN October 6, 2017 at 10:59 am #
        
        I normalize my data prior to grid search using X = scaler.transform(X_train) but dont you think it would introduce bias in the performance. Normally, I expect to normalize train set and use that normalization factor to normalize test or validation set before prediction. May be I did not understand you properly, how do you do normalization prior to grid search?
        
        Thanks
      - Jason Brownlee October 6, 2017 at 11:07 am #
        
        Yes, it’s a struggle or trade-off.
        
        Perhaps you can see if a Pipeline will work in the grid search, it may, but I expect it will error.
        
        Perhaps the bias is minor and you can ignore it.
        
        Perhaps you can implement your own grid search loop to only use training data to calculate data scaling coefficients.
      - TMN October 6, 2017 at 6:44 pm #
        
        I started looking at the pipeline (http://scikit-learn.org/stable/modules/pipeline.html) on how they have been using it for SVM, lets see. I would expect the pipeline to work for Keras as well, as this is a classical problem in machine learning. Why do you expect error here? I wanted to take the full advantage from automatic grid search. Well, the final option will be to implement my own grid search.
        
        The bias is really significant in 5-repeated 10-fold CV. Thanks
        
        Without normalization:
        Best: 0.535211 using {‘learn_rate’: 0.01, ‘dropout_rate’: 25, ‘batch_size’: 40, ‘neurons’: 200, ‘init_mode’: ‘lecun_uniform’, ‘optimizer’: ‘SGD’, ‘activation’: ‘relu’, ‘epochs’: 1000}
        
        With normalization:
        Best: 0.695775 using {‘optimizer’: ‘SGD’, ‘batch_size’: 132, ‘init_mode’: ‘lecun_uniform’, ‘epochs’: 1000, ‘learn_rate’: 0.01, ‘dropout_rate’: 25, ‘neurons’: 200, ‘activation’: ‘relu’}
      - Jason Brownlee October 7, 2017 at 5:51 am #
        
        If it works, that is great. I have seen cases where when grid search + keras gets fancy it causes errors.
        
        I have a tutorial on Pipeline here that might help:
        https://machinelearningmastery.com/automate-machine-learning-workflows-pipelines-python-scikit-learn/
HWU September 22, 2017 at 6:52 am #

This is such a great, thorough tutorial. Thanks for keeping your tutorials up to date! It’s so nice finding a resource with examples that you know will work because they’ve been tested on recent versions of required packages.

Reply
- Jason Brownlee September 23, 2017 at 5:32 am #
  
  Thanks!
  
  Reply
Marjan September 29, 2017 at 1:08 pm #

Thank you for your great tutorial. I tried to use it for my model with multiple inputs. but It didn`t work. I found that the scikit-learn wrapper does not work for multiple inputs. it gives me an error for grid.fit([input1,input2],y)
Do you have any suggestion to handle it?
Thanks,

Reply
- Jason Brownlee September 30, 2017 at 7:34 am #
  
  Sorry I do not. Perhaps run the grid search manually (e.g. your own for loop)?
  
  Reply
Buz Fifer October 5, 2017 at 7:06 am #

When I run your code to tune the dropout_rate, I get the following error:
ValueError: dropout_rate is not a legal parameter

In fact, I get this error for all labels except epochs and batch_size. Both of these were recognized and ran fine. I could not find a reference to valid labels anywhere, even in API docs. Any suggestions?

Reply
- Jason Brownlee October 5, 2017 at 5:16 pm #
  
  What do you mean by valid labels exactly?
  
  Reply
  - Buz Fifer October 6, 2017 at 3:02 am #
    
    Sorry, I should have included the code in the first place. I have added comments in the code to show exactly what I tried for each parameter.
    
    # ———— Define Keras Classifier Wrapper
    model1 = KerasClassifier(build_fn=kerasModel1, epochs=5, batch_size=10, verbose=0)
    
    # ———– define the grid search parameters
    mybatchs = [10, 20, 128]
    myepochs = [5, 10, 20, 50, 60, 80, 100]
    mylearn = [0.001, 0.002, 0.0025, 0.003]
    myopts = [‘Adam’, ‘Nadam’, ‘RMSprop’]
    myinits = [‘uniform’, ‘normal’, ‘lecun_uniform’, ‘lecun_normal’, ‘glorot_uniform’, ‘glorot_normal’]
    mydrop = [0.10, 0.20, 0.30, 0.35, 0.40, 0.50, 0.60, 0.70, 0.80]
    
    # ————- Not Recognized
    #param_grid = dict(optimizer=myopts)
    #param_grid = dict(learn_rate=mylearn)
    #param_grid = dict(learning_rate=mylearn)
    #param_grid = dict(init=myinits)
    #param_grid = dict(init_mode=myinits)
    #param_grid = dict(dropout_rate=mydrop)
    
    # ———— Recognized
    #param_grid = dict(epochs=myepochs) # —– OK
    #param_grid = dict(batch_size=mybatchs) # —– OK
    
    I removed comment # and ran each one separately. For example, running the first param_grid values resulted in: Error – optimizer is not a valid parameter. They all got the same rejection notice except for epochs and batch_size.
    I hope that helps.
    
    Reply
Buz Fifer October 6, 2017 at 3:09 am #

Just to be clearer, each parameter had it’s own name in the error message as follows:

Error – optimizer is not a valid parameter
Error – learn_rate is not a valid parameter
Error – learning_rate is not a valid parameter
Error – init is not a valid parameter
Error – init_mode is not a valid parameter
Error – dropout_rate is not a valid parameter

Reply
- Jason Brownlee October 6, 2017 at 5:39 am #
  
  That is odd, I don’t have any good ideas, other than continue to debug and try different variations to see if you can expose the cause of the issue.
  
  Double check all of your python libraries are up to date.
  
  Reply
ritika October 6, 2017 at 11:49 pm #

Hi Jason, Very nice tutorial..very well explained

Reply
- Jason Brownlee October 7, 2017 at 5:55 am #
  
  Thanks.
  
  Reply
TC October 17, 2017 at 10:27 am #

Hi Jason thanks for the great post.

Let’s say I’m using 5 fold CV on a relatively small dataset (not necessarily for a deep learning model). In this case, the variance of the performance metric might be quite high, and just by chance, a point on the grid that is in reality far from optimal, might be selected as the “best”.

So are there any approaches to smooth out the response surface of the grid search, to deal with “spikes” in performance due to variance?

Reply
- Jason Brownlee October 17, 2017 at 4:05 pm #
  
  Wonderful question.
  
  Yes, we can approach this problem by increasing the number of repeats (not folds) of each param combination.
  
  Reply
  - TC October 20, 2017 at 8:40 am #
    
    Hi Jason, by “number of repeats” do you mean to just repeat the process many times, with perhaps a different random seed?
    
    Reply
    - Jason Brownlee October 21, 2017 at 5:23 am #
      
      Exactly.
      
      Reply
Lea October 20, 2017 at 10:09 pm #

Thank you for this great tutorial! I tried to adapt the code for a CNN, but I am running constantly in the same error. May anyone help?

That is the code:

def create_model(nb_filters=3, nb_conv=2, pool=20):
model = Sequential()
model.add(Convolution1D(nb_filters, nb_conv, activation=’relu’,
input_shape=(X.shape[1], X.shape[2]), padding=”same”))
model.add(MaxPooling1D(pool))
model.add(Flatten())
model.add(Dense(1, activation=’sigmoid’))
model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
model.summary()
return model

model = KerasClassifier(build_fn=create_model(), verbose=0)

nb_conv = [2, 4, 6, 8, 10]
pool= [10, 20, 30, 50]
param_grid = dict(nb_conv=nb_conv, epochs=pool)
grid = GridSearchCV(estimator=model, param_grid=param_grid)
grid_result = grid.fit(X, y)

And the error I am getting is “nb_conv is not a legal parameter”. Unfortunately, I do not understand why.

Reply
- Jason Brownlee October 21, 2017 at 5:37 am #
  
  The API has changed:
  https://keras.io/layers/convolutional/
  
  Reply
went October 22, 2017 at 2:55 am #

Hi Jason,

Great post and Thank you.

What do you think is the best sequence when tuning all those Hyperparameters? I think difference sequence will lead to difference final Hyperparameters..

Reply
- Jason Brownlee October 22, 2017 at 5:32 am #
  
  This post has some ideas (at the end):
  https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-configure-batch-size/
  
  Also see the referenced paper.
  
  Reply
Bgie October 23, 2017 at 6:28 am #

Hi Jason,

What a great blog, I very much appreciate you sharing some of your expertise!

I want to grid search the hyperparams from my CNN, but I’m using data augmentation with ImageDataGenerator. So I’m not calling model.fit but model.fit_generator for the actual training.
This does not seem to be supported through the grid search..
Am I forced to write my own KerasClassifier implementation?

Would you advise to just fall back to using (nested) for loops instead, or would I be missing some ‘magic’ from the existing scikit gridsearch?

Reply
- Jason Brownlee October 23, 2017 at 4:10 pm #
  
  I would recommend writing your own for loops to grid search instead.
  
  Reply
Shubham Kumar October 26, 2017 at 3:58 am #

Hey Jason!

Needed help with model improvement!
Can you help me in understanding how to realize whether your model is suffering from
bad local minima, vanishing/exploding gradient problem?

Reply
- Jason Brownlee October 26, 2017 at 5:34 am #
  
  If you have exploding or vanishing gradients, then you will have NaN outputs.
  
  This post will give you ideas on how to lift skill:
  https://machinelearningmastery.com/improve-deep-learning-performance/
  
  This post will give you advice on how to effectively evaluate your model:
  https://machinelearningmastery.com/evaluate-skill-deep-learning-models/
  
  Reply
  - Shubham Kumar November 6, 2017 at 7:05 am #
    
    NaN outputs as in my predictions ?
    Or the weights ?
    If exploding gradient then weight will be very large (probably NaN) hence output would also be NaN.
    But how will this logic be used for vanishing gradients. I this case the weights basically stop changing r8?
    
    Reply
    - Shubham Kumar November 6, 2017 at 7:07 am #
      
      Should I use some kind of code that checks by how much the weights at each layer are changing…and if after a certain threshold they haven’t changed by a certain amount, I’ll declare vanishing gradient !
      
      Reply
    - Jason Brownlee November 7, 2017 at 9:44 am #
      
      Try gradient clipping on the optimization algorithm.
      
      Reply
Mustafa Murat ARAT October 28, 2017 at 12:45 am #

I have a question for you, Jason and for general audience. I tried to find optimal number of neurons for one of the hidden layers. i did loop over my function which contains my deep learning model. It is fast enough for the values I define and I get a result based on accuracy. However, when I use your code, it is extremely slow and never reached to an end. How long does it take on your computer?

Reply
- Jason Brownlee October 28, 2017 at 5:14 am #
  
  You could try to test fewer parameters or try to search on a smaller dataset?
  
  Reply
  - Mustafa Murat ARAT October 29, 2017 at 4:12 am #
    
    Hey Jason,
    
    Thank you for your quick reply. I try grid search for number of neurons on Iris data set for the purpose of learning. I scale the data first and then transform and encode the dependent variable. However, first of all, even though I use small data set or fewer parameters, it is slow; second of all, when I get the results, it is all zero. This is very basic example and I am pretty much sure that my code is correct but I guess I am missing out something.
    
    Best: 0.000000 using {‘neurons’: 3}
    0.000000 (0.000000) with: {‘neurons’: 3}
    0.000000 (0.000000) with: {‘neurons’: 5}
    
    THE CODE:
    
    from pandas import read_csv
    import numpy
    from sklearn.preprocessing import LabelEncoder
    from sklearn.preprocessing import StandardScaler
    from keras.wrappers.scikit_learn import KerasClassifier
    from keras.models import Sequential
    from keras.layers import Dense
    from keras.utils import np_utils
    from sklearn.model_selection import GridSearchCV
    
    dataframe=read_csv(“iris.csv”, header=None)
    dataset=dataframe.values
    X=dataset[:,0:4].astype(float)
    Y=dataset[:,4]
    
    seed=7
    numpy.random.seed(seed)
    
    #encode class values as integers
    encoder = LabelEncoder()
    encoder.fit(Y)
    encoded_Y = encoder.transform(Y)
    #one-hot encoding
    dummy_y = np_utils.to_categorical(encoded_Y)
    
    scaler = StandardScaler()
    X = scaler.fit_transform(X)
    
    def create_model(n_neurons):
    model = Sequential()
    model.add(Dense(n_neurons, input_dim=X.shape[1], activation=’relu’)) # hidden layer
    model.add(Dense(3, activation=’softmax’)) # output layer
    model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
    return model
    
    model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=10, initial_epoch=0, verbose=0)
    # define the grid search parameters
    neurons=[3, 5]
    
    #this does 3-fold classification. One can change k.
    param_grid = dict(n_neurons=neurons)
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
    grid_result = grid.fit(X, dummy_y)
    # summarize results
    print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
    means = grid_result.cv_results_[‘mean_test_score’]
    stds = grid_result.cv_results_[‘std_test_score’]
    params = grid_result.cv_results_[‘params’]
    for mean, stdev, param in zip(means, stds, params):
    print(“%f (%f) with: %r” % (mean, stdev, param))
    
    Reply
    - Jason Brownlee October 29, 2017 at 5:59 am #
      
      Sorry, I cannot debug your code/problem for you.
      
      Reply
      - Mustafa Murat ARAT October 30, 2017 at 8:29 am #
        
        I totally understand you. Thank you so much, though. I figured out my mistake. Iris dataset is very well balanced so I need to shuffle the data because GridSearchCV is using 3-Fold Cross Validation.
      - Jason Brownlee October 30, 2017 at 3:49 pm #
        
        Glad to hear it.
jenny November 8, 2017 at 4:12 am #

Thanks for sharing such a wonderful tutorial. Learnt many new things.

How can i save all the models that the grid search is generation with identifiers for each model?

I am an R user. This how I do it in R to save models with passing parameter values to its names.

xgb.object <- paste0('/path/xgb_disc20_new_',

sample.sizes[i], '_', s,'_',nrounds[j],'_',max.depth[k],'_',eta[l], '.RData')

write.table(cbind(sample.sizes[i], s,nrounds[j],max.depth[k],eta[l],tpr, tnr, acc, roc.area,

concordance), paste0('/path/xgb_disc20_new_', min.sample.size,'_', max.sample.size,

'.csv'), append=TRUE, sep=",",row.names=FALSE,col.names=FALSE)

How can this be achieved in python for keras(neural network) and other models in other libraries?

Reply
- Jason Brownlee November 8, 2017 at 9:29 am #
  
  I would recommend using grid search to find the parameters for a well performing model then train a new standalone model with those parameters that you can then save.
  
  Reply
jenny November 8, 2017 at 9:34 pm #

thank you jason for your quick reply . I will try that way.

Reply
Wassim November 16, 2017 at 11:26 pm #

Hi Jason,
Thank you for the great tutorial. I just have an issue when using exactly your code: when I try to parallelize the grid search with n_jobs=-1, I end up with the error “AttributeError: Can’t get attribute ‘create_model’ on ” while it works well without parallelization. Any idea where the issue comes from?
Thank you,
Wassim

Reply
- Jason Brownlee November 17, 2017 at 9:25 am #
  
  I’m not sure, perhaps you cannot parallelize the grid search with Keras models.
  
  Reply
Sangwon Chae November 28, 2017 at 9:51 pm #

Hi Jason,

The example code calculates the best score for accuracy to obtain the hyperparameter.

In my problem, I want to find RMSE rather than accuracy because it is regression problem (numerical prediction).

However, ‘grid_result.cv_resluts_’ only provides ‘fit_time’ and ‘score’, so it can not calculate RMSE.

What should I do?

Thank you.

Reply
- Jason Brownlee November 29, 2017 at 8:23 am #
  
  You can change the configuration to calculate MSE (e.g. scoring=’neg_mean_squared_error’) and then take the square root.
  
  Learn more here:
  http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
  
  Reply
Estelle December 5, 2017 at 7:37 am #

Hi Jason,

Thank you for this post.

Is there anything that prevents me to use Grid Search with train_on_batch() instead of fit()?

Thank you for letting me know.

All the best,

Estelle

Reply
- Jason Brownlee December 5, 2017 at 10:26 am #
  
  I think the wrapper is quite limited and does not offer this facility via sklearn.
  
  Reply
  - Estelle December 6, 2017 at 8:15 am #
    
    Thanks for your quick answer.
    
    All the best,
    
    Estelle
    
    Reply
    - Jason Brownlee December 6, 2017 at 9:08 am #
      
      No problem.
      
      Reply
Peter December 8, 2017 at 1:56 pm #

Thanks very much for the tutorial. It is extremely helpful for my work. I came across a problem with grid search with Keras (tensorflow backend). I want to run the same grid search on different datasets. Everything works fine on the first dataset. But when I fit the grid search to the second dataset, the program got stuck there. I run the grid search with n_jobs=-1 and put keras.backend.clear_session() between two fits. You can replicate this issue by fit to the data twice in your examples. Could you please kindly help me with this issue?

Reply
- Jason Brownlee December 8, 2017 at 2:30 pm #
  
  I’m sorry to hear that, perhaps change n_jobs to 1?
  
  Reply
  - Peter December 8, 2017 at 2:40 pm #
    
    Thanks for the quick reply. It works when n_jobs=1, but I do need parallel threads for speed.
    
    Reply
    - Jason Brownlee December 9, 2017 at 5:34 am #
      
      The neural network will be using all the cores, so running multiple threads may not offer any benefit.
      
      Reply
      - Peter December 11, 2017 at 9:53 am #
        
        I got it to work by just fitting one dataset in the python script and looping the python script over multiple datasets in a bash script. I am still not clear why second fitting fails in python, but this is a not-so-beautiful workaround.
      - Jason Brownlee December 11, 2017 at 4:52 pm #
        
        Glad to hear that you made some progress.
Daniel Pamplona December 13, 2017 at 1:37 am #

Hi Jason

Thank you so much for sharing your knowledge.
I am trying to optimize the number of hidden layers.
I can´t figure it out how to do it with keras (actually I am wondering how to set up the function create_model in order to maximize the number of hidden layers)
Could you please help me?
Thank you

Reply
- Jason Brownlee December 13, 2017 at 5:42 am #
  
  Perhaps the number of layers could be a parameter to your function.
  
  Reply
Sean December 15, 2017 at 1:43 am #

Hi Jason,

Thanks for this insightful and useful tutorial as always

No doubt your blog posts are arguably the best in the field of data sciences

Best wishes

Reply
- Jason Brownlee December 15, 2017 at 5:36 am #
  
  Thanks Sean.
  
  Reply

Sean December 16, 2017 at 12:06 am #

Hello Jason,
I decided to try the code on a textual data of about 3000 tweets having binary classification (Y) and the text corpus as (X). Started off with tuning the batch size and number of epochs

but got the following error:

Name: 0, Length: 2185, dtype: object, names=['dense_1_input'], shapes=[(None, 8)], check_batch_axis=False, exception_prefix='input')
    148                         raise ValueError(
    149                             'Error when checking ' + exception_prefix +
    150                             ': expected ' + names[i] +
    151                             ' to have shape ' + str(shapes[i]) +
    152                             ' but got array with shape ' +
--> 153                             str(array.shape))
        array.shape = (2185, 1)
    154     return arrays
    155 
    156 
    157 def _standardize_sample_or_class_weights(x_weight, output_names, weight_type):

ValueError: Error when checking input: expected dense_1_input to have shape (None, 8) but got array with shape (2185, 1)

Name: 0, Length: 2185, dtype: object, names=['dense_1_input'], shapes=[(None, 8)], check_batch_axis=False, exception_prefix='input')

148 raise ValueError(

149 'Error when checking ' + exception_prefix +

150 ': expected ' + names[i] +

151 ' to have shape ' + str(shapes[i]) +

152 ' but got array with shape ' +

--> 153 str(array.shape))

array.shape = (2185, 1)

154 return arrays

155

156

157 def _standardize_sample_or_class_weights(x_weight, output_names, weight_type):

ValueError: Error when checking input: expected dense_1_input to have shape (None, 8) but got array with shape (2185, 1)

Here’s the modified code below:

# Use scikit-learn to grid search the batch size and epochs
import pandas as pd
import numpy as np
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
# Function to create model, required for KerasClassifier
def create_model():
	# create model
	model = Sequential()
	model.add(Dense(12, input_dim=8, activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
# load dataset
dataset = pd.read_csv('training.txt', delimiter = '\t', quoting = 3, header = None)
# split into input (X) and output (Y) variables
X = dataset.iloc[:,0]
Y = dataset.iloc[:,1]
# create model
model = KerasClassifier(build_fn=create_model, verbose=0)
# define the grid search parameters
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

# Use scikit-learn to grid search the batch size and epochs

import pandas as pd

import numpy as np

from sklearn.model_selection import GridSearchCV

from keras.models import Sequential

from keras.layers import Dense

from keras.wrappers.scikit_learn import KerasClassifier

# Function to create model, required for KerasClassifier

def create_model():

# create model

model = Sequential()

model.add(Dense(12, input_dim=8, activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# Compile model

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

# fix random seed for reproducibility

seed = 7

np.random.seed(seed)

# load dataset

dataset = pd.read_csv('training.txt', delimiter = '\t', quoting = 3, header = None)

# split into input (X) and output (Y) variables

X = dataset.iloc[:,0]

Y = dataset.iloc[:,1]

# create model

model = KerasClassifier(build_fn=create_model, verbose=0)

# define the grid search parameters

batch_size = [10, 20, 40, 60, 80, 100]

epochs = [10, 50, 100]

param_grid = dict(batch_size=batch_size, epochs=epochs)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)

grid_result = grid.fit(X, Y)

# summarize results

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) with: %r" % (mean, stdev, param))

Thanks

Jason Brownlee December 16, 2017 at 5:29 am #

Sorry to hear that, it’s not clear to me. Perhaps post to stackoverflow to get help debugging your code?

Reply

Olivier Blais December 16, 2017 at 3:24 am #

Hi Jason, first thanks for your articles! Super useful!

I tried to execute the gripsearch but cam up with parallelism issues. I have a Windows OS and I get this error when I try to run the script on multiple cpus:

ImportError: [joblib] Attempting to do parallel computing without protecting your import on a system that does not support forking. To use parallel-computing in a script, you must protect your main loop using “if __name__ == ‘__main__'”. Please see the joblib documentation on Parallel for more information.

Do you know how I should address that?

Thanks in advance

Reply
- Jason Brownlee December 16, 2017 at 5:35 am #
  
  Perhaps try setting the number of jobs to 1?
  
  Reply
  - Olivier Blais December 28, 2017 at 5:30 am #
    
    Hi Jason! Yes this works but it is very slow as this is not parallel. Do you understand why it cannot run in parallel and how to fix that?
    
    Thanks again !
    Olivier
    
    Reply
    - Jason Brownlee December 28, 2017 at 2:10 pm #
      
      The backend is parallelized and the two levels of parallelization are in conflict.
      
      Reply
Shabnam December 18, 2017 at 2:21 pm #

Thanks a lot for such a wonderful post. Overall, there are a lot of parameters that need to be tuned. I was thinking to use RandomizedSearchCV instead of GridSearchCV. Still, it will be time consuming for a lot of simulations. Do you have any suggestion for fast parameter tuning? For example, can we say that specific parameters have more effect on scores, so lets try to Grid/RandomizedSearchCV them first?

Reply
- Jason Brownlee December 18, 2017 at 3:32 pm #
  
  Yes, there are some great tips at the end of this post:
  https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-configure-batch-size/
  
  Reply
Henry December 20, 2017 at 10:51 pm #

Dear Jason,

Fantastic post, thank you for this wonderful tutorial.

I was wondering if it would be more appropriate to tune all the hyperparameters at one go instead of breaking it up into various parts as shown above – you may be doing it for the sake of visibility of how each component is tuned but would it be better to tune everything together since there might be “interactions between the hyperparameters” which would not be captured if they were tuned separately?

Reply
- Jason Brownlee December 21, 2017 at 5:25 am #
  
  If you have the resources, then sure.
  
  Reply
Hao January 3, 2018 at 2:26 am #

Hi Jason,

Many thanks for a series of excellent posts!

I have an extremely imbalanced data set to study, of which #negative : #positive is about 100:1. When I built the first model, I performed 10-fold validation and in each validation round, I use oversampling to add positive samples on training data, but not on testing data. Now I question is: if I want to perform hyperparameter search, how do I tell GridSearchCV() to do oversampling for each round of cross-validation?

Many thanks

Reply
- Jason Brownlee January 3, 2018 at 5:40 am #
  
  Good question, you might need to use a Pipeline and have data prep happen within it.
  
  Reply
Justin Solms January 7, 2018 at 11:24 pm #

Hello Jason

A good 2018 to you. I have a question about how Keras early stopping callbacks might be able to use the GridSearchCV k-fold generated validation data set as their val_loss or val_acc. The question I posted on StackOverflow but I wished to call your attention to it – should you so wish.

https://stackoverflow.com/questions/48127550/how-do-i-implement-early-stopping-with-keras-and-the-sklearn-gridsearchcv-cross

Kind regards,
Justin

Reply
- Jason Brownlee January 8, 2018 at 5:43 am #
  
  I would suggest not combing CV and early stopping.
  
  Reply
  - James March 11, 2018 at 6:16 am #
    
    Could early stopping be used as a substitute for grid searching epoch size?
    
    Reply
    - Jason Brownlee March 11, 2018 at 6:30 am #
      
      Yes, but you might need to code it up yourself. sklearn might blow up.
      
      Reply
shwetabh shekhar January 19, 2018 at 12:03 am #

Hello sir
if i have large dataset the also we can do this hyperparameter tunning .
If i have 70 to 80 feature column and about 50000 rows.
can we apply this tunnig

Reply
- Jason Brownlee January 19, 2018 at 6:31 am #
  
  Sure, you might need a large computer or to split the work up across many computers.
  
  Perhaps you can work with a sample of your data.
  
  Reply
shwetabh shekhar January 19, 2018 at 1:13 am #

how to select the hidden layer if i have largedataset mentioned as above

Reply
- Jason Brownlee January 19, 2018 at 6:33 am #
  
  I don’t follow, what do you mean exactly?
  
  Reply
Kafeel Basha January 29, 2018 at 5:48 pm #

Very good post.

Hyper Parameter Tuning: How can I do grid search on number of neuron/epochs or batch size using Keras interface in R.

Reply
- Jason Brownlee January 30, 2018 at 9:48 am #
  
  Sorry, I don’t have an example in R.
  
  Reply
neha February 2, 2018 at 6:34 am #

Hi,I am facing a basic query where i have training and test set.i built lstm on training and using history = model.fit(trainX, trainY, epochs=100, batch_size=50,
validation_data=(testX, testY), verbose=0, shuffle=False) to fit my model.
After this i tried to model.predict(testX) to get predicted Y values.Now that was basic code.i am now trying to apply gridsearch.what variation in the history statement code i have to make to apply grid =
GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(testX, testY, verbose=0, shuffle=False)

Reply
neha February 2, 2018 at 6:50 am #

can gridsearchcv work for time series as well?

Reply
- Jason Brownlee February 2, 2018 at 8:25 am #
  
  Not really. You will have to write your own for loops and perform walk forward validation.
  
  Reply
Jack February 2, 2018 at 8:01 pm #

Hi Jason, thank you for your great tutorial! My question here is about ‘grid_result.best_score’. In this article the best score seems to be the best mean score, but in a regression problem, the mean score is irrelevant, so I have to look for the best std score. Is that correct?

Reply
- Jason Brownlee February 3, 2018 at 8:35 am #
  
  Mean score in regression will be mean error. Not irrelevant.
  
  Reply
  - Jack February 3, 2018 at 8:15 pm #
    
    I see. But when I run the code, the ‘grid_result.best_score’ printed out the biggest score. I don’t think that’s right, cause in a regression problem I should look for the smallest mean error. Am I understanding this right?
    Below are the results:
    Best: 0.062234 using {‘optimizer’: ‘Nadam’}
    0.059561 (0.017101) with: {‘optimizer’: ‘SGD’}
    0.056818 (0.013662) with: {‘optimizer’: ‘RMSprop’}
    0.059617 (0.014734) with: {‘optimizer’: ‘Adagrad’}
    0.061506 (0.014503) with: {‘optimizer’: ‘Adadelta’}
    0.059331 (0.014835) with: {‘optimizer’: ‘Adam’}
    0.057696 (0.014828) with: {‘optimizer’: ‘Adamax’}
    0.062234 (0.010834) with: {‘optimizer’: ‘Nadam’}
    
    Reply
    - Jason Brownlee February 4, 2018 at 5:09 am #
      
      Yes, for regression it should be the smallest error. Are you using negative mse as the score function?
      
      Reply
      - Jack February 5, 2018 at 6:52 pm #
        
        I’m not sure. I just copied the codes from this tutorial and changed ‘KerasClassifier’ to ‘KerasRegressor’.I didn’t make any change other than that. I don’t understand how score function works and I’m not familiar with the concept of negative mse. Would you please elaborate?
      - Jason Brownlee February 6, 2018 at 9:12 am #
        
        You must specify a scoring function in sklearn, learn more about the API here:
        http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
        
        Here are examples:
        http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
Mohamed Abd-Allah February 4, 2018 at 9:09 am #

very good tutorial, But I have a small question. can I tune all these hyperparameters together or I should take a part of the dataset and tune them separately like the examples you mentioned.

Reply
- Jason Brownlee February 5, 2018 at 7:43 am #
  
  Ideally, you would tune them all together, but this is often to computationally expensive.
  
  Reply
Vidar February 8, 2018 at 7:40 am #

Is there a way to do similar things in R using the Caret package? Or other package that can help you with hyperparameter grid search when using Keras in R?

Reply
- Jason Brownlee February 8, 2018 at 8:33 am #
  
  I don’t know if Keras and caret are compatible, sorry.
  
  Reply
joseph February 8, 2018 at 3:17 pm #

hi Jason,

do i need to split the training data for cross validation, or only perform splitting on the input data.

Reply
- Jason Brownlee February 9, 2018 at 8:59 am #
  
  Why do you want to split exactly? You goals will help me answer your question.
  
  Reply
  - joseph February 9, 2018 at 10:36 am #
    
    Thanks Jason for the quick reply…i will figure that out.. Just another minor question, is there any way to perform data preprocessing on 3d input (due to the input shape for lstm)
    
    Reply
    - Jason Brownlee February 10, 2018 at 8:49 am #
      
      Sure, but it might be easier (or make more sense) to perform data prep prior to shaping data for the LSTM.
      
      Reply
      - joseph February 10, 2018 at 12:12 pm #
        
        Thanks Jason..i will try that out.. Is it a good idea to tune the hyperparameter using the keras wrapper, then apply those tuned parameters on lstm model? Hope to get some comments on it. Thank you.
      - Jason Brownlee February 11, 2018 at 7:51 am #
        
        You can. Or you can write your own for loop and tune the model directly.
      - joseph February 12, 2018 at 1:10 pm #
        
        Thanks a lot Jason.. i will definitely try that one out..
Boris Branson February 15, 2018 at 7:27 am #

Hi Jason, wonderful post. I love your books – amazing.

I wish to include callbacks in the Grid Search (one for TensorBoard and one for logging losses on every combination over the params).

I have something like:

loggerCB = keras.callbacks.TensorBoard(log_dir=’logs’, histogram_freq=0, write_graph=True)
class LossHistory(keras.callbacks.Callback):
def on_train_begin(self, logs={}):
self.losses = []
def on_batch_end(self, batch, logs={}):
self.losses.append(logs.get(‘loss’))

historyCB = LossHistory()

grid_search = GridSearchCV(estimator=model,
param_grid=fit_params,
scoring=’accuracy’,
cv=10)
grid_search = grid_search.fit(X_train, y_train, fit_params={‘callbacks’: [loggerCB, historyCB]})

BUT I got this error:
TypeError: Unrecognized keyword arguments: {‘fit_params’: {‘callbacks’: [, ]}}

How can I pass callbacks using Grid Search?

Thanks,
Boris Branson

Reply
- Jason Brownlee February 15, 2018 at 8:52 am #
  
  ry, I have not used callbacks with a grid search. You might need to write your own for-loops for the search.
  
  Reply
Alessandro February 17, 2018 at 4:02 am #

Hello Jason,
let me congratulate for the good post.

I am curious about the use of CV . Each time you call
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
you are compiling a new keras model with the new set of parameters.

Are these different models of keras, compiled one after another, accumulating in the memory? Would this imply a memory usage problem in the case of an extensive grid search with bigger models? Any tips?

Best,
Alessandro

Reply
- Jason Brownlee February 17, 2018 at 8:50 am #
  
  Yes, each model is evaluated and discarded.
  
  For larger models, you could run each fold on a different machine (e.g. run the eval manually).
  
  Reply
Boris Branson February 23, 2018 at 9:26 pm #

Hello Jason,

I see you have used only SGD in the example of learning rate parameterization. Is it possible to combine different values for lr with different optimizers (not only SGD) in one grid search or i’d need a for loop?

Reply
- Jason Brownlee February 24, 2018 at 9:11 am #
  
  Yes, but the more parameters you grid search at once, the slower the search.
  
  Reply
Priyansh February 25, 2018 at 7:39 pm #

I Jason your article is super useful, but I am having problem using it for MNIST data set which is a three dimensional data set , When I try so ‘fit’ this one gives me error, Dimension error. Can you do one for MNIST data set. Thanks a lot

Reply
- Jason Brownlee February 26, 2018 at 6:05 am #
  
  Perhaps try this tutorial:
  https://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/
  
  Reply
TonyWang February 27, 2018 at 10:20 pm #

Hi Jason, Great tutorial, always learn a lot from your post. I have question, is it possible to combine all the parameters and with gridsearch? Seems more than thousands of combinations. For some models it will cost few days or weeks. Is there any better solution for this? randomgridsearch or something else? Thanks again!

Reply
- Jason Brownlee February 28, 2018 at 6:04 am #
  
  Yes, but as you say, you will need a lot of time or a lot of parallel compute resources to get a result.
  
  Random search is often preferred because you can uniformly sample the domain and get good enough results quickly.
  
  Reply
  - TonyWang March 1, 2018 at 3:18 am #
    
    Thanks for your reply. Googled a lot but didn’t find any method to search optimizers and their params, say different optimizer, adam and it’s learning rates. Is there any suggestions? Thanks!
    
    Reply
    - Jason Brownlee March 1, 2018 at 6:16 am #
      
      Yes, just start searching for viable params on your model/data. No need to find confirmation.
      
      Reply
Johnson Muthii March 2, 2018 at 8:56 am #

Hello Jason,

Thanks for this awesome tutorial. Am very fresh in machine learning and your tutorials are so simplified and easy to follow.

Am encountering an error when i run the epochs and batch size tuning code. Kindly help

This the code part bringing the error…

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)
# define the grid search parameters
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs= 1)
grid_result = grid.fit(X_train, y_train)

TypeError: __call__() missing 1 required positional argument: ‘inputs’

Reply
- Jason Brownlee March 2, 2018 at 3:20 pm #
  
  Sorry, I have not seen this error. Are you able to confirm that you have copied all of the code and that your development environment is up to date?
  
  Reply
- Nathan Rasch August 27, 2018 at 5:52 am #
  
  I ran into this over the weekend, and hopefully to same some one else some pain down the road:
  
  I kept getting the following error when working the prediction section of my code, which frankly was driving me nuts:
  
  TypeError: call() missing 1 required positional argument: ‘inputs’
  
  After researching the error message I came upon this comment which let me to the resolution:
  
  _The thing here is that KerasRegressor expects a callable that builds a model, rather than the model itself. By wrapping your function in this way you can return the build function (without calling it)._ [Source](https://stackoverflow.com/questions/47944463/specify-input-argument-with-kerasregressor)
  
  Solution: I needed to **wrap** my buildModel() function! 🙁
  
  Once I ‘wrapped’ the buildModel() function the prediction code blocks finally started working. Git it a try, and it should resolve your issue. The link I provided above should give you a working code example. If not let me know, and I’ll post my working example for you.
  
  Thanks!
  
  Reply
  - Jason Brownlee August 27, 2018 at 6:15 am #
    
    It might be easier to write your own for loops to grid search Keras models.
    
    Reply
sonia March 7, 2018 at 2:05 am #

dear jason
how much time this program run while tunning ?like tuning epoch and batch size?

Reply
- Jason Brownlee March 7, 2018 at 6:16 am #
  
  It depends on the size of the dataset, the size of the model and the speed of your system.
  
  Reply
Yumlembam Rahul March 12, 2018 at 1:59 pm #

Hi,

As you mention in your blog “As we proceed through the examples in this post, we will aggregate the best parameters. This is not the best way to grid search because parameters can interact, but it is good for demonstration purposes.” does this mean we should so the hyper parameter search in one grid instead of dividing.

regrads,

Yumlembam Rahul

Reply
- Jason Brownlee March 12, 2018 at 2:28 pm #
  
  Ideally, if you have the time and resources.
  
  Reply
jessy March 15, 2018 at 8:51 pm #

sir,

I have tried above code. it is executing ,but not displaying results..i don’t know the reason ..

Reply
- Jason Brownlee March 16, 2018 at 6:17 am #
  
  Perhaps try from the command line, then be patient.
  
  Perhaps try to reduce the data set size or use fewer combinations?
  
  Reply
Yumlembam Rahul March 16, 2018 at 5:25 pm #

hi, in your example optimizer parameter are not specified while doing grid search do they assume default values if not specified??

and for reproducibility of result i added the following code and have been able to get same result

import os
os.environ[‘PYTHONHASHSEED’] = ‘0’
np.random.seed(42)
rn.seed(12345)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
from keras import backend as K
# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see: https://www.tensorflow.org/api_docs/python/tf/set_random_seed
tf.set_random_seed(1234)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)

Reply
jessy March 16, 2018 at 6:05 pm #

sir,

I have doubt ,Whether LSTM concept could be used for prediction of diabetes dataset(PIMA INDIAN DATASET)…I don’t know how LSTM Learns data from dataset..is it possible to put an hands on calculation..

Reply
- Jason Brownlee March 17, 2018 at 8:32 am #
  
  LSTMs are not appropriate for classification problems. They are intended for sequence classification:
  https://machinelearningmastery.com/sequence-prediction/
  
  Reply
jessy March 19, 2018 at 7:32 pm #

Is it possible to put an hands on calculation particularly for hidden layers and LSTM layers..Is it possible to put manual calculation on weights(how it transfer weight from one layer to another layer)…

Reply
- Jason Brownlee March 20, 2018 at 6:15 am #
  
  Sure, but you will need to code these as extensions to the Keras library.
  
  Reply
jessy March 23, 2018 at 9:26 pm #

sir ,
i have tried above code without n_jobs==-1 parameter .it is working …I have doubt ,that is above code can be run using LSTM model …is that possible…

Reply
- Jason Brownlee March 24, 2018 at 6:26 am #
  
  Perhaps set it to 1 thread and let Keras have all of the cores?
  
  Reply
Max March 25, 2018 at 1:12 am #

Hi Jason,

I’m sure it’s possible – but I can’t figure it out.
The above code gives me as a result the best hyper-parameters as measured on the cross-validation.
Now which adjustments to the code would be necessary to additionally calculate the optimum hyper-parameters on a test set?
The optimum hyper-parameters seem to lead to significantly different results when applied to my model that I use to predict values.

Thanks
Max

Reply
jessy March 28, 2018 at 7:12 pm #

sir ,
I have an doubt that is multivariate time series data can be used for classification or prediction .whether we can use that data for prediction or classification or both.

Reply
- Jason Brownlee March 29, 2018 at 6:32 am #
  
  You can learn the difference between classification and regression here:
  https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/
  
  Reply
jessy March 28, 2018 at 7:18 pm #

sir,
In LSTM model you are using only RMSE loss function …..why you are not used other loss function ..In particular sequence prediction problem (forecasting) you used only RMSE loss function ….why sir.

Reply
- Jason Brownlee March 29, 2018 at 6:33 am #
  
  I use MSE not RMSE. You can try other loss functions if you prefer. I find MSE loss function works well for most problems.
  
  Reply
hamidi March 29, 2018 at 4:57 am #

Hi
Thanks for your nice post.

Could you please let me know how to incorporate class_weight and tune it?

Reply
- Jason Brownlee March 29, 2018 at 6:38 am #
  
  Sorry, I do not have a worked example.
  
  Reply
Prabha April 13, 2018 at 8:58 pm #

Hello, great post as always!
I had a query regarding this. So I have a training set and a test set, and I am using a stacking ensemble for predictions.
So when I run GridSearchCV on this, should I fit just the training set on this and print CV score on the training set ONLY? And not touch the test set at all?
Also should I fit the new grid classifier on the set before printing the CV score or after?

Reply
- Jason Brownlee April 14, 2018 at 6:43 am #
  
  Yes, hold back the test set, and use the training set for CV.
  
  More on this here:
  https://machinelearningmastery.com/difference-test-validation-datasets/
  
  Reply
Aditya Jain April 17, 2018 at 1:53 am #

model = KerasClassifier(build_fn=create_model, verbose=0)
# define the grid search parameters
batch_size = [10, 20]
epochs = [10, 20, 30]
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(x_train, y_train)

When I am running this code snippet I am getting error as
AttributeError: ‘NoneType’ object has no attribute ‘loss’

Can you please help me on that ?

Reply
- Jason Brownlee April 17, 2018 at 6:02 am #
  
  Sorry, I have not seen this error before.
  
  Reply
Marshall April 28, 2018 at 5:28 am #

Hi Jason,

First and foremost, this is an incredible writeup – very informative.

I’m getting an error that reads “can’t pickle _thread.RLock objects”

When I use the following code:

————————————————————————–

def build_neural_network(n_predictors, hidden_layer_neurons):
“””
Builds a Multi-Layer-Perceptron utilizing Keras.

Parameters:
x_train: (2D numpy array) A n x p matrix, with n observations
and p features
y_train: (1D numpy array) A numpy array of length n with the
target training values.
hidden_layer_neurons: (list) List of ints for the number of
neurons in each hidden layer.

Returns:
model: A MLP with 2 hidden layers
“””
model = Sequential()
input_layer_neurons = n_predictors

model.add(Dense(units=hidden_layer_neurons[0],
input_dim=input_layer_neurons,
kernel_initializer=’uniform’,
activation=’relu’))

model.add(Dense(units=hidden_layer_neurons[1],
kernel_initializer=’uniform’,
activation=’relu’))

model.add(Dense(units=1))

model.compile(optimizer=’rmsprop’,
loss=’mse’)

return model

# columns variable defined elsewhere, works as expected

mlp = build_neural_network(len(columns), [8, 12])

model = KerasRegressor(build_fn=mlp)

# create parameter lists for GridSearchCV
batch_size = list(np.arange(10, 250, 10))
epochs = list(np.arange(5, 20, 5))

neural_net_grid_dict = {‘batch_size’: batch_size,
‘epochs’: epochs}

neural_net_grid = GridSearchCV(estimator=model,
param_grid=neural_net_grid_dict,
scoring=’neg_mean_squared_error’,
verbose=1,
n_jobs=-1)

mask = df[‘Date’] == ‘2006-11-06’
X, y = create_X_y(df[mask], columns)

grid_result = neural_net_grid.fit(X, y)

——————————————————–

Any idea what might be going on?

Reply
- Jason Brownlee April 29, 2018 at 6:21 am #
  
  Sorry, I have not seen this error. Perhaps try posting to stackoverflow?
  
  Reply
Cristiana April 29, 2018 at 4:52 am #

Thanks so much ! This post helped me a lot !

Reply
- Jason Brownlee April 29, 2018 at 6:28 am #
  
  I’m glad to hear that.
  
  Reply
- Sandra July 26, 2018 at 4:56 am #
  
  I am experiencing the same error “can’t pickle _thread.RLock objects”, may I know how you solved it?
  
  Reply
Juan May 9, 2018 at 4:16 am #

Hi Jason,

how can tune your model to found hyperparameters (learning rate, epoch and output dim in hidden layer) using RandomizedSearchCV?

Thanks !!
Regards

Juan

Reply
- Jason Brownlee May 9, 2018 at 6:28 am #
  
  Specify ranges and search. What is the problem exactly?
  
  Reply
June May 12, 2018 at 6:18 am #

Hi Jason, I got a help from this blog post. Thank you very much!

I have one question though. What if I want to test with optimizers that has customized parameters and not default parameters. From your example, it’s just an array of Strings of optimizers name.

Do you know how I can do this?

Best,
June

Reply
- Jason Brownlee May 12, 2018 at 6:52 am #
  
  You can provide lists of strings with optimizer names if you wish.
  
  Reply
  - June May 12, 2018 at 7:14 am #
    
    Yes. Isn’t this what’s provided in the example code?
    optimizer = [‘SGD’, ‘RMSprop’, ‘Adagrad’, ‘Adadelta’, ‘Adam’, ‘Adamax’, ‘Nadam’]
    
    What I meant was not with default ones but like when I have my own optimizer defined as follows:
    
    sgd_custom = SGD(lr_rate=0.7)
    adam_custom = (decay=0.005)
    
    How can I give optimizer list for this setting? optimizer=[sgd_custom, adam_custom]?
    
    Reply
    - Jason Brownlee May 13, 2018 at 6:02 am #
      
      Good question.
      
      Yes, you could provide a list of pre-configured objects to use instead of strings.
      
      Reply
Philipp May 15, 2018 at 3:52 pm #

Hi Jason,

Your posts are really helpful – thanks a lot!

1. I’m using grid search on my own Keras CNN and everything is working. One thing that keep’s confusing me though: The F1 measures reported by grid search are always a bit (3-4%) higher than when running the same network configurations in Keras directly. I know that Keras isn’t using CV, but this shouldn’t lead to systematic deviations in one direction but to deviations in both directions I think.

2. Also I found that my network is always performing slightly better (accuracy) when using the TF-Layers API instead of Keras, even though the network configurations are exactly the same (as far as I can control this in Keras).

Any ideas why Keras seems to perform poorer? Have others experienced the same issues with Keras? I just can’t figure it out…

Cheers,
Philipp

Reply
- Jason Brownlee May 16, 2018 at 5:58 am #
  
  No good idea sorry. It might be statistical chance, or it might be real. See if you can tease this out with some hypothesis tests on the results.
  
  Reply
Philipp May 19, 2018 at 4:33 am #

Thanks, Jason.

Just to let you know: Apparently it has something to do with the F1 score. Accuracy scores reported by grid search are pretty much the same as my results in Keras.

Reply
- Jason Brownlee May 19, 2018 at 7:45 am #
  
  Interesting.
  
  Reply
Ng Minh Hieu May 28, 2018 at 3:43 am #

Hi Jason, thank you for very detailed and interesting tutorial.
1. I tried to grid hyperparameters of epochs and batch size as your code. No result was launched and no error message appeared. after that, i changed n_jobs equal 1, python gave me the result. I do not understand why value of n_jobs = -1 prevented the calculation process.

2. If i have complicated network (with two layers for example), could you tell me how grid can be implemented with number of epochs and batch size?

Thank you a lot!

Reply
- Jason Brownlee May 28, 2018 at 6:03 am #
  
  Might have caused a deadlock internally.
  
  I don’t understand your second question sorry, perhaps you can rephrase it?
  
  Reply
Sumit May 28, 2018 at 7:21 pm #

Hi, Jason, excellent post and help lot for improving my predictive model.

I have one question, is there any way I can optimise number of layer in network ?

Reply
- Jason Brownlee May 29, 2018 at 6:25 am #
  
  Yes, use a grid search and choose the configuration with the lowest loss.
  
  Reply
John May 30, 2018 at 2:20 pm #

I tried the gird search but got this error

ipython-input-49-ea7e264ec276> in ()
3 param_grid = dict(batch_size=batch_size, epochs=epochs)
4 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
—-> 5 grid_result = grid.fit(xs, testY)
6 # summarize results
7 print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))

~\Anaconda3\envs\tfdeeplearning\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
612 refit_metric = ‘score’
613
–> 614 X, y, groups = indexable(X, y, groups)
615 n_splits = cv.get_n_splits(X, y, groups)
616 # Regenerate parameter iterable for each fit

~\Anaconda3\envs\tfdeeplearning\lib\site-packages\sklearn\utils\validation.py in indexable(*iterables)
196 else:
197 result.append(np.array(X))
–> 198 check_consistent_length(*result)
199 return result
200

~\Anaconda3\envs\tfdeeplearning\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
171 if len(uniques) > 1:
172 raise ValueError(“Found input variables with inconsistent numbers of”
–> 173 ” samples: %r” % [int(l) for l in lengths])
174
175

ValueError: Found input variables with inconsistent numbers of samples: [17, 1]

Reply
- Jason Brownlee May 30, 2018 at 3:09 pm #
  
  I have some suggestions here John:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
amina May 31, 2018 at 1:38 am #

Hey,
what refers 8 in the input dim ? i have a time serie problem a dataset with 41 observation how could i deal with this ?

Reply
- Jason Brownlee May 31, 2018 at 6:20 am #
  
  It refers to 8 input variables.
  
  You could define a window of lag obs as input features. Perhaps experiment with different window sizes.
  
  Reply
lara May 31, 2018 at 2:08 am #

could we use only one hiden layer that contain lstm bloc. i want to grid search hyperparametre for my lstm achitecture how could i specify this on code.

Reply
- Jason Brownlee May 31, 2018 at 6:23 am #
  
  Yes, you could adapt the above examples to search layers/nodes in an LSTM.
  
  Reply
Angelo June 18, 2018 at 5:06 am #

Astounding post, thank you! I wonder how I could evaluate the loss and accuracy evolution of the KerasClassifier according to epoch. Is there something like the history class returned from the model.fit method from SciKitLearn?

Reply
- Jason Brownlee June 18, 2018 at 6:45 am #
  
  Not that I am aware, I believe you would need to use the Keras API directly and collect history objects from each run.
  
  Reply
Babu July 2, 2018 at 6:49 pm #

Dear Jason,

I found this article as very useful for my research. Thank you very much.

Is it possible to find the best CNN architecture (No.of layers, Kernel size, Kernel initialization, Pooling Technique etc) for a given dataset by using GridSearch or RandomSearch?

Reply
- Jason Brownlee July 3, 2018 at 6:23 am #
  
  There is no “best”, just good enough based on the time and resources we have available.
  
  Reply
  - prateek bhadauria July 13, 2018 at 8:50 pm #
    
    Hello Jason Sir , i want to know that how could i apply CNN concept for non image data which contains large datasets in form of rows and coloumns , and how could i apply padding in 50,000 Rows and 20 coloumns , Kindly suggest an approach.
    
    Reply
    - Jason Brownlee July 14, 2018 at 6:17 am #
      
      CNN is not appropriate unless there is some spatial relationship between the observations, e.g. time or space.
      
      Reply
      - maxv April 29, 2019 at 3:01 am #
        
        Hi
        
        thanks for this post and the replies to questions.
        
        I have a question on the properties of the cnn, if you have a dataset like the pollution dataset.
        
        If we have one binary variable as target in a classification with 10 exogenous variables and it is a daily forecast.
        Let us say we have 500 days of data.
        
        I can create a multivariate timeseries forecast and have 5 timesteps in my window so that my train shape will be (500,5,10)
        
        If I apply Conv1D, it should extract features out of all the 10 variables right ?
        or does it apply a Conv1D on each exogenous variable separately.
        
        What I try to understand is : does it capture interactions of exogenous variables ?
        
        Does the Conv2D only work for images or for times series too ?
        
        For each window of 5 timesteps, we have 5 timesteps and 10 exogenous variables so we could think this is 2D.
        
        Thanks J
      - Jason Brownlee April 29, 2019 at 8:24 am #
        
        Yes, you van get stated here:
        https://machinelearningmastery.com/how-to-develop-convolutional-neural-network-models-for-time-series-forecasting/
      - maxv May 1, 2019 at 6:52 am #
        
        Hi
        I think you are pointing me again to the same tutorial but my questions come from this one.
        
        Questions see above.
        
        Question 1 :
        If I apply Conv1D, it should extract features out of all the 10 variables right ?
        or does it apply a Conv1D on each exogenous variable separately.
        
        Question 2 : does it capture interactions of exogenous variables ?
        
        Question 3 :
        
        Does the Conv2D only work for images or for times series too ?
      - Jason Brownlee May 1, 2019 at 7:11 am #
        
        If you have multiple parallel time series, you can use separate Conv1D layers for each or one and merge into the model OR one Conv1D layer and treat each time series as a separate channel.
        
        Test both, but I recommend the latter.
        
        In both cases, the model will capture interactions.
        
        No Conv2D can work for any data that has a temporal or spatial relationship in two dimensions.
James July 10, 2018 at 12:34 am #

Thanks for the tutorial Jason, very informative. I wonder if you know of a relatively un-intrusive way of reducing the memory footprint of Grid (or equivalently Random) SearchCV, since they seem to store every model produced during the search in memory, instead of e.g just the best. I’m handling 3d data and trying 3d cnns, so the models quickly get too big to have e.g 25 in memory at once.

Wondered about hacky divide and conquer strategies on a higher level, e.g if the full space for a parameter is

[1,5,10,15,20,25],

do a grid search of [1,5,10], keep best model (m1) and discard the rest, search [15,20,25], keep best (m2), then keep best of [m1,m2], but this would still be fiddly/somewhat arbitrary to get correct for a given amount of memory and parameter space. I’d rather not have to implement my own parameter search, but if I go too far down this route I may as well end up doing so

Thanks

Reply
- Jason Brownlee July 10, 2018 at 6:49 am #
  
  Split the search across multiple scripts and machines or implement the for-loops of the search yourself (preferred).
  
  Reply
Kemas Farosi July 11, 2018 at 8:50 pm #

Hi Jason,

Great tutorial, I have a question, is it possible to find how many hidden layers in my deep neural networks by grid search ? because i want to find the best layer numbers in my DNN.

thanks

Reply
- Jason Brownlee July 12, 2018 at 6:23 am #
  
  Sure.
  
  Reply
Vugar Bayramov July 19, 2018 at 11:33 pm #

Hi Jason!!

Awesome content. Thanks very much for your effort.

I have a question regarding the model with multidimensional output. What i mean is my y_train is an array with [value1, value2, value3] which i am trying to predict. While using the example above for selection of the best activation function for my probelm i got this error below:

ValueError: y_true and y_pred have different number of output

How can i solve this issue?

Regards

Vugar

Reply
- Jason Brownlee July 20, 2018 at 5:59 am #
  
  I believe scikit-learn does not support models that predict multiple outputs.
  
  Reply
- Helmut August 15, 2018 at 12:33 am #
  
  Did you tried using the KerasRegressor instead of the KerasClassifier?
  
  https://keras.io/scikit-learn-api/
  
  This worked for me for predicting multiple values.
  
  Reply
  - Jason Brownlee August 15, 2018 at 6:05 am #
    
    Nice.
    
    Reply
Nick July 27, 2018 at 9:37 pm #

While doing the grid search some combinations lead to a:
ValueError: Input contains NaN, infinity or a value too large for dtype(‘float32’).

so the grid search stops. Do you know if its possible just to skip these combinations to prevent the search from stopping or why this happens with some NN hyperparameters?

Regards

Reply
- Jason Brownlee July 28, 2018 at 6:35 am #
  
  Perhaps. It might be easier to run the grid search yourself with some for-loops.
  
  Reply
- Pramod Hankare May 20, 2020 at 2:45 pm #
  
  Hi Nick, did you eventually find a solution for this?
  
  Reply
billa July 30, 2018 at 10:29 pm #

Is it possible to tune the neurons inside the convolution layer for image classification?

Reply
- Jason Brownlee July 31, 2018 at 6:01 am #
  
  Sure.
  
  Reply
Zenon Uchida July 31, 2018 at 8:56 pm #

Do filters (in the code below) denote to number of neurons?
conv = Conv1D(filters=64, kernel_size=5, activation=’relu’)(embedding)
if not, should filters also be tuned?
I’m pretty sure kernel_size should be tuned.

Reply
- Jason Brownlee August 1, 2018 at 7:43 am #
  
  No, they are the number of filters.
  
  Yes, the number of filter pas and kernel size can and should be tuned.
  
  Reply
Khaw August 7, 2018 at 11:53 pm #

Thank you for your awesome explanation.

Is it possible to do the same grid search for hyperparametrs in the R package Keras? I do not find the equivalent of the gridCV function

Reply
- Jason Brownlee August 8, 2018 at 6:22 am #
  
  It may be, I don’t have an example, sorry.
  
  Reply
Beatriz August 15, 2018 at 10:05 am #

Hi Jason,

I’m trying to do a grid search in my Seq2Seq model.

I’m not sure if I understand the values X,Y I should put inside the grid.fit() function.

In my case, I tried two numpy arrays with three dimensions (samples, max length of words, number of characters)

Anyway, I’m not sure if that is the reason it is not working for me. I get the following error:

TypeError: Cannot clone object ” (type ): it does not seem to be a scikit-learn estimator as it does not implement a ‘get_params’ methods.

What do you think is going wrong?

Reply
- Jason Brownlee August 15, 2018 at 1:53 pm #
  
  You might need to implement the for-loops of your grid search manually in order to have more control over the process.
  
  Reply
ammara August 15, 2018 at 9:16 pm #

Thanks for such a great content!!
I have a query that what is the “random_state” used in deep models, is this a
hyper-parameter?if it is then how much it is important for model training. kindly guide me.
Thanks in advance.

Reply
- Jason Brownlee August 16, 2018 at 6:05 am #
  
  It seeds the random number generator, you can learn more here:
  https://machinelearningmastery.com/introduction-to-random-number-generators-for-machine-learning/
  
  Most algorithms use randomness in some way, and if you fix the seed, you get the same randomness each run. You can learn more here:
  https://machinelearningmastery.com/randomness-in-machine-learning/
  
  Reply
Hoo Yu Heng August 21, 2018 at 4:17 am #

For those who face the error of ‘cannot pickle object class’, make sure u use create_model and not create_model() in the KerasClassifier constructor:

model = KerasClassifier(build_fn=create_model, verbose=0, epochs=100)

not
model = KerasClassifier(build_fn=create_model(), verbose=0, epochs=100)

Reply
- Jason Brownlee August 21, 2018 at 6:20 am #
  
  Great tip.
  
  Reply
Natanos August 21, 2018 at 8:31 pm #

Sorry but when I run this program, it ends in “Using TensorFlow backend” and not finished in almost 3 hours.

Is this normal? if not, what should I do? thanks

Reply
- Jason Brownlee August 22, 2018 at 6:11 am #
  
  Perhaps try searching fewer parameters?
  
  Reply
- clemm September 19, 2018 at 7:00 pm #
  
  Hello,
  
  Same problem here with a gridsearch reduced to one epoch and one batch_size : the fit function never ends (keras version : 2.2.2). But the same code worked with an other computer (keras version : 2.0.5).
  
  Reply
  - Jason Brownlee September 20, 2018 at 7:56 am #
    
    Perhaps run the grid search manually? Just some for-loops.
    
    Reply
Nathan Rasch August 27, 2018 at 10:28 am #

Has anyone had a change to combine RandomizedSearchCV with SelectKBest?

I have a “FeatureUnion” that includes “SelectKBest”, but then the “model.add(Dense….” call in the model build function complains about the “input_dim” being incorrect. I’m not sure how to attach to the value “SelectKBest” is currently considering as part of the random search, so that I can feed it to the build model function as a param for “input_dim”.

Ex:
features = []
features.append((‘Scaler’, StandardScaler()))
features.append((‘SelectKBest’, SelectKBest( k = 5)))
featureUnion = FeatureUnion(features)

def buildModel(optimizer = ‘Adam’, lr = 0.001, decay = 0.0, epsilon = None):
opt = None
model = Sequential()
model.add(Dense(20, input_dim = ???? …)

We get a nice, juicy error about the input dim when running this. 🙁

If anyone has a working example or link to some one who does I’d be very grateful.

Thanks!
Nathan

Reply
- Nathan Rasch August 27, 2018 at 11:36 am #
  
  OK, solved my own issue:
  
  The key is just to remove the “input_dim” param from the “model.Add” method call. Then you can pass whatever values you want to test with as part of the params dict.
  
  Ex:
  
  # Notice we don’t have a “Input dim” param on the model.add call anymore
  def buildModel():
  model = Sequential()
  model.add(Dense(20, kernel_initializer=’normal’, activation = ‘relu’))
  
  # We add the SelectKBest__k values we want to test to the “params” dict:
  params = {
  ‘housingModel__epochs’ : [ 1, 2 ],
  ‘housingModel__batch_size’ : [ 15, 30, 65 ],
  ‘FeatureUnion__SelectKBest__k’: [5, 6, 7, 8, 9, 10]
  }
  
  # And create the FeatureUnion
  features = []
  features.append((‘Scaler’, StandardScaler()))
  features.append((‘SelectKBest’, SelectKBest()))
  featureUnion = FeatureUnion(features)
  
  And that’s that. 🙂
  
  Thanks!
  
  Reply
  - Jason Brownlee August 27, 2018 at 1:57 pm #
    
    Nice tip.
    
    Reply
- Jason Brownlee August 27, 2018 at 1:56 pm #
  
  Perhaps write your own for-loop or use regularization to let the model ignore irrelevant features?
  
  Reply
Piyush September 11, 2018 at 2:15 am #

@Jason Brownlee

Great tutorial, though I suggest to combine all chunks of code and give a one final code which tunes all hyper parameters at once, e.g., define a grid with all hyper parameters rather than focusing on them one by one.

Also, once the tuned hyper parameters are found, provide a code with predictive model with tuned hyper parameters which can be used in actual problem to predict class labels.

Reply
- Jason Brownlee September 11, 2018 at 6:31 am #
  
  Thanks for the suggestion.
  
  Reply
Michael Pappas September 29, 2018 at 7:23 am #

Does anyone else has two problems with the first example? I’m using theano as backend and I run into two errors:

1) RuntimeError: You can’t initialize the GPU in a subprocess if the parent process already did it (goes away when I change .theanorc to cpu instead of cuda0)

2) sklearn.externals.joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

Any ideas?

Reply
- Jason Brownlee September 30, 2018 at 5:59 am #
  
  Perhaps try running on the CPU as a first step?
  
  Reply
  - Michael Pappas October 2, 2018 at 1:04 am #
    
    Then I get the second error as mentioned above.
    
    Reply
    - FERNANDO FREGAPANE SCALIA October 6, 2018 at 2:52 am #
      
      I have the same error with all libraries updated.
      
      Any ideas, please?
      
      Reply
Vasileios Papanikolaou October 13, 2018 at 11:23 am #

Hey Jason, thank you for this excellent post and your whole contribution to the ML/DL community! It really means a lot. I have quick q: Let’s say that once you define the model architecture and perform your first grid search over – say one hyperparameter. How can you redefine the model using the optimal hyperparameter, without rewriting the ‘create_model’ function? Thanks a lot in adavance

Reply
- Jason Brownlee October 14, 2018 at 5:59 am #
  
  You can create the model directly, using the hyperparametres found via the search.
  
  Perhaps I’m missing something in your question?
  
  Reply
Janosh Riebesell November 4, 2018 at 9:04 pm #

Slight correction:

> We can see that the dropout rate of 0.2% and the maxnorm weight constraint of 4 resulted in the best accuracy of about 72%.

Should be either 0.2 or 20 %.

Reply
- Jason Brownlee November 5, 2018 at 6:12 am #
  
  Thanks, fixed.
  
  Reply
Robert Guenther November 6, 2018 at 5:51 am #

Jason,

Ditto all the good things said above. You definitely are fulfilling your mission of making us (data scientist) better at machine learning.

Thank you,
Robert

Reply
- Jason Brownlee November 6, 2018 at 6:36 am #
  
  Thanks Robert.
  
  Reply
sukhpal November 15, 2018 at 12:43 am #

when i run the above code i got this message

model = Sequential()
^
IndentationError: expected an indented block
kindly help me to remove this error

Reply
- Jason Brownlee November 15, 2018 at 5:35 am #
  
  Ensure you indent the code correctly.
  
  Here’s help on how to copy-paste the code:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-copy-code-from-a-tutorial
  
  Reply
Long November 15, 2018 at 2:03 pm #

Great tutorial as always,

I also had 1 experience with Keras & scikit-learn wrapper when doing the train-test split. It turned out that I should not use params like validation_split/validation_data in Keras because cross validation from GridSearchCV already takes care of that.

I would like to ask, should I use scoring metrics from Keras itself or should I use metrics provided by GridSearchCV?
The docs here is not really clear https://keras.io/scikit-learn-api

And how about other parameters (if available) that appear to be overridden by scikit-learn wrapper), which ones should I pick, keras or scikit-learn?

Thank you so much Jason.

Reply
- Jason Brownlee November 16, 2018 at 6:11 am #
  
  Probably use sklearn’s metrics.
  
  What other parameters exactly?
  
  Reply
sukhpal November 16, 2018 at 9:12 pm #

when i run the code i receive this message instead of output.kindly help me

runfile(‘C:/Users/sukhpal/untitled9.py’, wdir=’C:/Users/sukhpal’)
Using Theano backend.
C:\Users\sukhpal\Anaconda2\lib\site-packages\sklearn\cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
“This module will be removed in 0.20.”, DeprecationWarning)

Reply
- Jason Brownlee November 17, 2018 at 5:46 am #
  
  Looks like a warning, you can ignore for now.
  
  Reply
sukhpal November 17, 2018 at 1:34 pm #

but sir no output is displayed on screen

Reply
- Jason Brownlee November 18, 2018 at 6:38 am #
  
  I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
kamal November 17, 2018 at 4:44 pm #

sir as my program also gives no error but no output is displayed on screen

Reply
- Jason Brownlee November 18, 2018 at 6:38 am #
  
  Ensure you are running from the command line and wait a few minutes.
  
  Reply
kamal November 19, 2018 at 12:57 am #

sir when i run the code from command prompt it gives me this error
Traceback (most recent call last):
File “C:/Python27/oop1.py”, line 3, in
from sklearn.model_selection import GridSearchCV
File “C:\Python27\lib\site-packages\sklearn\__init__.py”, line 134, in
from .base import clone
File “C:\Python27\lib\site-packages\sklearn\base.py”, line 11, in
from scipy import sparse
ImportError: No module named scipy

Reply
- Jason Brownlee November 19, 2018 at 6:47 am #
  
  Looks like you need to install scipy, this might help:
  https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/
  
  Reply
Kareem JEIROUDI November 20, 2018 at 12:08 am #

Hey Jason,

A very helpful post, thanks for your efforts. However, I’m still wondering if you can put these optimizations together, do you think that’s possible?? And if so, how?
The problem is that in your examples, you could configure the learning rate and momentum only as you used SGD but not any other optimizer.
I’ll try to write a function such that one can specify all these parameters before grid-searching, plus I’d like to modify the number of layers in a network.
Let me know what you think about all this.
Thanks again for this awesome post!

Reply
- Jason Brownlee November 20, 2018 at 6:36 am #
  
  I’m not sure I understand, sorry. Perhaps you can elaborate?
  
  Reply
kamal November 20, 2018 at 2:15 am #

sir as i have installed theano keras now my run option in editor window of python disappear.sir how i run my program as there is no direct option of run.

Reply
- Jason Brownlee November 20, 2018 at 6:38 am #
  
  Perhaps it will take a while to run?
  
  Reply
kamal November 20, 2018 at 12:09 pm #

sir my program give this error now…plz help me
======================== RESTART: C:\Python27\oop1.py ========================
Using Theano backend.

You can find the C code in this temporary file: c:\users\sukhpal\appdata\local\temp\theano_compilation_error_ei4ugz

Traceback (most recent call last):
File “C:\Python27\oop1.py”, line 4, in
from keras.models import Sequential
File “C:\Python27\lib\site-packages\keras\__init__.py”, line 3, in
from . import utils
File “C:\Python27\lib\site-packages\keras\utils\__init__.py”, line 6, in
from . import conv_utils
File “C:\Python27\lib\site-packages\keras\utils\conv_utils.py”, line 9, in
from .. import backend as K
File “C:\Python27\lib\site-packages\keras\backend\__init__.py”, line 86, in
from .theano_backend import *
File “C:\Python27\lib\site-packages\keras\backend\theano_backend.py”, line 7, in
import theano
File “C:\Python27\lib\site-packages\theano\__init__.py”, line 110, in
from theano.compile import (
File “C:\Python27\lib\site-packages\theano\compile\__init__.py”, line 12, in
from theano.compile.mode import *
File “C:\Python27\lib\site-packages\theano\compile\mode.py”, line 11, in
import theano.gof.vm
File “C:\Python27\lib\site-packages\theano\gof\vm.py”, line 674, in
from . import lazylinker_c
File “C:\Python27\lib\site-packages\theano\gof\lazylinker_c.py”, line 140, in
preargs=args)
File “C:\Python27\lib\site-packages\theano\gof\cmodule.py”, line 2388, in compile_str
(status, compile_stderr.replace(‘\n’, ‘. ‘)))
Exception: Compilation failed (return status=1): The system cannot find the path specified.

Reply
- Jason Brownlee November 20, 2018 at 2:06 pm #
  
  Looks like your environment is not setup correctly, perhaps this will help:
  https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/
  
  Reply
kamal November 22, 2018 at 7:28 pm #

sir when i run the program in python ieditor window i encounter this problem
Using Theano backend.
WARNING (theano.configdefaults): g++ not available, if using conda: conda install m2w64-toolchain

Warning (from warnings module):
File “C:\Python27\lib\site-packages\theano\configdefaults.py”, line 560
warnings.warn(“DeprecationWarning: there is no c++ compiler.”
UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.

Reply
- Jason Brownlee November 23, 2018 at 7:47 am #
  
  Perhaps ignore these warnings for now.
  
  Reply
Steven Veenma November 26, 2018 at 1:50 am #

Thanks for this excellent tutorial. It helped me getting a feeling using different parameters. I use Keras/Tensorflow/GPU and with smaller grids this works fine. But when I search for a larger grid I am running into errors. The GPU seems to keep old models in the memory. clear.session() apparently is not been implemented in keras.wrappers.scikit_learn.KerasClassifier. I raised an issue of this at https://github.com/keras-team/keras/issues/11693 where you can find more details. If this can’t be solved I have two options:
1. Try it with Theano
2. Program a function myself that does the job and implement clear.session() in it
Or do you have another advice?

Reply
- Jason Brownlee November 26, 2018 at 6:20 am #
  
  Nice discovery.
  
  For larger searches, I recommend writing a custom for-loop, output results to file and even spread the search across multiple machines (sub-grids).
  
  Reply
Nate Star November 29, 2018 at 5:37 am #

Thanks for the great post! However, when we do hyper-parameter tuning, shouldn’t we be utilizing cross-fold validation and optimizing for the average validation error across folds? In this article we are optimizing for training accuracy which would bias our model towards the training data and may lead to parameters that do not generalize well.

Reply
- Jason Brownlee November 29, 2018 at 7:48 am #
  
  Yes, we are using cross-validation for tuning in this tutorial.
  
  Reply
MK December 24, 2018 at 6:24 am #

Hi,

First of all thank you so much for this great post! I have one question:

Is it possible to optimize both learning rate and optimizer type together ? Because i am getting error when i try, error tells me that “learning_rate” is not a legal parameter”. Could you please give me some hint about it.

Thank you in advance

Reply
- Jason Brownlee December 25, 2018 at 7:15 am #
  
  Not really, pick an optimizer (e.g. SGD), then tune the learning rate.
  
  Reply
Jitendra December 25, 2018 at 8:49 pm #

Hello Jason, I am building a stateful model and have initiated batch_size as 20. This works well while I am fitting the model after passing batch_size=batch_size in mode.fit.

However, batch_size=batch_size doesn’t seem to work while I am predicting on test set.

Is there a rule or something which I am missing which states that we have to use different batch sizes for train and test. One that I am aware of explains that train and test lengths have to be multiples of batch size. Request your help please.

Reply
- Jason Brownlee December 26, 2018 at 6:43 am #
  
  Batch size really only matters during training, it is part of SGD. It is only used for memory efficiency at test time – no effect on model skill.
  
  Reply
kamal December 26, 2018 at 4:08 pm #

sir can be combine all the codes into one to produce single optimal model

Reply
- Jason Brownlee December 27, 2018 at 5:40 am #
  
  The code finds a set of hyperparametres for configuring a model for your problem.
  
  Reply
Paul January 3, 2019 at 11:58 am #

Would it be beneficial to do nested cross validation instead? So first doing a gridsearch, and then cross validating the gridsearch results.
Thanks!

Reply
- Jason Brownlee January 4, 2019 at 6:24 am #
  
  It might be, it depends on how much data you have. Not enough and the results may be optimistic.
  
  Reply
Martin January 21, 2019 at 5:17 am #

In those examples there are no hidden layers. The first layer, i.e. the input layer, isn’t hidden layer. Is that right?

Reply
- Jason Brownlee January 21, 2019 at 5:36 am #
  
  There are input, hidden and output layers.
  
  Recall, the input layer is defined via the “input_dim” argument on the first hidden layer.
  
  More details here:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-define-the-input-layer-in-keras
  
  Reply
Martin January 21, 2019 at 5:53 am #

Thanks Jason. In keras the first input layer is the first hidden layer!

Reply
- Jason Brownlee January 21, 2019 at 11:57 am #
  
  No, the input layer is defined via an argument on the first hidden layer. A hidden layer is not an input layer.
  
  Reply
Alex January 31, 2019 at 11:36 pm #

Jason, thank you for the excellent post. This was exactly what I needed. I have a feed-forward MLP network that I use to predict dam water inflow (for energy generation in Brazil) from past rainfall. By combining Keras and GridSearchCV I managed to find the best set of hyperparameters for my task.

Reply
- Jason Brownlee February 1, 2019 at 5:39 am #
  
  Well done!
  
  Reply
jessy February 1, 2019 at 9:47 am #

sir,
i have tried above in anaconda prompt. its taking lot of time to execute …pls tell me another way(or) to execute same code in anaconda

Reply
- Jason Brownlee February 1, 2019 at 11:05 am #
  
  I have some suggestions:
  
  Perhaps try testing fewer hyperparameters?
  Perhaps try running on less data?
  Perhaps try running on a faster computer?
  
  Reply
jagon February 1, 2019 at 10:08 am #

jagon
give me an idea to execute same code in diiferent ways ….tell me steps

Reply
- Jason Brownlee February 1, 2019 at 11:05 am #
  
  What do you mean exactly?
  
  Reply
jagon February 8, 2019 at 9:54 am #

jagon
sir i have executed the above code in anaconda prompt,another way of executing same code ..tell me steps pls…

Reply
- Jason Brownlee February 8, 2019 at 2:08 pm #
  
  I explain how to run code here:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-run-a-script-from-the-command-line
  
  Reply
Wonbin February 11, 2019 at 9:18 pm #

Thank you so much for this really helpful post! I’ve been learning all about ML here in your posts since the beginning. Thank you 🙂

I have still one question now even though I read all your comments and links about the question that people have asked.
When it comes to regression tasks, we might configure in

———————————————-
scoring=’neg_mean_squared_error’
———————————————-

and the result will return ‘the negated value of the metric’ (this maybe means negative value?) like below.

——————————————————————————————
Best: -19222385.393424 using {‘optimizer’: ‘Adamax’}
-3704635991.649002 (2334839512.648289) with: {‘optimizer’: ‘SGD’}
-21009285.029564 (9977061.839532) with: {‘optimizer’: ‘RMSprop’}
-19966418.799906 (9785063.647908) with: {‘optimizer’: ‘Adagrad’}
-21064977.853754 (9371950.402550) with: {‘optimizer’: ‘Adadelta’}
-19659670.962081 (9634316.972027) with: {‘optimizer’: ‘Adam’}
-19222385.393424 (9437755.930065) with: {‘optimizer’: ‘Adamax’}
-19785109.598847 (9571852.777559) with: {‘optimizer’: ‘Nadam’}
——————————————————————————————

So should I just use 19222385.393424 as MSE instead of minus 19222385.393424?
Is the value (after deleting the minus) MSE of the model?
I couldn’t really get what the minus is meaning…

I look forward to your reply. Thank you for your help!

Reply
- Jason Brownlee February 12, 2019 at 8:01 am #
  
  Yes, the scikit-learn will invert the metric and make it negative. More here:
  https://machinelearningmastery.com/faq/single-faq/why-are-some-scores-like-mse-negative-in-scikit-learn
  
  That is a very large loss, perhaps the model can be improved?
  
  Reply
  - Wonbin February 12, 2019 at 2:36 pm #
    
    Cheers mate! I checked your Frequently Asked Questions section and it looked the minus can be just ignored. But another reason I was confused with the number (‘Best: -19222385.393424′ in here) is like below, please see the output of my code.
    
    In:
    ——————————————————————————————
    def create_model(optimizer=’adam’):
    …
    # Compile model
    model.compile(loss=’mae’, optimizer=optimizer, metrics=[‘mse’])
    …
    
    # create model
    model = KerasRegressor(build_fn=create_model, verbose=2, epochs=300, batch_size=256)
    ——————————————————————————————
    
    Out:
    ——————————————————————————————
    Epoch 1/1
    – 41s – loss: 2958.7100 – mean_squared_error: 45877955.0626
    ——————————————————————————————
    (I just set the eopch to 1)
    
    In:
    ——————————————————————————————
    # summarize results
    …
    ——————————————————————————————
    
    Out:
    ——————————————————————————————
    Best: -19222385.393424 using {‘optimizer’: ‘Adamax’}
    
    -3704635991.649002 (2334839512.648289) with: {‘optimizer’: ‘SGD’}
    -21009285.029564 (9977061.839532) with: {‘optimizer’: ‘RMSprop’}
    -19966418.799906 (9785063.647908) with: {‘optimizer’: ‘Adagrad’}
    -21064977.853754 (9371950.402550) with: {‘optimizer’: ‘Adadelta’}
    -19659670.962081 (9634316.972027) with: {‘optimizer’: ‘Adam’}
    -19222385.393424 (9437755.930065) with: {‘optimizer’: ‘Adamax’}
    -19785109.598847 (9571852.777559) with: {‘optimizer’: ‘Nadam’}
    ——————————————————————————————
    
    So, my question is why the best score(19222385.393424) quite different with the mean_squared_error (45877955.0626) which was the output of the first code?
    
    About your comment “That is a very large loss, perhaps the model can be improved?”,
    I didn’t transform the target variable (like log-transformation). Is transforming the target variable necessary in neural networks?
    
    Reply
    - Jason Brownlee February 13, 2019 at 7:48 am #
      
      Yes, I recommend scaling the target variable (and input vars) before modeling. See this post:
      https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/
      
      Reply
    - Mohammad May 26, 2020 at 7:54 am #
      
      I have the same problem.” why the best score(19222385.393424) quite different with the mean_squared_error (45877955.0626) which was the output of the first code?”
      
      what is the answer to this question?
      
      Reply
Wonbin February 15, 2019 at 2:34 am #

Thanks for your help always!
I spent about 2 weeks tuning hyperparameters, but very unfortunately I got in serious troubles and may have to redo all the things because of the two reasons…
I want to ask you two questions..
————————————————-
Q1) Is there a good guideline on the sequence of hyperparameters to be tuned?
e.g. What parameters should I last dive into especially to prevent from wasting time?

Q2) Between loss and metrics(like ‘mse, ‘mape’ and so on), which one should I see when choosing the best parameters?
————————————————-
Could you please give me some advice on the above questions….?

(p.s. I’m doing on a regression task)

Reply
- Jason Brownlee February 15, 2019 at 8:14 am #
  
  Yes, learning rate is key. More details here:
  https://machinelearningmastery.com/learning-rate-for-deep-learning-neural-networks/
  
  Minimizing loss (the thing optimized) is the area to focus on. The metric is the thing you care about when describing the performance of the model to others. Often better loss is better metrics, but not always.
  
  Reply
  - Wonbin February 15, 2019 at 1:48 pm #
    
    Thank you so much!!! You’re an angle
    
    Reply
Edu February 22, 2019 at 10:21 am #

Hi, Jones. Congratulations on the content. If possible, I would like to ask a question. I read and reread the part “How to Tune the Number of Neurons in the Hidden Layer” and I could not understand what would change in the code if the output was a value between 0 and 100 and not just a clipping of 0 and 1. I need to do ” gridsearch “in a time series using univariate LSTM. Sorry if my question is too simple. Thank you.

Reply
- Jason Brownlee February 22, 2019 at 2:46 pm #
  
  Good question, I recommend this post for tuning an LSTM:
  https://machinelearningmastery.com/how-to-grid-search-deep-learning-models-for-time-series-forecasting/
  
  Reply
Adrian February 22, 2019 at 10:51 pm #

Hi, I am trying to use the dropout seleccion in my funciontal CNN and I get this error when I execute:
Cannot clone object ” (type ): it does not seem to be a scikit-learn estimator as it does not implement a ‘get_params’ methods.

I have no idea how to solve it.
Thanks

Reply
- Jason Brownlee February 23, 2019 at 6:32 am #
  
  Sorry to hear that, perhaps try manually grid searching:
  https://machinelearningmastery.com/how-to-grid-search-deep-learning-models-for-time-series-forecasting/
  
  Reply
sukhpal February 23, 2019 at 11:32 pm #

sir please provide me the python code of plots for comparison of various algorithm like adam adagrad,adadelta rmsprop for different epochs

Reply
- Jason Brownlee February 24, 2019 at 9:09 am #
  
  Thanks for the suggestion.
  
  I believe you can perform this comparison yourself, perhaps start here:
  https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/
  
  Reply
daifeng March 4, 2019 at 7:15 pm #

Hi， recently, I’m training using keras for a large data sets, so only the function “fit_generator” in keras can be used. So, I am wondering how to use grid search for such function, for only “fit” is offered in GridSearchCV class.

Reply
- Jason Brownlee March 5, 2019 at 6:34 am #
  
  I recommend performing the grid search yourself with for-loops. Here’s an example:
  https://machinelearningmastery.com/how-to-grid-search-deep-learning-models-for-time-series-forecasting/
  
  Reply
ismetb March 13, 2019 at 9:03 pm #

Excellent post again Jason, thank you. My two questions are:

1) How can I search optimizer and learning rate together? When I write optimizer=optimizer(lr=lr), the code does not run.

2) Is there an importance order of parameters? I would like to grid search for several parameters and I want to group the most important ones

Regards

Reply
- Jason Brownlee March 14, 2019 at 9:21 am #
  
  I recommend sticking with SGD and testing different learning rates.
  
  Yes, order is critical, more here (under the section “how to configure”):
  https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-configure-batch-size/
  
  Reply
Teresa Lisanti March 15, 2019 at 9:12 am #

Hi Jason, i am using a LeNet and i would like to say how to use the grid search. Is ti necessary write a function that return a model? I have a file named lenet.py where i wrote my neural network whit the class LeNet; can i use this file instead of make a new function? Thanks you for your answer.

Reply
- Jason Brownlee March 15, 2019 at 2:26 pm #
  
  You can develop the grid search any way you wish.
  
  Reply
Neel March 16, 2019 at 7:44 pm #

Hi Jason, superhelful tutorial! Thanks for investing the time in writing this.

I am working on a multiclass classification problem using Keras.

Grid Search provides best parameters for the metrics defined in the model.compile. I tried disabling that and adding scoring= in the grid.fit command line. However, as others have also pointed out grid search only accepts “accuracy” as a valid score for optimisation and providing the best hyperparameters. For me the optimum hyperparameters would the one providing high accuracy, precision, recall on my unseen test data.

Is there a way I can save all models trained using gridsearch or as gridsearch iterates through a set of hyperparameter, I can extract the model and run a classification_report for that model. In the end, I want a model which gives the best results in the classification_report

Reply
- Jason Brownlee March 17, 2019 at 6:19 am #
  
  I believe you can provide any sklearn metric for the scoring function:
  https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
  
  Listed here:
  https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
  
  I recommend picking one metric, not using the report.
  
  Reply
Neel March 16, 2019 at 7:53 pm #

Part 2 of the question:

I am currently splitting my data using train_test_split (Note1) and passing the testX and testY as a cross validation in grid_result. Purpose is to define a cross validation set to a model.

Note 1:

(trainX, testX, trainY, testY) = train_test_split(datatraincv, labeltraincv, test_size=0.40, random_state=42, shuffle=True)

grid_result = grid.fit(trainX, trainY, validation_data=(testX, testY), callbacks=[early_stopping_monitor])

*Question*: Is this required or sklearn automatically splits data in Train/Cross Val when we use grid.fit?

__

Additionally I have “datatest” and “labeltest” data which I use to predict and get the actual results on unseen data (Note 2)

Note 2:
predictions = grid_result.best_estimator_.model.predict(datatest, batch_size=32)
print(classification_report(labeltest.argmax(axis=1), predictions.argmax(axis=1), target_names=target_names))

*Question*: Is this required or the cross val data that I feed in is also unseen to the Keras algo? I used to earlier code in Matlab and CV data was for Theta selection such that Theta Train = the one which gets the highest accuracy in the Cross Val data.

Reply
- Jason Brownlee March 17, 2019 at 6:20 am #
  
  The grid search will use k-fold cross validation to split the data.
  
  Typically a new model is refit using the best parameters after the tuning process.
  
  Reply
Arian March 19, 2019 at 3:04 am #

Hey Jason,

thanks for the nice tutorial, i really enjoyed it!
I would like to parallelize the process by setting n_jobs to -1 or something else than 1, but when i try to run fit on the grid i get this error:
“_pickle.PicklingError: Could not pickle the task to send it to the workers.”

I did some research and found out, that this has to do something with Keras Objects not being pickle compatible.

Do you know a solution for my problem or a different method to parallelize Gridserach for Keras on CPUs?

Thank you very much!

Reply
- Jason Brownlee March 19, 2019 at 8:59 am #
  
  You might have to run the grid search manually with for-loops.
  
  Reply
Patrick March 26, 2019 at 8:55 am #

Thanks for the detailed explanation! It’s very helpful. I have two questions for you:

In what order would you suggest to tune the parameters?
Which parameters should be tuned together?

Reply
- Jason Brownlee March 26, 2019 at 2:16 pm #
  
  Great question, I recommend focusing on ensuring your model has enough capacity (layers/nodes) then tune learning rate, then later start adding regularization.
  
  This section will give you tons of ideas:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
Jaime April 12, 2019 at 11:59 pm #

Hello Jason,

Is it possible to change the metric from Accuracy to mse. I am using LSTMs por trajectory forecasting, so the values are continous. I am havine the error “ValueError: The model is not configured to compute accuracy. You should pass metrics=["accuracy"] to the model.compile() method.”

Reply
- Jason Brownlee April 13, 2019 at 6:32 am #
  
  You must remove accuracy for regression problems.
  
  Reply
Jaime April 13, 2019 at 2:00 am #

Hello Jason,

Sorry for the fool question. I found how to do it.

Reply
- Jason Brownlee April 13, 2019 at 6:37 am #
  
  No problem.
  
  Reply
- LIFEN HUANG June 4, 2019 at 1:20 pm #
  
  Hello, do you know how to do if? Could you share to me because I also face this problem of The model is not configured to compute accuracy. Thanks a lot!
  
  Reply
abhijit April 20, 2019 at 5:10 pm #

Hello Jason,

I tried a custom AUC metric with gridsearch cv in keras , Can you just me where i am getting wrong. I am getting an error for this

Reply
- Jason Brownlee April 21, 2019 at 8:20 am #
  
  Perhaps try running the grid search manually?
  
  Reply
Bagus April 25, 2019 at 10:23 pm #

Hi Jason,

I don’t see splitting the dataset for test/train there. Is this (data splitting) done within cross-validation? If so, what is the percentage of division between train/test/validate inside the gridSearchCV?

Reply
- Jason Brownlee April 26, 2019 at 8:34 am #
  
  In the above tutorial, the dataset is split using k-fold cross validation as part of the grid search.
  
  Reply
thiagu May 3, 2019 at 10:48 am #

HI JASON,

IN THE ABOVE EXAMPLE CAN WE USE LSTM MODEL.IS THAT POSSIBLE..

Reply
- Jason Brownlee May 3, 2019 at 2:43 pm #
  
  I would not recommend it as sequence prediction is best evaluated using walk-forward validation or careful management if samples.
  
  For example, see this post:
  https://machinelearningmastery.com/how-to-grid-search-deep-learning-models-for-time-series-forecasting/
  
  Reply
  - thiagu May 3, 2019 at 3:04 pm #
    
    thanks a lot jason
    
    Reply
krs reddy May 20, 2019 at 7:33 pm #

jason,

how to optimize for neuons in different layers? when model has 2 or more hidden layers and task is to get optimal no.of neurons in each layer– how to proceed??

Reply
- Jason Brownlee May 21, 2019 at 6:33 am #
  
  We never get an optimal model, we get a good enough model.
  
  Try a suite of configurations and see what works well.
  
  Reply
krs reddy May 20, 2019 at 7:37 pm #

jason,

how to hyper parameter tune for optimization algorithm, learning rate and momentum in a single go??

here optimization algo is hyperparameter to a model , learning rate and momentum are hyperparameters to optimization function.

Reply
- Jason Brownlee May 21, 2019 at 6:34 am #
  
  Perhaps use multiple nested loops.
  
  Reply
Prem Alphonse May 21, 2019 at 2:31 pm #

Hi Jason,
As you have shown one by one tuning of each parameters, can it be done in same way and combine the best values to build final model, or each parameter change may depend on others which we have to do nested loops to find the best set of parameters.

Reply
- Jason Brownlee May 21, 2019 at 2:47 pm #
  
  Yes, you can use nested loops across each hyperparameter if you wish.
  
  Reply
Marco Sabatini May 31, 2019 at 5:09 am #

set cv=5 (0r 3) instead of jobs=-1

Reply
Niez Ghabi June 12, 2019 at 10:21 pm #

Hello,

You didn’t mention how we could tune many parameters, it’s like each parameter is tuned on it’s own. moreover, you haven’t set an example of tuning the number of hidden layers of the number of neurones in more than one hidden layer. Can you please set an example ?

Thank you

Reply
- Jason Brownlee June 13, 2019 at 6:17 am #
  
  It is easier to tune one parameter at a time, you can train more if you like, but it will require more time/compute.
  
  You can tune the number of layers if you wish, I left out that example.
  
  Reply
  - Niez Ghabi June 13, 2019 at 11:33 pm #
    
    When I tune the number of layers, do I have to choose a particular number of layers in the current model/estimator ? Can this parameter be also tuned along with others ?
    
    What combination would be better ? Tuning every parameter on it’s own or tuning all parameters together ?
    
    Thank you.
    
    Reply
    - Jason Brownlee June 14, 2019 at 6:45 am #
      
      I often recommend using a large model with a big capacity and use regularization, like weight decay to reduce overfitting.
      
      Nevertheless, you can tune the number of layers and nodes at the same time if you wish.
      
      Reply
      - Niez Ghabi June 14, 2019 at 5:31 pm #
        
        Okay, but since tuning many parameters at the same time will require more time to compute, I was wandering whether tuning every parameter on it’s own would provide the same result as tuning them all together.
        
        Thank you.
      - Jason Brownlee June 15, 2019 at 6:27 am #
        
        No, tuning parameters one at a time is an approximation of tuning all parameters at once.
        
        We typically avoid tuning all parameters at once because of the computational cost.
Niez Ghabi June 18, 2019 at 1:54 am #

Than you very muc, this helped a lot. I will try tuning parameters at a time then.

Reply
- Jason Brownlee June 18, 2019 at 6:41 am #
  
  I’m happy to hear that.
  
  Reply
Guirado June 27, 2019 at 11:35 pm #

Hello Jason! Thank you very much for your posts, they are the best teaching source I have found.

Do you know how could I use Grid Search without defining my model as you explain at the beginning of the post? This is because I did transfer learning to MobileNet freezing the weights of all the layers except the last ones.

Then, if you could also explain how to adapt the code for images to use it for my CNN MobileNet it would be really helpful.

Thank you.

Reply
- Jason Brownlee June 28, 2019 at 6:03 am #
  
  Yes, you can use your own for-loops, I give examples here:
  https://machinelearningmastery.com/how-to-grid-search-deep-learning-models-for-time-series-forecasting/
  
  Reply

DJ June 28, 2019 at 12:58 am #

Hi Jason

First of all thank you for providing this post.
It really helped me a lot. I adjusted the many parameters through gridsearch as you did and got a better model.

As shown in this post, when I train with train data, the accuracy is more than 70%. Then I tried to predict the test data with the best model I got. I then compared the actual label of the test data with the predicted value from the model through the test data.

By the way! Accuracy goes up and down nearly 20-30 %. Why is this accuracy so low? Both train data and test data are of the same type (pima-indian-diabates). I even used a dropout to prevent overfitting. I’m so embarrassed.

Jason Brownlee June 28, 2019 at 6:05 am #

Perhaps the model is a little unstable based on the small sample size.

Perhaps try some weight decay?

DJ July 2, 2019 at 6:05 pm #

Thank you, Jason.

First, the data is the same pima-indians-diabetes data as the example in this post.
The training data was divided into 461 samples and the test data was divided into 307 samples.

Then I created the KerasClassifier model in the following way.

def create_model(hidden_layers = 1, neurons =1, init_mode = 'uniform', activation = 'elu'):
  model = Sequential()
  model.add(Dense(neurons, input_dim=len(tr_data.T), kernel_initializer=init_mode, activation=activation, kernel_regularizer=l2(0.001)))
  model.add(Dropout(0.2))

  for i in range(hidden_layers):
    
    model.add(Dense(neurons, kernel_initializer=init_mode, kernel_regularizer=l2(0.001)))
    model.add(BatchNormalization())
    model.add(Activation(activation))
    model.add(Dropout(0.2))
  
  if class_count == 2:  
    model.add(Dense(1,activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
  elif class_count != 2:
    model.add(Dense(class_count-1, activation='softmax'))
    model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

  return model

def create_model(hidden_layers = 1, neurons =1, init_mode = 'uniform', activation = 'elu'):

model = Sequential()

model.add(Dense(neurons, input_dim=len(tr_data.T), kernel_initializer=init_mode, activation=activation, kernel_regularizer=l2(0.001)))

model.add(Dropout(0.2))

for i in range(hidden_layers):

model.add(Dense(neurons, kernel_initializer=init_mode, kernel_regularizer=l2(0.001)))

model.add(BatchNormalization())

model.add(Activation(activation))

model.add(Dropout(0.2))

if class_count == 2:

model.add(Dense(1,activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

elif class_count != 2:

model.add(Dense(class_count-1, activation='softmax'))

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

I used L2 regularization (Weight Decay) as you told me.

Then I did the Gridsearch with the following parameters and found the optimal parameters.

leaky_relu = tf.nn.leaky_relu
hidden_layers = [10,15,20]
neurons = [30, 70, 110]
activation = ['elu', leaky_relu]
init_mode = ['glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform']


param_grid = dict(hidden_layers = hidden_layers, neurons = neurons, init_mode = init_mode, activation = activation)
grid = GridSearchCV(estimator=keras_model, param_grid=param_grid, n_jobs= 1, cv=5)

leaky_relu = tf.nn.leaky_relu

hidden_layers = [10,15,20]

neurons = [30, 70, 110]

activation = ['elu', leaky_relu]

init_mode = ['glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform']

param_grid = dict(hidden_layers = hidden_layers, neurons = neurons, init_mode = init_mode, activation = activation)

grid = GridSearchCV(estimator=keras_model, param_grid=param_grid, n_jobs= 1, cv=5)

Then I predicted the test data with that model and got the results.

pred = grid.predict(ts_data)

accuracy = accuracy_score(pred, ts_ans)
ts_ans = ts_ans.astype(float)
precision, recall, fbeta_score, support = precision_recall_fscore_support(ts_ans, pred)
conf_mat = confusion_matrix(ts_ans, pred)

pred = grid.predict(ts_data)

accuracy = accuracy_score(pred, ts_ans)

ts_ans = ts_ans.astype(float)

precision, recall, fbeta_score, support = precision_recall_fscore_support(ts_ans, pred)

conf_mat = confusion_matrix(ts_ans, pred)

However, when I still predict the test data, the accuracy is about 20%. It’s the opposite of 7-80% when I predict train data.

Even when I was trying to prevent overfitting using both weight decay and dropout.

Is it because the amount of data is inevitably too small? I still think there is a problem if the accuracy is only 20%.

Or is there a problem with my model and learning process? I do not know. I even did cross validation …

Thank you, Jason. Can you tell me what the problem is?

DJ July 2, 2019 at 7:15 pm #

I’m sorry, jason.
Do not mind the comments above!
I found a mistake.
I accidentally saved labels backwards.
So the test result was only 20%.

I’m so sorry and thank you.
Ignore the above!

Reply
- Jason Brownlee July 3, 2019 at 8:31 am #
  
  No problem.
Jason Brownlee July 3, 2019 at 8:27 am #

It might be the case that the dataset is too small.

This may help you diagnose what is going on:
https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/

Reply

SOA July 4, 2019 at 2:20 am #

Hello Doctor Jason. I have sequential data. Can I tune the batch size and the Epochs using the KerasClassifier or the KerasRegressor?I try the kerasClassifer but it did not work.

The error was in grid_result = grid.fit(X, Y)

ValueError: Found input variables with inconsistent numbers of samples: [10584, 30246]

My ultimate goal is to predict the next time step or sequence using previous time steps. Thanks.

Reply
- Jason Brownlee July 4, 2019 at 7:51 am #
  
  I recommend tuning the model manually, for example:
  https://machinelearningmastery.com/how-to-grid-search-deep-learning-models-for-time-series-forecasting/
  
  Reply
  - Samuel Alfred July 9, 2019 at 5:28 am #
    
    Thank you
    
    Reply
zeinab July 20, 2019 at 3:30 pm #

Hi Jason,
Can I use grid search for selecting the best network (cnn, lstm, gru … )?

Reply
- Jason Brownlee July 21, 2019 at 6:26 am #
  
  If you have the resources.
  
  Reply
  - zeinab July 22, 2019 at 3:32 am #
    
    sorry, What do you mean by the resources? Can you give me an example?
    
    Reply
    - Jason Brownlee July 22, 2019 at 8:27 am #
      
      Compute resources – e.g. time and access to big machines.
      
      Reply
      - Chris Connolly October 23, 2019 at 1:28 am #
        
        Just a note, it’s easy enough to signup to Google Colab which allows you to run your jupyter code on some pretty powerful machines. There’s a little config that allows you to take advantage of using the GPU. Very useful for deep learning when you don’t have the compute resources available.
      - Jason Brownlee October 23, 2019 at 6:53 am #
        
        Thanks for the tip.
        
        Not a fan.
Charlotte Vereecke August 10, 2019 at 12:23 am #

Hello,
I would like to use this for chosing the best parameters to predict stock market prices.

But now I was wondering if the metrics ‘accuracy’ are suited in my ?
I would rather use mse, but apperently that doesnt work.
i did some research and found that i can use negative mse, but i don’t understand the outcome then
I get Best: -0.211220 using {‘batch_size’: 10}

Reply
- Charlotte August 10, 2019 at 1:04 am #
  
  this is the code i used:
  
  def create_model():
  model = Sequential()
  model.add(Dense(100, input_dim=1, activation=’relu’))
  model.add(Dense(1, activation=’sigmoid’))
  model.compile(optimizer = ‘adam’, loss = ‘mean_squared_error’, metrics=[“mean_squared_error”])
  return model
  
  seed = 7
  np.random.seed(seed)
  
  df = pd.read_csv(“ACKB.BR_LONG”, parse_dates = True, index_col=0)
  print(df.head())
  
  data = df.values
  
  data_train, data_test = train_test_split(data, train_size=0.8, test_size=0.2, random_state=1)
  
  scaler_X_Test = MinMaxScaler()
  scaler_X_Train = MinMaxScaler()
  scaler_Y_Test = MinMaxScaler()
  scaler_Y_Train = MinMaxScaler()
  
  X_train = data_train[:, 3]
  X_train = X_train.reshape(-1,1)
  X_train = scaler_X_Train.fit_transform(X_train)
  
  X_test = data_test[:, 3]
  X_test = X_test.reshape(-1,1)
  X_test = scaler_X_Test.fit_transform(X_test)
  
  Y_train = data_train[:, 6]
  Y_train = Y_train.reshape(-1,1)
  Y_train = scaler_Y_Train.fit_transform(Y_train)
  
  Y_test = data_test[:, 6]
  Y_test = Y_test.reshape(-1,1)
  Y_test = scaler_Y_Test.fit_transform(Y_test)
  
  model = KerasClassifier(build_fn=create_model, epochs=10)
  
  batch_size = [10, 20, 40, 60, 80, 100]
  epochs = [10, 50, 100]
  
  param_grid = dict(batch_size=batch_size)
  grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, scoring= ‘neg_mean_squared_error’)
  grid_result = grid.fit(X_train, Y_train)
  
  print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
  
  Reply
- Jason Brownlee August 10, 2019 at 7:21 am #
  
  Generally you cannot predict stock prices:
  https://machinelearningmastery.com/faq/single-faq/can-you-help-me-with-machine-learning-for-finance-or-the-stock-market
  
  Additionally, classification is not appropriate for regression problems, instead you must calculate error:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-calculate-accuracy-for-regression
  
  I hope that helps.
  
  Reply
  - Charlotte August 10, 2019 at 10:40 pm #
    
    So you mean I can not use gridsearch at all?
    
    Reply
    - Jason Brownlee August 11, 2019 at 5:57 am #
      
      You can grid search regression models. Perhaps re-read my previous comment, see you missed my point 🙂
      
      Reply
    - Charlotte August 11, 2019 at 6:36 am #
      
      Hello
      
      I changed my code to this
      
      model = KerasRegressor(build_fn=create_model, epochs=10)
      
      batch_size = [10, 20, 40, 60, 80, 100]
      epochs = [10, 50, 100]
      
      param_grid = dict(batch_size=batch_size, epochs=epochs)
      grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, scoring=’neg_mean_squared_error’)
      grid_result = grid.fit(X_train, Y_train)
      
      but i still get a negative scoring
      I don’t understand Why, because when I programme 1 modle without using gridsearch, i get a positive mse
      
      Can you please help me
      
      Reply
      - Jason Brownlee August 12, 2019 at 6:31 am #
        
        This is a common question that I answer here:
        https://machinelearningmastery.com/faq/single-faq/why-are-some-scores-like-mse-negative-in-scikit-learn
Prem August 21, 2019 at 9:35 am #

Hi Jason,
May I know which way is better,

-Tune each hyperparameter individually and find optimum as you explained, then put together to build the final model

– Tune all parameters together to build the model

Second Takes longer time than first

Thanks
Prem

Reply
- Jason Brownlee August 21, 2019 at 2:06 pm #
  
  Typically all parameters together is preferred, but if the search space is large, we can sacrifice some purity and test subsets of params, or even just the most important parameter, such as learning rate, then other parameters.
  
  Reply
  - Prem August 21, 2019 at 3:54 pm #
    
    Thanks Jason
    
    Reply
ABHIJEET NAYAK August 24, 2019 at 3:27 am #

Hey Jason,

First of all, Thanks for sharing such detailed and balanced post. I think all most 90% of the time your post are recommended by google on searching anything related to “Deep Learning”.

I have one query though, like you have said in this post:
“This is not the best way to grid search because parameters can interact, but it is good
for demonstration purposes”.

So, I tried to interact most of the parameters and run a grid search but its taking hell of a time to run. Its not showing errors also, can you please give a look at the codes and let me know if I have made some errors? Or is there any way to make it run fast?

Thanks Again!!

from keras.layers import Dropout
def create_model(learn_rate=0.01, momentum=0, dropout_rate=0.0, neurons=1):
# create model
model = Sequential()
model.add(Dense(neurons, input_dim=18, kernel_initializer=”he_normal”, activation=’relu’))
model.add(Dense(neurons, kernel_initializer=”he_normal”, activation=’relu’))
model.add(Dropout(dropout_rate))
model.add(Dense(neurons, kernel_initializer=”he_normal”, activation=’relu’))
model.add(Dropout(dropout_rate))
model.add(Dense(4, kernel_initializer=”he_normal”, activation=’softmax’))
# Compile model
model.compile(loss=’sparse_categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
return model
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

X = x_train2
Y = y_train1
# create model
model = KerasClassifier(build_fn=create_model, verbose=0)
# define the grid search parameters
learn_rate = [0.001, 0.01, 0.1, 0.2, 0.3]
momentum = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [2,5,10]
dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
neurons = [1, 5, 10, 15, 20, 25, 30]

#Grid_Search
param_grid = dict(batch_size=batch_size, epochs=epochs, learn_rate=learn_rate, momentum=momentum, dropout_rate=dropout_rate, neurons=neurons)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X, Y)

# summarize results
print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_[‘mean_test_score’]
stds = grid_result.cv_results_[‘std_test_score’]
params = grid_result.cv_results_[‘params’]
for mean, stdev, param in zip(means, stds, params):
print(“%f (%f) with: %r” % (mean, stdev, param))

Reply
- ABHIJEET NAYAK August 24, 2019 at 5:57 am #
  
  Hey Jason,
  
  It worked fine but it took a very long time even after I reduced both the data_set and the parameters variables to bare minimum. But it worked out finally!!
  
  Reply
  - Jason Brownlee August 24, 2019 at 8:03 am #
    
    Nice work!
    
    Reply
- Jason Brownlee August 24, 2019 at 7:57 am #
  
  I’m happy to answer specific questions, but I don’t have the capacity to review and debug your code, sorry.
  
  I have some suggestions here that may help:
  https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
  
  Reply
  - ABHIJEET NAYAK August 28, 2019 at 7:43 am #
    
    Yeah I get that.
    Thanks for your reply though!!
    
    Reply
Suraj Pawar September 1, 2019 at 3:03 am #

What is the default scoring metrics that KerasClassifier uses? is it accuracy?

Reply
- Jason Brownlee September 1, 2019 at 5:45 am #
  
  It might be accuracy.
  
  It’s probably a good idea to always specify a metric.
  
  Reply
John White September 3, 2019 at 2:11 pm #

Hi Jason, awesome tutorial! I have a conceptual question:

Even if we do find the best model after tuning, the weights will be different, yielding different models and results. So the best model for this maybe wouldn’t be the best if we compiled and ran it again with the “best parameters”. If we seed the weights with the parameters for reproducibility, we don’t know if those would be the best weights. On the other hand, if we tune the weights, then the “best parameters” won’t be best parameters anymore? I am stuck in a loop. Is there a general guideline on what parameters to tune first as opposed to others?

Or is this whole logic flawed somewhere and I am way overthinking? Thanks for your time!

Reply
- Jason Brownlee September 4, 2019 at 5:55 am #
  
  Thanks.
  
  Yes, this is why we try to find the best model on average, over multiple CV runs.
  
  You can also use techniques to reduce the variance of the final model, e.g. ensembles.
  
  Reply
  - John White September 4, 2019 at 12:22 pm #
    
    This makes sense. So essentially:
    
    1. Run GridSearchCV on say batch_size.
    2. Run the model with the best batch_size param multiple times.
    3. Take the best model repeat Step 1 with another param, say optimizers? Thank you!
    
    Reply
    - Jason Brownlee September 4, 2019 at 1:45 pm #
      
      That is one approach, and can be effective. Learning rate is a good parameter to start with:
      https://machinelearningmastery.com/learning-rate-for-deep-learning-neural-networks/
      
      Reply
Jack September 4, 2019 at 12:09 pm #

Hi Jason,
I would like to ask you a question. I used Keras+Gridsearchcv to adjust parameters of Convlstm network. Before, my input was [1000,33,1,11,8], and the output was [1000,11,33].Grid search can be normally performed, but I changed the input into [1000*33,1,1,11,8], and the output was [1000*33,11,1]. After the training of the first parameter combination, the error ‘Found array with dim 3. Estimator expectation <= 2' appeared.I want to know how to solve.Thx

Reply
- Jason Brownlee September 4, 2019 at 1:44 pm #
  
  Perhaps try a manual grid search instead? I’m not sure sklearn supports 3d input data.
  
  Reply
  - Jack September 4, 2019 at 7:40 pm #
    
    Oddly enough, I removed the scoring of parameters from the GridSearchCV, allowing grid search to directly inherit the ‘rmse’ in model.compile, which seems to solve the problem.I don’t know if my operation is correct
    
    Reply
    - Jack September 4, 2019 at 7:41 pm #
      
      Correct it, it’s ‘mse’ in model.compile
      
      Reply
    - Jason Brownlee September 5, 2019 at 6:51 am #
      
      Interesting. Thanks for sharing.
      
      Reply
Anirban September 6, 2019 at 7:32 pm #

Hi Jason,
Thanks for this really helpful tutorial.
While running the following code in google colab I am getting this error

“/usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:1978: FutureWarning: The default value of cv will change from 3 to 5 in version 0.22. Specify it explicitly to silence this warning.
warnings.warn(CV_WARNING, FutureWarning)”

here is the code

# ———————————————
# define the model
# ———————————————
def create_model( learn_rate=0.01, momentum=0, dropout_rate=0.0, weight_constraint=0, epochs = 10, verbose=2):
model = Sequential()
model.add(LSTM(50, input_shape=(1000,6),return_sequences = True))
model.add(LSTM(50, return_sequences = True))
model.add(LSTM(50, return_sequences = True))
model.add(Dense(1))
# Compile model
adam=keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model.compile(loss=’mean_squared_error’, optimizer=’adam’, metrics=[‘accuracy’])
return model

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# ————————
# create model
# ————————
model = KerasRegressor(build_fn=create_model, verbose=0)
# define the grid search parameters
#batch_size = [10 , 30]
#epochs = [10 , 20]
learn_rate = [0.001, 0.01]
dropout_rate = [0.0, 0.2]
# —————————————-
# grid search
# —————————————-
param_grid = dict(batch_size=batch_size, epochs=epochs, learn_rate=learn_rate, dropout_rate=dropout_rate)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(input_matrix, output_matrix,)

model.summary()
plt.figure(figsize=(12,6))
plt.plot(grid_result.history[‘loss’], label=’train’)
plt.legend()
plt.show()

# ————————————
# summarize results
# ————————————
print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_[‘mean_test_score’]
stds = grid_result.cv_results_[‘std_test_score’]
params = grid_result.cv_results_[‘params’]
for mean, stdev, param in zip(means, stds, params):
print(“%f (%f) with: %r” % (mean, stdev, param))

if you can help me out with this, it would be greatly appreciated.
Thanks
Regards
Anirban

Reply
- Anirban September 6, 2019 at 7:58 pm #
  
  And I forgot to mention that the execution of the code seems to be stopped, no epoch result or anything is coming whether any error is not showing also.
  
  Reply
  - Jason Brownlee September 7, 2019 at 5:26 am #
    
    Perhaps try running it on your workstation from the command line instead?
    
    Reply
- Jason Brownlee September 7, 2019 at 5:25 am #
  
  Looking like a warning that you can safely ignore, also this may help:
  https://machinelearningmastery.com/how-to-fix-futurewarning-messages-in-scikit-learn/
  
  Reply
mustafa mohammed September 8, 2019 at 6:48 am #

hello Jason Brownlee
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
def create_model():
model = Sequential()
model.add(LSTM(100, input_shape=(train_X.shape[1], train_X.shape[2])))#kernel_initializer=’uniform’ kernel_constraint=min_max_norm(min_value=s.all(),max_value=d.all()
model.add(Dropout(0.2))
model.add(Dense(1))#, activation=’sigmoid’))
#model.add(Activation(‘sigmoid’))
#layer.get_weights()
#weight = model.get_weights()
#np.savetxt(‘f:\\weight.csv’ , weight , fmt=’%s’, delimiter=’,’)
#model.get_layer(0).set_weights(y, r)
model.compile(loss=’mae’, optimizer=’adam’,metrics=[‘accuracy’])#mean_squared_error
# create model
return model

# Fit the model
#history = model.fit(train_X, train_y, epochs=150,validation_data=(test_X, test_y),batch_size=24,verbose=2,shuffle=False)

#pyplot.plot(history.history[‘loss’], label=’train’)
#pyplot.plot(history.history[‘val_loss’], label=’test’)
#pyplot.legend()
#pyplot.show()

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset

# split into input (X) and output (Y) variables
X = train_X#dataset.iloc[:,0:6]

Y = train[:, -1]#dataset.iloc[:,]
# create model
model = KerasClassifier(build_fn=create_model, verbose=0)
# define the grid search parameters
batch_size = [24, 48 ,40, 60, 80, 100]
epochs = [10, 50, 100]
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X, Y)
# summarize results
print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
print (“dhdh”,grid_result.best_score_ )
print (“dhbbbbbbbbdh”,grid_result.best_params_ )

means = grid_result.cv_results_[‘mean_test_score’]
stds = grid_result.cv_results_[‘std_test_score’]
params = grid_result.cv_results_[‘params’]
for mean, stdev, param in zip(means, stds, params):
print(“%f (%f) with: %r” % (mean, stdev, param))
history = model.fit(train_X, train_y, epochs=150,validation_data=(test_X, test_y),batch_size=batch_size,verbose=2,shuffle=False)

—> 56 history = model.fit(train_X, train_y, epochs=150,validation_data=(test_X, test_y),batch_size=batch_size,verbose=2,shuffle=False)

TypeError: unsupported operand type(s) for +: ‘int’ and ‘list’

What caused this error؟

Reply
- Jason Brownlee September 9, 2019 at 5:08 am #
  
  Sorry to hear that, perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
mustafa mohammed September 9, 2019 at 7:44 pm #

I am very tired in shred the solution
How to add initial weights from a csv file and bias from a csv file to an LSTM network of type regression. Note that the input consists of 6 nodes, one hidden layer of 100 knots and one node output.
I hope you help me

Reply
- Jason Brownlee September 10, 2019 at 5:44 am #
  
  No problem:
  
  1. You can load your csv file as a numpy array.
  2. Then define your model in keras.
  3. Then reshape the weights into the required format for each layer.
  4. Then call set_weights() on each layer with your weights.
  
  To discover the expected shape for weight arrays in each layer, use layer.get_weights() and check the size attribute.
  
  If this is a challenge, perhaps post to stackoverflow or hire a freelancer?
  
  Reply
  - mustafa mohammed September 10, 2019 at 9:26 pm #
    
    Can you give me the code?
    Note that the reshape of the input layer are (samples , timestep , feature)
    
    Reply
    - Jason Brownlee September 11, 2019 at 5:35 am #
      
      Sorry, I don’t have the capacity to prepare custom code for you.
      
      Perhaps hire a freelance programmer?
      
      Reply
kadar September 13, 2019 at 8:57 pm #

Hi, its a very good explanation and code i found. but when i tried it, i got errors for every parameter tuning saying that its not a legal parameter. can i know why is this happening.

1.
neurons = [1, 5, 10, 15, 20, 25, 30]
param_grid = dict(neurons=neurons)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(x, y)

Errror: neurons is not a legal parameter.

2.
learn_rate = [0.001, 0.01, 0.1, 0.2, 0.3]
momentum = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]
param_grid = dict(learn_rate=learn_rate, momentum=momentum)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(x, y)

Errror: learn_rate is not a legal parameter.

Same with every parameter happened. can you plz help in this.

Reply
- Jason Brownlee September 14, 2019 at 6:17 am #
  
  Sorry to hear that, did you try coping the complete example?
  
  Reply
Sahil September 15, 2019 at 1:37 am #

Hi Jason,

I get the below error at grid.fit() while running the code for a project on google colab:

PicklingError: Could not pickle the task to send it to the workers.

What could be the reason for this?

Reply
- Jason Brownlee September 15, 2019 at 6:23 am #
  
  Perhaps try running the example on your workstation from the command line?
  
  Reply
Naresh Kumar September 20, 2019 at 9:17 pm #

Thanks Jason for the amazing blog. But I have couple of questions

1. How do I interpret the mean and standard deviation for my results. Let’s say if I use hyper-parameter the network for activation function so I get best activation like ‘relu’ which gives mean and standard deviation.

2. Should I hyper-parameter the parameters for the AUTO ENCODER as well as I don’t know best parameter for the network ?

Reply
- Jason Brownlee September 21, 2019 at 6:52 am #
  
  Mean is the expected value and standard deviation is the average variation from the mean.
  
  Means can be compared directly, or statistical tests can be used to see if two “samples” are indeed different or if it is likely a statistical fluke:
  https://machinelearningmastery.com/statistical-significance-tests-for-comparing-machine-learning-algorithms/
  
  Most people just use the means to make a decision – for better or worse.
  
  Reply
  - Naresh Kumar September 23, 2019 at 4:40 pm #
    
    Thank you for your reply. Please give me an answer for the second question as well.
    
    Should I hyper-parameter the parameters for the AUTO ENCODER as well, as I don’t know the best parameters for the network ?
    
    Reply
    - Jason Brownlee September 24, 2019 at 7:39 am #
      
      Perhaps test it and see?
      
      Reply
Pooria October 2, 2019 at 8:44 pm #

Dear Jason thanks a lot for your wonderful blog I learned a lot of things here.
Unfortunately,I have a small problem, when I am trying to use grid search I would be grateful if you could help me,
The problem is that when I want to use the grid search for instance optimizer tuning, I get
0.165123 (0.233519) with: {‘optimizer’: ‘SGD’}
0.165123 (0.233519) with: {‘optimizer’: ‘RMSprop’}
0.165123 (0.233519) with: {‘optimizer’: ‘Adagrad’}
0.165123 (0.233519) with: {‘optimizer’: ‘Adadelta’}
0.165123 (0.233519) with: {‘optimizer’: ‘Adam’}
0.165123 (0.233519) with: {‘optimizer’: ‘Adamax’}
0.165123 (0.233519) with: {‘optimizer’: ‘Nadam’}

the same numbers for every optimizer but when I try them in a network with same structure (same number of layer and neuron and changing the optimizer manually) and compared the results I get different results for instance ADAM is way better than every others but according to grid search all of the optimizer make the same error and I don’t know what is happening here!
one point to add is that I have already checked the training and testing data set and they are correct.

Reply
- Jason Brownlee October 3, 2019 at 6:46 am #
  
  The results suggest tuning the optimizer might not be useful, perhaps try the learning rate or model capacity.
  
  Reply
Prem October 4, 2019 at 10:04 am #

Hi Jason,

Can you publish a paper for tensorflow keras with Scikit-Learn in Python along with Grid Search Hyperparameters for similar diabetes dataset please.

Thanks

Reply
- Jason Brownlee October 6, 2019 at 8:06 am #
  
  I believe you can adapt the above example.
  
  Reply
shahad October 20, 2019 at 11:15 am #

*How to Tune Batch Size and Number of Epochs*

on this step “grid_result = grid.fit(X, Y)” I get this error

BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

how can I fix it 🙁 🙁

Reply
- Jason Brownlee October 21, 2019 at 6:13 am #
  
  Ouch, sorry I have not seen that problem before.
  
  I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply

poya October 25, 2019 at 3:37 am #

Hi Jason, thanks for you awesome post,
I was wondering if you could by any chance can help me chose loss instead of accuracy to be optimized in the grid search. I mean which section I should change?

Thanks.

Jason Brownlee October 25, 2019 at 6:49 am #

Great question!

On the GridSearchCV set the “scoring” argument to one of these:
https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

Or run the grid search without sklearn – manually and tune the keras result directly.

amir November 5, 2019 at 1:19 am #

hi, I have the same question,

so I did what you suggested to this guy, but when I change the scoring to ‘neg_mean_squared_error’ from the link that you suggested I expect to get values which are close to my loss value but they are not!
loss: 4.2473e-05
Best: -0.039595 using {‘neurons’: 10}
am I doing something wrong here?
and also for the second suggestion can you explain a bit more?

thanks

Jason Brownlee November 5, 2019 at 6:56 am #

I meant that you can write a for loop and fit and evaluate a model manually for each config.

amir November 9, 2019 at 2:46 am #

actually I am relatively new to python so I was wondering if you could again put me in the right direction

so I tried this one, it kinda works but it gives me wrong results. so am I doing it the right way(I mean the for loops)?
Thanks

neurons=[40,100]
es = EarlyStopping(monitor='val_loss', mode='min', verbose=2, patience=20)
def test_model(neurons=20):
    
    model=Sequential()
    
    model.add(Dense(neurons,activation='relu',input_shape=(n_cols,)))


    model.add(Dense(1,activation='relu'))
    
    model.compile(optimizer='Adam',loss='mean_squared_error')
    
    return model


for neurons in neurons:
model=test_model(neurons=neurons)  model.fit(X_norm,Y,validation_split=0.3,batch_size=15,epochs=100,verbose=2,callbacks=[time_callback,es])

neurons=[40,100]

es = EarlyStopping(monitor='val_loss', mode='min', verbose=2, patience=20)

def test_model(neurons=20):

model=Sequential()

model.add(Dense(neurons,activation='relu',input_shape=(n_cols,)))

model.add(Dense(1,activation='relu'))

model.compile(optimizer='Adam',loss='mean_squared_error')

return model

for neurons in neurons:

model=test_model(neurons=neurons) model.fit(X_norm,Y,validation_split=0.3,batch_size=15,epochs=100,verbose=2,callbacks=[time_callback,es])

Jason Brownlee November 9, 2019 at 6:17 am #

I believe I have examples on the blog you can use as a starting point, for example:
https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/

SUBHANKAR BHATTACHARYA October 30, 2019 at 3:39 am #

Hi Jason,

I see in cross_val_score or in GridSearchCV , you have used n_jobs = -1 which means all cores of the CPU has to be used.

However, if I intend to use this on GPU, this parameter has to be set to default (NONE or 1).
I find the GPU training to be awefully slower than what is happening with CPU ..

I have successfully installed tensorflow-gpu with all needed CUDA, CUDNN libraries and I am quite confused with the performance…

Reply
- Jason Brownlee October 30, 2019 at 6:06 am #
  
  You might need to run the grid search on one CPU thread and let Keras/TF run on all GPU cores.
  
  Reply
  - SUBHANKAR BHATTACHARYA October 30, 2019 at 3:30 pm #
    
    How do i ensure.. that Keras/TF is running on all GPU cores.. ? I have a single GPU device in my system, that is getting identified.. but how do i know, that all the cores are getting used for parallelism …
    
    Reply
    - Jason Brownlee October 31, 2019 at 5:27 am #
      
      Good question.
      
      I show how to monitor GPU performance here:
      https://machinelearningmastery.com/command-line-recipes-deep-learning-amazon-web-services/
      
      Reply
pooria October 30, 2019 at 8:52 am #

Hi Jason, thanks for answering all comments/questions.

I have a major problem here that I don’t understand at all:

I used the Grid search using the following structure for instance just for tuning the epochs:

def create_model(neurons=50):
# create model
model=Sequential()

model.add(Dense(neurons,activation=’relu’,input_shape=(n_cols,),kernel_initializer=’uniform’))
model.add(Dense(neurons,activation=’relu’,kernel_initializer=’uniform’))
model.add(Dense(neurons,activation=’relu’,kernel_initializer=’uniform’))
model.add(Dense(neurons,activation=’relu’,kernel_initializer=’uniform’))
model.add(Dense(1,activation=’relu’,kernel_initializer=’uniform’))
# Compile model
model.compile(loss=’mean_squared_error’, optimizer=’adam’)
return model

model = KerasClassifier(build_fn=create_model, epochs=50, batch_size=15, verbose=2)
neurons = [10,90]
param_grid = dict(neurons=neurons)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1,cv=3)
grid_result = grid.fit(X, Y)
# summarize results
print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_[‘mean_test_score’]
stds = grid_result.cv_results_[‘std_test_score’]
params = grid_result.cv_results_[‘params’]
for mean, stdev, param in zip(means, stds, params):
print(“%f (%f) with: %r” % (mean, stdev, param))

and I am getting really ridiculous results such as:
loss: 892786.6804
which is irrelevant because I am just trying to tune epochs(that the smallest loss, for some folds I get even bigger errors),
and then when I just try Kfold with same structure I get reasonable loss:3.9826e-06
I am using the following structure:
def test_model():

model=Sequential()

model.add(Dense(50,activation=’relu’,input_shape=(n_cols,)))

model.add(Dense(50,activation=’relu’))
model.add(Dense(50,activation=’relu’))
model.add(Dense(50,activation=’relu’))

model.add(Dense(1,activation=’relu’))

model.compile(optimizer=’adam’,loss=’mean_squared_error’)

return model

estimators = []
estimators.append((‘standardize’, StandardScaler()))
estimators.append((‘mlp’, KerasRegressor(build_fn=test_model, epochs=50, batch_size=15, verbose=2)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10)
results = cross_val_score(pipeline, X, Y, cv=kfold)

it is driving me crazy can you help me out please? I don’t understand whats wrong I used both samples from you blog and just changed the name of variables to my inputs.

Thanks in advance.

Reply
- Jason Brownlee October 30, 2019 at 1:57 pm #
  
  I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
  
  Reply
Tom November 3, 2019 at 9:53 pm #

Hi,
Thanks for the great post, good to see the comments are still active 🙂

I have a question regarding the score during training.

When I’m training my CNN without grid search, I train it for 10 epochs with a validation set and use a ModelCheckpoint callback to save the weights with the best validation accuracy/loss (which usually occurs at an earlier epoch than 10, but not always the same one).

I would like to do something similar with grid search: to keep the number of epochs set to 10, and perform grid search on other hyper-parameters, with the score used for the search being the best validation accuracy/loss during the training (and not the score after the last epoch).

I couldn’t find a straightforward way to do this, do you know of one?

Thanks!

Reply
- Jason Brownlee November 4, 2019 at 6:41 am #
  
  You might want to grid search the model manually rather than use the sklearn grid search functionality. Just to give you more control over things like the checkpoint.
  
  Reply
Osman November 4, 2019 at 7:29 am #

Hello,
thanks for the great post. I could learn a lot about deep learning from this post.

But I have an issue with my model, It works very fine when n_jobs =1 but it takes forever to get it done but when set to n_jobs = -1 it computes really fast but I get this error -[BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.] I get this error when i run this command

grid_result = grid.fit(ip_train1, op_train1)

I’m working on windows os and using gpus cuda cores.

this is my model

def create_model():
model = Sequential()
model.add(Dense(27, input_dim =54, activation=’relu’))
model.add(Dense(14, activation=’relu’))
model.add(Dense(7, activation=’softmax’))

model.compile(loss=’categorical_crossentropy’ ,optimizer=’adam’ ,metrics=[‘accuracy’])
return model
model = KerasClassifier(build_fn=create_model, verbose=0)
# define the grid search parameters
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs= -1, cv=3, verbose=100)
grid_result = grid.fit(ip_train1, op_train1)

and I get this error

Fitting 3 folds for each of 18 candidates, totalling 54 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(54, 95564), dtype=int32) to new file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(95564, 7), dtype=float32) to new file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
Pickling array (shape=(63709,), dtype=int32).
Pickling array (shape=(31855,), dtype=int32).
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
Pickling array (shape=(63709,), dtype=int32).
Pickling array (shape=(31855,), dtype=int32).
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
Pickling array (shape=(63710,), dtype=int32).
Pickling array (shape=(31854,), dtype=int32).
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
Pickling array (shape=(63709,), dtype=int32).
Pickling array (shape=(31855,), dtype=int32).
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
Pickling array (shape=(63709,), dtype=int32).
Pickling array (shape=(31855,), dtype=int32).
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
Pickling array (shape=(63710,), dtype=int32).
Pickling array (shape=(31854,), dtype=int32).
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
Pickling array (shape=(63709,), dtype=int32).
Pickling array (shape=(31855,), dtype=int32).
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
Pickling array (shape=(63709,), dtype=int32).
Pickling array (shape=(31855,), dtype=int32).
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
Pickling array (shape=(63710,), dtype=int32).
Pickling array (shape=(31854,), dtype=int32).
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
Pickling array (shape=(63709,), dtype=int32).
Pickling array (shape=(31855,), dtype=int32).
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
Pickling array (shape=(63709,), dtype=int32).
Pickling array (shape=(31855,), dtype=int32).
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
Pickling array (shape=(63710,), dtype=int32).
Pickling array (shape=(31854,), dtype=int32).
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(54, 95564), dtype=int32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-8cbde13cf0af4e1497191139143e62d1.pkl
Pickling array (shape=(54,), dtype=object).
Memmapping (shape=(95564, 7), dtype=float32) to old file C:\Users\osman\AppData\Local\Temp\joblib_memmapping_folder_20416_5342399261\20416-2216706148448-a4606d1f13994127a2117d592ec1f062.pkl
Pickling array (shape=(63709,), dtype=int32).
Pickling array (shape=(31855,), dtype=int32).
[Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 3 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 5 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 6 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 7 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 8 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 10 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 11 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 12 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 13 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 14 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 15 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 16 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 17 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 18 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 19 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 20 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 21 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 22 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 23 tasks | elapsed: 2.8s
[Parallel(n_jobs=-1)]: Done 24 tasks | elapsed: 2.8s
Traceback (most recent call last):

File “”, line 1, in
grid_result = grid.fit(ip_train1, op_train1)

File “C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py”, line 688, in fit
self._run_search(evaluate_candidates)

File “C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py”, line 1149, in _run_search
evaluate_candidates(ParameterGrid(self.param_grid))

File “C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py”, line 667, in evaluate_candidates
cv.split(X, y, groups)))

File “C:\ProgramData\Anaconda3\lib\site-packages\joblib\parallel.py”, line 934, in __call__
self.retrieve()

File “C:\ProgramData\Anaconda3\lib\site-packages\joblib\parallel.py”, line 833, in retrieve
self._output.extend(job.get(timeout=self.timeout))

File “C:\ProgramData\Anaconda3\lib\site-packages\joblib\_parallel_backends.py”, line 521, in wrap_future_result
return future.result(timeout=timeout)

File “C:\ProgramData\Anaconda3\lib\concurrent\futures\_base.py”, line 432, in result
return self.__get_result()

File “C:\ProgramData\Anaconda3\lib\concurrent\futures\_base.py”, line 384, in __get_result
raise self._exception

BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

how to resolve this problem?

Reply
- Jason Brownlee November 4, 2019 at 1:29 pm #
  
  Perhaps set the number of jobs to 1, and let Keras/TensorFlow have access to all other cores?
  
  Reply
sukhpal December 16, 2019 at 10:52 am #

Optimising epochs – May be epoch does not need to be optimised as the model training could be stopped when the validation loss plateaus, which may happen at different epoch depending on the optimiser and or the dataset used.
The above may justify why the SGD optimiser might not have converged at 200 epochs, which was selected by the GSO. please explain why to optimise epoch.

Reply
- Jason Brownlee December 16, 2019 at 1:36 pm #
  
  Yes, you could use early stopping to choose the number of epochs for you:
  https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/
  
  Reply
sukhpal December 16, 2019 at 10:59 am #

we will look at tuning the number of neurons in a single hidden layer. We will try values from 1 to 30 in steps of 5.sir, Please, clarify what do you mean here by the resulting values ranging from 1 to 30?

Reply
- Jason Brownlee December 16, 2019 at 1:37 pm #
  
  The chosen range was arbitrary.
  
  Reply
Sukhpal December 17, 2019 at 1:50 am #

Wh grid search optimization is done in stages

Reply
- Jason Brownlee December 17, 2019 at 6:36 am #
  
  You can do all at once if you like. I break it down to make it easier for beginners to understand what is going on.
  
  Reply
kanda December 17, 2019 at 4:36 am #

Hi Mr. Brownlee and thank u for all your tutorials!

I’m runing gridsearchCV from sklearn to try to find the best model parameters following this tutorial. The gridsearch best score was 0.8404 (R2_score). however, I can’t reach this R2 accruacy again at all trying prediction on test igot 0.4696 and on the train data I get 0.7521: using:

p=gridCNN.best_estimator_.predict(xtest)
r2_score(np.asarray(ytest).ravel(), p)

also i tried to rebuild the model using the best parameters i got 0.6764 of accruacy

So how can I reach again the gridsearch accruacy ( 0.8404) noting that i noting that I eliminated the cross validation using this during the grid search. So I should have the same accruacy!!:

cv=ShuffleSplit(1, test_size=0.2, random_state=584)

and thank you in advance

Reply
- kanda December 17, 2019 at 4:37 am #
  
  here is the problem detailed : https://stackoverflow.com/questions/59349364/getting-low-accruacy-than-the-gridsearchcv
  
  Reply
- Jason Brownlee December 17, 2019 at 6:39 am #
  
  You’re welcome.
  
  It is possible that the grid search evaluation was optimistic. Perhaps change the grid search cv config to make it more robust?
  
  No, recall that we are estimating the performance of the model on unseen data, there will be noise in that estimate. More repeats/folds will give a more robust mean estimate of performance.
  
  Reply
  - kanda December 17, 2019 at 11:44 am #
    
    thank you for your fast reply !!! I didn’t really get what you want to say !! What should I do please !!
    
    Reply
    - Jason Brownlee December 17, 2019 at 1:37 pm #
      
      Increase the repeats (and maybe folds) of your cross-validation process to better estimate the mean performance.
      
      Reply
sukhpal December 22, 2019 at 12:52 pm #

Loss minimisation is important in determining how best a deep learning model will perform. mentioned the selected the best parameters by recording accuracies . Could you please justify why use accuracies instead of the loss to determine best model parameters?
observed the validation loss and terminate training to avoid overfitting. Can you please clarify these contradicting statements?

Reply
- Jason Brownlee December 23, 2019 at 6:44 am #
  
  It’s just an example of how to use the API, you can use any metric you like.
  
  Reply
sukhpal December 28, 2019 at 7:11 pm #

sir is this possible that i apply grid search on my input dataset and apply validation on another dataset of optimized model after grid search on input data

Reply
- Jason Brownlee December 29, 2019 at 6:01 am #
  
  Perhaps.
  
  Reply
kamal January 1, 2020 at 10:33 pm #

This sentence is not clear (The resulting values ranging from 1 to 30 can be utilized as a number of neurons in steps of 5). Please, clarify what do you mean here by the resulting values ranging from 1 to 30? sir as i got revision.please help me to clarify this

Reply
- Jason Brownlee January 2, 2020 at 6:42 am #
  
  5, 10, 15, 20, 25, 30.
  
  Reply
Sweta January 10, 2020 at 11:52 pm #

Hi Jason,
A very helpful tutorial. Thanks a lot!

I need to know, now that we have done hyperparameter tuning as above, how does one incorporate them into the actual training code? Do you have an example tutorial for it? I mean where exactly we add it in our actual code below?

For eg,
# Layer 1
model.add(Dense(128, input_dim=806, activation =’relu’))
model.add(Dropout(0.6))

#Layer-2
model.add(Dense(64, activation=’relu’))
model.add(Dropout(0.6))

Question 2 is, Do we need to add the Dropout layer after every layer or it is just once? Similarily for other hyperparameters. Do we need to add the hyperparameters all the time in all the layers?

Thanks!

Reply
- Jason Brownlee January 11, 2020 at 7:26 am #
  
  Thanks.
  
  Once you find a config that works well, you can fit a standalone model with that config.
  
  Test different uses of dropout an use what works best. Typically it is used after each hidden layer.
  
  Reply
  - sweta January 11, 2020 at 6:16 pm #
    
    Thanks for the response.
    
    I have already found the best config with the above examples. But i need to know how to fit a standalone model with that config? Do you have an example tutorial for it?can you help with this?
    
    Reply
    - Jason Brownlee January 12, 2020 at 7:59 am #
      
      Yes, perhaps use this as a starting point:
      https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
      
      Reply
      - sweta January 12, 2020 at 8:21 pm #
        
        Thanks a lot, Jason. Your tutorials and prompt replies help me learn better and faster.
      - Jason Brownlee January 13, 2020 at 8:19 am #
        
        You’re welcome.
George January 13, 2020 at 2:08 pm #

Hi Jason,
whats the difference between using ‘softmax’ and ‘categorical_crossentropy’ AND ‘sigmoid’ and ‘binary_crossentropy’.
Which accuracy matches to accuracy from confusion matrix?
When each combination to be used?

Reply
- Jason Brownlee January 14, 2020 at 7:13 am #
  
  Cross entropy is a loss function that can be used for binary or multi-class classification.
  
  Sigmoid and Softmax are activation functions. Sigmoid is for a binomial probability distribution, softmax is for a multi-class classification.
  
  Sorry, I don’t follow your final questions, perhaps you van elaborate?
  
  Reply
George January 14, 2020 at 10:55 am #

Got it, Thanks Jason

Reply
- Jason Brownlee January 14, 2020 at 1:48 pm #
  
  Happy to hear that.
  
  Reply
Surajit Chakraborty January 15, 2020 at 10:24 pm #

Hi,

Please find below my code that performs GridSearch along with Cross Validation using sklearn.model_selection.GridSearchCV for the mnist dataset that works perfectly fine.

x———————–Code Start ———————————–x——————————————————-x

# Build Function to create model, required by KerasClassifier

def create_model(optimizer_val=’RMSprop’,hidden_layer_size=16,activation_fn=’relu’,dropout_rate=0.1,regularization_fn=tf.keras.regularizers.l1(0.001),kernel_initializer_fn=tf.keras.initializers.glorot_uniform,bias_initializer_fn=tf.keras.initializers.zeros):

model = tf.keras.models.Sequential([

tf.keras.layers.Flatten(input_shape=(28, 28)),

tf.keras.layers.Dense(units=hidden_layer_size, activation=activation_fn,kernel_regularizer=regularization_fn,kernel_initializer=kernel_initializer_fn,bias_initializer=bias_initializer_fn),

tf.keras.layers.Dropout(dropout_rate),

tf.keras.layers.Dense(units=hidden_layer_size,activation=’softmax’,kernel_regularizer=regularization_fn,kernel_initializer=kernel_initializer_fn,bias_initializer=bias_initializer_fn)

])

optimizer_val_final=optimizer_val

model.compile(optimizer=optimizer_val, loss=’sparse_categorical_crossentropy’, metrics=[‘accuracy’])

return model

#Create the model with the wrapper

model = tf.keras.wrappers.scikit_learn.KerasClassifier(build_fn=create_model, epochs=100, batch_size=10, verbose=2)

#Initialize the parameter grid

nn_param_grid = {

‘epochs’: [10],

‘batch_size’:[128],

‘optimizer_val’: [‘Adam’,’SGD’],

‘hidden_layer_size’: [128],

‘activation_fn’: [‘relu’],

‘dropout_rate’: [0.2],

‘regularization_fn’:[‘l1′,’l2′,’L1L2’],

‘kernel_initializer_fn’:[‘glorot_normal’, ‘glorot_uniform’],

‘bias_initializer_fn’:[tf.keras.initializers.zeros]

}

#Perform GridSearchCV

grid = GridSearchCV(estimator=model, param_grid=nn_param_grid, verbose=2, cv=3,scoring=precision_custom,return_train_score=False,n_jobs=-1)

grid_result = grid.fit(x_train, y_train)

x———————–Code End ———————————–x——————————————————-x

My idea is to pass different optimizers with different learning rates , say Adam for learning rates 0.1,0.01 and 0.001. I also want to try out SGD with different learning rates and momentum values.

In that case , when I pass ‘optimizer_val’: [tf.keras.optimizers.Adam(0.1)], I get the error as given below:

Cannot clone object , as the constructor either does not set or modifies parameter optimizer_val

Please advise as to how can I rectify this error.

Thanks

Surajit

Reply
- Jason Brownlee January 16, 2020 at 6:16 am #
  
  I’m happy to answer questions, but I don’t have the capacity to review/debug your code.
  
  Reply
  - Surajit Chakraborty January 16, 2020 at 7:01 am #
    
    Hi,
    Thanks for your reply. I just gave the code for understanding. What I am struggling with is as given below with sklearn GridSearchCV by building only 1 parameter grid.
    
    My idea is to pass different optimizers with different learning rates , say Adam for learning rates 0.1,0.01 and 0.001. I also want to try out SGD with different learning rates and momentum values.
    
    Thanks
    Surajit
    
    Reply
    - Jason Brownlee January 16, 2020 at 1:30 pm #
      
      I recommend not tuning the optimizer and instead use SGD and focus on tuning the learning rate and momentum:
      https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/
      
      Reply
Surajit Chakraborty January 18, 2020 at 11:42 am #

Hi,

Thanks for your reply. Just a quick question. Can you help me with use cases as when to prefer adaptive learning rate optimizers like Adam and when to opt for SGD ?

Thanks
Surajit

Reply
- Jason Brownlee January 19, 2020 at 7:08 am #
  
  When you reach the limit of SGD try adaptive methods and see if they can do better.
  
  Or, use adaptive methods first to get a good result fast, then see if you can do better manually.
  
  Reply
abbas January 23, 2020 at 3:49 am #

thanks jason for always posting best things ever.
i want to know that can i run all these all it once??

batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
optimizer = [‘SGD’, ‘RMSprop’, ‘Adagrad’, ‘Adadelta’, ‘Adam’, ‘Adamax’, ‘Nadam’]
learn_rate = [0.001, 0.01, 0.1, 0.2, 0.3]
momentum = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]
activation = [‘softmax’, ‘softplus’, ‘softsign’, ‘relu’, ‘tanh’, ‘sigmoid’, ‘hard_sigmoid’, ‘linear’]
dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
neurons = [1, 5, 10, 15, 20, 25, 30]
init_mode = [‘uniform’, ‘lecun_uniform’, ‘normal’, ‘zero’, ‘glorot_normal’, ‘glorot_uniform’, ‘he_normal’, ‘he_uniform’]

Reply
- Jason Brownlee January 23, 2020 at 6:41 am #
  
  Perhaps, but it will be slow and most of the configs would be a waste of time to test.
  
  Reply
Raman January 27, 2020 at 12:01 pm #

Thanks Jason, this is a really good tutorial.
Could you please help me I am working on a regression problem using a deep learning, however getting a below error. Essentially I am trying gridsearch for regression problem

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3, scoring=’neg_mean_absolute_error’)

TypeError: Cannot clone object ” (type ): it does not seem to be a scikit-learn estimator as it does not implement a ‘get_params’ methods.

Reply
- Jason Brownlee January 27, 2020 at 2:33 pm #
  
  It might suggestion you are trying to grid search a non sklearn model.
  
  Reply
Jerry February 9, 2020 at 11:46 am #

Hey Jason great Tutorial,
Can you make another one using randomized search as the literature states that is more efficient!

Reply
- Jason Brownlee February 9, 2020 at 1:05 pm #
  
  Great suggestion, thanks!
  
  Reply
Randa February 16, 2020 at 11:03 pm #

Hello,
what is the solution with this error?
TypeError: If no scoring is specified, the estimator passed should have a ‘score’ method. The estimator does not.

Thank you

Reply
- Jason Brownlee February 17, 2020 at 7:48 am #
  
  You need to specify the “scoring” argument to the grid search.
  
  Reply
aggelos papoutsis February 25, 2020 at 5:23 pm #

Hi jason,

can i use train/test split with your examples above?

so instead to have grid_result = grid.fit(X, Y)

change to grid_result = grid.fit(x_train, y_train)

and then test withx_test, y_test

Reply
- Jason Brownlee February 26, 2020 at 8:15 am #
  
  The sklearn grid search only use cross-validation, not train-test splits.
  
  You will have to grid search manually, I believe.
  
  Reply
  - AGGELOS PAPOUTSIS February 26, 2020 at 4:59 pm #
    
    ok thank you i see. so grid search allows only to estimate the best parameters and then you can run another experiment and use all the other things like confusion matrix etc.
    
    Reply
    - Jason Brownlee February 27, 2020 at 5:38 am #
      
      Yes.
      
      Reply
debmalya March 4, 2020 at 1:04 am #

model = Sequential()
model.add(Dense(128, activation=’relu’, input_dim=n_input_1))
#model.add(Dense(50, activation=’relu’))
#model.add(Dense(25, activation=’relu’))
model.add(Dense(1))
model.compile(optimizer=’adam’, loss=’mse’)

seed = 7
np.random.seed(seed)

# define the grid search parameters
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(scaled_X, y)
# summarize results
print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_[‘mean_test_score’]
stds = grid_result.cv_results_[‘std_test_score’]
params = grid_result.cv_results_[‘params’]
for mean, stdev, param in zip(means, stds, params):
print(“%f (%f) with: %r” % (mean, stdev, param))

this is giving the error-

TypeError Traceback (most recent call last)
in
11 param_grid = dict(batch_size=batch_size, epochs=epochs)
12 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
—> 13 grid_result = grid.fit(scaled_X, y)
14 # summarize results
15 print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
607
608 scorers, self.multimetric_ = _check_multimetric_scoring(
–> 609 self.estimator, scoring=self.scoring)
610
611 if self.multimetric_:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\metrics\scorer.py in _check_multimetric_scoring(estimator, scoring)
340 if callable(scoring) or scoring is None or isinstance(scoring,
341 str):
–> 342 scorers = {“score”: check_scoring(estimator, scoring=scoring)}
343 return scorers, False
344 else:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\metrics\scorer.py in check_scoring(estimator, scoring, allow_none)
293 “If no scoring is specified, the estimator passed should ”
294 “have a ‘score’ method. The estimator %r does not.”
–> 295 % estimator)
296 elif isinstance(scoring, Iterable):
297 raise ValueError(“For evaluating multiple scores, use ”

TypeError: If no scoring is specified, the estimator passed should have a ‘score’ method. The estimator does not.

Reply
- Jason Brownlee March 4, 2020 at 5:57 am #
  
  Perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - debmalya March 12, 2020 at 3:01 pm #
    
    Iam using this model-
    
    model = Sequential()
    model.add(Dense(128, activation=’relu’, input_dim=n_input))
    model.add(Dense(1))
    model.compile(optimizer=’adam’, loss=’mse’)
    
    What should I keep as my metric then? This is a MLP for time series forecasting
    
    Reply
    - Jason Brownlee March 13, 2020 at 8:12 am #
      
      Choose a metric that allows you to see if a model meets the goals of your project.
      
      RMSE, MAE, and MAPE are common.
      
      Reply
debmalya March 12, 2020 at 3:20 pm #

Hi

I used this model-

model = Sequential()
model.add(Dense(128, activation=’relu’, input_dim=n_input))
model.add(Dense(1))
model.compile(optimizer=’adam’, loss=’mse’,metrics=[‘mse’])

This is a MLP for time series forecasting. Now when I am doing hyper parameter tuning I am getting this error-

TypeError: If no scoring is specified, the estimator passed should have a ‘score’ method. The estimator does not.

on this line-

grid_result = grid.fit(scaled_X, y)

Thanks in advance for your reply!

Reply
- Jason Brownlee March 13, 2020 at 8:12 am #
  
  Grid searching using cross-validation for time series forecasting problems would not be valid. See this:
  https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
  
  Reply
ahmed March 12, 2020 at 9:03 pm #

Thanks for this post.

Can you please tell me that how can we do grid search if we have dataset in the form of images?

Reply
- Jason Brownlee March 13, 2020 at 8:14 am #
  
  Manually, with for-loops over configs you want to test.
  
  Reply
debmalya March 14, 2020 at 1:08 am #

Hi

I am using MLP for time series forecasting and I wanted to do grid search for hyper parameter tuning-

from sklearn.model_selection import GridSearchCV
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# define the grid search parameters
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3,scoring=’neg_mean_absolute_error’)

grid_result = grid.fit(scaled_train,y_train_c)
# summarize results
print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_[‘mean_test_score’]
stds = grid_result.cv_results_[‘std_test_score’]
params = grid_result.cv_results_[‘params’]
for mean, stdev, param in zip(means, stds, params):
print(“%f (%f) with: %r” % (mean, stdev, param))

and I got this error-

TypeError Traceback (most recent call last)
in
9 param_grid = dict(batch_size=batch_size, epochs=epochs)
10 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3,scoring=’neg_mean_absolute_error’)
—> 11 grid_result = grid.fit(scaled_train,y_train_c)
12 # summarize results
13 print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
631 n_splits = cv.get_n_splits(X, y, groups)
632
–> 633 base_estimator = clone(self.estimator)
634
635 parallel = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\base.py in clone(estimator, safe)
58 “it does not seem to be a scikit-learn estimator ”
59 “as it does not implement a ‘get_params’ methods.”
—> 60 % (repr(estimator), type(estimator)))
61 klass = estimator.__class__
62 new_object_params = estimator.get_params(deep=False)

TypeError: Cannot clone object ” (type ): it does not seem to be a scikit-learn estimator as it does not implement a ‘get_params’ methods.

Please help me I couldnt find anything related to this. Thanks in advance!

Reply
- Jason Brownlee March 14, 2020 at 8:14 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
  
  Reply
Bianca March 27, 2020 at 9:46 am #

Hi Jason

Thank you for this post. I really enjoy all of them actually, as you manage to explain everything at a beginners level. However, am I right that this approach cannot be used for multi-classification problems? Keras requires one hot encoding and this confuses sklearn 🙁 It’s throwing an error, because my labels are thus one hot encoded.

I found the Keras Tuner (https://github.com/keras-team/keras-tuner), would you recommend this package?

Thanks in advance 🙂

Reply
- Jason Brownlee March 28, 2020 at 6:09 am #
  
  You’re welcome.
  
  You can use this approach for multi-class classification. You can convert the predictions back to class labels by calling predict_classes().
  
  Reply
  - Mahmoud April 28, 2020 at 2:39 am #
    
    Hi Jason,
    
    Would it be possible for you to give a working example, please? I have the same problem as Bianca.
    
    Thank you!
    
    Reply
    - Jason Brownlee April 28, 2020 at 6:49 am #
      
      See this:
      https://machinelearningmastery.com/multi-class-classification-tutorial-keras-deep-learning-library/
      
      Reply
ElHanzo March 27, 2020 at 11:32 pm #

hello,
i have a question of understanding , when using keras and gridsearchcv from scikit-learn.
as far as i understand, you are passing a keras model to the gridsearchcv function. the keras model then evaluates each set of hyperparameters the gridsearchcv function is passing to the model.(so the gridsearcher is just some kind of nesting of for loops that automatically saves the best combination of hyperparams)
when i.e. using 3 folds for cross validation, the model evaluates all three subsets (i.e. fold one and fold two for training and fold three for testing and so on) and calculates the mean/average of i.e. the accuracy score for all evaluations. and then you can see what the best hyperparam combination is and so on.
now, if i want to display the accuracy (or loss) for each epoch (when using this params) i normally what do something like (pseudo code):
-define a model
-grid_result = grid_searcher.fit this model with trainingset
-best_model = best_estimator_.model.model
-predicting model on testset

so when getting the history with:
-history = best_model.history.history

i could now get the [“acc”, “loss”, etc] in dependancy of the epochs and display them in a plot (epochs on x axis, accuracy on y axis ).
my first question is now, which acc value is saved in history? because when using 3 folds, there should be three accuracies for each epoch. or is it the mean/average for each epoch that is saved?

and if so, is there a way to get the accuracy for each epoch for mean/average of the test(=validation and not the testset i later want to predict) per epoch?

because i want to compare the accuracy for the training and test sets for each epoch.

and another question: you could also use “validation_split” for the model itself. if using this option with crossvalidation, would this mean, that the gridsearcher function is splitting into training and testing sets,
and then the model itself also is splitting into a training and test set and evaluating with respect to this test set? would this automatically cause some kind of overfitting, because the validation set is part of the training?

thanks in advance and kind regards

Reply
- Jason Brownlee March 28, 2020 at 6:19 am #
  
  sklearn calculates the accuracy as part of CV.
  
  You can collect and plot the histories across multiple cross-validation runs, I have demonstrated this here:
  https://machinelearningmastery.com/how-to-develop-a-cnn-from-scratch-for-fashion-mnist-clothing-classification/
  
  Reply
Nick Yang April 5, 2020 at 3:59 am #

Hi Jason, in the example you used a KerasClassifier wrapper, would a KerasRegressor wrapper work the same? In grid search, what is the scoring function for the regression? I’m trying to make sense of the score.

Thanks

Reply
- Jason Brownlee April 5, 2020 at 5:48 am #
  
  Yes.
  
  Use mse for regression scoring function, implemented as negative mean squared error:
  https://scikit-learn.org/stable/modules/model_evaluation.html
  
  Reply
Joel April 23, 2020 at 5:03 am #

Hi Jason, I’m running into an issue that the sklearn scoring metrics need 2d-array, whereas my training samples are 3d (for CNN). How do i make use of the metrics in this case?

Reply
- Joel April 23, 2020 at 5:58 am #
  
  I guess the problem is that i’m developing an autoencoder, which input == output shape. So i guess i need to reshape the scoring metrtic input into a 2d array first
  
  Reply
- Jason Brownlee April 23, 2020 at 6:12 am #
  
  Perhaps reshape your data to meet the expectations of the metrics?
  
  Reply
Zhang April 30, 2020 at 2:31 am #

Hi Jason,
Thank you very much for sharing your knowledge !
I have one question when tuning the epoch and batch_size. Even though I put the numpy seed, every time I get different results in terms of best_parameters and best_accuracy.
I understand that if we don’t avoid randomness, the parameter tuning will have no sense, because with each pair of epoch and batch_size, the test runs on a different model (random initial weight or else).
Could you confirm my thought?
Do you have a way to avoid randomness in the grid search?
Best regards,

Reply
- Jason Brownlee April 30, 2020 at 6:51 am #
  
  Go the other way and control for the random nature of learning/searching – use repeated stratified k-fold cv and calculate the average.
  
  Reply
manar May 2, 2020 at 2:35 pm #

Hi Jason,
I’m working on multi task learning .what about “grid .fit (x,y) ” which i have two outputs ?
and (x,y) are the training data or the all data?

Reply
- Jason Brownlee May 3, 2020 at 6:06 am #
  
  You might have to perform the grid search manually.
  
  Reply
Soothy May 2, 2020 at 11:38 pm #

Hi Jason, Can you maybe make a blog on how we can visualize the results of the hyperparameter tuning through interactive graphs?
Thanks!

Reply
- Jason Brownlee May 3, 2020 at 6:13 am #
  
  Thanks for the suggestion.
  
  I’m not a big fan of interactive graphs though, sorry.
  
  Reply
Richard May 16, 2020 at 6:07 am #

Hi Jason. Thank you so much for all this information, it is literally a lifesaver for my ML class!

I am a bit confused though and hoping you could clarify. I am currently setting up a NN to predict house prices. I have 1800 observations and 11 input parameters.

Does it make sense to just start straight away with the hyperparameter optimization using Gridsearchcv? I am optimising neurons, epochs, hidden_layers and dropout_rate at the same time to find the best model to use based on MSE. Is that senseful?

Then after selecting these hyperparameters above I use gridsearchcv again to select the best learning rate and momentum for the GD by selecting the model with the lowest MSE.

Am I skipping any necessary steps?

Reply
- Jason Brownlee May 16, 2020 at 6:26 am #
  
  You’re welcome.
  
  Yes, perhaps.
  
  There are many things to try:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
ll May 20, 2020 at 2:45 pm #

Great post!

Reply
- Jason Brownlee May 21, 2020 at 6:09 am #
  
  Thanks!
  
  Reply
Adrian Garcia Badaracco May 20, 2020 at 6:25 pm #

Hi Jason,

First off, thank you for this article. It was very helpful when I was first learning how to integrate Keras and Scikit-Learn.

I actually ended up submitting a PR to the tensorflow team to fix a lot of the issues with these wrappers, some of which are reported in other comments in this very post (ex: “the can’t pickle _thread.RLock objects” issues).

That PR ended up turning into an entire package, that now fixes dozens of open issues in tf. If you can, I’d appreciate it if you took a look, and maybe updated the article to use it, if you think that is appropriate. Any feedback is welcome!

https://github.com/adriangb/scikeras
https://pypi.org/project/scikeras/

Thanks!

Reply
- Jason Brownlee May 21, 2020 at 6:13 am #
  
  You’re welcome.
  
  Nice one!
  
  Reply
sukhpal May 21, 2020 at 10:21 pm #

Sir as i applied grid search optimization with deep learning to produce optimal model.How i can furthur improve performance of optimized model.
Is there any other machine learning techniques to apply on optimized deep model after grid search optimization

Reply
- Jason Brownlee May 22, 2020 at 6:08 am #
  
  Yes, the suggestions here will give you ideas:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
Mohammad May 26, 2020 at 8:09 am #

Hello,

why the MSE in the result is different than the scoring value?
I used a grid search for the regression model. To see how the model works, I make all the hyperparameters have only one value like the following:
———
# Create hyperparameter space
epochs = [80]
learning_rates = [0.001]
n_filters= [32]
———

then I run the grid search as the following:

———
# Create grid search
grid = GridSearchCV(estimator=my_network, scoring=’neg_mean_squared_error’, refit=’neg_mean_squared_error’,
cv=5, param_grid=hyperparameters)

# Fit grid search
grid_result = grid.fit(X_train, y_train, batch_size= batch_size)
———-

The MSE on the results is different than the scoring of each k-fold. for example, the last MSE is 0.007 and the last score is (split4_test_score’: array([-0.27008066]),)

why it is different ?

thanks

Reply
- Jason Brownlee May 26, 2020 at 1:20 pm #
  
  We are minimizing cross entropy and evaluating using accuracy.
  
  Reply
  - Mohammad May 27, 2020 at 12:25 am #
    
    sorry I don’t get you. How can I make it MSE not accuracy then? because, as far I know, the regression models should not by in “accuracy”.
    
    Thanks
    
    Reply
    - Jason Brownlee May 27, 2020 at 7:56 am #
      
      I thought you were referring to the above tutorial where all examples are classification.
      
      Sorry, I don’t understand your question. Perhaps you can simplify or rephrase it?
      
      Reply
      - Mohammad May 27, 2020 at 9:11 am #
        
        sorry for not being clear.
        
        – I wrote a CNN regression model, in which MSE is used to evaluate the model
        – used Grid search code as mentioned in my first post, in which neg_mean_squered_error is used
        – during the code running, I can see the MSE value which has a low value (e.g. 0.007).
        – from grid.result, the score is high (e.g. 0.2).
        
        why the score value is different than the MSE value?
        
        ——
        I hope I make it clear.
        by the way, I have similar problem to this
        https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/#comment-536437
      - Jason Brownlee May 27, 2020 at 1:27 pm #
        
        A CNN is in appropriate for regression, unless it is a sequence input.
        
        If it is a sequence input, using a grid search in sklearn would be inappropriate as you will need to manually walk-forward validation to evaluate the model.
        
        Finally, to answer the specific question, perhaps you are using CV in the grid search which will average across multiple runs and you are comparing this to one value from one model at one point during training?
      - Martha June 6, 2020 at 1:50 am #
        
        Hello Jason,
        
        I am working on my Master thesis about the prediction of the pollution, basically a regression problem, and I need to find the best parameters for differents models.
        
        I’m surprised about your last answer…Because I have prepared a grid search over a KerasRegressor using TimeSeriesSplit as CV.
        
        What is the alternative to find the best parameter in a timeseries problem?
        
        Thanks in advance!
      - Jason Brownlee June 6, 2020 at 7:57 am #
        
        My preference is a manual grid search where models are evaluated using walk-forward validation.
tbob May 26, 2020 at 6:37 pm #

I’ve been trying to run the code with my own parameters, specifically learning rate & weight initialisation, however i keep running an error which states that learn_rate isn’t a legal parameter? I understand that it’s not a legal parameter in the fit method for grid, but don’t quite understand how you got yours working in the code?

Reply
- Jason Brownlee May 27, 2020 at 7:45 am #
  
  Perhaps try tuning the parameter manually with your own loop?
  
  Reply
Shadi June 21, 2020 at 11:36 pm #

Hello Jason! Thank you very much for your posts 🙂
This is my question:How can I use early stopping in my code?where should I put it?

# callbacks=[tf.keras.callbacks.EarlyStopping(monitor=’val_loss’, patience=10,mode=”auto”)]

################################################################
###define the model:

numpy.random import seed
seed(1)

def create_model(optimizer=’rmsprop’):
model = Sequential()
model.add(LSTM(50, activation=’relu’, return_sequences=True))
model.add(LSTM(50, activation=’relu’))
model.add(Dense(1))

model.compile(loss=’mse’,optimizer = optimizer)

return model

clf = KerasRegressor(build_fn=create_model,epochs = 500,callbacks=[tf.keras.callbacks.EarlyStopping( patience=10)])

param_grid = {
‘clf__optimizer’ : [‘adam’,’rmsprop’],
‘clf__batch_size’ : [500,45,77]
}

pipeline = Pipeline([
(‘clf’,clf)
])

from sklearn.model_selection import TimeSeriesSplit, GridSearchCV

tscv = TimeSeriesSplit(n_splits=5)

grid = GridSearchCV(pipeline, cv=tscv,param_grid=param_grid,return_train_score=True,verbose=10,
scoring = ‘neg_mean_squared_error’)

grid.fit(Xtrain2,ytrain.values)

grid.cv_results_

#####################################################################

Reply
- Jason Brownlee June 22, 2020 at 6:15 am #
  
  Generally early stopping is not used with cross-validation, it gets messy.
  
  This can help you use it manually:
  https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/
  
  Reply
Abinash June 22, 2020 at 9:29 pm #

Sir, instead of ‘accuracy’ I want to use ‘AUC’ in the metrics in model.compile because I have highly imbalanced data. But I am unable to do that as it throws an error: ‘Could not pickle the task to send it to the workers’. How to solve this??

Reply
- Jason Brownlee June 23, 2020 at 6:19 am #
  
  You might need to run the grid search manually with for loops and calculate AUC using the sklearn function manually.
  
  Reply
Abinash June 24, 2020 at 2:29 am #

Sir, is there any way of using any classification metrics like ‘Recall’, ‘Precision’, TruePositives’ etc. in the place of ‘accuracy’ in the metrics of model.compile in the code corresponding to the section ‘How to Tune Batch Size and Number of Epochs’ of this page? I just do not want to use ‘accuracy’ because of the dataset I am using. If ‘accuracy’ can be used then why not ‘Recall’, I just do not get it. Help needed.

Reply
- Jason Brownlee June 24, 2020 at 6:36 am #
  
  Yes, you can specify them directly as a list of metrics.
  
  Reply
  - Ronald May 1, 2021 at 5:20 am #
    
    Hi Jason,
    
    If you select multiple metrics, how can you view them when reviewing the results? I tried to look through the cv_results_ dictionary but it always seems to only store the accuracy metric (which is the first metric in my list).
    
    Reply
    - Jason Brownlee May 1, 2021 at 6:11 am #
      
      I would expect all metrics to be stored in there, I’m suprised.
      
      Reply
Marco July 9, 2020 at 12:51 am #

Hi, many thanks for your post. My question is:

I am trying to grid search almost all hyper-parameters at once:

def create_model(optimizer, activation, dropout_rate, neurons, init_mode):
# create model
model = Sequential()
model.add(Dense(neurons=neurons, input_dim=8, kernel_initializer=init_mode, activation=activation)) #neuron activation
model.add(Dropout(dropout_rate=dropout_rate))
model.add(Dense(1, kernel_initializer=init_mode, activation=’sigmoid’))
# Compile model
model.compile(loss=’binary_crossentropy’, optimizer=optimizer, metrics=[‘accuracy’])
return model

… load data…
… create model…

batch_size = [5, 10]
epochs = [1, 2]
optimizer = [‘SGD’, ‘Adam’]
##learning_rate = [0.01, 0.1] ## NON MI FA METTERE LR E MOMENTUM QUANDO C’è DROPOUT
##momentum = [0.2, 0.4]
activation = [‘softmax’, ‘relu’]
dropout_rate = [0.1, 0.2]
neurons = [5, 10]
init_mode = [‘uniform’, ‘normal’]

param_grid = dict(batch_size=batch_size,
epochs=epochs,
optimizer=optimizer,
activation=activation,
dropout_rate=dropout_rate,
neurons=neurons,
init_mode=init_mode)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)

But I get this error: TypeError: __init__() missing 1 required positional argument: ‘units’

Do you know what could be the reason and possible solution? Or if I am doing anything wrong?

Thank you!
Marco

Reply
- Jason Brownlee July 9, 2020 at 6:41 am #
  
  You may need to debug your code:
  https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
  
  Reply
Jack July 23, 2020 at 7:34 am #

Hy Jason, very insightful tutorial. I have customised your Grid Search example to test for almost all hyperparameters and I have fed a dataset of 300000+ data points. The problem is that the process is extremely slow and I am using GPU nvidia on centOS 7 through SSH connetion. I have two questions in regards:

1. I am wondering if there might be a technique to accelerate the grid search process?

2. Would it make sense perhaps to decrease the size of my dataset to test for hyperparametrs optimisation? or it would make no sense at all because my model will then implement a different (way larger) dataset compared to the one used for Grid search?

Thanks Jason,
Jack

Reply
- Jason Brownlee July 23, 2020 at 2:39 pm #
  
  Some ideas:
  – Test fewer hyperparameters
  – Use a faster machine
  – Use a smaller dataset
  
  It’s a trade off of your expected improvement from searching and the time/resources you want to spend.
  
  Reply
Srikar July 24, 2020 at 4:27 pm #

Hey jason,

Thank you very much for tutorial. I am using a dataset of only 400 points for the optimization of my lstm model and when I try to grid fit my data using the grid, my error score gets printed out as NAN.What could be the issue? Also, have you scaled your data in the tutorial or have you fed it as it is?

Also, instead of the binary_crossentropy error, I am using the means squared error for my model. Kindly do let me know. I am trying to optimize my number of epochs and batch size first.

Reply
- Jason Brownlee July 25, 2020 at 6:12 am #
  
  You’re welcome.
  
  Perhaps the data has a nan? – check your data.
  Perhaps the gradients are exploding (overflow)? – try gradient clipping and scaling the data.
  Perhaps the gradient are disappearing (underflow)? – use relu and scale your data.
  
  Cross entropy loss is appropriate for classification tasks:
  https://machinelearningmastery.com/cross-entropy-for-machine-learning/
  
  Reply
Ella July 27, 2020 at 1:11 am #

Hi Jason, this below is just your code put into two functions.

I am wondering, why it does not work? it says that activation is not defined! but when I run it as you made it (with create_model function and rest of the code outside) it works amazingly.

def create_model(activation=’relu’):
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer=’uniform’, activation=activation))
model.add(Dense(1, kernel_initializer=’uniform’, activation=’sigmoid’))
# Compile model
model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
return model

deft keras_model(model):
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt(“pima-indians-diabetes.csv”, delimiter=”,”)
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=10, verbose=0)
# define the grid search parameters
activation = [‘softmax’, ‘softplus’, ‘softsign’, ‘relu’, ‘tanh’, ‘sigmoid’, ‘hard_sigmoid’, ‘linear’]
param_grid = dict(activation=activation)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)
# summarize results
print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_[‘mean_test_score’]
stds = grid_result.cv_results_[‘std_test_score’]
params = grid_result.cv_results_[‘params’]
for mean, stdev, param in zip(means, stds, params):
print(“%f (%f) with: %r” % (mean, stdev, param))

Reply
- Jason Brownlee July 27, 2020 at 5:48 am #
  
  I’m sorry to hear that you’re having trouble, this may help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - Ella July 27, 2020 at 7:04 am #
    
    The code works fine! its just not working when framed as a unique function or two (as in the example above).
    
    Reply
Isaac August 10, 2020 at 5:16 am #

Hi Jason, thank you for this amazing tutorial, it has been super useful!

I have a question: I am trying to tune GaussianNoise(gn) and LeakyRelu(alpha) in my lstm model, but I I get the errors that both gn and alpha are not legal parameters.

Do you know if they are supported by GridSearchCV? Because for all other hyperparameters it works perfectly!

Thank you!
Isaac

Reply
- Jason Brownlee August 10, 2020 at 5:58 am #
  
  Sorry to hear you’re having trouble. I would expect that grid search is not concerned about the specific layers used in your model.
  
  Reply
Michael August 19, 2020 at 1:37 am #

Thank you for this great post. This will certainly help me make better deep learning models.

Is it possible to use this same procedure with a TimeSeriesGenerator?

In particular, when I try to fit the model using fit_generator(), I get an error saying that gridsearchcv object has no attribute ‘fit_generator’.

And I’m not sure what my x and y variables would be if I were to use .fit() because my dataset is a one-variable time series.

Reply
- Jason Brownlee August 19, 2020 at 6:04 am #
  
  You’re welcome.
  
  No, I’d recommend grid searching time series models manually, see this example:
  https://machinelearningmastery.com/how-to-grid-search-deep-learning-models-for-time-series-forecasting/
  
  Reply
GRIGORIY SOKOLOV August 28, 2020 at 3:22 am #

Very useful, thank you very much.

Reply
- Jason Brownlee August 28, 2020 at 6:53 am #
  
  You’re welcome.
  
  Reply
Abhi Bhagat September 6, 2020 at 8:08 pm #

Can you please explain what ” random seed ” actually does ?

Reply
- Jason Brownlee September 7, 2020 at 8:29 am #
  
  Yes, you can learn more about pseudo random number generators here:
  https://machinelearningmastery.com/introduction-to-random-number-generators-for-machine-learning/
  
  In this case, it controls the stochastic nature of the evaluation procedure, you can learn more here:
  https://machinelearningmastery.com/different-results-each-time-in-machine-learning/
  
  Reply
Jessy September 20, 2020 at 4:18 pm #

hi jason,
can i use feature selection technique with lstm? Dataset contain 50 features. My question is (feature selection +LSTM) and then perform classification. Does LSTM do feature engineering process on its own.

Reply
- Jason Brownlee September 21, 2020 at 8:07 am #
  
  Yes, you can use a step-wise procedure or RFE-like procedure to see what combination of input time series result in the best performing model.
  
  Yes, LSTM will perform automatic feature extraction.
  
  Reply
George September 22, 2020 at 4:58 pm #

Hi Jason,
While fitting with GridSearchCV can we include validation set, which is better, using it or not using it

grid_result = grid.fit(X_train, y_train,validation_data=(X_valid, y_valid))
or
grid_result = grid.fit(X_train, y_train)
Thanks

Reply
- Jason Brownlee September 23, 2020 at 6:35 am #
  
  It is better to draw the validation set from the train set in each fold.
  
  i.e. I think it’s better to write the grid search manually if you want to use a validation set.
  
  Reply
George September 23, 2020 at 9:21 am #

Thanks Jason,
We can try sklearn.model_selection.PredefinedSplit
also, this is very useful
https://machinelearningmastery.com/train-to-the-test-set-in-machine-learning/

Reply
- Jason Brownlee September 23, 2020 at 1:43 pm #
  
  Nice.
  
  Reply
Jessy October 6, 2020 at 1:50 pm #

hi jason,
I have a doubt that is ..my research problem is sequence classification (sleep stage and EEG eye state). can i use simulated annealing technique to select subset of features and then passing into lstm…( is that correct) or lstm can itselef select the required features for sequence prediction.. i have little confusion in feature engineering in deep learning…

Reply
- Jason Brownlee October 6, 2020 at 1:59 pm #
  
  Perhaps try it and compare the results to using all features.
  
  Reply
Parijat October 22, 2020 at 8:22 pm #

Hi Jason,

Thanks for such an elaborate explanation. I am facing a problem: my epochs and batch_size combination are not changing.

params = {‘epochs’:[100,150],’batch_size’:[16,32]}

Problem snap:

lr=0.01,n1=26,n2=13,p=0.2,activation1=sigmoid,activation2=linear
epochs=100,batch_size=16
lr=0.01,n1=26,n2=13,p=0.2,activation1=sigmoid,activation2=linear
epochs=100,batch_size=16

————————————–

The code:

def build_regressor(batch_size,epochs,n1=26,n2=13,p=0.2,lr=1e-02,activation1=’sigmoid’,activation2=’linear’):
print(f’lr={lr},n1={n1},n2={n2},p={p},activation1={activation1},activation2={activation2}’)
print(f’epochs={epochs},batch_size={batch_size}’)
model = Sequential([
Dense(n1, activation=activation1, input_shape=(x_train.shape[1],)),
Dropout(p),
Dense(n2, activation=activation1),
Dropout(p),
Dense(1, activation=activation2),#elu
])

optimizer=Adam(lr=lr)
model.compile(optimizer=optimizer,loss=’mse’,metrics=[‘mse’]) #mean_squared_error
return model

model=KerasRegressor(build_fn= build_regressor,verbose=-1)
params = {‘epochs’:[100,150],’batch_size’:[16,32]}
model=GridSearchCV(estimator=model,param_grid=params,cv=5)
hist=model.fit(x_train,y_train,verbose=0)

As the params combination is changing, the prog is running in an infinite loop. I am not able to find any syntax error.

Reply
- Jason Brownlee October 23, 2020 at 6:07 am #
  
  Sorry to hear that, perhaps some of these tips will help:
  https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
  
  Reply
George November 13, 2020 at 10:06 am #

Hi Jason,
Question on tuning number of neurons in hidden layer, under that headings you have given
In this example, we will look at tuning the number of neurons in a single hidden layer.
Does the single hidden layer means
Input Layer plus One Hidden Layer OR Only the Input Layer
As per the model, it is a tuning for Input Layer, please correct me, Thanks
model = Sequential() model.add(Dense(neurons, input_dim=8, kernel_initializer='uniform', activation='linear', kernel_constraint=maxnorm(4))) model.add(Dropout(0.2)) model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))

Reply
- George November 13, 2020 at 10:12 am #
  
  The Confusion for me is,
  First i tuned for no of neurons as per above model,
  Next i tune for number of hidden layers as
  model.add(Dense(neurons_1L, ....)) model.add(Dropout(rate=dropout_rate)) for i in range(int(hidden_layers)): model.add(Dense(neurons_1L, ....)) model.add(Dropout(dropout_rate)) model.add(Dense(1, kernel_initializer=init_mode, activation='sigmoid'))
  If i get 2 hidden layers, should i put
  Input Layer plus 2 Hidden Layers (which is 3 Layers) or Only 2 Layers
  Thanks
  
  Reply
- Jason Brownlee November 13, 2020 at 10:28 am #
  
  The first hidden layer and visible layer are defined on one line.
  
  We are tuning the number of nodes in the first hidden layer, not the visible layer.
  
  Reply
So so Slowly November 13, 2020 at 12:44 pm #

My running code is as follows and although it takes about 8 9, it does not give very slow results, both when I use gpu and when I use cpu. What should I do?

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import Dropout
from keras.layers import LSTM
from numpy import mean
from numpy import std
from keras.utils import to_categorical
from sklearn.model_selection import GridSearchCV
from tensorflow import keras
from tensorflow.keras import layers
from keras.optimizers import Adam
from keras.optimizers import SGD
from keras.layers import Dense
from keras.layers import Dropout
from keras.wrappers.scikit_learn import KerasClassifier
from keras.constraints import maxnorm

def create_model(dropout_rate=0.0):
verbose, epochs, batch_size = 0, 40, 5
n_timesteps, n_features,n_outputs = trainX.shape[1],trainX.shape[2], testY.shape[1]
model = Sequential()
model.add(LSTM(100, input_shape=(n_timesteps,n_features)))
model.add(Dropout(dropout_rate))
model.add(Dense(100, activation=’relu’))
model.add(Dense(n_outputs, activation=’softmax’))
#opt = keras.optimizers.SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
opt = keras.optimizers.Adam(learning_rate= 0.000001)
model.compile(loss=’categorical_crossentropy’,optimizer = opt,metrics=[‘accuracy’])
return model

# run an experiment
def run_experiment():
print((trainX).shape,(trainY).shape,(testX).shape, (testY).shape)
model = KerasClassifier(build_fn=create_model, epochs=40, batch_size=5, verbose=0)
dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
param_grid = dict(dropout_rate=dropout_rate)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=3)
grid_result = grid.fit(trainX, trainY)
print(“a”)
print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_[‘mean_test_score’]
stds = grid_result.cv_results_[‘std_test_score’]
params = grid_result.cv_results_[‘params’]
for mean, stdev, param in zip(means, stds, params):
print(“%f (%f) with: %r” % (mean, stdev, param))

run_experiment()

Reply
- So so Slowly November 13, 2020 at 12:45 pm #
  
  * 8/ 9 hours
  
  Reply
- Jason Brownlee November 13, 2020 at 12:49 pm #
  
  Here are some ideas:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-speed-up-the-training-of-my-model
  
  Reply
Erwan Delh November 14, 2020 at 3:48 am #

Hello,

Thank you for this post. I knew the key knowledges about grid-search optimisation, but I didn’t know yet how to implement it.

I’ve got a few question, regarding a really specific application of ML that I’m doing (I’ve already made researchs, but found few answers) :

– is it possible, instead of making a kfold for the CV, to give a CVset directly? (I do Human Activity Recognition, so I need to think at a subject scale, and not a data scale, in order to be confident in my ability to generalise) –> I have the impression, that I can do so, by assigning a list of indexes, for which, I want my data to be part of the CVset or not)

– is it possible to give as input, a tensorflow datasets?

thank you for your answers

Reply
- Jason Brownlee November 14, 2020 at 6:37 am #
  
  Sure, you can design the test harness any way you like.
  
  Reply
Prajna November 20, 2020 at 9:24 pm #

Thanks Jason for the post. But if we are using only GridSearchCV, can we calculate Precision, Recall and F1 Score for the CNN model once hyper parameter tuning is done……?..if yes, please let me know how to do that….like i have not used train_test_split…… i only used GridSearchCV…..

Reply
- Jason Brownlee November 21, 2020 at 6:39 am #
  
  Yes, see this tutorial:
  https://machinelearningmastery.com/how-to-calculate-precision-recall-f1-and-more-for-deep-learning-models/
  
  Reply
Fatima November 23, 2020 at 1:46 am #

Hi Jason,
Thanks a lot for this amazing tutorial, it helps me a lot to start applying deep learning.
I have a data set that contains 2738 rows, with 26 columns (Colum Number 26 is the target variable), I want to apply Deep Neural Network (DNN) to do the classification (to predict a binary class label)
My question is how to tune the number of Neurons in each hidden layer, is it dependent on the size of the dataset? Do you have a way that will be helpful to tune the number of Neurons in each hidden layer?

Thanks for your help!

Reply
- Jason Brownlee November 23, 2020 at 6:18 am #
  
  You’re welcome.
  
  Good question, perhaps use a little trial and error and see what works good enough, then try tuning that.
  
  Reply
snehal November 25, 2020 at 6:44 pm #

hello @Jason can we create grid search for numbers of hidden layers? I tried to create but it didn’t work.

Reply
- Jason Brownlee November 26, 2020 at 6:30 am #
  
  Yes. Perhaps try adapting the above example.
  
  Reply
progammer newbie December 3, 2020 at 12:13 am #

Hi Jason,

thanks very much for your precious work. I’ll definitely buy your e-book set to support you and learn more!

Our Institute offers additional processing power, since we have many programmers here. Do you think that it should be possible to do the optimization with all hyperparameters at once?

Reply
- Jason Brownlee December 3, 2020 at 8:19 am #
  
  You’re welcome.
  
  Yes, if you have the resources, tune everything at once.
  
  Reply
Abdullah December 16, 2020 at 1:40 am #

Amazing explanation Jason, I am thank you so much for that. Just I have one question if we would how many hidden layers we use can we get it?

Reply
- Jason Brownlee December 16, 2020 at 7:52 am #
  
  Thanks.
  
  I recommend testing different numbers of hidden layers and discover what works well/best for your model and dataset.
  
  Reply
Carlos Castro December 17, 2020 at 12:12 pm #

Thank you Jason, I have a question. How do you set the proportion for training and test? I need a 75 25 rate. Thank you!!

Reply
- Jason Brownlee December 17, 2020 at 1:00 pm #
  
  You’re welcome!
  
  Tough question. It depends on the data, you want both train and test to be “representative” of the problem.
  
  Start with 50/50, evaluate, then try more aggressive splits like 70/30 etc and compare results (variance of repeated evals).
  
  Reply
Xerxes December 25, 2020 at 12:37 pm #

Hello,

Is it possible to do this using data from ImageDataGenerator in the grid.fit?

Reply
- Jason Brownlee December 26, 2020 at 5:06 am #
  
  I don’t think so.
  
  Reply
Saransh Gupta December 29, 2020 at 3:52 pm #

Thanks Jason, it is a very helpful tutorial

Reply
- Jason Brownlee December 30, 2020 at 6:33 am #
  
  You’re welcome.
  
  Reply
Greg December 31, 2020 at 8:58 am #

How to handle multiple inputs with the GridSearch. I can handle one input one output for keras model, but then when I have a model with 2 inputs, things go badly wrong. I get something like:

AssertionError: Could not compute output Tensor(“dense_9/Sigmoid:0”, shape=(None, 1), dtype=float32)

Reply
- Jason Brownlee December 31, 2020 at 9:27 am #
  
  You may need to write your own for loop to enumerate the configurations to test.
  
  Reply
  - Greg December 31, 2020 at 11:16 pm #
    
    Looks like GridSearch is still like that. I can perform the same functionality with Keras Tuner for multiple inputs, but the reason why I was trying to use GridSeach directly from scikit-learn is that I want something that plays nicely with dask and form what I can tell dask does not handle Keras Tuner but it does handle the normal GridSearch from Keras.
    
    Reply
Helene January 7, 2021 at 9:12 pm #

Thanks a lot for this excellent post!

I am working on a regression problem using keras.wrappers.scikit_learn.KerasRegressor. I don’t completely understand the grid_result.best_score_ in this case: is it a mean gap in % between the prediction and the real solution?

I also noticed in the source code of GridSearchCV function that version 0.22 changed the cv default value from 3-fold to 5-fold.

Reply
- Jason Brownlee January 8, 2021 at 5:44 am #
  
  You’re welcome.
  
  Best score is the largest mean score across the CV folds for the corresponding configration.
  
  Reply
Ritesh February 4, 2021 at 6:25 pm #

Hi Jason,

Thanks for this amazing post!

I am stuck here, could you please share any idea to solve the below issues:

1. Facing when mentioned the optimizer
optimizers = [‘SGD’, ‘RMSprop’, ‘Adagrad’, ‘Adadelta’, ‘Adam’, ‘Adamax’, ‘Nadam’]
param_grid = dict(optimizer=optimizers)

ValueError: optimizer is not a legal parameter

2. Facing when mentioned the learn_rate and momentum
learn_rate = [0.001, 0.01, 0.1]
momentum = [0.0, 0.2, 0.4]
param_grid = dict(learn_rate=learn_rate, momentum=momentum)

ValueError: learn_rate is not a legal parameter

Reply
- Jason Brownlee February 5, 2021 at 5:36 am #
  
  Thanks!
  
  Sorry to hear that you’re having trouble, perhaps these tips will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
sanglok February 5, 2021 at 4:42 pm #

Thanks for the great post.

I have a question regarding the GridSearchCV outputs.

I am using KerasRegressor

I can see the outputs with ‘loss’ and ‘mse’
however, this loss is training loss or validation loss?

If the loss is the training loss, how can I show the validation loss in the GridSearchCV?

Reply
- Jason Brownlee February 6, 2021 at 5:44 am #
  
  GridSearchCV won’t return model loss, it will report the cross-validation performance for your model on your chosen metric.
  
  Reply
Sonika Jha February 16, 2021 at 1:24 am #

Say for example we’re searching learning rate using grid search, then in that case each model would be trained till the last epoch(say 200) and if any model happens to overfit till it reaches 200th epoch, then the final validation loss would be low even, though the model might have performed really well early on (lets say during 100th epoch). So I believe early stopping should be mandatory in doing better grid search.But going by the google searches on this it seems it’s very uncommon to use Grid search+early stopping.Please share your thoughts on this.

Reply
- Jason Brownlee February 16, 2021 at 6:08 am #
  
  It can be a good idea.
  
  Reply
Marek March 2, 2021 at 9:50 pm #

I am wondering is function GridSearchCV evaluating best parameter setting on some internally split test size or it trains and test on whole dataset which is passed in? If the latter, how is it actually useful please?

Thanks!

Reply
- Jason Brownlee March 3, 2021 at 5:35 am #
  
  Yes, internally it uses k-fold cv, and if you want you can specify the cross-validation procedure via the “cv” argument.
  
  Reply
ali March 4, 2021 at 12:01 am #

Hi Jason, and thank you for this amazing tutorial!

only have one question. is there any way to use grid search or bayesian optimization for Functional API Models?
because this method doesn’t work in this case.

I wonder what is the best way to tune hyperparameters for functional models?

thanks a lot!

Reply
- Jason Brownlee March 4, 2021 at 5:50 am #
  
  You’re welcome.
  
  You would use grid search or bayes optimization, but not both. They are two different solutions to the same problem.
  
  Reply
Imran Khan March 16, 2021 at 5:31 pm #

Hi..
It was a very good experience in reading this article, I gained a lot.
Thank you for this wonderful post.
I was looking for one more thing in this post i.e. How to optimize the number of layers.

Reply
- Jason Brownlee March 17, 2021 at 6:01 am #
  
  Thanks!
  
  You can write a for-loop and iterate over different numbers of layers to try in your model.
  
  Reply
Imran Khan March 16, 2021 at 5:41 pm #

Hi.
I have a few queries regarding tuning the model by RandomizedSearchCV or GridSearchCV:
1. What is the difference between metrics and scores with respect to RandomizedSearchCV or GridSearchCV.

2.What metrics and score we should use for regression problems while performing RandomizedSearchCV or GridSearchCV.

3.RandomizedSearchCV saves a lot of time, but how far the selected parameters from RandomizedSearchCV are best. I mean authentication of it being the best among the lot.

Reply
- Jason Brownlee March 17, 2021 at 6:02 am #
  
  The difference in scores is due to the difference in the search algorithms.
  
  You must choose a metric that best captures what is important to you and your project stakeholders about a final model.
  
  We cannot know the best parametres or how long they might take to locate, instead, we only only run a search for as long as we have resources available.
  
  Reply
JG April 20, 2021 at 7:19 pm #

Hi Jason,

An old but very useful tutorial about Sklearn GridSearchCV() module. Thank you.

Because Keras Wrapper via KerasRegressor and KerasClassifier try to take advantage of all these stuff coming from Sklearn:
KFold(), cross_val_score (), GridSearchCV(), Pipeline(),…

My questions are:

1º) I see that also Sklearn is advancing to neural networks implementation with MLPRegressor and MLPClassifier by its own …so probably you do not need more KerasRegressor and KerasClassifier because direct integration of MLP neural models coming from Sklearn… what do you think?

2º) do you have any post to see the scope of these sklearn steps ? such as for example trying to include a more advance ANN such as convolutional (CNN), recurrent (LSTM) …

3º) or these Sklearn steps has not any meaning, because keras or tensorflow libraries can develop o wrapper better this modules relating to GridSearcCV, KFold, cross_val_score, …that currently they do not have as their own?

regards,

Reply
- Jason Brownlee April 21, 2021 at 5:55 am #
  
  Perhaps. Keras is still a better lib for custom neural nets I believe.
  
  No, CNNs and LSTMs don’t play nice with sklearn given the structure of the input for the models.
  
  sklearn still offers useful data transforms and metrics I believe.
  
  Reply
JG April 21, 2021 at 5:52 pm #

thanks

Reply
- Jason Brownlee April 22, 2021 at 5:37 am #
  
  You’re welcome.
  
  Reply
alex May 22, 2021 at 4:27 pm #

…. How can you find the best dropout & weight when not comparing training with validation result ?????? I think there is a mistake here no ?

Reply
- Jason Brownlee May 23, 2021 at 5:23 am #
  
  You can find a good drop out rate with trial and error.
  
  Reply
Negin May 26, 2021 at 5:25 am #

Hello,

If we have weighted loss and we need to find the suitable weight for each part of the loss function using GridSearchCV, is it possible? how can I do it?

Reply
- Jason Brownlee May 26, 2021 at 5:57 am #
  
  I don’t know.
  
  Reply
Irfan Tariq May 31, 2021 at 7:46 am #

KerasClassifier doesn’t support sample_weight.

i want to use adaboost classifier to boost my LSTM model but I am getting the following error. how I can solve this problem.

Reply
- Jason Brownlee June 1, 2021 at 5:26 am #
  
  Perhaps you can use the Keras API directly?
  
  Reply
sara July 17, 2021 at 7:15 pm #

Thanks Jason,

Do you know if this GridSearch is applicable for data with shape like this (samples, steps, features)

When I tried it out, I got the following error:

ValueError: Found array with dim 3. Estimator expected <= 2.

Did I mess something ?

Reply
- Jason Brownlee July 18, 2021 at 5:21 am #
  
  No, you may have to write custom code, e.g. some for loops.
  
  Reply
Mahdi Arjomandazar July 20, 2021 at 6:39 pm #

Dear Jason,
I have adopted your code for my dataset which comprises 8 variables. Since It’s a regression analysis, I have changed the pre-defined model compile hyperparameter loss criteria to ‘mean_squared_error’.
Nonetheless, it yields lower accuracy than expected (~0.002).
Therefore, I’ve also changed the metrics to ‘mse, cosine’ which doesn’t tend to be compatible with the KerasClassifier structure.

Kindly assist me how to tune the code for regression.
All the best,
Mahdi

Reply
- Jason Brownlee July 21, 2021 at 5:44 am #
  
  You can adapt it for regression by first choosing a regression algorithm to tune, then configure the model for regression with MSE loss, a linear activation function in the output layer and remove the accuracy metric.
  
  This may help:
  https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/
  
  Reply
Jack August 23, 2021 at 8:15 am #

Dear Jason,

I’ve been trying to crack this one for a while and reading your code and articles have really helped.

I’ve tried to combine some of the examples above, but they they fail very quickly, declaring that “All estimators failed to fit” (I’ve put my code below).

Is there an underlying reason why (apart from compute resource) these can’t work together?

Kind regards,
Jack

# Make a scorer
mse_scorer = make_scorer( mean_squared_error, greater_is_better = False )

# Walk-forward validation: How many splits
tss_cv = TimeSeriesSplit(n_splits = 3)

LSTM_random_grid = {
‘lstm_units’: [5, 6]
, ‘activation’ : [‘relu’,’tanh’]
, ‘batch_size’: [50, 100]
, ‘epochs’: [1,2]
}

# Function to create and returns a Keras model
def build_model(activation=’relu’, lstm_units=150):
# Design network
model = tf.keras.models.Sequential()
# Add LSTM hidden layer
model.add(tf.keras.layers.LSTM(lstm_units, input_shape=(train_X_nn.shape[1],train_X_nn.shape[2])))
# Add the dense layer
model.add(tf.keras.layers.Dense(train_y.shape[1], activation=activation))
# Compile model
model.compile(loss=’mean_squared_error’, optimizer=’adam’, metrics = [‘mse’])
return model

# Create model
model = KerasRegressor(build_fn=build_model, verbose=10)

# Prepare the search
LSTM_random_search = RandomizedSearchCV(
estimator=model
, param_distributions = LSTM_random_grid
, n_iter = 20 # E.g. For n_iter=20, fitting 5 folds for each of 20 candidates, totals 100 fits
, cv=tss_cv
, verbose = 10
, scoring = mse_scorer
, random_state = seedVal
, n_jobs = -1 #-1 to use all cores
, pre_dispatch = 2 # Limit the number of jobs despatched in parallel
)

# Best fit
LSTM_random_result = LSTM_random_search.fit(traintest_X_nn, traintest_y.values.ravel())

Reply
- Adrian Tam August 24, 2021 at 8:18 am #
  
  Because GridSearchCV() expects the scoring function to go up when it improves. Usually something failing fast is due to set up error.
  
  Reply
Victor Soeby November 12, 2021 at 10:22 pm #

Hi.

I have followed this amazing post in my project, where i have a model composed of LSTM layers. Here i’ve used the KerasRegressor in pretty much the same way you use the classifier.
I currently have a gridsearch running at 14/16 avaliable cores on my PC, and it has been going for two days straight. I’m interested in getting the best performance, which i’ve read can be done using the GPU when working with Keras models (I have a 1070 Ti).

Do you have any links or information regarding how one would implement GPU utilization in this grid search example of yours? I am completely new to implementing GPU’s, as this is my first Keras project, since i ‘graduated’ from Scikit Learn.

Reply
- Adrian Tam November 14, 2021 at 2:25 pm #
  
  Unfortunately, probably you can’t do it with GPU. You can run ONE model with GPU on many data (but your machine learning library needs to support GPU, such as Tensorflow) but GPU as a SIMD machine, it inherently cannot run two different programs parallel.
  
  Reply
Murilo Souza December 7, 2021 at 9:13 pm #

I am getting the following error when trying to tune the activation function:

ValueError: Invalid parameter activation for estimator KerasClassifier.
This issue can likely be resolved by setting this parameter in the KerasClassifier constructor:
KerasClassifier(activation=relu)
Check the list of available parameters with estimator.get_params().keys()

This is happening only when trying to tune activation functions. Doesn’t matter wich activation function i use there, it always throw this error.

Reply
- Adrian Tam December 8, 2021 at 8:06 am #
  
  What did you set in the parameter?
  
  Reply
Adnan Abid February 1, 2022 at 5:21 pm #

Hi Jason,

A great tutorial indeed!

Can you please let me know if we can tune all the parameters collectively
i.e. we run a single experiment while providing different variations of all the above parameters in [.., .., …]
All different parameters mean a separate list for
batch size […..]
optmization algorithm […..]
activation function […..]
learning rate […..]
no. of neurons [….]
dropout [….]
epochs [….]

This can result into a large number of experiments.
Or should fix few parameters and then find the optimal for the others?

Reply
- James Carmichael February 2, 2022 at 10:20 am #
  
  Hello Adnan…For a larger number of parameters, I would suggest a Bayesian approach:
  
  https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f
  
  Reply
Mohamad Darouich February 24, 2022 at 12:25 am #

Hi Jason,

A great tutorial indeed, thank you very much!

I tried to use KerasRegressor with the GridSearchCv instead of KerasClssifier.
but unfortunetly this didn’t work, i became the following error:

FileNotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ram://aea24a7a-9df6-4443-ac2d-d6011f6fa569/variables/variables
You may be trying to load on a different device from the computational device. Consider setting the experimental_io_device option in tf.saved_model.LoadOptions to the io_device such as ‘/job:localhost’.

i’ll be thankful, if yoi could tell me how i fix this problem?!

Reply
- James Carmichael February 24, 2022 at 12:52 pm #
  
  Hi Mohamad,
  
  Thanks for asking.
  
  I’m eager to help, but I just don’t have the capacity to debug code for you.
  
  I am happy to make some suggestions:
  
  Consider aggressively cutting the code back to the minimum required. This will help you isolate the problem and focus on it.
  Consider cutting the problem back to just one or a few simple examples.
  Consider finding other similar code examples that do work and slowly modify them to meet your needs. This might expose your misstep.
  Consider posting your question and code to StackOverflow.
  
  Reply
Jessica March 15, 2022 at 2:42 am #

Dear Jason and team. I have 2 questions:

1. Is it a good practice to obtain “mean_train_score” and “mean_test_score” to know if the model evaluated in “GridSearchCV” suffers from overfitting or underfitting? Is it good practice to also get the model evaluation metric using “GridSearchCV”?

2. AFTER performing “GridSearchCV” it is necessary to create a model in the traditional way (example: rfr = RandomForestRegressor.set_params(**optimal_params); rfr.fit(X, y); rfr.predict(X_pred)) considering ALL the data (different to what is done in “GridSearchCV”, where the data was divided into train and test, and therefore the data for train consisted of k-1 folds) as TRAIN data and the optimal parameters obtained with GridSearchCV?

Thank you so much!!! Your information is fantastic, thank you for making it freely available.

Reply
Ayenew March 17, 2022 at 6:20 am #

It’s interesting and helpful tutorial. I think there are some changes on the KerasClassifier class. First it’s now deprecated and is separated from the main keras library. Also there are some changes on how to use it.

Reply
Tyler July 5, 2022 at 12:51 pm #

Hi

Almost the same blocks of code for tuning multiple params. Is that just for educational purposes or a limitation?

Using D.R.Y, can we not tune all of the parameters at the same time with GridSearch like we can with other ML models?

If not, what are the params we can technically tune together in a single code-block?

Thanks for the tutorial.

Reply
- James Carmichael July 6, 2022 at 3:11 am #
  
  Hi Tyler…For educational purposes.
  
  Reply
Tyler July 5, 2022 at 1:37 pm #

Also, in your GridSearch for neurons you have

model = Sequential() model.add(Dense(neurons, input_shape=(X_train.shape[1],), kernel_initializer='uniform', activation='sigmoid', kernel_constraint=MaxNorm(4))) model.add(Dropout(0.2)) model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))

This is for regression it seems. For my binary classification should I change the first layer to activation='sigmoid'? Why do you have linear in the first layer and sigmoid in the third layer? And should I remove model.add(Dropout(0.2))?

Thanks

Reply
Gian Paolo July 8, 2022 at 3:36 am #

When migrating from tf.keras.wrappers.scikit_learn to Scikeras as per your latest update, I have installed the SciKeras package in my env.
My kernel dies after importing Tensorflow in my notebook.
Can you tell how you installed Scikeras to make it work in your code?

Thanks

Reply
- James Carmichael July 8, 2022 at 5:49 am #
  
  Hi Gian…Please clarify exactly what you mean by “kernel dies” so that we may better assist you.
  
  Reply
Gian Paolo July 8, 2022 at 12:20 pm #

It is a pop up message from the jupyter notebook. I run my scripts with that.
– I pip install scikeras and tensorfllow
– when I execute the line ‘ import tensorflow in my jupiter notebook, the pop up message appears saying the kernel has died and automatically will restart, but it does not.
– a red box appears on the upper right cornet of jupyter notebook reading dead kernel
– same behaviour when I shutdown and restart the kernel and repeat the command

I am using a M1 mac with an Anaconda installation.

Reply
Tarun July 12, 2022 at 7:55 pm #

Dear Jason,

As always, excellent post. I have a query. Can I use CV if I have a Time Series ? Since in Time Series, CV doesn’t work.

Please let me know.

Regards
Tarun

Reply
jackie July 20, 2022 at 3:10 am #

Hi Jason,

Thanks for your tutorials, they are very helpful !

I used the grid.fit(x,y) to tune the batchsize and epoch for my CNN. One strange thing I encountered is that, when I used all my data, the ‘mean_test_score’ is only 30%.
But if I split data and only use train set for grid.fit(x_train, y_train), then my accuracy is over 70%, even if I use 0.001 to split the train test set(meaning my trainset is 99.9%)

I can’t figure out the reason for this, I have tried multiple times and it always gave me 30% when using all samples, and whenever I use part of data it always gave over 70% accuracy.

Thanks!

Reply
- James Carmichael July 20, 2022 at 9:10 am #
  
  Hi jackie…I would recommend first understanding whether your model is under or overfitting:
  
  https://machinelearningmastery.com/diagnose-overfitting-underfitting-lstm-models/
  
  Reply
Richard July 24, 2022 at 3:40 am #

Hi Jason,
This is an excellent article and resource.
I have a very large data set (900,000 rows, 10 columns).
The grid search method take all available CPU and memory on my machine.
I’ve tried using AWS sagemaker instead, but tensorflow > 2.7 is not supported for scikeras.
I’d like to use sequential ANN.
Any tips on how i can get this to work for me?
Thanks

Reply
- James Carmichael July 24, 2022 at 9:32 am #
  
  Hi Richard…You may want to consider some other options as presented in the following resource:
  
  https://www.kdnuggets.com/2020/05/hyperparameter-optimization-machine-learning-models.html
  
  Also, you may want to consider Google Colab Pro with its GPU option and additional memory allocation.
  
  Reply
Larissa Benevides September 30, 2022 at 12:09 pm #

Hello Jason,

Great tutorials, very didactic and explanatory. Anyone interested and with little knowledge can understand your tutorials. Very good!

Reply
- James Carmichael October 1, 2022 at 6:54 am #
  
  You are very welcome Larissa! We greatly appreciate your feedback and support!
  
  Reply
Ugur October 2, 2022 at 5:30 am #

create_model function call need ()

Reply
- James Carmichael October 2, 2022 at 8:11 am #
  
  Thank you for the feedback Ugur!
  
  Reply
christian October 26, 2022 at 12:05 am #

If we use a sample of our dataset for tuning ,the parameters obtained will still work well with the whole dataset?
for example tuning the number of neurons, we know number of neurons depends on the size of our data so the number of neurons obtained with the sample will not suit well on the whole dataset. Please clarify on that point.
Thanks

Reply
- James Carmichael October 26, 2022 at 7:10 am #
  
  Hi christian…The number of neurons for the “input layer” is established by the training dataset size, however the “hidden layers” may contain any number of neurons.
  
  Reply
christian October 26, 2022 at 12:21 am #

Suppose we tune the number of neurons in the hidden layers and we get 50 , how do we decide if we use 1, 2 or 3 hidden layers? and let’s say we decide to have 2 hidden layers ,will each layer have 50 neurons?

Reply
- James Carmichael October 26, 2022 at 7:08 am #
  
  Hi christian…You may find the following resource helpful:
  
  https://towardsdatascience.com/beginners-ask-how-many-hidden-layers-neurons-to-use-in-artificial-neural-networks-51466afa0d3e
  
  Reply
Johnny B November 17, 2022 at 10:13 am #

Hello –
This is a great post, thanks for making it. I have a couple of questions I’m hoping you can answer.

1.) Is there a way to see which data points are used as the training data points for each k-fold?

2.) Could we just put this: tf.random.set_seed(7), inside of the create_model() function to get more consistent results? Either way, is there an issue/problem with doing so?

3.) When this code is executed:
model = KerasClassifier(model=create_model, verbose=0)
Is there a reason why it is create_model instead of create_model()?

4.) In the following two lines of code:
model = KerasClassifier(model=create_model, verbose=0)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
Is there a reason why we set the KerasClassifier equal to the variable ‘model’, couldn’t that just be any variable name? Just wondering if there is a reason because it gets a little confusing since the ‘model’ inside of KerasClassifier is asking for a model name too, or is this just the convention?

Thanks!
Johnny

Reply
- James Carmichael November 17, 2022 at 12:15 pm #
  
  Hi Johnny…The following resource may be of interest:
  
  http://man.hubwiz.com/docset/TensorFlow.docset/Contents/Resources/Documents/api_docs/python/tf/keras/wrappers/scikit_learn/KerasClassifier.html
  
  Reply
Johnny B November 18, 2022 at 5:04 am #

Hi James –
That link was helpful, but didn’t really answer any of my questions, but I might have missed something. Thanks anyways. Still a great post!

Reply
Jim November 25, 2022 at 7:19 am #

Hi, Jason. Thank you for this tutorial. I’m getting a couple of errors that I can’t solve:

1: I’ve tried installing scikeras with conda install conda-forge scikeras and pip install scikeras[tensorflow] --user but I’m still getting a ModuleNotFoundError: No module named 'scikeras' error.

2: I also get NameError: name '_read' is not defined when running `dataset = np.loadtxt(“pima-indians-diabetes.csv”, delimiter = ‘,’)

Any ideas on how to resolve these errors?

Reply
- James Carmichael November 25, 2022 at 9:15 am #
  
  Hi Jim…You may want to try your code in Google Colab until you can resolve the installation issue.
  
  Reply
zen November 28, 2022 at 4:17 am #

is it good to use grid search , random search or bayes search , iam using currently now RNN gru arhitecture model

Reply
- James Carmichael November 29, 2022 at 9:37 am #
  
  Hi Zen…My recommendation would be investigate bayesian methods for this case.
  
  Reply
Sara December 14, 2022 at 11:34 pm #

Hello, thank you for your nice and informative website.
I learned that by using the train-test-split and without explicitly mentioning any validation test, we still have not any data leakage if we use GridSearchCV for hyper tuning.
My question is how I can still prevent data leakage just by having a train and test sets and no explicit validation test if I want to ensemble the classifiers.

Reply
- James Carmichael December 15, 2022 at 9:28 am #
  
  Hi Sara…The following resource may be of interest to you:
  
  https://machinelearningmastery.com/data-preparation-without-data-leakage/
  
  Reply
AGGELOS PAPOUTSIS January 16, 2023 at 4:20 am #

Hi Jason,

Why do you apply the grid fit procedure on X ,y and do not train test split and apply to the X_train, y_train ?

Thanks

Reply
Sue October 26, 2023 at 2:29 am #

Hi, I can’t run this coding both in Kaggle and my laptop Python 3.11.3. Sorry to ask this but I am a newbies in Python. May I know is it possible for Mr Jason/ Mr James to show this in Kaggle?

Reply
- James Carmichael October 26, 2023 at 10:48 am #
  
  Hi Sue…Please clarify the issue you are having with running the code. This will better enable us to guide you.
  
  Reply
Filbe January 4, 2024 at 10:19 pm #

Hello!
This post is awesome it literally covers anything I wanted to learn, however I get errors when I try to replicate the learning rate optimization.

This is my test setup:

optimizer = tf.keras.optimizers.Adam()

simple_model_wrapper = KerasRegressor(model_OG_trained,optimizer= optimizer,metrics=[‘mae’],verbose=False)

#Define hyperparameters
batch_size = [2,4,6,8,10]
learning_rate = scipy.stats.uniform()
epochs = np.random.randint(10,51)
loss = [‘mse’,dual_loss] #dual_loss is a custom loss object
param_distributions = dict(optimizer__learning_rate=learning_rate,batch_size=batch_size, epochs=epochs,loss=loss)

simple_model_grid = RandomizedSearchCV(estimator=simple_model_wrapper, param_distributions=param_distributions)

And when calling fit I get this issue:
> 51 model.optimizer.build(model.trainable_variables)
52 return model

AttributeError: ‘NoneType’ object has no attribute ‘build’ Could you recommend some way of fixing this, or more info so I can debug please?

Reply
- James Carmichael January 5, 2024 at 11:28 am #
  
  Hi Filbe…You are very welcome! You may find the following discussion of interest.
  
  https://stackoverflow.com/questions/8949252/why-do-i-get-attributeerror-nonetype-object-has-no-attribute-something
  
  Reply
Edie March 19, 2024 at 2:10 am #

I am trying to run the first version of the example on how to tune the training optimization algorithm. Everything works fine on Colab (with Python 3.10.2, Keras 2.15, Scikit 1.2.2 and tensorflow 2.15.0). When I have tried to run the same code on my laptop (Macbook Pro M1 Max with Python 3.11.7, Keras 3.0.5, Scikit 1.4.1.post1 and tensorflow 2.16.1) I have the following error that arise when the instruction grid.fit is executed:
ValueError: Could not interpret metric identifier: loss

I have tried many solution but apparently nothing works. I would appreciate any help. Thanks in advance.

Reply
- James Carmichael March 19, 2024 at 8:14 am #
  
  Hi Edie…did you copy and paste the code or type it? Also…keep the following in mind:
  
  The error ValueError: Could not interpret metric identifier: loss typically occurs in the context of using machine learning libraries like Keras (TensorFlow) when configuring the model for training. This error suggests there’s an issue with how the loss function or a metric is specified. Here are some common reasons and solutions for this error:
  
  ### Common Causes and Solutions
  
  1. **Incorrectly Specifying the Loss Function**: Ensure that the loss function is correctly specified when compiling the model. For example, use 'binary_crossentropy' for binary classification problems, 'categorical_crossentropy' for multi-class classification problems, etc.
  
  python model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
  
  2. **Incorrectly Specifying Metrics**: If you intended to use ‘loss’ as a metric, ensure that you are using it correctly. Generally, ‘loss’ is not directly specified in the metrics argument since it is inherently a part of the training process. If you want to monitor the loss during training, just ensure that the loss argument is correctly set. The training process automatically includes loss as a metric to monitor.
  
  3. **Custom Loss Function or Metric Not Defined Properly**: If you’re using a custom loss function or metric, ensure it is correctly defined. A custom loss function should take the true labels and predicted labels as arguments and return a loss value. After defining it, you can pass it directly to the loss or metrics argument by its function name.
  
  python def custom_loss_function(y_true, y_pred): # Calculate and return loss return loss_value
  model.compile(optimizer='adam', loss=custom_loss_function, metrics=['accuracy'])
  
  4. **Typographical Error**: Check for typographical errors in specifying the loss function or metrics. This includes incorrect spelling or using unsupported values in the context of your specific library version.
  
  5. **Library Version Mismatch**: Ensure that the syntax and identifiers you are using are compatible with the version of the library (like TensorFlow or Keras) you are working with. Sometimes, the way to specify loss functions or metrics changes between versions.
  
  6. **Misplaced Argument**: Make sure that the loss argument is correctly placed within the compile() method of your model, and not mistakenly placed within the fit() method or any other method.
  
  ### General Advice
  
  – **Consult Documentation**: Always consult the official documentation of the library you are using (e.g., TensorFlow, Keras) for the correct syntax and available options for loss functions and metrics.
  – **Update Libraries**: If you suspect a version mismatch issue, consider updating your machine learning library to the latest version, but ensure your code is compatible with the update.
  
  If these solutions don’t resolve your issue, it might help to review the exact context in which the error occurs, including how you’ve defined and compiled your model.
  
  Reply

Navigation

How to Grid Search Hyperparameters for Deep Learning Models in Python with Keras

Overview

How to Use Keras Models in scikit-learn

How to Use Grid Search in scikit-learn

Problem Description

Note on Parallelizing Grid Search

Need help with Deep Learning in Python?

How to Tune Batch Size and Number of Epochs

How to Tune the Training Optimization Algorithm

How to Tune Learning Rate and Momentum

How to Tune Network Weight Initialization

How to Tune the Neuron Activation Function

How to Tune Dropout Regularization

How to Tune the Number of Neurons in the Hidden Layer

Tips for Hyperparameter Optimization

Summary

More On This Topic

815 Responses to How to Grid Search Hyperparameters for Deep Learning Models in Python with Keras

Leave a Reply Click here to cancel reply.