Binary Classification Tutorial with the Keras Deep Learning Library

By Jason Brownlee on August 5, 2022 in Deep Learning 198

Keras is a Python library for deep learning that wraps the efficient numerical libraries TensorFlow and Theano.

Keras allows you to quickly and simply design and train neural networks and deep learning models.

In this post, you will discover how to effectively use the Keras library in your machine learning project by working through a binary classification project step-by-step.

After completing this tutorial, you will know:

How to load training data and make it available to Keras
How to design and train a neural network for tabular data
How to evaluate the performance of a neural network model in Keras on unseen data
How to perform data preparation to improve skill when using neural networks
How to tune the topology and configuration of neural networks in Keras

Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Jun/2016: First published
Update Oct/2016: Updated for Keras 1.1.0 and scikit-learn v0.18.
Update Mar/2017: Updated for Keras 2.0.2, TensorFlow 1.0.1 and Theano 0.9.0.
Update Sep/2019: Updated for Keras 2.2.5 API.
Update Jul/2022: Update for TensorFlow 2.x syntax

Binary classification worked example with the Keras deep learning library
Photo by Mattia Merlo, some rights reserved.

1. Description of the Dataset

The dataset you will use in this tutorial is the Sonar dataset.

This is a dataset that describes sonar chirp returns bouncing off different services. The 60 input variables are the strength of the returns at different angles. It is a binary classification problem that requires a model to differentiate rocks from metal cylinders.

You can learn more about this dataset on the UCI Machine Learning repository. You can download the dataset for free and place it in your working directory with the filename sonar.csv.

It is a well-understood dataset. All the variables are continuous and generally in the range of 0 to 1. The output variable is a string “M” for mine and “R” for rock, which will need to be converted to integers 1 and 0.

A benefit of using this dataset is that it is a standard benchmark problem. This means that we have some idea of the expected skill of a good model. Using cross-validation, a neural network should be able to achieve a performance of around 84% with an upper bound on accuracy for custom models at around 88%.

Need help with Deep Learning in Python?

Take my free 2-week email course and discover MLPs, CNNs and LSTMs (with code).

Click to sign-up now and also get a free PDF Ebook version of the course.

2. Baseline Neural Network Model Performance

Let’s create a baseline model and result for this problem.

You will start by importing all the classes and functions you will need.

import pandas as pd
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
...

import pandas as pd

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from scikeras.wrappers import KerasClassifier

from sklearn.model_selection import cross_val_score

from sklearn.preprocessing import LabelEncoder

from sklearn.model_selection import StratifiedKFold

from sklearn.preprocessing import StandardScaler

from sklearn.pipeline import Pipeline

...

Now, you can load the dataset using pandas and split the columns into 60 input variables (X) and one output variable (Y). Use pandas to load the data because it easily handles strings (the output variable), whereas attempting to load the data directly using NumPy would be more difficult.

...
# load dataset
dataframe = pd.read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]

...

# load dataset

dataframe = pd.read_csv("sonar.csv", header=None)

dataset = dataframe.values

# split into input (X) and output (Y) variables

X = dataset[:,0:60].astype(float)

Y = dataset[:,60]

The output variable is string values. You must convert them into integer values 0 and 1.

You can do this using the LabelEncoder class from scikit-learn. This class will model the encoding required using the entire dataset via the fit() function, then apply the encoding to create a new output variable using the transform() function.

...
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

...

# encode class values as integers

encoder = LabelEncoder()

encoder.fit(Y)

encoded_Y = encoder.transform(Y)

You are now ready to create your neural network model using Keras.

You will use scikit-learn to evaluate the model using stratified k-fold cross validation. This is a resampling technique that will provide an estimate of the performance of the model. It does this by splitting the data into k-parts and training the model on all parts except one, which is held out as a test set to evaluate the performance of the model. This process is repeated k-times, and the average score across all constructed models is used as a robust estimate of performance. It is stratified, meaning that it will look at the output values and attempt to balance the number of instances that belong to each class in the k-splits of the data.

To use Keras models with scikit-learn, you must use the KerasClassifier wrapper from the SciKeras module. This class takes a function that creates and returns our neural network model. It also takes arguments that it will pass along to the call to fit(), such as the number of epochs and the batch size.

Let’s start by defining the function that creates your baseline model. Your model will have a single, fully connected hidden layer with the same number of neurons as input variables. This is a good default starting point when creating neural networks.

The weights are initialized using a small Gaussian random number. The Rectifier activation function is used. The output layer contains a single neuron in order to make predictions. It uses the sigmoid activation function in order to produce a probability output in the range of 0 to 1 that can easily and automatically be converted to crisp class values.

Finally, you will use the logarithmic loss function (binary_crossentropy) during training, the preferred loss function for binary classification problems. The model also uses the efficient Adam optimization algorithm for gradient descent, and accuracy metrics will be collected when the model is trained.

# baseline model
def create_baseline():
	# create model
	model = Sequential()
	model.add(Dense(60, input_shape=(60,), activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model

# baseline model

def create_baseline():

# create model

model = Sequential()

model.add(Dense(60, input_shape=(60,), activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# Compile model

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

Now, it is time to evaluate this model using stratified cross validation in the scikit-learn framework.

Pass the number of training epochs to the KerasClassifier, again using reasonable default values. Verbose output is also turned off, given that the model will be created ten times for the 10-fold cross validation being performed.

...
# evaluate model with standardized dataset
estimator = KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(estimator, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

...

# evaluate model with standardized dataset

estimator = KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)

kfold = StratifiedKFold(n_splits=10, shuffle=True)

results = cross_val_score(estimator, X, encoded_Y, cv=kfold)

print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

After tying this together, the complete example is listed below.

# Binary Classification with Sonar Dataset: Baseline
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# baseline model
def create_baseline():
	# create model
	model = Sequential()
	model.add(Dense(60, input_shape=(60,), activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
# evaluate model with standardized dataset
estimator = KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(estimator, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

# Binary Classification with Sonar Dataset: Baseline

from pandas import read_csv

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from scikeras.wrappers import KerasClassifier

from sklearn.model_selection import cross_val_score

from sklearn.preprocessing import LabelEncoder

from sklearn.model_selection import StratifiedKFold

# load dataset

dataframe = read_csv("sonar.csv", header=None)

dataset = dataframe.values

# split into input (X) and output (Y) variables

X = dataset[:,0:60].astype(float)

Y = dataset[:,60]

# encode class values as integers

encoder = LabelEncoder()

encoder.fit(Y)

encoded_Y = encoder.transform(Y)

# baseline model

def create_baseline():

# create model

model = Sequential()

model.add(Dense(60, input_shape=(60,), activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# Compile model

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

# evaluate model with standardized dataset

estimator = KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)

kfold = StratifiedKFold(n_splits=10, shuffle=True)

results = cross_val_score(estimator, X, encoded_Y, cv=kfold)

print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running this code produces the following output showing the mean and standard deviation of the estimated accuracy of the model on unseen data.

Baseline: 81.68% (7.26%)

1	Baseline: 81.68% (7.26%)

This is an excellent score without doing any hard work.

3. Re-Run the Baseline Model with Data Preparation

It is a good practice to prepare your data before modeling.

Neural network models are especially suitable for having consistent input values, both in scale and distribution.

Standardization is an effective data preparation scheme for tabular data when building neural network models. This is where the data is rescaled such that the mean value for each attribute is 0, and the standard deviation is 1. This preserves Gaussian and Gaussian-like distributions while normalizing the central tendencies for each attribute.

You can use scikit-learn to perform the standardization of your sonar dataset using the StandardScaler class.

Rather than performing the standardization on the entire dataset, it is good practice to train the standardization procedure on the training data within the pass of a cross-validation run and use the trained standardization to prepare the “unseen” test fold. This makes standardization a step in model preparation in the cross-validation process. It prevents the algorithm from having knowledge of “unseen” data during evaluation, knowledge that might be passed from the data preparation scheme like a crisper distribution.

You can achieve this in scikit-learn using a Pipeline. The pipeline is a wrapper that executes one or more models within a pass of the cross-validation procedure. Here, you can define a pipeline with the StandardScaler followed by your neural network model.

...
# evaluate baseline model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

...

# evaluate baseline model with standardized dataset

estimators = []

estimators.append(('standardize', StandardScaler()))

estimators.append(('mlp', KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)))

pipeline = Pipeline(estimators)

kfold = StratifiedKFold(n_splits=10, shuffle=True)

results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)

print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

After tying this together, the complete example is listed below.

# Binary Classification with Sonar Dataset: Standardized
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# baseline model
def create_baseline():
	# create model
	model = Sequential()
	model.add(Dense(60, input_shape=(60,), activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
# evaluate baseline model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

# Binary Classification with Sonar Dataset: Standardized

from pandas import read_csv

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from scikeras.wrappers import KerasClassifier

from sklearn.model_selection import cross_val_score

from sklearn.preprocessing import LabelEncoder

from sklearn.model_selection import StratifiedKFold

from sklearn.preprocessing import StandardScaler

from sklearn.pipeline import Pipeline

# load dataset

dataframe = read_csv("sonar.csv", header=None)

dataset = dataframe.values

# split into input (X) and output (Y) variables

X = dataset[:,0:60].astype(float)

Y = dataset[:,60]

# encode class values as integers

encoder = LabelEncoder()

encoder.fit(Y)

encoded_Y = encoder.transform(Y)

# baseline model

def create_baseline():

# create model

model = Sequential()

model.add(Dense(60, input_shape=(60,), activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# Compile model

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

# evaluate baseline model with standardized dataset

estimators = []

estimators.append(('standardize', StandardScaler()))

estimators.append(('mlp', KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)))

pipeline = Pipeline(estimators)

kfold = StratifiedKFold(n_splits=10, shuffle=True)

results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)

print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Running this example provides the results below.

You now see a small but very nice lift in the mean accuracy.

Standardized: 84.56% (5.74%)

1	Standardized: 84.56% (5.74%)

4. Tuning Layers and Number of Neurons in the Model

There are many things to tune on a neural network, such as weight initialization, activation functions, optimization procedure, and so on.

One aspect that may have an outsized effect is the structure of the network itself, called the network topology. In this section, you will look at two experiments on the structure of the network: making it smaller and making it larger.

These are good experiments to perform when tuning a neural network on your problem.

4.1. Evaluate a Smaller Network

Note that there is likely a lot of redundancy in the input variables for this problem.

The data describes the same signal from different angles. Perhaps some of those angles are more relevant than others. So you can force a type of feature extraction by the network by restricting the representational space in the first hidden layer.

In this experiment, you will take your baseline model with 60 neurons in the hidden layer and reduce it by half to 30. This will pressure the network during training to pick out the most important structure in the input data to model.

You will also standardize the data as in the previous experiment with data preparation and try to take advantage of the slight lift in performance.

...
# smaller model
def create_smaller():
	# create model
	model = Sequential()
	model.add(Dense(30, input_shape=(60,), activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_smaller, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Smaller: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

...

# smaller model

def create_smaller():

# create model

model = Sequential()

model.add(Dense(30, input_shape=(60,), activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# Compile model

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

estimators = []

estimators.append(('standardize', StandardScaler()))

estimators.append(('mlp', KerasClassifier(model=create_smaller, epochs=100, batch_size=5, verbose=0)))

pipeline = Pipeline(estimators)

kfold = StratifiedKFold(n_splits=10, shuffle=True)

results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)

print("Smaller: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

After tying this together, the complete example is listed below.

# Binary Classification with Sonar Dataset: Standardized Smaller
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# smaller model
def create_smaller():
	# create model
	model = Sequential()
	model.add(Dense(30, input_shape=(60,), activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_smaller, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Smaller: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

# Binary Classification with Sonar Dataset: Standardized Smaller

from pandas import read_csv

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from scikeras.wrappers import KerasClassifier

from sklearn.model_selection import cross_val_score

from sklearn.preprocessing import LabelEncoder

from sklearn.model_selection import StratifiedKFold

from sklearn.preprocessing import StandardScaler

from sklearn.pipeline import Pipeline

# load dataset

dataframe = read_csv("sonar.csv", header=None)

dataset = dataframe.values

# split into input (X) and output (Y) variables

X = dataset[:,0:60].astype(float)

Y = dataset[:,60]

# encode class values as integers

encoder = LabelEncoder()

encoder.fit(Y)

encoded_Y = encoder.transform(Y)

# smaller model

def create_smaller():

# create model

model = Sequential()

model.add(Dense(30, input_shape=(60,), activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# Compile model

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

estimators = []

estimators.append(('standardize', StandardScaler()))

estimators.append(('mlp', KerasClassifier(model=create_smaller, epochs=100, batch_size=5, verbose=0)))

pipeline = Pipeline(estimators)

kfold = StratifiedKFold(n_splits=10, shuffle=True)

results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)

print("Smaller: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Running this example provides the following result. You can see that you have a very slight boost in the mean estimated accuracy and an important reduction in the standard deviation (average spread) of the accuracy scores for the model.

This is a great result because you are doing slightly better with a network half the size, which, in turn, takes half the time to train.

Smaller: 86.04% (4.00%)

1	Smaller: 86.04% (4.00%)

4.2. Evaluate a Larger Network

A neural network topology with more layers offers more opportunities for the network to extract key features and recombine them in useful nonlinear ways.

You can easily evaluate whether adding more layers to the network improves the performance by making another small tweak to the function used to create our model. Here, you add one new layer (one line) to the network that introduces another hidden layer with 30 neurons after the first hidden layer.

Your network now has the topology:

60 inputs -> [60 -> 30] -> 1 output

1	60 inputs -> [60 -> 30] -> 1 output

The idea here is that the network is given the opportunity to model all input variables before being bottlenecked and forced to halve the representational capacity, much like you did in the experiment above with the smaller network.

Instead of squeezing the representation of the inputs themselves, you have an additional hidden layer to aid in the process.

...
# larger model
def create_larger():
	# create model
	model = Sequential()
	model.add(Dense(60, input_shape=(60,), activation='relu'))
	model.add(Dense(30, activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_larger, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Larger: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

...

# larger model

def create_larger():

# create model

model = Sequential()

model.add(Dense(60, input_shape=(60,), activation='relu'))

model.add(Dense(30, activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# Compile model

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

estimators = []

estimators.append(('standardize', StandardScaler()))

estimators.append(('mlp', KerasClassifier(model=create_larger, epochs=100, batch_size=5, verbose=0)))

pipeline = Pipeline(estimators)

kfold = StratifiedKFold(n_splits=10, shuffle=True)

results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)

print("Larger: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

After tying this together, the complete example is listed below.

# Binary Classification with Sonar Dataset: Standardized Larger
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# larger model
def create_larger():
	# create model
	model = Sequential()
	model.add(Dense(60, input_shape=(60,), activation='relu'))
	model.add(Dense(30, activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_larger, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Larger: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

# Binary Classification with Sonar Dataset: Standardized Larger

from pandas import read_csv

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from scikeras.wrappers import KerasClassifier

from sklearn.model_selection import cross_val_score

from sklearn.preprocessing import LabelEncoder

from sklearn.model_selection import StratifiedKFold

from sklearn.preprocessing import StandardScaler

from sklearn.pipeline import Pipeline

# load dataset

dataframe = read_csv("sonar.csv", header=None)

dataset = dataframe.values

# split into input (X) and output (Y) variables

X = dataset[:,0:60].astype(float)

Y = dataset[:,60]

# encode class values as integers

encoder = LabelEncoder()

encoder.fit(Y)

encoded_Y = encoder.transform(Y)

# larger model

def create_larger():

# create model

model = Sequential()

model.add(Dense(60, input_shape=(60,), activation='relu'))

model.add(Dense(30, activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# Compile model

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

estimators = []

estimators.append(('standardize', StandardScaler()))

estimators.append(('mlp', KerasClassifier(model=create_larger, epochs=100, batch_size=5, verbose=0)))

pipeline = Pipeline(estimators)

kfold = StratifiedKFold(n_splits=10, shuffle=True)

results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)

print("Larger: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Running this example produces the results below.

You can see that you do not get a lift in the model performance. This may be statistical noise or a sign that further training is needed.

Larger: 83.14% (4.52%)

1	Larger: 83.14% (4.52%)

With further tuning of aspects like the optimization algorithm and the number of training epochs, it is expected that further improvements are possible. What is the best score that you can achieve on this dataset?

Summary

In this post, you discovered the Keras deep Learning library in Python.

You learned how you can work through a binary classification problem step-by-step with Keras, specifically:

How to load and prepare data for use in Keras
How to create a baseline neural network model
How to evaluate a Keras model using scikit-learn and stratified k-fold cross validation
How data preparation schemes can lift the performance of your models
How experiments adjusting the network topology can lift model performance

Do you have any questions about deep learning with Keras or this post? Ask your questions in the comments, and I will do my best to answer.

198 Responses to Binary Classification Tutorial with the Keras Deep Learning Library

Matt June 15, 2016 at 12:21 pm #

Excellent post with straightforward examples. Thanks for posting Jason!

Reply
- Jason Brownlee June 15, 2016 at 1:41 pm #
  
  You’re very welcome Matt.
  
  Reply
Shanky SHarma July 11, 2016 at 4:11 pm #

Hi Jason,

How can we use a test dataset here, I am new to machine Learning and so far I have only come across k-fold methods for accuracy measurements, but I’d like to predict on a test set, can you share an example of that.

Thank you.

Reply
- Jason Brownlee July 12, 2016 at 5:24 am #
  
  Hi Shanky,
  
  There is an example of evaluating a neural network on a manual verification dataset while the model is being fit here:
  https://machinelearningmastery.com/evaluate-performance-deep-learning-models-keras/
  
  You can use the model.evaluate() function to evaluate your fit model on new data, there is an example at the end of this deep learning tutorial:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  You can learn more about test options for evaluating machine learning algorithms here:
  https://machinelearningmastery.com/how-to-choose-the-right-test-options-when-evaluating-machine-learning-algorithms/
  
  Reply
Paul July 12, 2016 at 9:08 am #

Hi Jason,

After following this tutorial successfully I started playing with the model to learn more.

Eventually I got to the point where I added model.predict inside the baseline.

However when I print back the predicted Ys they are scaled. Is there a way to use standard scalar and then get your prediction back to binary?

Thanks

Reply
- Jason Brownlee August 15, 2016 at 11:19 am #
  
  Hi Paul, I would advise you to scale your data before hand and keep the coefficients used to scale, then reuse them later to reverse the scaling of predictions.
  
  Reply
Cedric August 8, 2016 at 3:06 am #

Hi Jason,

great post! Very helpful introduction to binary classification in Keras.

I was wondering, how would one print the progress of the model training the way Keras usually does in this example particularly?

Reply
- Jason Brownlee August 8, 2016 at 5:49 am #
  
  Thanks Cedric.
  
  You can print progress with an epoch by setting verbose=1 in the call to model.fit(). You can just see progress across epochs by setting verbose=2 and turin off output with verbose=0.
  
  Progress is turned off here because we are using k-fold cross validation which results in so many more models being created and in turn very noisy output.
  
  Reply
Aakash Nain August 10, 2016 at 3:51 am #

Hello Jason,
Excellent tutorial. Consider a situation now. Suppose the data set loaded by you is the training set and the test set is given to you separately. I created the model as you described but now I want to predict the outcomes for test data and check the prediction score for the test data. How can I do that ?

Reply
- Jason Brownlee August 15, 2016 at 11:18 am #
  
  You can use model.predict() to make predictions and then compare the results to the known outcomes.
  
  This post provides an example of what you want:
  https://machinelearningmastery.com/5-step-life-cycle-neural-network-models-keras/
  
  Reply
Sally October 26, 2016 at 4:14 am #

Dear Jason,

Thanks for this excellent tutorial , may I ask you regarding this network model; to which deep learning models does it belong? is it Deep Belief Network, CNN, stacked auto-encoder or other?

Thanks in advance

Reply
- Jason Brownlee October 26, 2016 at 8:32 am #
  
  It is a deep neural network.
  
  Note that the DBN and autoencoders are generally no longer mainstream for classification problems like this example.
  
  CNN are state of the art and used with image data.
  
  I hope this helps.
  
  Reply
  - Partha Shankar Nayak September 28, 2018 at 3:21 pm #
    
    Hello Dr. Brownlee,
    
    I’m sorry that I don’t get your point on the statement “…DBN and autoencoders are generally no longer mainstream for classification problems…”. I read on paper where they have used DBN for prediction of success of movies. They mentioned that they used a 2-layer DBN that yielded best accuracy.
    
    Reply
    - Jason Brownlee September 29, 2018 at 6:32 am #
      
      Yes, my understanding is that CNNs are currently state of the art for text-classification.
      
      That does not stop new papers coming out on old methods.
      
      Reply
Sally November 7, 2016 at 11:33 pm #

Thanks Jason for you reply, I have another question regarding this example. How can I know the reduced features after making the network smaller as in section 4.1. you have obliged the network to reduce the features in the hidden layer from 60 to 30. how can I know which features are chosen after this step? also can I know the weight that each feature got in participation in the classification process?

Reply
- Jason Brownlee November 8, 2016 at 9:53 am #
  
  Hi Sally,
  
  The features are weighted, but the weighting is complex, because of the multiple layers. It would not be accurate to take just the input weights and use that to determine feature importance or which features are required.
  
  The hidden layer neurons are not the same as the input features, I hope that is clear. Perhaps I misunderstand your question and you can elaborate what you mean?
  
  Reply
Sally November 8, 2016 at 9:51 pm #

Hi Jason,

My case is as follows: I have something similar to your example. I have a deep Neural network with 11 features. I used a hidden layer to reduce the 11 features to 7 and then fed it to a binary classifier to classify the values to A class or B class. The first thing I need to know is that which 7 features of the 11 were chosen? can I have a way in the code to list them? the second thing I need to know is the average value for each feature in the case of classifying the record as class A or B. In more details; when feature 1 have an average value of 0.5 , feature 2 have average value of 0.2, feature 3 value of 0.3 ,,, etc. then the record is classified as class A. I need something like that; how can I have such value ?

Reply
- Jason Brownlee November 9, 2016 at 9:51 am #
  
  Hi Sally,
  
  The number of nodes in a hidden layer is not a subset of the input features. They are an entirely new nonlinear recombination of input data. You cannot list out which features the nodes in a hidden layer relate to, because they are new features that relate to all input features. Does that make sense?
  
  Reply
Sally November 9, 2016 at 6:20 pm #

Oh Yup!! I thought it is a kind of features selection that is done via the hidden layers!! so that if I need to make a feature selection I have to do it before creating the model. The second question that I did not get answer for it, is how can I measure the contribution of each feature at the prediction? in another words; how can I get the ” _features_importance_” . I tried to do it in the code but it is not applied to the “pipeline” model in line 16. Where can I use the function of “features_importance “to view each feature contribution in the prediction

Reply
- Jason Brownlee November 10, 2016 at 7:39 am #
  
  Hi Sally, you may be able to calculate feature importance using a neural net, I don’t know. You may have to research this question yourself sorry.
  
  Reply
Sally November 10, 2016 at 8:40 am #

I search it but unfortunately I did not get it 🙁 .. Thanks for your cooperation

Reply
Sunil Manikani November 16, 2016 at 10:47 pm #

Hi Jason,
Excellent tutorial indeed!!!

While using PyDev in eclipse I ran into trouble with following imports …

from keras.models import Sequential
from keras.layers import Dense

I downloaded latest keras-master from git and did
sudo python setup.py install because my latest PIP install of keras gave me import errors. [Had to remove it.]

Hope it helps someone. My two cents, contributing to your excellent post.

Thanks a ton! Once again.

Warm regards,
Sunil M

Reply
- Jason Brownlee November 17, 2016 at 9:53 am #
  
  Thanks for the note Sunil.
  
  I’m not an IDE user myself, command line all the way.
  
  Reply
Sally January 5, 2017 at 12:47 pm #

Dear Jason,

I have another question regarding this example. As you know; deep learning performs well with large data-sets and mostly overfitts with small data-sets. The dataset in this example have only 208 record, and the deep model achieved pretty good results. How can this meet the idea of deep learning with large datasets?

Reply
- Jason Brownlee January 6, 2017 at 9:04 am #
  
  Hi Sally,
  
  Don’t read too much into it. It is a demonstration of an MLP on a small binary classification problem.
  
  MLPs scale. If the problem was sufficiently complex and we had 1000x more data, the model performance would continue to improve.
  
  Reply
Sally January 7, 2017 at 8:12 am #

Thanks Jason for the reply, but could you please explain me how you find out that the data is 1000x ?? you have 208 record with 60 input value for each? did you multiply them to get this number? so that we can have the determine that a data is complex or not? another this could you help me by published articles that approve that MLP scale if the problem was complex?? Sorry for all these question but I am working on some thing relevant on my project and I need to prove and cite it

Reply
- Jason Brownlee January 7, 2017 at 8:42 am #
  
  Sorry, no, I meant if we had one thousand times the amount of data.
  
  Reply
Sidharth Kumar February 3, 2017 at 12:01 am #

In multiple category classification like MNIST we have 10 outputs for everyone of 0 to 9.
Why in binary classification we have only 1 output? We should have 2 outputs for each 0 and 1. Can you explain.

Reply
- Jason Brownlee February 3, 2017 at 10:01 am #
  
  Great question Sidharth.
  
  We can use two output neurons for binary classification.
  
  Alternatively, because there are only two outcomes, we can simplify and use a single output neuron with an activation function that outputs a binary response, like sigmoid or tanh.
  
  They are generally equivalent, although the simpler approach is preferred as there are fewer weights to train.
  
  Finally, you can one output neuron for a multi-class classification if you like and design a custom activation function or interpret a linear output value into the classes. This approach often does not capture sufficient complexity in the problem – e.g. like the network wanting to suggest an input may have potential membership in more than one class (a confusing input pattern) and it assumes an ordinal relationship between classes which is often invalid.
  
  Reply
  - Pablo March 18, 2017 at 3:02 am #
    
    I dont get it, how and where you do that.
    
    Do you use 1 output node and if the sigmoid output is =0.5) is considered class B ??
    
    Is that correct? Where in the code do you do that?
    
    Reply
- KUNDAN KUMAR September 26, 2019 at 12:35 am #
  
  What are you saying man if you have to test whether a bulb on or off for testing circuit rules, you have to test this with two different bulb or one is sufficient?
  
  Reply
SYKim February 6, 2017 at 10:47 pm #

Hi Jason. Thanks. Your tutorials are really helpful!

I made a small network(2-2-1) which fits XOR function.

I found that without numpy.random.seed(seed) accuracy results can vary much.

Sometimes it learns quickly but in most cases its accuracy just remain near 0.25, 0.50, 0.75 etc…

So I needed to try several times to find some proper seed value which leads to high accuracy.

Is it common to try several times with the same model until it succeeds?

Also there was a case where it’s trapped in the local optimum but after a long time it gets out of it and accuracy reach 1.0

What if there’s a very big network and it takes 2~3 weeks to train it?

Do people just start training and start it again if there is not much improvement for some time?

Do people run the same model with different initialization values on different machines?

Is there any method to know if its accuracy will go up after a week?

Thank you!

Reply
- Jason Brownlee February 7, 2017 at 10:16 am #
  
  Great questions, see this post on randomness and machine learning:
  https://machinelearningmastery.com/randomness-in-machine-learning/
  
  I hope that helps as a start.
  
  Reply
Mark February 8, 2017 at 4:26 pm #

Hi Jason. Thanks for the tutorial.

I want to implement autoencoder to do image similarity measurement. Cloud you please provide some tips/directions/suggestions to me how to figure this out ? Thanks

Reply
- Jason Brownlee February 9, 2017 at 7:21 am #
  
  Sorry, I do not have an example of using autoencoders.
  
  Reply
Chan February 9, 2017 at 10:13 pm #

Hi Brownlee:

How would I save and load the model of KerasRegressor.

estimator = KerasRegressor(…)

I use estimator.model.save(), it works,
but it should call estimator.fit(X, Y) first, or it would throw “no model” error.

Besides, I have no idea about how to load the model to estimator.

It is easier to use normal model of Keras to save/load model, while using Keras wrapper of scikit_learn to save/load model is more difficult for me.

Would you please tell me how to do this.
Thanks a lot.

Reply
- Jason Brownlee February 10, 2017 at 9:53 am #
  
  Hi Chan, you could try pickle?
  
  I find it easier to use KerasClassifier to explore models and tuning, and then using native Keras with save/load for larger models and finalizing the model.
  
  Reply
Emerson February 10, 2017 at 4:23 pm #

Awesome tutorial, one of the first I’ve been able to follow the entire way through.

I would love to see a tiny code snippet that uses this model to make an actual prediction. I figured it would be as easy as using estimator.predict(X[0]), but I’m getting errors about the shape of my data being incorrect (None, 60) vs (60, 1).

Reply
- Jason Brownlee February 11, 2017 at 4:54 am #
  
  I’m glad to hear it Emerson.
  
  Yes, you can make a prediction with:
  
  yhat = model.predict(X)
  
  1
  
  yhat = model.predict(X)
  
  You may need to reshape your data into a 2D array:
  
  data = data.reshape(60, 1)
  
  1
  
  data = data.reshape(60, 1)
  
  Reply
  - Carlos Castellanos July 3, 2017 at 7:24 am #
    
    Hi Jason, such an amazing post, congrats! I have some doubts regarding Emerson’s question and your answer.
    
    I want to separate cross-validation and prediction in different stages basically because they are executed in different moments, for that I will receive to receive a non-standardized input vector X with a single sample to predict. I was able to save the model using callbacks so it can be reused to predict but I’m a bit lost on how to standardize the input vector without loading the entire dataset before predicting, I was trying to pickle the pipeline state but nothing good came from that road, is this possible? do you have any example on how to do it? Thanks!
    
    Reply
    - Jason Brownlee July 6, 2017 at 9:56 am #
      
      To standardize all you need is the mean and standard deviation of the training data for each variable.
      
      Reply
Chris Cummins February 24, 2017 at 4:47 am #

Fantastic tutorial Jason, thank you. Here’s my Jupyter notebook of it: https://github.com/ChrisCummins/phd/blob/master/learn/keras/Sonar.ipynb

Reply
- Jason Brownlee February 24, 2017 at 10:12 am #
  
  Nice work Chris.
  
  Reply
Dmitri Levitin March 24, 2017 at 7:46 am #

I have a difficult question. I have google weekly search trends data for NASDAQ companies, over 2 year span, and I’m trying to classify if the stock goes up or down after the earnings based on the search trends, which leads to104 weeks or features. I ran this data and received no signal Results: 48.55% (4.48%).

However, in my non machine learning experiments i see signal. If i take the diffs (week n – week n+1), creating an array of 103 diffs. I then average out all the stocks that went up and average out all the stocks that went down. When i predict a new stock for the same 2 year time period, I compare in a voting like manner week n of new stock to week n of stocks labeled up, and labeled down. Whoever has more votes wins. In this simple method i do see signal.

Thoughts?

Reply
- Jason Brownlee March 24, 2017 at 8:02 am #
  
  Short term movements on the stock market are a random walk. The best you can do is a persistence forecast as far as I know.
  
  Reply
  - Dmitri Levitin March 24, 2017 at 8:15 am #
    
    But I’m not comparing movements of the stock, but its tendency to have an upward day or downward day after earnings, as the labeled data, and the google weekly search trends over the 2 year span becoming essentially the inputs for the neural network. So then it becomes a classification problem.
    
    As described above in the 2nd paragraph i see signal, based on taking the average of the weeks that go up after earnings vs ones that go down, and comparing the new week to those 2 averages. I’m just not sure how to interpret that into a neural network.
    
    BTW, awesome tutorial, i will follow all of your tutorials.
    
    Reply
    - Dmitri Levitin March 24, 2017 at 8:19 am #
      
      I meant to say i take the average of each week for all the labeled companies that go up after earnings creating an array of averages, and same for the companies that go down after earnings. I then compare the weeks of the new stock, over the same time period to each of the prior arrays. An i do see signal, but how to make that work with neural networks.
      
      Reply
Dmitri Levitin March 24, 2017 at 10:31 am #

Another question. Using this methodology but with a different set of data I’m getting accuracy improvement with each epoch run. But in the end i get Results: 52.64% (15.74%). Any idea why? I thought results were related to the average accuracy.

Epoch 1/10
0s – loss: 1.1388 – acc: 0.5130
Epoch 2/10
0s – loss: 0.6415 – acc: 0.6269
Epoch 3/10
0s – loss: 0.4489 – acc: 0.7565
Epoch 4/10
0s – loss: 0.3568 – acc: 0.8446
Epoch 5/10
0s – loss: 0.3007 – acc: 0.8808
Epoch 6/10
0s – loss: 0.2611 – acc: 0.9326
Epoch 7/10
0s – loss: 0.2260 – acc: 0.9430
Epoch 8/10
0s – loss: 0.1987 – acc: 0.9689
Epoch 9/10
0s – loss: 0.1771 – acc: 0.9741
Epoch 10/10
0s – loss: 0.1556 – acc: 0.9741

Results: 52.64% (15.74%)

Reply
- Jason Brownlee March 25, 2017 at 7:30 am #
  
  Perhaps the model is overfitting the training data?
  
  Consider slowing down learning with some regularization methods like dropout.
  
  Reply
Michael April 21, 2017 at 6:05 am #

Hi Jason,

Can this type of classifier (which described in this tutorial) can be used for ordinal classification (with binary classification)?

Thanks,

Reply
- Jason Brownlee April 21, 2017 at 8:43 am #
  
  I would use the network as is or phrase the problem as a regression problem and round results.
  
  Reply
Ahmad May 18, 2017 at 8:12 pm #

Hello Jason,

How can I save the pipelined model?
I mean in the past it was easy when we only implemented a model and we fit it …
but now how can I save this in order to load it and make predictions later on?

Reply
- Jason Brownlee May 19, 2017 at 8:17 am #
  
  I believe you cannot save the pipelined model.
  
  You must use the Keras API alone to save models to disk. At least as far as I know.
  
  Reply
  - Mik June 11, 2018 at 9:47 pm #
    
    Hi Jason,
    
    “You must use the Keras API alone to save models to disk” –> any chance you’d be willing to elaborate on what you mean by this, please? I’ve been trying to save the model from your example above using pickle, the json-method you explained here: https://machinelearningmastery.com/save-load-keras-deep-learning-models/ , as well the joblib method you explained here: https://machinelearningmastery.com/save-load-machine-learning-models-python-scikit-learn/ . However, none of them work. Pickle gives the following error:
    
    _pickle.PicklingError: Can’t pickle : attribute lookup module on builtins failed
    
    Using json gives this error:
    
    AttributeError: ‘Pipeline’ object has no attribute ‘to_json’
    
    … and for the joblib approach I get the error message
    
    TypeError: can’t pickle SwigPyObject objects
    
    I have tried googling the SwigPyObject for more info, but haven’t found anything useful. Any advice you’d be able to offer would be great.
    
    Thanks in advance.
    
    Reply
    - Jason Brownlee June 12, 2018 at 6:42 am #
      
      As far as I know, we cannot save a sklearn wrapped keras model. We must use the Keras API directly to save/load the model.
      
      Reply
Rob June 20, 2017 at 8:03 am #

Thanks a lot for this great post! I am trying to learn more about machine learning and your blog has been a huge help.

Any idea why I would be getting very different results if I train the model without k-fold cross validation? e.g. If I run

model = create_baseline() model.fit(X, encoded_Y, epochs=100, batch_size=5, validation_split=0.3)

It outputs a val_acc of around 0.38. But if I run your code using k-fold I am getting an accuracy of around 75%

Full code snippet is here https://gist.github.com/robianmcd/e94b4d393346b2d62f9ca2fcecb1cfdf

Any idea why this might be happening?

Reply
- Jason Brownlee June 21, 2017 at 8:07 am #
  
  Hi Rob, yes neural networks are stochastic. See this post:
  https://machinelearningmastery.com/randomness-in-machine-learning/
  
  See here for how to get a more robust estimate of neural network model skill:
  https://machinelearningmastery.com/evaluate-skill-deep-learning-models/
  
  Reply
  - Rob June 22, 2017 at 7:15 am #
    
    I ran it many times and I was consistently getting around 75% accuracy with k-fold and 35% without it. Turns out I wasn’t shuffling the array when I wasn’t using k-fold so the validation target set was almost all 1s and the training set was mostly 0s. I added numpy.random.shuffle(dataset) and it’s all good now.
    
    Reply
    - Jason Brownlee June 23, 2017 at 6:37 am #
      
      I’m glad to hear you got to the bottom of it Rob!
      
      Reply
Michael June 21, 2017 at 11:33 pm #

Hi Jason,

In this post you mentioned the ability of hidden layers with less neurons than the number of neurons in the previous layers to extract key features.
Is it possible to visualize or get list of these selected key features in Keras? (For exmaple, for networks with high number of features)?

Thanks,
Michael

Reply
- Jason Brownlee June 22, 2017 at 6:06 am #
  
  You may, I am not aware if an example sorry.
  
  Reply
joseph June 25, 2017 at 6:45 pm #

Hi I would love to see object location / segmentation network for identifying object locations and labeling them.

Reply
- Jason Brownlee June 26, 2017 at 6:07 am #
  
  Thanks for the suggestion joseph.
  
  Reply
Parth July 19, 2017 at 1:58 am #

Hi Jason, how do we know which structure is best for a neural network? Any resources you could point me to?( I don’t mind going through the math)

Reply
- Jason Brownlee July 19, 2017 at 8:27 am #
  
  Nope. There is no good theory for this.
  
  Copy other designs, use trial and error. Design robust experiments to test many structures.
  
  Reply
Fan Feng August 5, 2017 at 7:43 pm #

Thanks for your sharing.

Reply
- Jason Brownlee August 6, 2017 at 7:37 am #
  
  I’m glad it helped!
  
  Reply
Alex Mikhalev August 28, 2017 at 4:37 am #

Thank you for sharing, but it needs now a bit more discussion –
see http://www.cloudypoint.com/Tutorials/discussion/python-solved-can-i-send-callbacks-to-a-kerasclassifier/

Reply
- Jason Brownlee August 28, 2017 at 6:51 am #
  
  Glad to hear it.
  
  Reply
Valentin September 19, 2017 at 7:15 pm #

Hi Jason! Thanks so much for this very concise and easy to follow tutorial! One question: if you call native Keras model.fit(X,y) you can also supply validation_data, such that validation score is printed during training (if verbose=1). Do you know how to switch this feature on in the pipeline? sklearn creates the split automatically within the cross_val_score step, but how to pass this on to the Keras fit method…?

Thanks a lot!

Reply
- Jason Brownlee September 20, 2017 at 5:55 am #
  
  No and I would not recommend it. I think it would cause more problems.
  
  Reply
B G SINGH September 20, 2017 at 3:38 am #

Hi Jason,

Is there any way to use class_weight parameter in this code?

Thanks,
Biswa

Reply
- Jason Brownlee September 20, 2017 at 6:02 am #
  
  Yes, set class_weight in the fit() function.
  
  More help here:
  https://keras.io/models/sequential/
  
  Reply
  - Don December 6, 2017 at 1:45 pm #
    
    Thanks for the great post Jason!
    
    How to use class_weight when I use cross_val_score and I don’t use fit(), as you did in this post?
    
    Thanks,
    Don
    
    Reply
    - Jason Brownlee December 7, 2017 at 7:50 am #
      
      Sorry, I don’t have examples of using weighted classes.
      
      Reply
Luis Ernesto October 19, 2017 at 2:53 pm #

Good day interesting article. I am currently doing an investigation, it is a comparative study of three types of artificial neural network algorithms: multilayer perceptron, radial and recurrent neural networks. Well I already work the algorithms and I’m in training time, everything is fine until I start this stage unfortunately I can not generalize the network, and try changing parameters such as learning reason and number of iterations, but the result remains the same. The input data (dataset) that input are binary ie a pattern for example has (1,0,0,1,1,0,0,1,0,1,1,1) the last indicator being the desired output , I also noticed that when the weights converge and I use them in the validation stage, all the results are almost the same is as if there would be no difference in the patterns. Well now I am doing cross validation hoping to solve this problem or to realize what my error may be. I would appreciate your help or advice

Reply
- Jason Brownlee October 19, 2017 at 4:03 pm #
  
  Generally, I would recommend this process for evaluating your model:
  https://machinelearningmastery.com/evaluate-skill-deep-learning-models/
  
  Perhaps you can calculate some diagnostics like learning rate on a training and validation datasets?
  
  Reply
Vivaldi Gut November 8, 2017 at 5:17 am #

Hi Jason or anyone active here:

could you please advise on what would be considered good performance of binary classification regarding precision and recall? I have got:

class precision recall f1-score support

0 0.88 0.94 0.91 32438
1 0.80 0.66 0.72 11790

avg / total 0.86 0.86 0.86 44228
Accuracy: 0.864520213439

I wish to improve recall for class 1. Would appreciate if anyone can provide hints.

Thanks in advance.

Reply
- Jason Brownlee November 8, 2017 at 9:30 am #
  
  A “good” result is really problem dependent and relative to other algorithm performance on your problem.
  
  Reply
masoumeh November 24, 2017 at 7:53 pm #

Thanks Jason,

actually i have binary classification problem, i have written my code, just i can see the accuracy of my model, so if i want to see the output of my model what should i add to my code? i mean when it recieves 1 or 0 , at the end it shows to me that it is 1 or 0?

Reply
- Jason Brownlee November 25, 2017 at 10:17 am #
  
  You can make predictions with your final model as follows:
  
  X = ... yhat = model.predict(X)
  
  1
  2
  
  X = ...
  yhat = model.predict(X)
  
  Does that help?
  
  Reply
rakashi December 4, 2017 at 5:14 pm #

Hi,

I am trying to classify an image. I have used classifier as softmax, loss as categorical_crossentropy. while I am testing the model I am getting the probabilities but all probabilities is equal to 1. But I want to get the probability of classes independently. I have tried with sigmoid and loss as binary_crossentropy. here i am getting the accuracy 85% but its not giving the probabilities independently like clarifai website.

How do I can achieve? can you please suggest ?

Reply
- Jason Brownlee December 5, 2017 at 5:41 am #
  
  Perhaps check-out this tutorial:
  https://machinelearningmastery.com/object-recognition-convolutional-neural-networks-keras-deep-learning-library/
  
  Reply
Cody December 13, 2017 at 8:47 am #

Thanks for the great tutorial. I wanted to mention that for some newer versions of Keras the above code didn’t work correctly (due to changes in the Keras API).

The most notable change that took me a while to debug is that “nb_epoch=100” has to be changed to “epoch=100” or the cross validation steps will only go for 1 epoch resulting in poor model performance overall (~55% instead of 81%). Turns out that “nb_epoch” has been depreciated. Hope this comment helps someone.

Reply
- Jason Brownlee December 13, 2017 at 4:12 pm #
  
  It should work with Keras 2.1.2.
  
  Reply
  - Vincenzo February 7, 2018 at 12:36 am #
    
    @Cody is right, “b_epoch” has to be changed with “epochs”, otherwise it will be ignored, and the training will run just for 1 epoch for each fold (Keras 2.1.3)
    
    Reply
    - Jason Brownlee February 7, 2018 at 9:25 am #
      
      Fixed, thanks.
      
      Reply
Nandini February 8, 2018 at 5:19 pm #

My loss value keep on constant its not even decreasing after 4 epochs and accuracy not even increasing,which parameters i have update to tune the RNN binary classification probelm.

Please help in that .

model = Sequential()
model.add(LSTM(100, input_shape=(82, 1),activation=’relu’))
#model.add(Dense(60, input_dim=60, kernel_initializer=’normal’, activation=’relu’))
model.add((Dense(80,activation=’tanh’)))
model.add((Dense(40,activation=’tanh’)))
model.add((Dense(20,activation=’tanh’)))
model.add(Dense(1,activation=’sigmoid’))
model.compile(loss=’binary_crossentropy’, optimizer=’adam’,metrics=[“accuracy”])
#print(model.summary())

model.fit(trainX,trainY, nb_epoch=200, batch_size=4, verbose=2,shuffle=False)
Please suggest me in this scenario .

Reply
- Jason Brownlee February 9, 2018 at 9:00 am #
  
  Perhaps try training for longer, 100s of epochs.
  
  Perhaps spot check on less data.
  
  Here are more ideas to try:
  https://machinelearningmastery.com/improve-deep-learning-performance/
  
  Reply
  - Razvan July 8, 2022 at 2:24 pm #
    
    Anotyer marvelous post Jason 🙂 Bravo !
    
    Reply
nandini February 9, 2018 at 6:35 pm #

which optmizer is suitable for binary classification i am giving rmsprop .
i am having less no of samples with me.
can i train with more epochs and less batch size ,is it suitable to increase my accuracy of model.

Reply
- Jason Brownlee February 10, 2018 at 8:55 am #
  
  Perhaps try it and see.
  
  Reply
David February 28, 2018 at 3:13 am #

Hi Jason, when testing new samples with a trained binary classification model, do the new samples need to be scaled before feeding into the model? What if there is only one sample? Thanks David

Reply
- Jason Brownlee February 28, 2018 at 6:09 am #
  
  Yes, data must be prepared in exact same way. Even a single sample.
  
  Reply
Vatsal March 31, 2018 at 7:29 pm #

Hi Jason! It is really kind of you to contribute this article. Albeit how do I classify a new data set (60 features)? I think there is no code snippet for this. I mean really using the trained model now.

Reply
- Jason Brownlee April 1, 2018 at 5:48 am #
  
  Once you train your final model you can make predictions by calling model.predict(X).
  
  Perhaps this post will make it clearer:
  https://machinelearningmastery.com/train-final-machine-learning-model/
  
  Reply
Ciaran May 11, 2018 at 9:43 pm #

Thank you very much for this. This is an excellent introduction to Keras for me and I adapted this code in minutes without any problems. The explanation was perfect too. Much appreciated.

Reply
- Jason Brownlee May 12, 2018 at 6:32 am #
  
  I’m glad it helped.
  
  Reply
Yannis June 27, 2018 at 9:48 pm #

Great article, thanks!

Reply
- Jason Brownlee June 28, 2018 at 6:18 am #
  
  I’m glad it helped.
  
  Reply
youness mourtaji July 28, 2018 at 5:43 am #

Hi Jason,

Thank you very for the great tutorial, it helps me a lot.

Please I have two questions,
1- I have a binary classification problem, please any idea how to choose the right architecture of neural network , RNN or CNN or …. ?
2- Is there any to way use machine learning classifier like K-Means, DecisionTrees, excplitly in your code above? because you used KerasClassifier but I don’t know which algorithm is used for classification.

Thank you again

Reply
- Jason Brownlee July 28, 2018 at 6:42 am #
  
  Use an MLP, more here:
  https://machinelearningmastery.com/when-to-use-mlp-cnn-and-rnn-neural-networks/
  
  You can use sklearn to test a suite of other algorithms, more here:
  https://machinelearningmastery.com/spot-check-classification-machine-learning-algorithms-python-scikit-learn/
  
  Reply
youness mourtaji July 28, 2018 at 7:22 am #

Thank you very much again M.Jason.

Reply
- Jason Brownlee July 28, 2018 at 7:37 am #
  
  You’re welcome.
  
  Reply
Lovish Batheja August 13, 2018 at 10:19 pm #

This article was very helpful! 🙂

I have a question. In this article you have used all continuous variables to predict a binary variable. How to proceed if the inputs are a mix of categorical and continuous variables?

Reply
- Jason Brownlee August 14, 2018 at 6:20 am #
  
  Categorical inputs can be integer encoded, one hot encoded or some other encoding prior to modeling.
  
  Reply
Charanraj Mohan August 27, 2018 at 5:00 am #

Hello,
I have a question. I am using Functional API of keras (using dense layer) & built a single fully connected NN. I see that the weight updates happens based on several factors like optimization method, activation function, etc. etc. Suppose, assume that I am using a real binary weight as my synapse & i want to use a binary weight function to update the weight such that I check weight update (delta w) in every iteration & when it is positive I decide to increase the weight & when it is negative I want to decrease the weight. How can it be done using keras ??

I read that keras is very limited to do this. Is it true ?? FYI, I use the syntax dense to define my layers & input to define the inputs. Is it possible to add a binary weight deciding function using dense layers in keras ?

Thanks in advance 🙂

Reply
Avi September 23, 2018 at 12:19 am #

Hi Jason,

Thanks for the post. A couple of questions.

1) The data has 260 rows. If i look at the number of params in the deeper network it is 6000+ . Shouldn’t the number of rows be greater than the number of params?

2) How can we use the cross-validated model to predict. Do we just take the last model and predict ?

Reply
- Jason Brownlee September 23, 2018 at 6:40 am #
  
  No, we can over-specify the model and still achieve low generalization error. This is also true for statistical methods through the use of regularization.
  
  We do not use CV to predict. CV is only used to estimate the generalization error of the model. Learn more here:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-make-predictions
  
  Reply

Arjun.K October 3, 2018 at 2:44 am #

Hello Jason,
I am new to Deep Learning, here is my deep learning first program is Sonar data with keras , while fitting the model i got an error i’m unable to understanding that:

‘ValueError: Error when checking input: expected dense_13_input to have shape (20,) but got array with shape (60,)’

could please help me where did i make mistake… Thank you Jason…here is my program code:

import keras
import numpy
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense,Activation,Dropout
from keras.optimizers import SGD
from keras.utils import np_utils
from sklearn.model_selection import train_test_split

sonar=pd.read_csv('C:\\Users\\sonar.all-data.csv',header=None)
sonar.head()

X=sonar.iloc[:,0:60]  #All the rows and 0 to 59 columns
Y=sonar.iloc[:,60]

Y=pd.get_dummies(Y)
Y.head()

#Split the dataset
x_train,y_train,x_test,y_test=train_test_split(X,Y,test_size=0.20,random_state=101)
print('x_train data: ',x_train.shape)
print('y_train data: ',y_train.shape)
print('x_test data: ',x_test.shape)
print('y_test data: ',y_test.shape)

#Building  the model
model = Sequential()

model.add(Dense(60,activation='relu',input_dim=20))
model.add(Dropout(0.20))

model.add(Dense(60,activation='relu'))
model.add(Dropout(0.20))

model.add(Dense(60,activation='relu'))
model.add(Dropout(0.20))

model.add(Dense(2,activation='softmax'))

# Compiling the  model
epochs = 10
lrate = 0.01
decay = lrate/epochs
sgd = SGD(lr=lrate, momentum=0.4, decay=decay, nesterov=False)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
print(model.summary())

model.fit(x_train, y_train,epochs=epochs, batch_size=10)

import keras

import numpy

import pandas as pd

from keras.models import Sequential

from keras.layers import Dense,Activation,Dropout

from keras.optimizers import SGD

from keras.utils import np_utils

from sklearn.model_selection import train_test_split

sonar=pd.read_csv('C:\\Users\\sonar.all-data.csv',header=None)

sonar.head()

X=sonar.iloc[:,0:60] #All the rows and 0 to 59 columns

Y=sonar.iloc[:,60]

Y=pd.get_dummies(Y)

Y.head()

#Split the dataset

x_train,y_train,x_test,y_test=train_test_split(X,Y,test_size=0.20,random_state=101)

print('x_train data: ',x_train.shape)

print('y_train data: ',y_train.shape)

print('x_test data: ',x_test.shape)

print('y_test data: ',y_test.shape)

#Building the model

model = Sequential()

model.add(Dense(60,activation='relu',input_dim=20))

model.add(Dropout(0.20))

model.add(Dense(60,activation='relu'))

model.add(Dropout(0.20))

model.add(Dense(60,activation='relu'))

model.add(Dropout(0.20))

model.add(Dense(2,activation='softmax'))

# Compiling the model

epochs = 10

lrate = 0.01

decay = lrate/epochs

sgd = SGD(lr=lrate, momentum=0.4, decay=decay, nesterov=False)

model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

print(model.summary())

model.fit(x_train, y_train,epochs=epochs, batch_size=10)

Jason Brownlee October 3, 2018 at 6:21 am #

The error suggests the expectations of the model and the actual data differ. You can change the model or change the data.

Reply

khalil ahmad December 2, 2018 at 5:19 am #

hi sir …
sir is it possible that every line should contain some brief explanation for example
import numpy :(numpy is library of scientific computation etc.
so i can understand the functionality of every line easily.
thanks.

Reply
- Jason Brownlee December 2, 2018 at 6:24 am #
  
  Thanks for the suggestion.
  
  Reply
JG December 6, 2018 at 10:44 am #

Hola Jason:

Thank you. I could not have enough time to go through your tutorial , but from other logistic regression (binary classification)tutorials of you, I have a general question:

1) As in multi-class classification we put as many units on the last or output layers as numbers of classes , could we replace the single units of the last layer with sigmoid activation by two units in the output layer with softmax activation instead of sigmoid, and the corresponding arguments of loss for categorical_crossentropy instead of binary_cross entropy in de model.compilation?

1.1) If it is possible this method, is it more efficient than the “classical” of unit only in the output layer?

many thanks

Reply
- Jason Brownlee December 6, 2018 at 1:45 pm #
  
  Yes, you can have 2 nodes with softmax for binary classification.
  
  It often does not make a difference and we have less complexity by using a single node.
  
  Reply
  - Priyanshu Kumar November 24, 2019 at 7:42 pm #
    
    The activation function of the last layer of the neural network changes in both the circumstances. Thus, the value of gradients change in both cases. Is there a possibility that there is an astonishing difference between the performance of the 2 networks on a given data set ?
    
    Reply
    - Jason Brownlee November 25, 2019 at 6:28 am #
      
      Possible.
      
      Reply
Pablo December 18, 2018 at 5:48 am #

‘encoded_Y’ is ‘Y’? Is not defined before.

Reply
- Jason Brownlee December 18, 2018 at 6:06 am #
  
  It is defined in section 2 of the post.
  
  Reply
Igor December 19, 2018 at 11:50 am #

Hi Jason.

Thank you for an interesting and informative article.

I am new to ANN and am not a Python programmer, so cannot “look inside” those Keras functions you used. But I have a general (and I am sure very basic) question about your example.

If I understand correctly, you constructed a neural net with 60 nodes (same as the number of predictor variables) in the input layer, a single hidden layer and an output layer with just one node for the predicted binary variable. It seems to me then that you needed to train your net for each record in your dataset separately. And as a result obtain as many sets of optimal node weights as there are records in the dataset (208 total). How then can you integrate them into just one final set? If you do something like averaging all 208 weights for each node, how then can the resultant net perform well? And without it, how can the net be tested and later used for actual predictions? I am truly confused.

Do my questions make any sense?

Thank you,

Igor

Reply
- Jason Brownlee December 19, 2018 at 2:29 pm #
  
  Not really, a single set of weights is updated during training.
  
  Perhaps this will make things clearer:
  https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/
  
  Reply
boukhari el mouatassem December 22, 2018 at 9:18 pm #

thank you for the good explain
how i can save a model create baseline() plz answer me?

Reply
- Jason Brownlee December 23, 2018 at 6:05 am #
  
  Yes, this post shows you how to save a model:
  https://machinelearningmastery.com/save-load-keras-deep-learning-models/
  
  Reply
Nandini January 25, 2019 at 5:43 pm #

for binary claasificaiton why we have to give 1 in output layer and sigmoid layer as activation function ,is their any particular reason .

Reply
- Jason Brownlee January 26, 2019 at 6:10 am #
  
  Yes, it can predict the probability directly. It’s efficient and effective.
  
  Reply
saba January 30, 2019 at 5:36 pm #

Hi Jason

Thanks for your excellent post.

I just want to start DNN with Keras . Would you please introduce me a practical tutorial according to Keras library most in case of classification?

Cheers
S

Reply
- Jason Brownlee January 31, 2019 at 5:29 am #
  
  Yes, you can get started here:
  https://machinelearningmastery.com/start-here/#deeplearning
  
  Reply
Meriem February 27, 2019 at 2:48 am #

Hi Jason
Thanks for this tutoriel but what about the test phase ?

Reply
- Jason Brownlee February 27, 2019 at 7:34 am #
  
  You can use model.evaluate() to estimate the performance of the model on unseen data.
  
  Does that help?
  
  Reply

Marco Sabatini February 27, 2019 at 8:07 am #

Sir, the result from this code is around 55% not 81%, without optimizing the NN.

Here my code for checking errors or what else:

import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data"


dataframe = pandas.read_csv(url, header=None)
dataset = dataframe.values
dim = dataset.shape
raw = dim[0]
col = dim[1]
#len(dataset.columns)

# split into input (X) and output (Y) variables
X = dataset[:,0:col-1].astype(float)
Y = dataset[:,col-1]

# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# baseline model
def create_baseline():
    # create model
    model = Sequential()
    model.add(Dense(60, input_dim=60, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, init= "normal" , activation= "sigmoid" ))
    model.compile(loss= "binary_crossentropy" , optimizer= "adam" , metrics=[ "accuracy" ])
    return model
# evaluate model with standardized dataset
estimator = KerasClassifier(build_fn=create_baseline, nb_epoch=100, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

import numpy

import pandas

from keras.models import Sequential

from keras.layers import Dense

from keras.wrappers.scikit_learn import KerasClassifier

from sklearn.model_selection import cross_val_score

from sklearn.preprocessing import LabelEncoder

from sklearn.model_selection import StratifiedKFold

# fix random seed for reproducibility

seed = 7

numpy.random.seed(seed)

# load dataset

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data"

dataframe = pandas.read_csv(url, header=None)

dataset = dataframe.values

dim = dataset.shape

raw = dim[0]

col = dim[1]

#len(dataset.columns)

# split into input (X) and output (Y) variables

X = dataset[:,0:col-1].astype(float)

Y = dataset[:,col-1]

# encode class values as integers

encoder = LabelEncoder()

encoder.fit(Y)

encoded_Y = encoder.transform(Y)

# baseline model

def create_baseline():

# create model

model = Sequential()

model.add(Dense(60, input_dim=60, kernel_initializer='normal', activation='relu'))

model.add(Dense(1, init= "normal" , activation= "sigmoid" ))

model.compile(loss= "binary_crossentropy" , optimizer= "adam" , metrics=[ "accuracy" ])

return model

# evaluate model with standardized dataset

estimator = KerasClassifier(build_fn=create_baseline, nb_epoch=100, batch_size=5, verbose=0)

kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)

results = cross_val_score(estimator, X, encoded_Y, cv=kfold)

print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Jason Brownlee February 27, 2019 at 2:36 pm #

I expect normalizing the data first might help.

Reply
- Farah November 15, 2022 at 10:05 am #
  
  I want to ask about input_shape of lstm for binary classification
  Some tutorial lstm for binary classification are (features,1) and you did (features,). May i know why? Do you have article regarding input shape of lstm?
  
  Reply
  - James Carmichael November 16, 2022 at 7:26 am #
    
    Hi Farah…The following resource may be of interest:
    
    https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
    
    Reply

Anista March 9, 2019 at 12:41 am #

Hi Jason,

I have a binary classification problem where classes are unbalanced. For example, 72000 records belongs to one class and 3000 records to the other. Is there a way to mark some kind of weights between classes in order to give more relevance to the less common class?
I’ve found class_weights but I doesn’t work with 3D data.
Could you give and idea to solve the problem?

Thank you in advanced!

Reply
- Jason Brownlee March 9, 2019 at 6:29 am #
  
  Yes, I have some ideas here that might help:
  https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/
  
  Reply
  - Anista March 11, 2019 at 7:34 pm #
    
    Thank you Jason!
    
    Reply
  - Anista March 13, 2019 at 8:15 pm #
    
    Hello Jason,
    I wonder if the options you mention in the above link can be used with time series as some of them modify the content of the dataset. I’m not sure what to use.
    Thank you.
    
    Reply
    - Jason Brownlee March 14, 2019 at 9:21 am #
      
      Not really, I expect you may need specialized methods for time series.
      
      Reply
      - Anista March 14, 2019 at 8:59 pm #
        
        And what specialized methods can I use to solve the problem for time series?
      - Jason Brownlee March 15, 2019 at 5:29 am #
        
        Sorry, I don’t have many tutorials on time series classification, I do have a few here:
        https://machinelearningmastery.com/start-here/#deep_learning_time_series
Archit Gupta March 27, 2019 at 8:17 am #

Hello Jason, I enjoy your tutorials to learn ML and feel you are very helpful to us. I saw that in this post you have used LabelEncoder. If you use this, then doesn’t it mean that when you assign values to categorical labels then there is a meaning between intergers i.e. 0 < 1 is interpreted by the model.

While reading elsewhere, I saw that when you have labels where the order of integers is unimportant, then you must use OneHotEncoder. I am not sure if it makes any difference here, please clarify if you are aware.

https://medium.com/@contactsunny/label-encoder-vs-one-hot-encoder-in-machine-learning-3fc273365621

Reply
- Jason Brownlee March 27, 2019 at 9:09 am #
  
  Yes, if the input is integer encoded then the model may infer an ordinal relationship between the values.
  
  If no such relationship is real, it is recommended to use a OHE.
  
  Reply
  - Archit Gupta March 28, 2019 at 4:20 am #
    
    Thanks. 🙂
    
    Reply
Jonathan April 4, 2019 at 8:15 pm #

Hi Jason,

Thank you for your nice explanations!

I have a question about the cross-validation part in your code, which gives us a good view of the generalization error.

Does the use of cross-validation enable us to select the right weights for the neural network? Is it like using CV for a logistic regression, which would select the right complexity of the model in order to reach bias-variance tradeoff? What is the CV doing precisely for your neural network?

About the process, I guess that the network trains itself on the whole training data. Then, the network can be validated on 10 randomly shuffled pieces of the training dataset (10-fold CV). Am I right? Then, I get the accuracy score of the classification performance of the model, as well as its standard deviation?

So, if I want to test my model on new data, then I can do what Aakash Nain and you have nicely proposed?

Thank you!

Best,

Jonathan

Reply
- Jason Brownlee April 5, 2019 at 6:16 am #
  
  Yes, we can use CV to estimate the performance of a specific model/config, as we do for other algorithms.
  
  You can learn how CV works here:
  https://machinelearningmastery.com/k-fold-cross-validation/
  
  If you want to make predictions, you must fit the model on all available data first:
  https://machinelearningmastery.com/train-final-machine-learning-model/
  
  Then use that model to make predictions:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  Reply
  - Jonathan April 5, 2019 at 7:04 pm #
    
    Thank you Jason.
    
    Another question, does it make sense to use like 75% of my data for training and CV, and then the remaining 25% for testing my model ?
    
    I am making a MLP for classification purpose. In my case, doing CV would evaluate the performance. However, making a separated test set would be better if I want to give to the model unseen data, right ?
    
    Thanks!
    
    Jonathan
    
    Reply
    - Jason Brownlee April 6, 2019 at 6:45 am #
      
      Perhaps. It really depends on the problem and how representative the 25% is of the broader problem. If it’s too small it might give misleading/optimistic results.
      
      Reply
Radek April 27, 2019 at 12:41 am #

Hi,

Why isn’t there a .fit() method used here?

Reply
- Jason Brownlee April 27, 2019 at 6:34 am #
  
  We are using the sklearn wrapper instead.
  
  Reply
  - Radek April 27, 2019 at 9:52 am #
    
    Thank you :). One more question, cause it may be me being blind. The only way I see the data set linked to the model is through cross validation that takes the X and endoded_Y. What is it that I am missing here?
    
    R
    
    Reply
    - Jason Brownlee April 28, 2019 at 6:53 am #
      
      A model learns to map input to output.
      
      Most models achieve this by taking input, making a prediction, and comparing the prediction to the expected values, then updating the model toward making predictions closer to the expected values. Repeat.
      
      Does that help?
      
      Reply
Radek April 29, 2019 at 11:45 pm #

Jason,

It does indeed – the inner workings of this model are clear. I am wondering if you have a model as a function here, how would you serialise it?

Reply
- Jason Brownlee April 30, 2019 at 6:59 am #
  
  You must use the Keras API directly in order to save the model:
  https://machinelearningmastery.com/save-load-keras-deep-learning-models/
  
  Reply
erik June 27, 2019 at 11:58 am #

I have a mixed data-set(categorical and numerical features). Does this method will be suitable with such data?

Reply
- Jason Brownlee June 27, 2019 at 2:17 pm #
  
  Yes, although you may need to integer encode or one hot encode the categorical data first.
  
  Reply
Tristan July 6, 2019 at 2:33 am #

Why do you use accuracy to evaluate the model in this dataset? Is it not an imbalanced dataset?

Reply
- Jason Brownlee July 6, 2019 at 8:43 am #
  
  It is imbalanced, but not severely. Accuracy is reasonable as long as it is compared to a baseline/naive result.
  
  Reply
erik July 9, 2019 at 12:02 pm #

How to determine the no of neurons to build our layer with? Does it depend on the no of features?? Say i have 40 features.. what should be the optimal no of neurons ?

Reply
- Jason Brownlee July 10, 2019 at 7:56 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/how-many-layers-and-nodes-do-i-need-in-my-neural-network
  
  Reply
Anuj Bhambri July 23, 2019 at 9:24 am #

Hey

How can we implement neural networks on 6 million binary data with 128 columns?
I mean what will be the units, the activation function, batch size and the epochs?

Thanks in advance

Reply
- Jason Brownlee July 23, 2019 at 2:40 pm #
  
  Start with a smaller sample of the dataset, more details here:
  https://machinelearningmastery.com/faq/single-faq/how-to-i-work-with-a-very-large-dataset
  
  Reply
Olivier July 31, 2019 at 10:31 pm #

Hi Jason,

I’ve read many of your posts, which are all excellent, congrat!

I’ve a question regarding the probabilities output in the case of binary classification with binary_crossentropy + sigmoid with Keras/TF

I’ve read many time this is the way of doing to have real (calibrated) probabilities as an output.

In order to verify this, I’ve applied the calibration curve to my model, and probabilities results are not meeting my expectations. Especially I don’t understand the fact that on training data this does not give a nearly perfect curve.

Pseudo code I use for calibration curve of training data:
model.fit(X, Y, epochs=nb_epochs, batch_size=5, verbose=2)
..
predictions = model.predict_classes(X)
…
calibration_curve(Y, predictions, n_bins=100)

The results (with calibration curve on test) to be found here:
https://we.tl/t-WwJKqXQFVB

I was wondering If you had any advice on this.

Thanks in advance

Reply
- Jason Brownlee August 1, 2019 at 6:51 am #
  
  Perhaps this tutorial will help in calibrating the predicted probabilities from your model:
  https://machinelearningmastery.com/calibrated-classification-model-in-scikit-learn/
  
  Reply
Hamed September 4, 2019 at 5:01 am #

Hi Jason, another great tutorial and thank you for that! I have 2 questions in this regards, though:

1) What if my output is a binary image of size 160×160 which includes facial landmarks. Can I use this model but the output should be 160×160 =25600 rather than only one neuron?

2) The paper says they used a shallow MLP with ReLU. This means their model doesn’t have any hidden layers. So, I just need to directly connect the input face features to the output layer to construct landmarks mask?

Thank you so much

Reply
- Jason Brownlee September 4, 2019 at 6:04 am #
  
  If you are predicting an image, you might want to use a different model, like a U-Net.
  
  I don’t know about the paper you’re referring to, perhaps contact the authors?
  
  Reply
  - Hamed September 4, 2019 at 8:28 am #
    
    Thank you for the suggestion, dear Jason. I’ll look into it. This is the paper:
    
    “Synthesizing Normalized Faces from Facial Identity Features”
    
    They create facial landmarks for neutral faces using a MLP. Thank you!
    
    Reply
thomson September 28, 2019 at 10:56 pm #

Is lstm classification adopted for look back concept?

Reply
- Jason Brownlee September 29, 2019 at 6:12 am #
  
  Sorry, I don’t understand, can you elaborate please?
  
  Reply
krishna November 15, 2019 at 4:17 pm #

Hi Jason Brownlee
I have some doubts about metrics calculation for cross-fold validation.
In your code, total accuracy was getting by using

results = cross_val_score(estimator, X, encoded_Y, cv=kfold)

print(“Baseline: %.2f%% (%.2f%%)” % (results.mean()*100, results.std()*100))

Can I use the following formulas for calculating metrics like (total accuracy, misclassification rate, sensitivity, precision, and f1score)?

from sklearn import metrics
from sklearn.model_selection import cross_val_predict
y_pred = cross_val_predict(estimator, X, encoded_Y, cv=kfold)
totacu=round((metrics.accuracy_score(encoded_Y,y_pred)*100),3)
totMisacu=round((1-metrics.accuracy_score(encoded_Y,y_pred))*100,3)
sensitivityVal=round((metrics.recall_score(encoded_Y,y_pred))*100,3)
precision=round((metrics.precision_score(encoded_Y,y_pred))*100,3);
f1score=round(2*((sensitivityVal*precision)/(sensitivityVal+precision)),2)

Reply
- Jason Brownlee November 16, 2019 at 7:19 am #
  
  See this tutorial to get other metrics:
  https://machinelearningmastery.com/how-to-calculate-precision-recall-f1-and-more-for-deep-learning-models/
  
  Reply
  - krishna November 18, 2019 at 4:35 pm #
    
    Hi Jason Brownlee.
    Thank you for your reply.
    In “https://machinelearningmastery.com/how-to-calculate-precision-recall-f1-and-more-for-deep-learning-models/” you provided metrics related to train-test spittling data.
    
    For cross fold validation
    kfold = StratifiedKFold(n_splits=10, shuffle=True)
    results = cross_val_score(estimator, X, encoded_Y, cv=kfold)
    how can we calculate metricess like precision, sensitivity and f1score.
    
    I try to get using following syntaxes:
    from sklearn import metrics
    from sklearn.model_selection import cross_val_predict
    y_pred = cross_val_predict(estimator, X, encoded_Y, cv=kfold)
    totacu=round((metrics.accuracy_score(encoded_Y,y_pred)*100),3)
    totMisacu=round((1-metrics.accuracy_score(encoded_Y,y_pred))*100,3)
    sensitivityVal=round((metrics.recall_score(encoded_Y,y_pred))*100,3)
    precision=round((metrics.precision_score(encoded_Y,y_pred))*100,3);
    f1score=round(2*((sensitivityVal*precision)/(sensitivityVal+precision)),2)
    
    is this vaid?
    Please suggest the right way to calculate metrics for the cross-fold validation process
    
    Reply
    - Jason Brownlee November 19, 2019 at 7:38 am #
      
      You can calculate the desire metric on the predictions from each fold, then report the average and standard deviation across all of the folds.
      
      Reply
Svetlana December 24, 2019 at 4:42 am #

Hi Jason,

Your tutorials are very helpful and informative and thanks for making all of them and getting it to us. Many thanks!!

I have specific questions:

1. Is stratified and 10 fold CV the same or are they different?I know the definition but I always wonder how are they different from each other. If they are then how do we perform 10 fold CV for the same example?

2. How does one evaluate a deep learning trained model on an independent/external test dataset? Most of the time I have seen tutorials splitting the data randomly into 70% training and 30% testing.

Thanks.

Reply
- Jason Brownlee December 24, 2019 at 6:47 am #
  
  Thanks.
  
  Different. Stratified ensures that the class distribution in each fold is the same as the source dataset.
  
  You can use a train/test split for deep learning, or cross validation. The choice is yours.
  
  I have examples of both on the blog.
  
  Reply
Nancy January 2, 2020 at 3:38 am #

Hello Jason,

First of all many thanks for such good tutorials. If I like anyone’s content that’s Andrew Ng’s, Corey Schafer and yours. Really helpful and informative.

I have a question.
Say suppose my problem is a Binary Classification Problem and If I have already done hyper tuning of parameters(like no of neurons in each layer, learning rate, dropout, etc), then where do I fit them in my code. Do you have any tutorial on this?

Thanks.

Reply
- Jason Brownlee January 2, 2020 at 6:45 am #
  
  Thanks!
  
  Dense objects are layers, the argument to Dense() is the number of nodes.
  
  Sounds like you’re asking about the basics of neural nets in Keras API, perhaps start here:
  https://machinelearningmastery.com/start-here/#deeplearning
  
  Reply
Madhur Dheer January 4, 2020 at 9:50 pm #

Hello Sir,

Can you tell me how to use this estimator model to evaluate output on a testing dataset?

Reply
- Jason Brownlee January 5, 2020 at 7:04 am #
  
  You can call:
  
  result = model.evaluate(testX)
  
  Reply
chris January 9, 2020 at 4:48 am #

How would you find what data had been misclassified?

Reply
- Jason Brownlee January 9, 2020 at 7:32 am #
  
  Compare predictions to expected outputs on a dataset where you have outputs – e.g. a test set – or on a dataset where you will get real outputs later.
  
  Reply
Shashank February 20, 2020 at 6:44 pm #

encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# baseline model
def create_baseline():
# create model
model = Sequential()
model.add(Dense(166, input_dim=166, activation=’sigmoid’))
model.add(Dense(1, activation=’sigmoid’))

# Compile model
model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
return model
model.summary()

# evaluate model with standardized dataset
estimator = KerasClassifier(build_fn=create_baseline, epochs=10, batch_size=5,verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(estimator, X, encoded_Y, cv=kfold)
print(results)
print(kfold)
print(estimator)
print(“Baseline: %.2f%% (%.2f%%)” % (results.mean()*100, results.std()*100))

For the code above I have to to print acc and loss graphs, needed Loss and Accuracy graphs in proper format. (Both Training and Validation) Final performance measures of the model including validation accuracy, loss, precision, recall, F1 score.

I wish to know what do I use as Xtrain, Xtest,Y train , Y_test in this case.

thanks

Reply
- Jason Brownlee February 21, 2020 at 8:20 am #
  
  Perhaps this will help:
  https://machinelearningmastery.com/how-to-calculate-precision-recall-f1-and-more-for-deep-learning-models/
  
  And this:
  https://machinelearningmastery.com/custom-metrics-deep-learning-keras-python/
  
  Reply
Shashank Yadav February 21, 2020 at 3:32 am #

Can you help me with tensorboard as well please?

from tensorflow.python.keras.callbacks import TensorBoard
tensorboard = TensorBoard(log_dir=”logs/{}”.format(time()))
I used the above code but can’t call tensorboard and can’t specify path?

When i use model.save for H5 is get model is not defined.

from keras.models import load_model
model.save_weights(‘model_weights.h5’)
model.save(‘my_model.h5’)
model = load_model(‘my_model.h5’)

Reply
- Jason Brownlee February 21, 2020 at 8:28 am #
  
  Sorry, I don’t know about tensorboard.
  
  See this for saving a model:
  https://machinelearningmastery.com/save-load-keras-deep-learning-models/
  
  Reply
Shashank Yadav February 22, 2020 at 3:37 am #

@Jason Brownlee Thanks a lot.
Great to get a reply from you!! 🙂

Reply
- Jason Brownlee February 22, 2020 at 6:32 am #
  
  You’re welcome.
  
  Reply
kiri June 28, 2020 at 6:17 am #

hi
Thank you for this tutorial
Is the number of samples of this data enough for train cnn?
How can I use the same data in cnn? I searched your site but found nothing.

Reply
- Jason Brownlee June 29, 2020 at 6:21 am #
  
  This dataset is not appropriate for a CNN, see this:
  https://machinelearningmastery.com/when-to-use-mlp-cnn-and-rnn-neural-networks/
  
  Reply
Ferdinand Sonel October 24, 2020 at 6:24 am #

Hi, in this case the dataset already sorted. On top list is labeled as R and on the bottom list is labeled as M, I want to ask what happen if the data are not sorted like that ?

im sorry for my bad english

Reply
- Jason Brownlee October 24, 2020 at 7:13 am #
  
  Data is shuffled before split into train and test sets.
  
  Reply
  - Ferdinand Sonel October 26, 2020 at 7:15 am #
    
    oh i see
    thx for answering my question
    
    Reply
    - Jason Brownlee October 26, 2020 at 7:34 am #
      
      You’re welcome.
      
      Reply
Kassaye Akanie May 16, 2021 at 5:49 pm #

oh!!!! thanks a lot this very important

Reply
- Jason Brownlee May 17, 2021 at 5:37 am #
  
  You’re welcome.
  
  Reply
Vasilis December 10, 2022 at 1:58 am #

Hi Jason, thank you for the great tutorial. Unfortunately, it gives me an error and I cannot proceed. Can you help me?

“ValueError:
All the 10 fits failed.
It is very likely that your model is misconfigured.
You can try to debug the error by setting error_score=’raise’.

ps the commands
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
needs to be imported now as
from keras.models import Sequential
from keras.layers import Dense

Reply
- James Carmichael December 10, 2022 at 10:51 am #
  
  Hi Vasilis…You are very welcome! The following discussion may add clarity:
  
  https://stackoverflow.com/questions/69005052/how-do-i-fix-fitfailedwarning-estimator-fit-failed-the-score-on-this-train-t
  
  Reply
sajjad October 20, 2023 at 8:27 pm #

Hi Jason,
thank you for the great tutorial.

I need to use confusion matrix in my project, how i can add it to this code?

can you help me?

Reply
- James Carmichael October 21, 2023 at 9:19 am #
  
  Hi sajjad…The following resource may be of interest to you:
  
  https://machinelearningmastery.com/confusion-matrix-machine-learning/
  
  Reply

Navigation

Binary Classification Tutorial with the Keras Deep Learning Library

1. Description of the Dataset

Need help with Deep Learning in Python?

2. Baseline Neural Network Model Performance

3. Re-Run the Baseline Model with Data Preparation

4. Tuning Layers and Number of Neurons in the Model

4.1. Evaluate a Smaller Network

4.2. Evaluate a Larger Network

Summary

More On This Topic

198 Responses to Binary Classification Tutorial with the Keras Deep Learning Library

Leave a Reply Click here to cancel reply.