Multi-Class Classification Tutorial with the Keras Deep Learning Library

By Jason Brownlee on August 7, 2022 in Deep Learning 614

Keras is a Python library for deep learning that wraps the efficient numerical libraries Theano and TensorFlow.

In this tutorial, you will discover how to use Keras to develop and evaluate neural network models for multi-class classification problems.

After completing this step-by-step tutorial, you will know:

How to load data from CSV and make it available to Keras
How to prepare multi-class classification data for modeling with neural networks
How to evaluate Keras neural network models with scikit-learn

Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Update Oct/2016: Updated for Keras 1.1.0 and scikit-learn v0.18
Update Mar/2017: Updated for Keras 2.0.2, TensorFlow 1.0.1 and Theano 0.9.0
Update Jun/2017: Updated to use softmax activation in output layer, larger hidden layer, default weight initialization
Update Aug/2019: Added complete working example for convenience, removed random seed
Update Sep/2019: Updated for Keras 2.2.5 API

Multi-class classification tutorial with the Keras deep learning library
Photo by houroumono, some rights reserved.

1. Problem Description

In this tutorial, you will use the standard machine learning problem called the iris flowers dataset.

This dataset is well studied and makes a good problem for practicing on neural networks because all four input variables are numeric and have the same scale in centimeters. Each instance describes the properties of an observed flower’s measurements, and the output variable is a specific iris species.

This is a multi-class classification problem, meaning that there are more than two classes to be predicted. In fact, there are three flower species. This is an important problem for practicing with neural networks because the three class values require specialized handling.

The iris flower dataset is a well-studied problem, and as such, you can expect to achieve a model accuracy in the range of 95% to 97%. This provides a good target to aim for when developing your models.

You can download the iris flowers dataset from the UCI Machine Learning repository and place it in your current working directory with the filename “iris.csv“.

Iris Flowers Dataset (iris.csv)

Need help with Deep Learning in Python?

Take my free 2-week email course and discover MLPs, CNNs and LSTMs (with code).

Click to sign-up now and also get a free PDF Ebook version of the course.

2. Import Classes and Functions

You can begin by importing all the classes and functions you will need in this tutorial.

This includes both the functionality you require from Keras and the data loading from pandas, as well as data preparation and model evaluation from scikit-learn.

import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline
...

import pandas

from keras.models import Sequential

from keras.layers import Dense

from keras.wrappers.scikit_learn import KerasClassifier

from keras.utils import np_utils

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import KFold

from sklearn.preprocessing import LabelEncoder

from sklearn.pipeline import Pipeline

...

3. Load the Dataset

The dataset can be loaded directly. Because the output variable contains strings, it is easiest to load the data using pandas. You can then split the attributes (columns) into input variables (X) and output variables (Y).

...
# load dataset
dataframe = pandas.read_csv("iris.csv", header=None)
dataset = dataframe.values
X = dataset[:,0:4].astype(float)
Y = dataset[:,4]

...

# load dataset

dataframe = pandas.read_csv("iris.csv", header=None)

dataset = dataframe.values

X = dataset[:,0:4].astype(float)

Y = dataset[:,4]

4. Encode the Output Variable

The output variable contains three different string values.

When modeling multi-class classification problems using neural networks, it is good practice to reshape the output attribute from a vector that contains values for each class value to a matrix with a Boolean for each class value and whether a given instance has that class value or not.

This is called one-hot encoding or creating dummy variables from a categorical variable.

For example, in this problem, three class values are Iris-setosa, Iris-versicolor, and Iris-virginica. If you had the observations:

Iris-setosa
Iris-versicolor
Iris-virginica

Iris-setosa

Iris-versicolor

Iris-virginica

You can turn this into a one-hot encoded binary matrix for each data instance that would look like this:

Iris-setosa,	Iris-versicolor,	Iris-virginica
1,		0,			0
0,		1, 			0
0, 		0, 			1

Iris-setosa, Iris-versicolor, Iris-virginica

1, 0, 0

0, 1, 0

0, 0, 1

You can first encode the strings consistently to integers using the scikit-learn class LabelEncoder. Then convert the vector of integers to a one-hot encoding using the Keras function to_categorical().

...
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

...

# encode class values as integers

encoder = LabelEncoder()

encoder.fit(Y)

encoded_Y = encoder.transform(Y)

# convert integers to dummy variables (i.e. one hot encoded)

dummy_y = np_utils.to_categorical(encoded_Y)

5. Define the Neural Network Model

If you are new to Keras or deep learning, see this helpful Keras tutorial.

The Keras library provides wrapper classes to allow you to use neural network models developed with Keras in scikit-learn.

There is a KerasClassifier class in Keras that can be used as an Estimator in scikit-learn, the base type of model in the library. The KerasClassifier takes the name of a function as an argument. This function must return the constructed neural network model, ready for training.

Below is a function that will create a baseline neural network for the iris classification problem. It creates a simple, fully connected network with one hidden layer that contains eight neurons.

The hidden layer uses a rectifier activation function which is a good practice. Because you used a one-hot encoding for your iris dataset, the output layer must create three output values, one for each class. The output value with the largest value will be taken as the class predicted by the model.

The network topology of this simple one-layer neural network can be summarized as follows:

4 inputs -> [8 hidden nodes] -> 3 outputs

1	4 inputs -> [8 hidden nodes] -> 3 outputs

Note that a “softmax” activation function was used in the output layer. This ensures the output values are in the range of 0 and 1 and may be used as predicted probabilities.

Finally, the network uses the efficient Adam gradient descent optimization algorithm with a logarithmic loss function, which is called “categorical_crossentropy” in Keras.

...
# define baseline model
def baseline_model():
	# create model
	model = Sequential()
	model.add(Dense(8, input_dim=4, activation='relu'))
	model.add(Dense(3, activation='softmax'))
	# Compile model
	model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model

...

# define baseline model

def baseline_model():

# create model

model = Sequential()

model.add(Dense(8, input_dim=4, activation='relu'))

model.add(Dense(3, activation='softmax'))

# Compile model

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

You can now create your KerasClassifier for use in scikit-learn.

You can also pass arguments in the construction of the KerasClassifier class that will be passed on to the fit() function internally used to train the neural network. Here, you pass the number of epochs as 200 and batch size as 5 to use when training the model. Debugging is also turned off when training by setting verbose to 0.

...
estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)

1 2	... estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)

6. Evaluate the Model with k-Fold Cross Validation

You can now evaluate the neural network model on our training data.

The scikit-learn has excellent capability to evaluate models using a suite of techniques. The gold standard for evaluating machine learning models is k-fold cross validation.

First, define the model evaluation procedure. Here, you set the number of folds to 10 (an excellent default) and shuffle the data before partitioning it.

...
kfold = KFold(n_splits=10, shuffle=True)

1 2	... kfold = KFold(n_splits=10, shuffle=True)

Now, you can evaluate your model (estimator) on your dataset (X and dummy_y) using a 10-fold cross-validation procedure (k-fold).

Evaluating the model only takes approximately 10 seconds and returns an object that describes the evaluation of the ten constructed models for each of the splits of the dataset.

...
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

...

results = cross_val_score(estimator, X, dummy_y, cv=kfold)

print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

7. Complete Example

You can tie all of this together into a single program that you can save and run as a script:

# multi-class classification with Keras
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline
# load dataset
dataframe = pandas.read_csv("iris.data", header=None)
dataset = dataframe.values
X = dataset[:,0:4].astype(float)
Y = dataset[:,4]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

# define baseline model
def baseline_model():
	# create model
	model = Sequential()
	model.add(Dense(8, input_dim=4, activation='relu'))
	model.add(Dense(3, activation='softmax'))
	# Compile model
	model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model

estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, shuffle=True)
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

# multi-class classification with Keras

import pandas

from keras.models import Sequential

from keras.layers import Dense

from keras.wrappers.scikit_learn import KerasClassifier

from keras.utils import np_utils

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import KFold

from sklearn.preprocessing import LabelEncoder

from sklearn.pipeline import Pipeline

# load dataset

dataframe = pandas.read_csv("iris.data", header=None)

dataset = dataframe.values

X = dataset[:,0:4].astype(float)

Y = dataset[:,4]

# encode class values as integers

encoder = LabelEncoder()

encoder.fit(Y)

encoded_Y = encoder.transform(Y)

# convert integers to dummy variables (i.e. one hot encoded)

dummy_y = np_utils.to_categorical(encoded_Y)

# define baseline model

def baseline_model():

# create model

model = Sequential()

model.add(Dense(8, input_dim=4, activation='relu'))

model.add(Dense(3, activation='softmax'))

# Compile model

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)

kfold = KFold(n_splits=10, shuffle=True)

results = cross_val_score(estimator, X, dummy_y, cv=kfold)

print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

The results are summarized as both the mean and standard deviation of the model accuracy on the dataset.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

This is a reasonable estimation of the performance of the model on unseen data. It is also within the realm of known top results for this problem.

Accuracy: 97.33% (4.42%)

1	Accuracy: 97.33% (4.42%)

Summary

In this post, you discovered how to develop and evaluate a neural network using the Keras Python library for deep learning.

By completing this tutorial, you learned:

How to load data and make it available to Keras
How to prepare multi-class classification data for modeling using one-hot encoding
How to use Keras neural network models with scikit-learn
How to define a neural network using Keras for multi-class classification
How to evaluate a Keras neural network model using scikit-learn with k-fold cross validation

Do you have any questions about deep learning with Keras or this post?

Ask your questions in the comments below, and I will do my best to answer them.

614 Responses to Multi-Class Classification Tutorial with the Keras Deep Learning Library

Jack June 19, 2016 at 3:12 pm #

Thanks for this cool tutorial! I have a question about the input data. If the datatypes of input variables are different (i.e. string and numeric). How to preprocess the train data to fit keras?

Reply
- Jason Brownlee June 20, 2016 at 5:41 am #
  
  Great question. Eventually, all of the data need to be turned into real values.
  
  With categorical variables, you can create dummy variables and use one-hot encoding. For string data, you can use word embeddings.
  
  Reply
  - Shraddha February 10, 2017 at 8:32 pm #
    
    Could you please let me know how to convert string data into word embeddings in large datasets?
    Would really appreciate it
    Thanks so much
    
    Reply
    - Jason Brownlee February 11, 2017 at 5:01 am #
      
      Hi Shraddha,
      
      First, convert the chars to vectors of integers. You can then pad all vectors to the same length. Then away you go.
      
      I hope that helps.
      
      Reply
      - Shraddha Sunil February 13, 2017 at 4:52 pm #
        
        Thanks so much Jason!
      - Jason Brownlee February 14, 2017 at 10:04 am #
        
        You’re welcome.
      - sasi August 5, 2017 at 7:51 pm #
        
        can you give an example for that..
      - Jason Brownlee August 6, 2017 at 7:38 am #
        
        I have many tutorials for encoding and padding sequences on the blog. Please use the search.
  - Chandan February 14, 2019 at 3:17 pm #
    
    query:
    
    which type of properties of an observed flower measurements is taken
    Told me what is the 4 attributes, you taken
    
    Reply
    - Jason Brownlee February 15, 2019 at 7:58 am #
      
      For more on the dataset, see this post:
      https://en.wikipedia.org/wiki/Iris_flower_data_set
      
      Reply
    - Manohar Nookala March 22, 2020 at 12:06 am #
      
      Class indices are 7. Then how manu output variables i need to mentions
      
      Reply
  - SJ July 16, 2020 at 3:44 pm #
    
    IF i choose to use Entity Embeddings for categorical data, can you please suggest how to feed them to a MLP. I am able to do that in pytorch by using your article on pytorch.
    Can you please suggest how to convert the below architecture into an MLP.
    
    (all_embeddings): ModuleList(
    (0): Embedding(24, 12)
    (1): Embedding(2, 1)
    (2): Embedding(7, 4)
    )
    (embedding_dropout): Dropout(p=0.4, inplace=False)
    (batch_norm_num): BatchNorm1d(7, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (layers): Sequential(
    (0): Linear(in_features=24, out_features=200, bias=True)
    (1): ReLU(inplace=True)
    (2): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Dropout(p=0.4, inplace=False)
    (4): Linear(in_features=200, out_features=100, bias=True)
    (5): ReLU(inplace=True)
    (6): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): Dropout(p=0.4, inplace=False)
    (8): Linear(in_features=100, out_features=1, bias=True)
    )
    )
    
    Reply
    - Jason Brownlee July 17, 2020 at 6:00 am #
      
      sorry, I don’t have an example for pytorch, but I have an example for keras that might help:
      https://machinelearningmastery.com/how-to-prepare-categorical-data-for-deep-learning-in-python/
      
      Reply
  - Lam October 27, 2021 at 1:07 am #
    
    A great tutorial. What should I do to preprocess mixed input data (data includes both numeric and categorical variables) for the fitting model? Thanks in advance.
    
    Reply
    - Adrian Tam October 27, 2021 at 3:29 am #
      
      Numeric usually just presented as-is, but sometimes we apply scaling to it too. Categorical is usually one-hot encoded.
      
      Reply
- Harale Vandana Rangrao April 21, 2018 at 5:55 pm #
  
  Thank you very much, sir, for sharing so much information, but sir I want to a dataset of greenhouse for tomato crop with climate variable like Temperature, Humidity, Soil Moisture, pH Scale, CO2, Light Intensity. Can you provide me this type dataset?
  
  Reply
  - Jason Brownlee April 22, 2018 at 5:58 am #
    
    I answer this question here:
    https://machinelearningmastery.com/faq/single-faq/where-can-i-get-a-dataset-on-___
    
    Reply
- Ramesh April 17, 2019 at 7:00 pm #
  
  Hey,
  Can we use this module for array of string
  
  Reply
  - Ramesh April 17, 2019 at 7:02 pm #
    
    Hey can we use this method for arrays
    
    array([[”, u’multios’, u’dsdsds’, u’DW_SAL_CANNOT_INITIALIZE’, u’av_sw’],
    [”, u’android-l’, u’dsssd’, u’SYS_SW’, u’syssw’],
    [”, u’gnu_linux-k4.9′, u’dssss’, u’USB_IO_Error’, u’syssw’],
    …,
    [”, u’android-p’, u’fddfdfdf’, u’mm_nvmedia_video_decoder_create’,
    u’multimedia’],
    [”, u’android-o’, u’sasa’, u’mm_log_tag’,
    u’multimedia’],
    [u’rel-32′, u’android-p’, u’dsdsd’,
    u’mm_parsevp9_incorrect_sync_code_for_vp9′, u’multimedia’]],
    dtype=object)
    
    Reply
    - Jason Brownlee April 18, 2019 at 8:25 am #
      
      I would recommend using a bag of words model when starting with text:
      https://machinelearningmastery.com/gentle-introduction-bag-words-model/
      
      Reply
- hieund198 September 7, 2019 at 12:57 am #
  
  Hi Mr Jason,
  What’s name the model you use to train?
  Sorry, I am newbie.
  Thanks
  
  Reply
  - Jason Brownlee September 7, 2019 at 5:35 am #
    
    The model in this tutorial a neural network or a multilayer neural network, often called an MLP or a fully connected network.
    
    Reply
    - hieund198 September 11, 2019 at 3:55 am #
      
      Dear Mr Jason,
      
      I run your example code I noticed that softmax in your tutorial has different result with softmax used in CNN model.
      I would like to confirm with you this is a behavior of CNN
      
      Exemple my code:
      
      model = Sequential()
      model.add(Conv1D(64, 3, activation=’relu’, input_shape=(8,1)))
      model.add(Conv1D(64, 3, activation=’relu’))
      model.add(Dropout(0.5))
      model.add(MaxPooling1D())
      model.add(Flatten())
      model.add(Dense(100, activation=’relu’))
      model.add(Dense(4, activation=’softmax’))
      model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
      
      X_train, X_test, Y_train, Y_test = train_test_split(X, dummy_y, test_size=0.33, random_state=seed)
      model.fit(X_train, Y_train)
      predictions = model.predict(X_test)
      print(predictions)
      
      >> Out put
      
      [[0.5863281 0.11777738 0.16206734 0.13382716]
      [0.5863281 0.11777738 0.16206734 0.13382716]
      [0.39733416 0.19241211 0.2283105 0.1819432 ]
      [0.54646176 0.12707633 0.20596607 0.12049587]
      
      I think that softmax in CNN model will return % for each result need to classification.
      
      And your model will return value dummy_y prediction.
      
      Thank you
      
      Reply
      - Jason Brownlee September 11, 2019 at 5:43 am #
        
        The softmax is a standard implementation.
        
        Perhaps I don’t follow, what is the problem you have exactly?
      - hieund1994 September 11, 2019 at 12:41 pm #
        
        I got found solution from another article of you.
        Thanks
        
        “First, the raw 17-element prediction vector is printed. If we wish, we could pretty-print this vector and summarize the predicted confidence that the photo would be assigned each label.
        
        Next, the prediction is rounded and the vector indexes that contain a 1 value are reverse-mapped to their tag string values. The predicted tags are then printed. we can see that the model has correctly predicted the known tags for the provided photo.
        
        It might be interesting to repeat this test with an entirely new photo, such as a photo from the test dataset, after you have already manually suggested tags.
        
        [9.0940112e-01 3.6541668e-03 1.5959743e-02 6.8241461e-05 8.5694155e-05
        9.9828100e-01 7.4096164e-08 5.5998818e-05 3.6668104e-01 1.2538023e-01
        4.6371704e-04 3.7660234e-04 9.9999273e-01 1.9014676e-01 5.6060363e-04
        1.4613305e-03 9.5227945e-01]
        
        [‘agriculture’, ‘clear’, ‘primary’, ‘water’] ”
        
        https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-to-classify-satellite-photos-of-the-amazon-rainforest/
      - Jason Brownlee September 11, 2019 at 2:29 pm #
        
        Happy to hear that.
- Preetkaran January 17, 2020 at 6:42 am #
  
  Hi Jason
  
  I’m doing an image localization and classification task on Keras-FRCNN, on Theano Backend. I’m getting the following error:
  
  Traceback (most recent call last):
  File “train_frcnn.py”, line 208, in
  model_classifier.compile(optimizer=optimizer_classifier, loss=[losses.class_loss_cls, losses.class_loss_regr(len(classes_count)-1)], metrics={‘dense_class_{}’.format(len(classes_count)): ‘accuracy’})
  File “C:\Users\singh\Anaconda3\lib\site-packages\keras\engine\training.py”, line 229, in compile
  self.total_loss = self._prepare_total_loss(masks)
  File “C:\Users\singh\Anaconda3\lib\site-packages\keras\engine\training.py”, line 692, in _prepare_total_loss
  y_true, y_pred, sample_weight=sample_weight)
  File “C:\Users\singh\Anaconda3\lib\site-packages\keras\losses.py”, line 71, in __call__
  losses = self.call(y_true, y_pred)
  File “C:\Users\singh\Anaconda3\lib\site-packages\keras\losses.py”, line 132, in call
  return self.fn(y_true, y_pred, **self._fn_kwargs)
  File “F:\ML\keras-frcnn-moded\keras_frcnn\losses.py”, line 55, in class_loss_cls
  return lambda_cls_class*K.mean(categorical_crossentropy(y_true[0, :, :], y_pred[0, :, :]))
  File “C:\Users\singh\Anaconda3\lib\site-packages\keras\losses.py”, line 691, in categorical_crossentropy
  return K.categorical_crossentropy(y_true, y_pred, from_logits=from_logits)
  File “C:\Users\singh\Anaconda3\lib\site-packages\keras\backend\theano_backend.py”, line 1831, in categorical_crossentropy
  output_dimensions = list(range(len(int_shape(output))))
  TypeError: object of type ‘NoneType’ has no len()
  
  When I use Tensorflow backend, then I don’t face this error. So, I think it’s something related to Keras and Theano. Using tensorflow as keras backend serves useful but it’s quite slow for the model (takes days for training).
  
  Any clue/fix for the issue, will be very helpful…..
  
  Reply
  - Jason Brownlee January 17, 2020 at 1:48 pm #
    
    Perhaps post your code and error to stackoverflow?
    
    Reply
Aakash Nain July 4, 2016 at 2:25 pm #

Hello Jason,
It’s a very nice tutorial to learn. I implemented the same model but on my work station I achieved a score of 88.67% only. After modifying the number of hidden layers, I achieved an accuracy of 93.04%. But I am not able to achieve the score of 95% or above. Any particular reason behind it ?

Reply
- Jason Brownlee July 6, 2016 at 6:27 am #
  
  Interesting Aakash.
  
  I used the Theano backend. Are you using the same?
  
  Are all your libraries up to date? (Keras, Theano, NumPy, etc…)
  
  Reply
  - Aakash Nain July 7, 2016 at 12:03 am #
    
    Yes Jason . Backend is theano and all libraries are up to date.
    
    Reply
    - Jason Brownlee July 7, 2016 at 9:40 am #
      
      Interesting. Perhaps seeding the random number generator is not having the desired effect for reproducibility. It perhaps it has different effects on different platforms.
      
      Perhaps re-run the above code example a few times and see the spread of accuracy scores you achieve?
      
      Reply
- hieund1994 September 11, 2019 at 10:57 am #
  
  Because Label I use LabelEncoder() to endcoe label.
  
  I could not encoder.inverse_transform(predictions)
  
  Expected :output must be follow format:
  
  —
  [[1 0 0 0]
  [1 0 0 0]
  [1 0 0 0]
  —
  But current output is:
  
  [[0.5863281 0.11777738 0.16206734 0.13382716]
  [0.5863281 0.11777738 0.16206734 0.13382716]
  [0.39733416 0.19241211 0.2283105 0.1819432 ]
  
  So I can not encoder.inverse_transform(predictions)
  
  Are you have any suggest?
  
  Thank you
  
  Reply
  - Jason Brownlee September 11, 2019 at 2:29 pm #
    
    First, you must reverse the prediction to an integer via argmax, then integer to category via the inverse_transform.
    
    Reply
    - Mbonu Chinedu April 24, 2020 at 2:34 pm #
      
      lols, exactly !!!!!
      
      Reply
La Tuan Nghia July 6, 2016 at 1:29 am #

Hello Jason,

In chapter 10 of the book “Deep Learning With Python”, there is a fraction of code:

estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0)
kfold = KFold(n=len(X), n_folds=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print(“Accuracy: %.2f%% (%.2f%%)” % (results.mean()*100, results.std()*100))

How to save this model and weights to file, then how to load these file to predict a new input data?

Many thanks!

Reply
- Jason Brownlee July 6, 2016 at 6:26 am #
  
  Really good question.
  
  Keras does provide functions to save network weights to HDF5 and network structure to JSON or YAML. The problem is, once you wrap the network in a scikit-learn classifier, how do you access the model and save it. Or can you save the whole wrapped model.
  
  Perhaps a simple but inefficient place to start would be to try and simply pickle the whole classifier?
  https://docs.python.org/2/library/pickle.html
  
  Reply
  - Constantin Weisser July 30, 2016 at 4:30 am #
    
    I tried doing that. It works for a normal sklearn classifier, but apparently not for a Keras Classifier:
    
    import pickle
    with open(“name.p”,”wb”) as fw:
    pickle.dump(clf,fw)
    
    with open(name+”.p”,”rb”) as fr:
    clf_saved = pickle.load(fr)
    print(clf_saved)
    
    prob_pred=clf_saved.predict_proba(X_test)[:,1]
    
    This gives:
    
    theano.gof.fg.MissingInputError: An input of the graph, used to compute DimShuffle{x,x}(keras_learning_phase), was not provided and not given a value.Use the Theano flag exception_verbosity=’high’,for more information on this error.
    
    Backtrace when the variable is created:
    File “nn_systematics_I_evaluation_of_optimised_classifiers.py”, line 6, in
    import classifier_eval_simplified
    File “../../../../classifier_eval_simplified.py”, line 26, in
    from keras.utils import np_utils
    File “/usr/local/lib/python2.7/site-packages/keras/__init__.py”, line 2, in
    from . import backend
    File “/usr/local/lib/python2.7/site-packages/keras/backend/__init__.py”, line 56, in
    from .theano_backend import *
    File “/usr/local/lib/python2.7/site-packages/keras/backend/theano_backend.py”, line 17, in
    _LEARNING_PHASE = T.scalar(dtype=’uint8′, name=’keras_learning_phase’) # 0 = test, 1 = train
    
    Reply
    - Jason Brownlee July 30, 2016 at 7:12 am #
      
      I provide examples of saving and loading Keras models here:
      https://machinelearningmastery.com/save-load-keras-deep-learning-models/
      
      Sorry, I don’t have any examples of saving/loading the wrapped Keras classifier. Perhaps the internal model can be seralized and later deserialized and put back inside the wrapper.
      
      Reply
Sally July 15, 2016 at 4:10 am #

Dear Dr. Jason,

Thanks very much for this great tutorial . I got extra benefit from it, but I need to calculate precision, recall and confusion matrix for such multi-class classification. I tried to did it but each time I got a different problem. could you please explain me how to do this

Reply
- Jason Brownlee July 15, 2016 at 9:04 am #
  
  Hi Sally, you could perhaps use the tools in scikit-learn to summarize the performance of your model.
  
  For example, you could use sklearn.metrics.confusion_matrix() to calculate the confusion matrix for predictions, etc.
  
  See the metrics package:
  http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics
  
  Reply
  - Prabhat April 13, 2018 at 5:45 am #
    
    Could you tell how to use that in this code you have provided above? I am very new Keras.
    
    Thanks in Advance
    
    Reply
- olfa August 3, 2018 at 11:23 pm #
  
  please how we can implemente python code using recall and precision to evaluate prediction model
  
  Reply
  - Jason Brownlee August 4, 2018 at 6:11 am #
    
    You can use the sklearn library to calculate these scores:
    http://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics
    
    Reply
    - Kaddy S. January 28, 2020 at 9:29 pm #
      
      Hi jason.. your tutorials are a great help.. i am a student working on deep learning for detection of diabetic retinopathy and its stages.. using the code u gave for multi class, for my dataset.. i am getting a very low baseline.. 23%..can help me on improving the accuracy.. also how to classify images using deep learning?
      
      Reply
      - Jason Brownlee January 29, 2020 at 6:35 am #
        
        Thanks!
        
        Yes, this will give you ideas:
        https://machinelearningmastery.com/start-here/#better

Fabian Leon July 31, 2016 at 4:12 am #

Hi jason, Reading the tutorial and the same example in your book, you still don’t tell us how can use the model to make predictions, you have only show us how to train and evaluate it but I would like to see you using this model to make predictions on at least one example of iris flowers data no matters if is dummy data.

I would like to see how can I load my own instance of an iris-flower and use the above model to predict what kind is the flower?

could you do that for us?

Jason Brownlee July 31, 2016 at 7:31 am #

Hi Fabian, no problem.

In the tutorial above, we are using the scikit-learn wrapper. That means we can use the standard model.predict() function to make predictions from scikit-learn.

For example, below is an an example adapted from the above where we split the dataset, train on 67% and make predictions on 33%. Remember that we have encoded the output class value as integers, so the predictions are integers. We can then use encoder.inverse_transform() to turn the predicted integers back into strings.

# Train model and make predictions
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import LabelEncoder
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataframe = pandas.read_csv("iris.csv", header=None)
dataset = dataframe.values
X = dataset[:,0:4].astype(float)
Y = dataset[:,4]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
# define baseline model
def baseline_model():
	# create model
	model = Sequential()
	model.add(Dense(4, input_dim=4, init='normal', activation='relu'))
	model.add(Dense(3, init='normal', activation='sigmoid'))
	# Compile model
	model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0)
X_train, X_test, Y_train, Y_test = train_test_split(X, dummy_y, test_size=0.33, random_state=seed)
estimator.fit(X_train, Y_train)
predictions = estimator.predict(X_test)
print(predictions)
print(encoder.inverse_transform(predictions))

# Train model and make predictions

import numpy

import pandas

from keras.models import Sequential

from keras.layers import Dense

from keras.wrappers.scikit_learn import KerasClassifier

from keras.utils import np_utils

from sklearn.cross_validation import train_test_split

from sklearn.preprocessing import LabelEncoder

# fix random seed for reproducibility

seed = 7

numpy.random.seed(seed)

# load dataset

dataframe = pandas.read_csv("iris.csv", header=None)

dataset = dataframe.values

X = dataset[:,0:4].astype(float)

Y = dataset[:,4]

# encode class values as integers

encoder = LabelEncoder()

encoder.fit(Y)

encoded_Y = encoder.transform(Y)

# convert integers to dummy variables (i.e. one hot encoded)

dummy_y = np_utils.to_categorical(encoded_Y)

# define baseline model

def baseline_model():

# create model

model = Sequential()

model.add(Dense(4, input_dim=4, init='normal', activation='relu'))

model.add(Dense(3, init='normal', activation='sigmoid'))

# Compile model

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0)

X_train, X_test, Y_train, Y_test = train_test_split(X, dummy_y, test_size=0.33, random_state=seed)

estimator.fit(X_train, Y_train)

predictions = estimator.predict(X_test)

print(predictions)

print(encoder.inverse_transform(predictions))

Running this example prints:

[2 1 0 1 2 0 1 1 0 1 2 1 0 2 0 2 2 2 0 0 1 2 1 2 2 2 1 1 2 2 2 1 0 2 1 0 0
 0 0 2 2 1 2 2 1 0 1 1 2 0]
['Iris-virginica' 'Iris-versicolor' 'Iris-setosa' 'Iris-versicolor'
 'Iris-virginica' 'Iris-setosa' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-setosa' 'Iris-versicolor' 'Iris-virginica' 'Iris-versicolor'
 'Iris-setosa' 'Iris-virginica' 'Iris-setosa' 'Iris-virginica'
 'Iris-virginica' 'Iris-virginica' 'Iris-setosa' 'Iris-setosa'
 'Iris-versicolor' 'Iris-virginica' 'Iris-versicolor' 'Iris-virginica'
 'Iris-virginica' 'Iris-virginica' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-virginica' 'Iris-virginica' 'Iris-virginica' 'Iris-versicolor'
 'Iris-setosa' 'Iris-virginica' 'Iris-versicolor' 'Iris-setosa'
 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-virginica'
 'Iris-virginica' 'Iris-versicolor' 'Iris-virginica' 'Iris-virginica'
 'Iris-versicolor' 'Iris-setosa' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-virginica' 'Iris-setosa']

[2 1 0 1 2 0 1 1 0 1 2 1 0 2 0 2 2 2 0 0 1 2 1 2 2 2 1 1 2 2 2 1 0 2 1 0 0

0 0 2 2 1 2 2 1 0 1 1 2 0]

['Iris-virginica' 'Iris-versicolor' 'Iris-setosa' 'Iris-versicolor'

'Iris-virginica' 'Iris-setosa' 'Iris-versicolor' 'Iris-versicolor'

'Iris-setosa' 'Iris-versicolor' 'Iris-virginica' 'Iris-versicolor'

'Iris-setosa' 'Iris-virginica' 'Iris-setosa' 'Iris-virginica'

'Iris-virginica' 'Iris-virginica' 'Iris-setosa' 'Iris-setosa'

'Iris-versicolor' 'Iris-virginica' 'Iris-versicolor' 'Iris-virginica'

'Iris-virginica' 'Iris-virginica' 'Iris-versicolor' 'Iris-versicolor'

'Iris-virginica' 'Iris-virginica' 'Iris-virginica' 'Iris-versicolor'

'Iris-setosa' 'Iris-virginica' 'Iris-versicolor' 'Iris-setosa'

'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-virginica'

'Iris-virginica' 'Iris-versicolor' 'Iris-virginica' 'Iris-virginica'

'Iris-versicolor' 'Iris-setosa' 'Iris-versicolor' 'Iris-versicolor'

'Iris-virginica' 'Iris-setosa']

I hope that is clear and useful. Let me know if you have any more questions.

Devendra November 27, 2016 at 9:40 pm #

Hi Jason,

I was facing error while converting string to float and so I had to make a minor correction to my code
X = dataset[1:,0:4].astype(float)
Y = dataset[1:,4]

However, I am still unable to run since I am getting the following error for line

“—-> 1 results = cross_val_score(estimator, X, dummy_y, cv=kfold)”
……………….
“Exception: Error when checking model target: expected dense_4 to have shape (None, 3) but got array with shape (135L, 22L)”

I would appreciate your help. Thanks.

Reply
- Devendra November 28, 2016 at 5:41 am #
  
  I found the issue. It was with with the indexes.
  I had to take [1:,1:5] for X and [1:,5] for Y.
  
  I am using Jupyter notebook to run my code.
  The index range seems to be different in my case.
  
  Reply
  - Jason Brownlee November 28, 2016 at 8:47 am #
    
    I’m glad you worked it out Devendra.
Cristina March 24, 2017 at 2:23 am #

For some reason, when I run this example I get 0 as prediction value for all the samples. What could be happening?

I’ve the same problem on prediction with other code I’m executing, and decided to run yours to check if i could be doing something wrong?

I’m lost now, this is very strange.

Thanks a in advance!

Reply
- Cristina March 24, 2017 at 2:42 am #
  
  Hello again,
  
  This is happening with Keras 2.0, with Keras 1 works fine.
  
  Thanks,
  
  Cristina
  
  Reply
  - Jason Brownlee March 24, 2017 at 8:00 am #
    
    Thanks for the note.
  - Fawzi April 5, 2018 at 5:55 pm #
    
    Hi all,
    I faced the same problem it works well with keras 1 but gives all 0 with keras 2 !
    
    Thanks for this great tuto !
    
    Fawzi
  - Jason Brownlee April 6, 2018 at 6:21 am #
    
    Does this happen every time you train the model?
  - Tharindu Rangana December 27, 2018 at 4:38 am #
    
    Hello Cristina,
    I have faced the same problem with keras 2. And then I change keras to 1.2 and worked well. Thank you for the information
- Jason Brownlee March 24, 2017 at 7:57 am #
  
  Very strange.
  
  Maybe check that your data file is correct, that you have all of the code and that your environment is installed and is working correctly.
  
  Reply
  - Andrea December 12, 2017 at 7:17 am #
    
    Jason, I’m getting the same prediction (all zeroes) with Keras 2. If we could be able to nail the cause, it would be great. After all, as of now it’s more than likely that people will try to run your great examples with keras 2.
    
    Plus, a couple of questions:
    
    1. why did you use a sigmoid for the output layer instead of a softmax?
    
    2. why did you provide initialization even for the last layer?
    
    Thanks a lot.
  - Jason Brownlee December 12, 2017 at 4:02 pm #
    
    The example does use softmax, perhaps check that you have copied all of the code from the post?
- kristi January 18, 2018 at 3:17 pm #
  
  I’m having same issue. How did u resolve it? could you please help me
  
  Reply
  - Yousuf March 21, 2018 at 1:41 pm #
    
    Has anyone resolved the issue with the output being all zeros?
  - Jason Brownlee March 21, 2018 at 3:07 pm #
    
    Perhaps try re-train the model to see if the issue occurs again?
  - Jackson May 6, 2019 at 5:35 pm #
    
    I changed the seed=7 to seed= 0, which should make each random number different, and the result will no longer be all 0.
  - Yme August 15, 2019 at 12:47 am #
    
    Issue is still present! If I use keras >2.0, the model simply predicts the same class for every training example in the dataset.
    
    – Have tried varying loss functions
    – changing activation function from sigmoid to softmax in the output layer
    – using Theano/tensorflow backends
    – Changing the number of hidden neurons in the hidden layer
    
    And for all these fixes the error persists. Only thing that solves the issue, and makes me get similar results to the ones you’re getting in your tutorial, is downgrading to Keras <2.0 (In my case I downgraded to Keras 1.2.2.)
  - Jason Brownlee August 15, 2019 at 8:20 am #
    
    I can confirm the example works as stated with Keras 2.2.4, TensorFlow 1.14 and Python 3.6.
    
    Baseline: 98.00% (3.06%)
    
    1
    
    Baseline: 98.00% (3.06%)
    
    I believe there is an issue with your development environment. This may help:
    https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/
  - Yme August 16, 2019 at 4:55 am #
    
    Could you share with me the entire code you use? I don’t think its environment related, have tried with a fresh conda environment, and am able to reproduce the issue on 2 seperate machines.
  - Jason Brownlee August 16, 2019 at 8:04 am #
    
    The entire code listing is provided in the post, I updated it to provide it all together.
  - Yme August 16, 2019 at 7:34 pm #
    
    Managed to find the problem!!!
    
    In the code above, as well as in your book (Which I am following) we are using code that I think is written for keras1. The code carries over to keras2, apart from some warnings, but predicts poor. The reason for this is the nb_epoch parameter in the KerasClassifier class. When you leave that as is, the model predicts the same class for every training example. When you change it to “epochs” in keras2, everything is fine. I don’t know if this is Intented behavior or a bug.
  - Jason Brownlee August 17, 2019 at 5:38 am #
    
    No.
    
    The example in the post uses “epochs” for Keras 2.
    
    So does the most recent version of the book.
    
    I think you are not referring to the above tutorial and are in fact referring to a very old version of the book. You can contact me here to get the most recent version:
    https://machinelearningmastery.com/contact/
Tanvir. March 27, 2017 at 7:43 am #

Hi Jason,
Thanks for your awesome tutorials. I had a curious question:
As we are using KerasClassifier or KerasRegressor of Scikit-Learn wrapper, then how to save them as a file after fitting ?

For example, I am predicting regression or multiclass classification. I have to use KerasRegressor or KerasClassifier then. After fitting a large volume of data, I want to save the trained neural network model to use it for prediction purpose only. How to save them and how to restore them from saved files ? Your answer will help me a lot.

Reply
- Jason Brownlee March 27, 2017 at 8:00 am #
  
  Great question, I’m not sure you can easily do this. You might be better served fitting the Keras model directly then using the Keras API to save the model:
  https://machinelearningmastery.com/save-load-keras-deep-learning-models/
  
  Reply
Reinier May 4, 2017 at 2:04 am #

Hi Jason, Thank your very much for those nice explainations.
I’m having some problems and I trying very hard to get it solved but it wont work..
If I simply copy-past your code from your comment on 31-july 2016 I keep getting the following Error:

Traceback (most recent call last): File “/Users/reinier/PycharmProjects/Test-IRIS/TESTIRIS.py”, line 43, in estimator.fit(X_train, Y_train) File “/Users/reinier/Library/Python/3.6/lib/python/site-packages/keras/wrappers/scikit_learn.py”, line 206, in fit return super(KerasClassifier, self).fit(x, y, **kwargs) File “/Users/reinier/Library/Python/3.6/lib/python/site-packages/keras/wrappers/scikit_learn.py”, line 149, in fit history = self.model.fit(x, y, **fit_args) File “/Users/reinier/Library/Python/3.6/lib/python/site-packages/keras/models.py”, line 856, in fit initial_epoch=initial_epoch) File “/Users/reinier/Library/Python/3.6/lib/python/site-packages/keras/engine/training.py”, line 1429, in fit batch_size=batch_size) File “/Users/reinier/Library/Python/3.6/lib/python/site-packages/keras/engine/training.py”, line 1309, in _standardize_user_data exception_prefix=’target’) File “/Users/reinier/Library/Python/3.6/lib/python/site-packages/keras/engine/training.py”, line 139, in _standardize_input_data str(array.shape)) ValueError: Error when checking target: expected dense_2 to have shape (None, 3) but got array with shape (67, 40)

It seems like something is wrong with the fit function. Is this the cause of a new Keras version? Thanks you very much in advance,

Reinier

Reply
- Jason Brownlee May 4, 2017 at 8:09 am #
  
  Sorry, it is not clear what is going on.
  
  Does the example in the blog post work as expected?
  
  Reply
Priyesh July 12, 2017 at 3:02 am #

Hello Jason,

Thank you for such a wonderful and detailed explanation. Please can guide me on how to plot the graphs for clustering for this data set and code (both for training and predictions).

Thanks.

Reply
- Jason Brownlee July 12, 2017 at 9:50 am #
  
  Sorry, I do not have examples of clustering.
  
  Reply
Priyesh July 12, 2017 at 5:12 am #

Hi Jason,

Thank you so much for such an elegant and detailed explanation. I wanted to learn on how to plot graphs for the same. I went through the comments and you said we can’t plot accuracy but I wish to plot the graphs for input data sets and predictions to show like a cluster (as we show K-means like a scattered plot). Please can you guide me with the same.

Thank you.

Reply
- Jason Brownlee July 12, 2017 at 9:53 am #
  
  Sorry I do not have any examples for clustering.
  
  Reply
Budi January 19, 2018 at 2:58 am #

Woahh,, it’s work’s again…
it’s nice result,

btw, how, it we want make just own sentences, not use test data?

Reply
- Jason Brownlee January 19, 2018 at 6:36 am #
  
  This is called NLP, learn more here:
  https://machinelearningmastery.com/start-here/#nlp
  
  Reply
Bonobo June 24, 2018 at 12:06 am #

I think the line

model = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0)

must be

model = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)

for newer Keras versions.

Reply
- Jason Brownlee June 24, 2018 at 7:34 am #
  
  Correct.
  
  Reply
Prakhar July 10, 2018 at 5:36 am #

hello Sir,
i used the following code in keras backend, but when using categorical_crossentropy
all the rows of a columns have same predictions,but when i use binary_crossentropy the predictions are correct.Can u plz explain why?
And my predictions are also in the form of HotEncoding an and not like 2,1,0,2. Kindly help me out in this.
Thank you

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

train=pd.read_csv(‘iris_train.csv’)
test=pd.read_csv(‘iris_test.csv’)

xtrain=train.iloc[:,0:4].values
ytrain=train.iloc[:,4].values
xtest=test.iloc[:,0:4].values
ytest=test.iloc[:,4].values

import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import to_categorical

from sklearn.preprocessing import LabelEncoder,OneHotEncoder
ytrain2=ytrain.reshape(len(ytrain),1)
encoder1=LabelEncoder()
ytrain2[:,0]=encoder1.fit_transform(ytrain2[:,0])
encoder=OneHotEncoder(categorical_features=[0])
ytrain2=encoder.fit_transform(ytrain2).toarray()

classifier=Sequential()
classifier.add(Dense(output_dim=4,init=’uniform’,activation=’relu’,input_dim=4))
classifier.add(Dense(output_dim=4,init=’uniform’,activation=’relu’))
classifier.add(Dense(output_dim=3,init=’uniform’,activation=’sigmoid’))

classifier.compile(optimizer=’adam’,loss=’categorical_crossentropy’,metrics=[‘accuracy’])
classifier.fit(xtrain,ytrain2,batch_size=5,epochs=300)

y_pred=classifier.predict(xtest)

Reply
- Jason Brownlee July 10, 2018 at 6:54 am #
  
  Sorry, I do not have the capacity to debug your code. Perhaps post to stackoverflow.
  
  Reply
Shooter August 10, 2018 at 7:15 pm #

Hi Jason, this code gives the accuracy of 98%. But when i add k-fold cross validation code, accuracy decreases to 75%.

Reply
- Jason Brownlee August 11, 2018 at 6:07 am #
  
  Perhaps try tuning the model further?
  
  Reply
Titus November 9, 2020 at 6:27 am #

Hello Jason,

This code does not work form me. I am using the exact same code but I get error with estimator.fit(). The error looks like that:

—————————————————————————
TypeError Traceback (most recent call last)
in
34 estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0)
35 X_train, X_test, Y_train, Y_test = train_test_split(X, dummy_y, test_size=0.33, random_state=seed)
—> 36 estimator.fit(X_train, Y_train)
37 predictions = estimator.predict(X_test)
38 print(predictions)

Reply
- Jason Brownlee November 9, 2020 at 7:51 am #
  
  I can confirm that the code works with the latest version of scikit-learn, tensorflow and keras.
  
  Perhaps some of these tips will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - Titus November 9, 2020 at 6:33 pm #
    
    Thanks Jason,
    
    I have resolved the issue. I don’t know why but the problem is from the model.add() function.
    
    model.add(Dense(3, init=’normal’, activation=’sigmoid’))
    
    If I remove the argument init = ‘normal’ from model.add() I get the correct result but if I add it then I get error with the estimator.fit() function. I don’t know what the reason maybe but simply removing init = ‘normal’ from model.add() resolves the error.
    
    Thanks.
  - Jason Brownlee November 10, 2020 at 6:39 am #
    
    Nice work!

Prash August 14, 2016 at 9:15 pm #

Jason, boss you are too good! You have really helped me out especially in implementation of Deep learning part. I was rattled and lost and was desperately looking for some technology and came across your blogs. thanks a lot.

Reply
- Jason Brownlee August 15, 2016 at 12:38 pm #
  
  I’m glad I have helped in some small way Prash.
  
  Reply
Harsha August 18, 2016 at 7:03 pm #

It is a great tutorial Dr. Jason. Very clear and crispy. I am a beginner in Keras. I have a small doubt.

Is it necessary to use scikit-learn. Can we solve the same problem using basic keras?

Reply
- Jason Brownlee August 19, 2016 at 5:25 am #
  
  You can use basic Keras, but scikit-learn make Keras better. They work very well together.
  
  Reply
  - Harsha August 19, 2016 at 11:06 pm #
    
    Thank You Jason for your prompt reply
    
    Reply
    - Jason Brownlee August 20, 2016 at 6:05 am #
      
      You’re welcome Harsha.
      
      Reply
  - jokla January 12, 2017 at 7:30 am #
    
    Hi Jason, nice tutorial!
    
    I have a question. You mentioned that scikit-learn make Keras better, why?
    
    Thanks!
    
    Reply
    - Jason Brownlee January 12, 2017 at 9:40 am #
      
      Hi jokla, great question.
      
      The reason is that we can access all of sklearn’s features using the Keras Wrapper classes. Tools like grid searching, cross validation, ensembles, and more.
      
      Reply
moeyzf August 21, 2016 at 10:17 am #

Hi Jason,

I’m a CS student currently studying sentiment analysis and was wondering how to use keras for multi classification of text, ideally I would like the functionality of the TFidvectoriser from sklearn so a one hot vector representation against a given vocabulary is used, within a neural net to determine the final classification.

I am having trouble understanding the initial steps in transforming and feeding word data into vector representations. Can you help me out with some basic code examples of this first step in the sense that say I have a text file with 5000 words for example, which also include emoji (to use as the vocabulary), how can I feed in a training file in csv format text,sentiment and convert each text into a one hot representation then feed it into the neural net, for a final output vector of size e.g 1×7 to denote the various class labels.

I have tried to find help online and most of the solutions use helper methods to load in text data such as imdb, while others use word2vec which isnt what i need.

Hope you can help, I would really appreciate it!

Cheers,

Mo

Reply
Qichang September 12, 2016 at 3:01 pm #

Hi Jason,

Thanks for the great tutorial!

Just one question regarding the output variable encoding. You mentioned that it is a good practice to convert the output variable to one hot encoding matrix. Is this a necessary step? If the output varible consists of discrete integters, say 1, 2, 3, do we still need to to_categorical() to perform one hot encoding?

I check some example codes in keras github, it seems this is required. Can you please kindly shed some lights on it?

Thanks in advance.

Reply
- Jason Brownlee September 13, 2016 at 8:09 am #
  
  Hi Qichang, great question.
  
  A one hot encoding is not required, you can train the network to predict an integer, it is just a MUCH harder problem.
  
  By using a one hot encoding, you greatly simplify the prediction problem making it easier to train for and achieve better performance.
  
  Try it and compare the results.
  
  Reply
Pedro A. Castillo September 16, 2016 at 12:31 am #

Hello,
I have followed your tutorial and I get an error in the following line:

results = cross_val_score(estimator, X, dummy_y, cv=kfold)

Traceback (most recent call last):
File “k.py”, line 84, in
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
File “/Library/Python/2.7/site-packages/scikit_learn-0.17.1-py2.7-macosx-10.9-intel.egg/sklearn/cross_validation.py”, line 1433, in cross_val_score
for train, test in cv)
File “/Library/Python/2.7/site-packages/scikit_learn-0.17.1-py2.7-macosx-10.9-intel.egg/sklearn/externals/joblib/parallel.py”, line 800, in __call__
while self.dispatch_one_batch(iterator):
File “/Library/Python/2.7/site-packages/scikit_learn-0.17.1-py2.7-macosx-10.9-intel.egg/sklearn/externals/joblib/parallel.py”, line 658, in dispatch_one_batch
self._dispatch(tasks)
File “/Library/Python/2.7/site-packages/scikit_learn-0.17.1-py2.7-macosx-10.9-intel.egg/sklearn/externals/joblib/parallel.py”, line 566, in _dispatch
job = ImmediateComputeBatch(batch)
File “/Library/Python/2.7/site-packages/scikit_learn-0.17.1-py2.7-macosx-10.9-intel.egg/sklearn/externals/joblib/parallel.py”, line 180, in __init__
self.results = batch()
File “/Library/Python/2.7/site-packages/scikit_learn-0.17.1-py2.7-macosx-10.9-intel.egg/sklearn/externals/joblib/parallel.py”, line 72, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File “/Library/Python/2.7/site-packages/scikit_learn-0.17.1-py2.7-macosx-10.9-intel.egg/sklearn/cross_validation.py”, line 1531, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File “/Library/Python/2.7/site-packages/keras/wrappers/scikit_learn.py”, line 135, in fit
**self.filter_sk_params(self.build_fn.__call__))
TypeError: __call__() takes at least 2 arguments (1 given)

Do you have received this error before? do you have an idea how to fix that?

Reply
- Jason Brownlee September 16, 2016 at 9:07 am #
  
  I have not seen this before Pedro.
  
  Perhaps it is something simple like a copy-paste error from the tutorial?
  
  Are you able to double check the code matches the tutorial exactly?
  
  Reply
  - Victor October 8, 2016 at 10:15 pm #
    
    I have exactly the same problem.
    Double checked the code,
    have all the versions of keras etc, updated.
    🙁
    
    Reply
    - Jason Brownlee October 9, 2016 at 6:50 am #
      
      Hi Victor, are you able to share your version of Keras, scikit-learn, TensorFlow/Theano?
      
      Reply
Yunita September 25, 2016 at 12:17 am #

Hi Jason,

Thanks for the great tutorial.
But I have a question, why did you use sigmoid activation function together with categorical_crossentropy loss function?
Usually, for multiclass classification problem, I found implementations always using softmax activation function with categorical_cross entropy.
In addition, does one-hot encoding in the output make it as binary classification instead of multiclass classification? Could you please give some explanations on it?

Reply
- Jason Brownlee September 25, 2016 at 8:04 am #
  
  Yes, you could use a softmax instead of sigmoid. Try it and see.
  
  The one hot encoding creates 3 binary output features. This too would be required with the softmax activation function.
  
  Reply
  - Preston September 12, 2017 at 11:14 pm #
    
    Jason,
    
    Great site, great resource. Is it possible to see the old example with the one hot encoding output? I’m interested in creating a network with multiple binary outputs and have been searching around for an example.
    
    Many thanks.
    
    Reply
    - Jason Brownlee September 13, 2017 at 12:31 pm #
      
      I have many examples on the blog of categorical outputs from LSTMs, try the search.
      
      Reply
      - Preston September 14, 2017 at 5:40 am #
        
        Thank you.
Marcus September 26, 2016 at 6:49 am #

For Text classification or to basically assign them a category based on the text. How would the baseline_model change????

I’m trying to have an inner layer of 24 nodes and an output of 17 categories but the input_dim=4 as specified in the tutorial wouldn’t be right cause the text length will change depending on the number of words.

I’m a little confused. Your help would be much appreciated.

model.add(Dense(24, init=’normal’, activation=’relu’))

def baseline_model():
# create model
model = Sequential()
model.add(Dense(24, init=’normal’, activation=’relu’))
model.add(Dense(17, init=’normal’, activation=’sigmoid’))
# Compile model
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
return model

Reply
- Jason Brownlee September 26, 2016 at 7:01 am #
  
  You will need to use padding on the input vectors of encoded words.
  
  See this post for an example of working with text:
  https://machinelearningmastery.com/predict-sentiment-movie-reviews-using-deep-learning/
  
  Reply
Vishnu October 19, 2016 at 9:07 pm #

Hi Jason,

Thank you for your tutorial. I was really interested in Deep Learning and was looking for a place to start, this helped a lot.

But while I was running the code, I came across two errors. The first one was, that while loading the data through pandas, just like your code i set “header= None” but in the next line when we convert the value to float i got the following error message.

“ValueError: could not convert string to float: ‘Petal.Length'”.

This problem went away after I took the header=None condition off.

The second one came at the end, during the Kfold validation. during the one hot encoding it’s binning the values into 22 categories and not 3. which is causing this error:

“Exception: Error when checking model target: expected dense_2 to have shape (None, 3) but got array with shape (135, 22)”

I haven’t been able to get around this. Any suggestion would be appreciated.

Reply
- Jason Brownlee October 20, 2016 at 8:36 am #
  
  That is quite strange Vishnu, I think perhaps you have the wrong dataset.
  
  You can download the CSV here:
  http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data
  
  Reply
Homagni Saha October 20, 2016 at 10:39 am #

Hello, I tried to use the exact same code for another dataset , the only difference being the dataset had 78 columns and 100000 rows . I had to predict the last column taking the remaining 77 columns as features . I must also say that the last column has 23 different classes.(types basically) and the 23 different classes are all integers not strings like you have used.

model = Sequential()
model.add(Dense(77, input_dim=77, init=’normal’, activation=’relu’))
model.add(Dense(10, init=’normal’, activation=’relu’))
model.add(Dense(23, init=’normal’, activation=’sigmoid’))

also I used nb_epoch=20 and batch_size=1000

also in estimator I changed the verbose to 1, and now the accuracy is a dismal of 0.52% at the end. Also while running I saw strange outputs in the verbose as :

93807/93807 [==============================] – 0s – loss: nan – acc: 0.0052

why is the loss always as loss: nan ??

Can you please tell me how to modify the code to make it run correctly for my dataset?(remaining everything in the code is unchanged)

Reply
Jason Brownlee October 21, 2016 at 8:30 am #

Hi Homagni,

That is a lot of classes for 100K records. If you can reduce that by splitting up the problem, that might be good.

Your batch size is probably too big and your number of epochs is way too small. Dramatically increase the number of epochs bu 2-3 orders of magnitude.

Start there and let me know how you go.

Reply
AbuZekry October 30, 2016 at 12:02 am #

Hi Jason,

I’ve edited the first layer’s activation to ‘softplus’ instead of ‘relu’ and number of neurons to 8 instead of 4
Then I edited the second layer’s activation to ‘softmax’ instead of sigmoid and I got 97.33% (4.42%) performance. Do you have an explanation to this enhancement in performance ?

Reply
- Jason Brownlee October 30, 2016 at 8:55 am #
  
  Well done AbuZekry.
  
  Neural nets are infinitely configurable.
  
  Reply
Panand November 7, 2016 at 3:58 am #

Hello Jason,

Is there a error in your code? You said the network has 4 input neurons , 4 hidden neurons and 3 output neurons.But in the code you haven’t added the hidden neurons.You just specified only the input and output neurons… Will it effect the output in anyway?

Reply
- Jason Brownlee November 7, 2016 at 7:18 am #
  
  Hi Panand,
  
  The network structure is as follows:
  
  4 inputs -> [4 hidden nodes] -> 3 outputs
  
  1
  
  4 inputs -> [4 hidden nodes] -> 3 outputs
  
  Line 5 of the code in section 6 adds both the input and hidden layer:
  
  model.add(Dense(4, input_dim=4, init='normal', activation='relu'))
  
  1
  
  model.add(Dense(4, input_dim=4, init='normal', activation='relu'))
  
  The input_dim argument defines the shape of the input.
  
  Reply
JD November 13, 2016 at 5:28 pm #

Hi Jason,
I have a set of categorical features and continuous features, I have this model:
model = Sequential()
model.add(Dense(117, input_dim=117, init=’normal’, activation=’relu’))
model.add(Dense(10, activation=’softmax’))

I am getting a dismal : (‘Test accuracy:’, 0.43541752685249119) :
Details:
Total records 45k, 10 classes to predict
batch_size=1000, nb_epoch=25

Any improvements also I would like to put LSTM how to go about doing that as I am getting errors if I add
model.add(Dense(117, input_dim=117, init=’normal’, activation=’relu’))
model.add(LSTM(117,dropout_W=0.2, dropout_U=0.2, return_sequences=True))
model.add(Dense(10, activation=’softmax’))
Error:
Exception: Input 0 is incompatible with layer lstm_6: expected ndim=3, found ndim=2

Reply
- Jason Brownlee November 14, 2016 at 7:41 am #
  
  Hi JD,
  
  Here is a long list of ideas to improve the skill of your deep learning model:
  https://machinelearningmastery.com/improve-deep-learning-performance/
  
  Not sure about the exception, you may need to double check the input dimensions of your data and confirm that your model definition matches.
  
  Reply
YA November 17, 2016 at 7:00 pm #

Hi Jason,

I have a set of categorical features(events) from a real system, and i am trying to build a deep learning model for event prediction.
The event’s are not appears equally in the training set and one of them is relatively rare compared to the others.
event count in training set
1 22000
2 6000
3 13000
4 12000
5 26000

Should i continue with this training set? or should i restructure the training set?
What is your recommendation?

Reply
- Jason Brownlee November 18, 2016 at 8:20 am #
  
  Hi YA, I would try as many different “views” on your problem as you can think of and see which best exposes the problem to the learning algorithms (gets the best performance when everything else is held constant).
  
  Reply
Tom December 9, 2016 at 12:13 am #

Hello Jason,
Great work on your website and tuturials! I was wondering if you could show a multi hot encoding, I think you can call it al multi label classification.
Now you have (only one option on and the rest off)
[1,0,0]
[0,1,0]
[0,0,1]

And do like (each classification has the option on or off)
[0,0,0]
[0,1,1]
[1,0,1]
[1,1,0]
[1,1,1]
etc..

This would really help for me
Thanks!!

Reply
- Tom December 9, 2016 at 1:07 am #
  
  Extra side note, with k-Fold Cross Validation. I got it working with binary_crossentropy with quite bad results. Therefore I wanted to optimize the model and add cross validation which unfortunately didn’t work.
  
  Reply
Martin December 26, 2016 at 6:02 pm #

Hi, Jason: Regarding this, I have 2 questions:
1) You said this is a “simple one-layer neural network”. However, I feel it’s still 3-layer network: input layer, hidden layer and output layer.

4 inputs -> [4 hidden nodes] -> 3 outputs

2) However, in your model definition:
model.add(Dense(4, input_dim=4, init=’normal’, activation=’relu’))
model.add(Dense(3, init=’normal’, activation=’sigmoid’))

Seems that only two layers, input and output, there is no hidden layer. So this is actually a 2-layer network. Is this right?

Reply
- Jason Brownlee December 27, 2016 at 5:24 am #
  
  Hi Martin, yes. One hidden layer. I take the input and output layers as assumed, the work happens in the hidden layer.
  
  The first line defines the number of inputs (input_dim=4) AND the number of nodes in the hidden layer:
  
  model.add(Dense(4, input_dim=4, init=’normal’, activation=’relu’))
  
  1
  
  model.add(Dense(4, input_dim=4, init=’normal’, activation=’relu’))
  
  I hope that helps.
  
  Reply
Seun January 16, 2017 at 3:58 pm #

Hi, Jason: I ran this same code but got this error:

Traceback (most recent call last):

File “”, line 1, in
runfile(‘C:/Users/USER/Documents/keras-master/examples/iris_val.py’, wdir=’C:/Users/USER/Documents/keras-master/examples’)

File “C:\Users\USER\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 866, in runfile
execfile(filename, namespace)

File “C:\Users\USER\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 87, in execfile
exec(compile(scripttext, filename, ‘exec’), glob, loc)

File “C:/Users/USER/Documents/keras-master/examples/iris_val.py”, line 46, in
results = cross_val_score(estimator, X, dummy_y, cv=kfold)

File “C:\Users\USER\Anaconda2\lib\site-packages\sklearn\model_selection\_validation.py”, line 140, in cross_val_score
for train, test in cv_iter)

File “C:\Users\USER\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py”, line 758, in __call__
while self.dispatch_one_batch(iterator):

File “C:\Users\USER\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py”, line 603, in dispatch_one_batch
tasks = BatchedCalls(itertools.islice(iterator, batch_size))

File “C:\Users\USER\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py”, line 127, in __init__
self.items = list(iterator_slice)

File “C:\Users\USER\Anaconda2\lib\site-packages\sklearn\model_selection\_validation.py”, line 140, in
for train, test in cv_iter)

File “C:\Users\USER\Anaconda2\lib\site-packages\sklearn\base.py”, line 67, in clone
new_object_params = estimator.get_params(deep=False)

TypeError: get_params() got an unexpected keyword argument ‘deep’

Please, I need your help on how to resolve this.

Reply
- Jason Brownlee January 17, 2017 at 7:35 am #
  
  Hi Seun, it is not clear what is going on here.
  
  You may have added an additional line or whitespace or perhaps your environment has a problem?
  
  Reply
- David January 25, 2017 at 3:07 am #
  
  Hello Seun, perhaps this could help you: http://stackoverflow.com/questions/41796618/python-keras-cross-val-score-error/41832675#41832675
  
  Reply
  - Jason Brownlee January 25, 2017 at 10:58 am #
    
    I have reproduced the fault and understand the cause.
    
    The error is caused by a bug in Keras 1.2.1 and I have two candidate fixes for the issue.
    
    I have written up the problem and fixes here:
    http://stackoverflow.com/a/41841066/78453
    
    Reply
shazz January 25, 2017 at 7:36 am #

I have the same issue….
File “/usr/local/lib/python3.5/dist-packages/sklearn/base.py”, line 67, in clone
new_object_params = estimator.get_params(deep=False)
TypeError: get_params() got an unexpected keyword argument ‘deep’

Looks to be an old issue fixed last year so I don’t understand which lib is in the wrong version…
https://github.com/fchollet/keras/issues/1385

Reply
- Jason Brownlee January 25, 2017 at 10:58 am #
  
  Hi shazz,
  
  I have reproduced the fault and understand the cause.
  
  The error is caused by a bug in Keras 1.2.1 and I have two candidate fixes for the issue.
  
  I have written up the problem and fixes here:
  http://stackoverflow.com/a/41841066/78453
  
  Reply
Seun January 25, 2017 at 10:13 pm #

Hi Jasson,
Thanks so much. The second fix worked for me.

Reply
- Jason Brownlee January 26, 2017 at 4:45 am #
  
  Glad to hear it Seun.
  
  Reply
Sulthan January 31, 2017 at 3:08 am #

Dear Jason,

With the help of your example i am trying to use the same for handwritten digits pixel data to classify the no input is 5000rows with example 20*20 pixels so totally x matrix is (5000,400) and Y is (5000,1), i am not able to successfully run the model getting error as below in the end of the code.

#importing the needed libraries
import scipy.io
import numpy
from sklearn.preprocessing import LabelEncoder
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline

In [158]:

#Intializing random no for reproductiblity
seed = 7
numpy.random.seed(seed)

In [159]:

#loading the dataset from mat file
mat = scipy.io.loadmat(‘C:\\Users\\Sulthan\\Desktop\\NeuralNet\\ex3data1.mat’)
print(mat)

{‘X’: array([[ 0., 0., 0., …, 0., 0., 0.],
[ 0., 0., 0., …, 0., 0., 0.],
[ 0., 0., 0., …, 0., 0., 0.],
…,
[ 0., 0., 0., …, 0., 0., 0.],
[ 0., 0., 0., …, 0., 0., 0.],
[ 0., 0., 0., …, 0., 0., 0.]]), ‘__header__’: b’MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Sun Oct 16 13:09:09 2011′, ‘__version__’: ‘1.0’, ‘y’: array([[10],
[10],
[10],
…,
[ 9],
[ 9],
[ 9]], dtype=uint8), ‘__globals__’: []}

Type Markdown and LaTeX:
α
2
α2
In [ ]:

In [ ]:

In [160]:

#Splitting of X and Y of DATA
X_train = mat[‘X’]

In [161]:

X_train

Out[161]:
array([[ 0., 0., 0., …, 0., 0., 0.],
[ 0., 0., 0., …, 0., 0., 0.],
[ 0., 0., 0., …, 0., 0., 0.],
…,
[ 0., 0., 0., …, 0., 0., 0.],
[ 0., 0., 0., …, 0., 0., 0.],
[ 0., 0., 0., …, 0., 0., 0.]])
In [162]:

Y_train = mat[‘y’]

In [163]:

Y_train

Out[163]:
array([[10],
[10],
[10],
…,
[ 9],
[ 9],
[ 9]], dtype=uint8)
In [164]:

X_train.shape

Out[164]:
(5000, 400)
In [165]:

Y_train.shape

Out[165]:
(5000, 1)
In [166]:

data_trainX = X_train[2500:,0:400]

In [167]:

data_trainX

Out[167]:
array([[ 0., 0., 0., …, 0., 0., 0.],
[ 0., 0., 0., …, 0., 0., 0.],
[ 0., 0., 0., …, 0., 0., 0.],
…,
[ 0., 0., 0., …, 0., 0., 0.],
[ 0., 0., 0., …, 0., 0., 0.],
[ 0., 0., 0., …, 0., 0., 0.]])
In [168]:

data_trainX.shape

Out[168]:
(2500, 400)
In [256]:

data_trainY = Y_train[:2500,:].reshape(-1)

In [257]:

data_trainY
data_trainY.shape

Out[257]:
(2500,)
In [284]:

#enocode class values as integers
encoder = LabelEncoder()
encoder.fit(data_trainY)
encoded_Y = encoder.transform(data_trainY)
# convert integers to dummy variables
dummy_Y= np_utils.to_categorical(encoded_Y)

In [285]:

dummy_Y

Out[285]:
array([[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 1.],
…,
[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 1., 0.]])
In [298]:

newy = dummy_Y.reshape(-1,1)

In [300]:

newy

Out[300]:
array([[ 0.],
[ 0.],
[ 0.],
…,
[ 0.],
[ 1.],
[ 0.]])
In [293]:

#define baseline model
def baseline_model():
#create model
model = Sequential()
model.add(Dense(15,input_dim=400,init=’normal’,activation=’relu’))
model.add(Dense(10,init=’normal’,activation=’sigmoid’))
#compilemodel
model.compile(loss=’categorical_crossentropy’,optimizer=’adam’,metrics=[‘accuracy’])
return model

estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200,batch_size=5,verbose=0)
print(estimator)

In [295]:

kfold = KFold(n_splits=10, shuffle=True, random_state=seed)

results = cross_val_score(estimator, data_trainX, newy, cv=kfold)
print(“Baseline: %.2f%% (%.2f%%)” % (results.mean()*100, results.std()*100))

—————————————————————————
ValueError Traceback (most recent call last)
in ()
—-> 1 results = cross_val_score(estimator, data_trainX, newy, cv=kfold)
2 print(“Baseline: %.2f%% (%.2f%%)” % (results.mean()*100, results.std()*100))

C:\Users\Sulthan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
126
127 “””
–> 128 X, y, groups = indexable(X, y, groups)
129
130 cv = check_cv(cv, y, classifier=is_classifier(estimator))

C:\Users\Sulthan\Anaconda3\lib\site-packages\sklearn\utils\validation.py in indexable(*iterables)
204 else:
205 result.append(np.array(X))
–> 206 check_consistent_length(*result)
207 return result
208

C:\Users\Sulthan\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
179 if len(uniques) > 1:
180 raise ValueError(“Found input variables with inconsistent numbers of”
–> 181 ” samples: %r” % [int(l) for l in lengths])
182
183

ValueError: Found input variables with inconsistent numbers of samples: [2500, 12500]

Reply
- Jason Brownlee February 1, 2017 at 10:26 am #
  
  Hi Sulthan, the trace is a little hard to read.
  
  Sorry, I have no off the cuff ideas.
  
  Perhaps try cutting your example back to the minimum to help isolate the fault?
  
  Reply
Linmu February 3, 2017 at 2:13 am #

Hi Jason,

Thanks for your tutorial!

Just one question regarding the output. In this problem, we got three classes (setosa, versicolor and virginica), and since each data instance should be classified into only one category, the problem is more specifically “single-lable, multi-class classification”. What if each data instance belonged to multiple categories. Then we are facing “multi-lable, multi-class classification”. In our case, each flower belongs to at least two species (Let’s just forget the biology 🙂 ).

My solution is to modify the output variable (Y) with mutiple ‘1’ in it, i.e. [1 1 0], [0 1 1], [1 1 1 ]……. This is definitely not one-hot encoding any more (maybe two or three-hot?)

Will my method work out? If not, how do you think the problem of “multi-lable, multi-class classification” should be solved?

Thanks in advance

Reply
- Jason Brownlee February 3, 2017 at 10:07 am #
  
  Your method sounds very reasonable.
  
  You may also want to use sigmoid activation functions on the output layer to allow binary class membership to each available class.
  
  Reply
solarenqu February 19, 2017 at 9:28 pm #

Hello, how can I use the model to create predictions?

if i try this: print(‘predict: ‘,estimator.predict([[5.7,4.4,1.5,0.4]])) i got this exception:

AttributeError: ‘KerasClassifier’ object has no attribute ‘model’
Exception ignored in: <bound method BaseSession.__del__ of >
Traceback (most recent call last):
File “/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 581, in __del__
AttributeError: ‘NoneType’ object has no attribute ‘TF_DeleteStatus’

Reply
- Jason Brownlee February 20, 2017 at 9:29 am #
  
  I have not seen this error before.
  
  What versions of Keras/TF/sklearn/Python are you using?
  
  Reply
Suvam March 1, 2017 at 7:34 am #

Hi,
Thanks for the great tutorial.
It would be great if you could outline what changes would be necessary if I want to do a multi-class classification with text data: the training data assigns scores to different lines of text, and the problem is to infer the score for a new line of text. It seems that the estimator above cannot handle strings. What would be the fix for this?

Thanks in advance for the help.

Reply
- Jason Brownlee March 1, 2017 at 8:47 am #
  
  Consider encoding your words as integers, using a word embedding and a fixed sequence length.
  
  See this tutorial:
  https://machinelearningmastery.com/predict-sentiment-movie-reviews-using-deep-learning/
  
  Reply
Sweta March 1, 2017 at 9:10 pm #

This was a great tutorial to enhance the skills in deep learning. My question: is it possible to use this same dataset for LSTM? Can you please help with this how to solve in LSTM?

Reply
- Jason Brownlee March 2, 2017 at 8:15 am #
  
  Hi Sweta,
  
  You could use an LSTM, but it would not be appropriate because LSTMs are intended for sequence prediction problems and this is not a sequence prediction problem.
  
  Reply
Akash March 22, 2017 at 5:47 pm #

Hi Jason,

I have this problem where I have 1500 features as input to my DNN and 2 output classes, can you explain how do I decide the size of neurons in my hidden layer and how many hidden layers I need to process such high features with accuracy.

Reply
- Jason Brownlee March 23, 2017 at 8:47 am #
  
  Lots of trial and error.
  
  Start with a small network and keep adding neurons and layers and epochs until no more benefit is seen.
  
  Reply
Ananya Mohapatra March 24, 2017 at 9:39 pm #

sir, the following code is showing an error message.. could you help me figure it out. i am trying to do a multi class classification with 5 datasets combined in one( 4 non epileptic patients and 1 epileptic) …500 x 25 dataset and the 26th column is the class.

# Train model and make predictions
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import KFold

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataframe = pandas.read_csv(“DemoNSO.csv”, header=None)
dataset = dataframe.values
X = dataset[:,0:25].astype(float)
Y = dataset[:,25]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
# define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(Dense(700, input_dim=25, init=’normal’, activation=’relu’))
model.add(Dense(2, init=’normal’, activation=’sigmoid’))

# Compile model
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
return model

estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=50, batch_size=20)

kfold = KFold(n_splits=5, shuffle=True, random_state=seed)

results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print(“Baseline: %.2f%% (%.2f%%)” % (results.mean()*100, results.std()*100))

X_train, X_test, Y_train, Y_test = train_test_split(X, dummy_y, test_size=0.55, random_state=seed)
estimator.fit(X_train, Y_train)
predictions = estimator.predict(X_test)

print(predictions)
print(encoder.inverse_transform(predictions))

error message:
str(array.shape))
ValueError: Error when checking model target: expected dense_56 to have shape (None, 2) but got array with shape (240, 3)

Reply
- Jason Brownlee March 25, 2017 at 7:36 am #
  
  Confirm the size of your output (y) matches the dimension of your output layer.
  
  Reply
Alican March 28, 2017 at 4:05 am #

Hello Jason,

I got your model to work using Python 2.7.13, Keras 2.0.2, Theano 0.9.0.dev…, by copying the codes exactly, however the results that I get are not only very bad (59.33%, 48.67%, 38.00% on different trials), but they are also different.

I was under the impression that using a fixed seed would allow us to reproduce the same results.

Do you have any idea what could have caused such bad results?

Thanks

Reply
- Alican March 28, 2017 at 4:28 am #
  
  edit: I was re-executing only the results=cross_val_score(…) line to get different results I listed above.
  
  Running the whole script over and over generates the same result: “Baseline: 59.33% (21.59%)”
  
  Reply
  - Jason Brownlee March 28, 2017 at 8:26 am #
    
    Glad to hear it.
    
    Reply
- Jason Brownlee March 28, 2017 at 8:25 am #
  
  Not sure why the results are so bad. I’ll take a look.
  
  The fixed seed does not seem to have an effect on the Theano or TensorFlow backends. Try running examples multiple times and take the average performance.
  
  Reply
  - Alican April 2, 2017 at 2:30 am #
    
    Did you have time to look into this?
    
    I had my colleague run this script on Theano 1.0.1, and it gave the expected performance of 95.33%. I then installed Theano 1.0.1, and got the same result again.
    
    However, using Theano 2.0.2 I was getting 59.33% with seed=7, and similar performances with different seeds. Is it possible the developers made some crucial changes with the new version?
    
    Reply
    - Jason Brownlee April 2, 2017 at 6:30 am #
      
      The most recent version of Theano is 0.9:
      https://github.com/Theano/Theano/releases
      
      Do you mean Keras versions?
      
      It may not be the Keras version causing the difference in the run. The fixed random seed may not be having an effect in general, or may not be having when a Theano backend is being used.
      
      Neural networks are stochastic algorithms and will produce a different result each run:
      https://machinelearningmastery.com/randomness-in-machine-learning/
      
      Reply
      - Alican April 2, 2017 at 6:59 am #
        
        Yes I meant Keras, sorry.
        
        There is no issue with the seed, I’m getting the same result with you on multiple computers using Keras 1.1.1. But with Keras 2.0.2, the results are absymally bad.
      - Jonathan July 11, 2017 at 4:28 am #
        
        not sure if this was every resolved, but I’m getting the same thing with most recent versions of Theano and Keras
        
        59.33% with seed=7
      - Jason Brownlee July 11, 2017 at 10:33 am #
        
        Try running the example a few times with different seeds.
        
        Neural networks are stochastic:
        https://machinelearningmastery.com/randomness-in-machine-learning/
Nalini March 29, 2017 at 3:13 am #

Hi Jason

in this code for multiclass classification can u suggest me how to plot graph to display the accuracy and also what should be the axis represent

Reply
- Jason Brownlee March 29, 2017 at 9:10 am #
  
  No, we normally do not graph accuracy, unless you want to graph it over training epochs?
  
  Reply
  - Sebastian September 4, 2019 at 3:18 am #
    
    Hi Jason,
    
    First of all, I’d like to thank you for your blog. I’m currently trying to build a multiclass classifier just as the one you have explained above. I was wondering: how could I plot the history of loss and accuracy for training and validation per epoch as it is done using the historry=model.fit()?.
    
    Many thanks in advance for your help.
    
    Reply
    - Jason Brownlee September 4, 2019 at 6:03 am #
      
      Yes, see this post:
      https://machinelearningmastery.com/display-deep-learning-model-training-history-in-keras/
      
      Reply
Nalini March 31, 2017 at 1:42 am #

thanks

Reply
- Jason Brownlee March 31, 2017 at 5:55 am #
  
  You’re welcome.
  
  Reply
Frank April 6, 2017 at 8:47 pm #

Dear Jason,
I have found this tutorial very interesting and helpful.
What I wanted to ask is, I am currently trying to classify poker hands as this kaggle competition: https://www.kaggle.com/c/poker-rule-induction (For a school project) I wish to create a neural network as you have created above. What do you suggest for me to start this?
Your help would be greatly appreciated!
Thanks.

Reply
- Jason Brownlee April 9, 2017 at 2:39 pm #
  
  This process will help you work through your modeling problem:
  https://machinelearningmastery.com/start-here/#process
  
  Reply
shiva April 8, 2017 at 12:28 pm #

Hi Jason,
Its an awesome tutorial. It would be great if you can come up with a blog post on multiclass medical image classification with Keras Deep Learning library. It would serve as a great asset for researchers like me, working with medical image classification. Looking forward.

Reply
- Jason Brownlee April 9, 2017 at 2:56 pm #
  
  Thanks for the suggestion.
  
  Reply
Toby April 9, 2017 at 4:38 am #

Thanks for the great tutorial!
I duplicated the result using Theano as backend.
However, using Tensorflow yield a worse accuracy, 88.67%.
Any explanation?
Thanks!

Reply
- Jason Brownlee April 9, 2017 at 3:00 pm #
  
  It may be related to the stochastic nature of neural nets and the difficulty of making results with the TF backend reproducible.
  
  You can learn more about the stochastic nature of machine learning algorithms here:
  https://machinelearningmastery.com/randomness-in-machine-learning/
  
  Reply
Anupam April 11, 2017 at 6:11 pm #

Hi Jason, How to find the Precision, Recall and f1 score of your example?

Case-1 I have used like :

model.compile(loss=’categorical_crossentropy’, optimizer=’Nadam’, metrics=[‘acc’, ‘fmeasure’, ‘precision’, ‘recall’])

Case-2 and also used :

def score(yh, pr):
coords = [np.where(yhh > 0)[0][0] for yhh in yh]
yh = [yhh[co:] for yhh, co in zip(yh, coords)]
ypr = [prr[co:] for prr, co in zip(pr, coords)]
fyh = [c for row in yh for c in row]
fpr = [c for row in ypr for c in row]
return fyh, fpr

pr = model.predict_classes(X_train)
yh = y_train.argmax(2)
fyh, fpr = score(yh, pr)
print ‘Training accuracy:’, accuracy_score(fyh, fpr)
print ‘Training confusion matrix:’
print confusion_matrix(fyh, fpr)
precision_recall_fscore_support(fyh, fpr)

pr = model.predict_classes(X_test)
yh = y_test.argmax(2)
fyh, fpr = score(yh, pr)
print ‘Testing accuracy:’, accuracy_score(fyh, fpr)
print ‘Testing confusion matrix:’
print confusion_matrix(fyh, fpr)
precision_recall_fscore_support(fyh, fpr)

What I have observed is that, accuracy of case-1 and case-2 are different?

Any solution?

Reply
- Jason Brownlee April 12, 2017 at 7:52 am #
  
  You can make predictions on your test data and use the tools from sklearn:
  http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics
  
  Reply
Raynier van Egmond April 15, 2017 at 12:19 pm #

Hi Jason,

Like a student earlier in the comments my accuracy results are exactly the same as his:

********** Baseline: 88.67% (21.09%)

and I think this is related to having Tensorflow as the backend rather than the Theano backend.

I am working this through in a Jupyter notebook

I went through your earlier tutorials on setting up the environment:

scipy: 0.18.1
numpy: 1.11.3
matplotlib: 2.0.0
pandas: 0.19.2
statsmodels: 0.6.1
sklearn: 0.18.1
theano: 0.9.0.dev-c697eeab84e5b8a74908da654b66ec9eca4f1291
tensorflow: 1.0.1
Using TensorFlow backend.
keras: 2.0.3

The Tensorflow is a Python3.6 recompile picked up from the web at:

http://www.lfd.uci.edu/~gohlke/pythonlibs/#tensorflow

Do you know have I can force the Keras library to take Theano as a backend rather than the Tensorflow library?

Thanks for the great work on your tutorials… for beginners it is such in invaluable thing to have tutorials that actually work !!!

Looking forward to get more of your books

Rene

Reply
- Raynier van Egmond April 15, 2017 at 12:42 pm #
  
  Changing to the Theano backend doesn’t change the results:
  
  Managed to change to a Theano backend by setting the Keras config file:
  {
  “image_data_format”: “channels_last”,
  “epsilon”: 1e-07,
  “floatx”: “float32”,
  “backend”: “theano”
  }
  
  as instructed at: https://keras.io/backend/#keras-backends
  
  The notebook no longer reports it is using Tensorflow so I guess the switch worked but the results are still:
  
  ****** Baseline: 88.67% (21.09%)
  
  Will need to look a little deeper and play with the actual architecture a bit.
  
  All the same great material to get started with
  
  Thanks again
  
  Rene
  
  Reply
  - Raynier van Egmond April 15, 2017 at 1:26 pm #
    
    Confirmed that changes to the model as someone above mentioned
    
    model.add(Dense(8, input_dim=4, kernel_initializer=’normal’, activation=’relu’))
    model.add(Dense(3, kernel_initializer=’normal’, activation=’softmax’))
    
    nodes makes a substantial difference:
    
    **** Baseline: 96.67% (4.47%)
    
    but there is no difference between the Tensorflow and Theano backend results. I guess that’s as far as I can take this for now.
    
    Take care,
    
    Rene
    
    Reply
    - Jason Brownlee April 16, 2017 at 9:27 am #
      
      Nice.
      
      Also, note that MLPs are stochastic. This means that if you don’t fix the random seed, you will get different results for each run of the algorithm.
      
      Ideally, you should take the average performance of the algorithm across multiple runs to evaluate its performance.
      
      See this post:
      https://machinelearningmastery.com/randomness-in-machine-learning/
      
      Reply
- Jason Brownlee April 16, 2017 at 9:22 am #
  
  You can change the back-end used by Keras in the Kersas config file. See this post:
  https://machinelearningmastery.com/introduction-python-deep-learning-library-keras/
  
  Reply
Tursun April 16, 2017 at 9:18 pm #

Jason,
Thank you very much first. These tutorials are excellent. They are very practical. Your are an excellent educator.
I want classify my data into multiple classes of 25-30. Your IRIS example is nearest classification. They DL4J previously has IRIS classification with DBN; but disappeared in new community version.
I have following issues:
1.>
It takes so long. My laptop is TOSHIBA L745, 4GB RAM, i3 processor. it has CUDA.
My classification problem is solved with SVM in very short time. I’d say in split second.
Do you think speed would increase if we use DBN or CNN something ?
2.>
My result :
Baseline: 88.67% (21.09%),
Once I have installed Docker (tensorflow in it),then run IRIS classification. It shows 96%.
I wish similar or better accuracy. How to reach that level ?

Thank you

Reply
- Jason Brownlee April 17, 2017 at 5:13 am #
  
  MLP is the right algorithm for multi-class classification algorithms.
  
  If it is slow, consider running it on AWS:
  https://machinelearningmastery.com/develop-evaluate-large-deep-learning-models-keras-amazon-web-services/
  
  There are many things you can do to lift performance, see this post:
  https://machinelearningmastery.com/improve-deep-learning-performance/
  
  Reply
Chris April 17, 2017 at 5:13 am #

Hello Jason,
first of all, your tutorials are really well done when you start working with keras.

I have a question about the epochs and batch_size in this tutorial. I think I haven’t understood it correctly.

I loaded the record and it contains 150 entries.

You choose 200 epochs and batch_size=5. So you use 5*200=1000 examples for training. So does keras use the same entries multiple times or does it stop automatically?

Thanks!

Reply
- Jason Brownlee April 18, 2017 at 8:23 am #
  
  One epoch involves exposing each pattern in the training dataset to the model.
  
  One epoch is comprised of one or more batches.
  
  One batch involves showing a subset of the patterns in the training data to the model and updating weights.
  
  The number of patterns in the dataset for one epoch must be a factor of the batch size (e.g. divide evenly).
  
  Does that help?
  
  Reply
  - Chris April 22, 2017 at 3:43 am #
    
    Hi,
    thank you for the explanation.
    The explanation helped me, and in the meantime I have read and tried several LSTM tutorials from you and it became much clearer to me.
    greetings, Chris
    
    Reply
    - Jason Brownlee April 22, 2017 at 9:28 am #
      
      I’m glad to hear that Chris.
      
      Reply
Abhilash Menon April 17, 2017 at 1:27 pm #

Hey Jason,

I have been following your tutorials and they have been very very helpful!. Especially, the most useful section being the comments where people like me get to ask you questions and some of them are the same ones I had in my mind.

Although, I have one that I think hasn’t been asked before, at least on this page!

What changes should I make to the regular program you illustrated with the “pima_indians_diabetes.csv” in order to take a dataset that has 5 categorical inputs and 1 binary output.

This would be a huge help! Thanks in advance!

Reply
- Jason Brownlee April 18, 2017 at 8:30 am #
  
  Great question.
  
  Consider using an integer encoding followed by a binary encoding of the categorical inputs.
  
  This post will show you how:
  https://machinelearningmastery.com/data-preparation-gradient-boosting-xgboost-python/
  
  Reply
  - Abhilash Menon July 18, 2017 at 12:47 pm #
    
    Hello Dr. Brownlee,
    
    The link that you shared was very helpful and I have been able to one hot encode and use the data set but at this point of time I am not able to find relevant information regarding what the perfect batch size and no. of epochs should be. My data has 5 categorical inputs and 1 binary output (2800 instances). Could you tell me what factors I should take into consideration before arriving at a perfect batch size and epoch number? The following are the configuration details of my neural net:
    
    model.add(Dense(28, input_dim=43, init=’uniform’, activation=’relu’))
    model.add(Dense(28, init=’uniform’, activation=’relu’))
    model.add(Dense(1, init=’uniform’, activation=’sigmoid’))
    model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
    
    Reply
    - Jason Brownlee July 18, 2017 at 5:01 pm #
      
      I recommend testing a suite of different batch sizes.
      
      I have a post this friday with advice on tuning the batch size, watch out for it.
      
      Reply
Tuba April 18, 2017 at 8:43 am #

Hi Jason,

First of all, your tutorials are really very interesting.

I was facing error this when i run it . I’m work with python 3 and the same file input .

Error :
ImportError: Traceback (most recent call last):
File “/home/indatacore/anaconda3/lib/python3.5/site-packages/tensorflow/python/__init__.py”, line 61, in
from tensorflow.python import pywrap_tensorflow
File “/home/indatacore/anaconda3/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py”, line 28, in
_pywrap_tensorflow = swig_import_helper()
File “/home/indatacore/anaconda3/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py”, line 24, in swig_import_helper
_mod = imp.load_module(‘_pywrap_tensorflow’, fp, pathname, description)
File “/home/indatacore/anaconda3/lib/python3.5/imp.py”, line 242, in load_module
return load_dynamic(name, filename, file)
File “/home/indatacore/anaconda3/lib/python3.5/imp.py”, line 342, in load_dynamic
return _load(spec)
ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#import_error

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.

Reply
- Jason Brownlee April 19, 2017 at 7:44 am #
  
  Ouch. I have not seen this error before.
  
  Consider trying the Theano backend and see if that makes a difference.
  
  Reply
Tursun April 21, 2017 at 2:17 am #

Jason,
Thank you. I got your notion: there is no key which opens all doors.

Here, I have multi class classification problem.
My data can be downloaded from here:
https://www.dropbox.com/s/w2en6ewdsed69pc/tursun_deep_p6.csv?dl=0

size of my data set : 512*16, last column is 21 classes, they are digits 1-21
note: number of samples (rows in my data) for each class is different. mostly 20 rows, but sometimes 17 or 31 rows
my network has:
first layer (input) has 15 neurons
second layer (hidden) has 30 neurons
last layer (output) has 21 neurons
in last layer I used “softmax” based on this recommendation from
https://github.com/fchollet/keras/issues/1013
“The softmax function transforms your hidden units into probability scores of the class labels you have; and thus is more suited to classification problems ”
error message:
alueError: Error when checking model target: expected dense_8 to have shape (None, 21) but got array with shape (512, 1)

I would be thankful if you can help me to run this code.

I modified this code from yours:
———–keras code start ———–
from keras.models import Sequential
from keras.layers import Dense
import numpy
# fix random seed for reproducibility
numpy.random.seed(7)
# load pima indians dataset
dataset = numpy.loadtxt(“tursun_deep_p6.csv”, delimiter=”,”)
# split into input (X) and output (Y) variables
X = dataset[:,0:15]
Y = dataset[:,15]

# create model
model = Sequential()
model.add(Dense(30, input_dim=15, activation=’relu’)) # not sure if 30 too much. not sure #about lower and upper limits
#model.add(Dense(25, activation=’relu’)) # think about to add one more hidden layer
model.add(Dense(21, activation=’softmax’)) # they say softmax at last L does classification
# Compile model
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=5)
# evaluate the model
scores = model.evaluate(X, Y)
print(“\n%s: %.2f%%” % (model.metrics_names[1], scores[1]*100))

———–keras code start ———–

Reply
- Jason Brownlee April 21, 2017 at 8:40 am #
  
  I see the problem, your output layer expects 8 columns and you only have 1.
  
  You need to transform your output variable int 8 variables. You can do this using a one hot encoding.
  
  Reply
Shiva April 23, 2017 at 5:54 am #

Hi jason, I am following your book deep learning with python and i have an issue with the script. I have succesfully read my .csv datafile through pandas and trying to adopt a decay based learning rate as discussed in the book. I define the initial lrate, drop, epochs_drop and the formula for lrate update as said in the book. I then created the model like this (works best for my problem) and started creating a pipeline in contrary to the model fitting strategy used by you in the book:

def baseline_model():
# create model
model = Sequential()
model.add(Dense(50, input_dim=15, kernel_initializer=’normal’, activation=’relu’))
model.add(Dense(3, kernel_initializer=’normal’, activation=’sigmoid’))
sgd = SGD(lr=0.0, momentum=0.9, decay=0, nesterov=False)
model.compile(loss=’categorical_crossentropy’, optimizer=sgd, metrics=[‘accuracy’])
return model
#learning schedule callback
lrate = LearningRateScheduler(step_decay)
callbacks_list = [lrate]

estimators = []
estimators.append((‘standardize’, StandardScaler()))
estimators.append((‘mlp’, KerasClassifier(build_fn=baseline_model, epochs=100,
batch_size=5, callbacks=[lrate], verbose=1)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=2, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)

I’m getting the error “Cannot clone object , as the constructor does not seem to set parameter callbacks”. According to keras documentation, I can see that i can pass callbacks to the kerasclassifier wrapper. kindly suggest what to do in this occasion. Looking forward.

Reply
- Jason Brownlee April 24, 2017 at 5:29 am #
  
  I have not tried to use callbacks with the sklearn wrapper sorry.
  
  Perhaps it is a limitation that you can’t? Though, I’d be surprised.
  
  you may have to use the keras API directly.
  
  Reply
Shiva April 25, 2017 at 6:23 am #

Hi Jason,
I’m trying to apply the image augmentation techniques discussed in your book to the data I have stored in my system under C:\images\train and C:\images\test. Could you help me with the syntax on how to load my own data with a modification to the syntax available in the book:

# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Thanks in advance.

Reply
- Jason Brownlee April 25, 2017 at 7:52 am #
  
  Sorry, I don’t have an example of how to load image data from disk, I hope to cover it in the future.
  
  This post may help as a start:
  https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
  
  Reply
Michael Ng April 28, 2017 at 12:49 am #

Hi,

By implementing neural network in Keras, how can we get the associated probabilities for each predicted class?’

Many Thanks!
Michael Ng

Reply
- Jason Brownlee April 28, 2017 at 7:47 am #
  
  Review the outputs from the softmax, although not strictly probabilities, they can be used as such.
  
  Also see the keras function model.predict_proba() for predicting probabilities directly.
  https://keras.io/models/sequential/
  
  Reply
  - Michael Ng April 30, 2017 at 11:55 am #
    
    Hi Jason,
    
    ‘Note that we use a sigmoid activation function in the output layer. This is to ensure the output values are in the range of 0 and 1 and may be used as predicted probabilities.’
    
    Instead of using softmax function, how do I review the sigmoidal outputs (as per the tutorial) for each of 3 output nodes? Mind to share the code to list the sigmoidal outputs?
    
    Regards,
    Michael Ng
    
    Reply
    - Jason Brownlee May 1, 2017 at 5:52 am #
      
      I would recommend softmax for multi-class classification.
      
      You can learn more about sigmoid here:
      https://en.wikipedia.org/wiki/Logistic_function
      
      Reply
  - Andrea December 12, 2017 at 7:59 am #
    
    Jason,
    
    may you elaborate further (or provide a link) about “the outputs from the softmax, although not strictly probabilities”?
    
    I thought they were probabilities even in the most formal sense.
    
    Thanks!
    
    Reply
    - Jason Brownlee December 12, 2017 at 4:04 pm #
      
      No, they are normalized to look like probabilities.
      
      This might be a good place to start:
      https://en.wikipedia.org/wiki/Softmax_function
      
      Reply
Ann April 28, 2017 at 2:08 am #

Hi, Jason! I’m exactly newbie to Keras, and I want to figure out confusion matrix by using sklearn.confusion_matrix(y_test, predict). But I was facing error this when i run it .

—————————————————————————
ValueError Traceback (most recent call last)
in ()
—-> 1 confusion_matrix(y_test, predict)

C:\Users\Ann\Anaconda3\envs\py27\lib\site-packages\sklearn\metrics\classification.pyc in confusion_matrix(y_true, y_pred, labels, sample_weight)
240 y_type, y_true, y_pred = _check_targets(y_true, y_pred)
241 if y_type not in (“binary”, “multiclass”):
–> 242 raise ValueError(“%s is not supported” % y_type)
243
244 if labels is None:

ValueError: multilabel-indicator is not supported

I’ve checked that y_test and predict have same shape (231L, 2L).
Any solution?
Your help would be greatly appreciated!
Thanks.

Reply
- Jason Brownlee April 28, 2017 at 7:50 am #
  
  Consider checking the dimensionality of both y and yhat to ensure they are the same (e.g. print the shape of them).
  
  Reply
Mohammed Zahran April 30, 2017 at 4:49 am #

can we use the same approach to classify MNIST in (0,1…) and the same time classify the numbers to even and odd numbers ?

Reply
- Jason Brownlee April 30, 2017 at 5:35 am #
  
  Machine learning is not needed to check for odd and even numbers, just a little math.
  
  Reply
  - TAM.G April 30, 2017 at 4:46 pm #
    
    but if we too it as a simple try to learn about multi-labeling ,, how could we do this
    
    Reply
    - Moh May 1, 2017 at 10:45 am #
      
      @Jason Brownlee I totally agree with you. We are using this problem as proxy for more complex problems like classifying a scene with multiple cars and we want to classify the models of these cars. The same approach is needed in tackling neurological images
      
      Reply
TAM.G April 30, 2017 at 3:22 pm #

first this is a great tutorial , but , am confused a little ,, am i loading my training files and labeling files or what ??
as i tried to apply this tutorial to my case ,, I’ve about 10 folder each has its own images these images are related together for one class ,, but i need to make multi labeling for each folder of them for example folder number 1 has about 1500 .png imgs of owl bird , here i need to make a multi label for this to train it as a bird and owl , and here comes the problem ,, as i’m seraching for a tool to make labeling for all images in each folder and label them as [ owl, bird] together … any idea about how to build my own multi label classifier ?

Reply
- Jason Brownlee May 1, 2017 at 5:53 am #
  
  I would recommend using a CNN instead of an MLP for image classification, see this post:
  https://machinelearningmastery.com/object-recognition-convolutional-neural-networks-keras-deep-learning-library/
  
  Reply
Ik.O May 14, 2017 at 10:58 pm #

I implemented the same code on my system and achieved a score of 88.67% at seed = 7 and 96.00% at seed = 4. Any particular reason for this?

Reply
- Jason Brownlee May 15, 2017 at 5:52 am #
  
  Nice work!
  
  Yes, deep learning algorithms are stochastic:
  https://machinelearningmastery.com/randomness-in-machine-learning/
  
  Reply
Anupam May 18, 2017 at 4:58 pm #

Hi Jason, Just gone through your blog https://machinelearningmastery.com/ .Just to know as a beginner in Deep learning, can you give any hint to do the task sequence learning for word language identification problem.
Here each word is a variable sequence of characters and the id of each word must be classified with a language tag.
Like, Suppose if we have a dataset like:

hello/L1 bahiya/L2 hain/L2 brother/L1 ,/L3 :)/L4

where L1,L2,L3 and L4 are the Language-tag

Reply
- Jason Brownlee May 19, 2017 at 8:14 am #
  
  Hi Anupam, that sounds like a great problem.
  
  I would suggest starting with a high-quality dataset, then consider modeling the problem using a seq2seq architecture.
  
  Reply
A.Malathi May 19, 2017 at 7:30 pm #

Hi Jason,

Your tutorials are great and very helpful to me. Have you written any article on Autoencoder.
I have constructed an autoencoder network for a dataset with labels. The output is a vector of
errors(Euclidean Distance). From that errors, classification or prediction on the test set is possible since labels are given??

Reply
- Jason Brownlee May 20, 2017 at 5:37 am #
  
  Sorry, I don’t currently have any material on autoencoders.
  
  Reply
J. A. Gildea May 22, 2017 at 2:57 am #

Hi Jason, thank you so much for your helpful tutorials.
I have one question regarding one-hot encoding:
I am working on using a CNN for sentiment analysis and I have a total of six labels for my output variable, string values (P+, P, NONE, NEU, N, N+) representing sentiments.
I one-hot encoded my output variable the same way as you showed in this tutorial, but the shape after one-hot encoding appears to be (, 7). Shouln’t it be 6 instead of 7? Any idea what might be going on? I checked for issues in my dataset such as null values in a certain row, and got rid of all of them yet this persists.
Thanks!

Reply
- Jason Brownlee May 22, 2017 at 7:54 am #
  
  It should be 7.
  
  Consider loading your data in Python and printing the set of values in the column to get an idea of what is in your data.
  
  Reply
  - J. A. Gildea May 22, 2017 at 6:39 pm #
    
    I checked my data a bit deeper and it seems it had a couple of null values that I removed.
    I am however getting very poor results, could this be due to the fact that my data is a bit unbalanced? Some of the classes appear twice as others, so I imagine I would have to change the metrics in my compile function (using accuracy at the moment).
    Can a slight imbalance in the dataset yield such poor results (under 40% validation accuracy)?
    
    Thanks.
    
    Reply
    - Jason Brownlee May 23, 2017 at 7:50 am #
      
      With multiple classes, it might be better to use another metric like log loss (cross entropy) or AUC.
      
      Accuracy will not capture the true performance of the model.
      
      Also, imbalanced classes can be a problem. You could look at removing some classes or rebalancing the data:
      https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/
      
      Reply
Nalini May 24, 2017 at 6:10 pm #

Hi Jason!
I can’t seem to add more layers in my code.
model.add(Dense(12, input_dim=25, init=’normal’, activation=’relu’))
model.add(Dense(5, init=’normal’, activation=’sigmoid’))
This is a part of the existing code. if i try to add more layers along with them i get a warning for indentation fault.
can you please specify which one of the above layers is the input layer and which one is hidden….

Reply
- Jason Brownlee June 2, 2017 at 11:32 am #
  
  This is a Python issue. Ensure you understand the role of whitespace in Python:
  http://www.diveintopython.net/getting_to_know_python/indenting_code.html
  
  Reply
Michael May 28, 2017 at 4:01 am #

Hi Jason,
I have two questions:

1. I didn’t see the code in this post calling the fit method. Is the fitting process executed in KerasClassifier?

2. I have only one dataset as training set (No dedicated test set).
Is the KFold method using this single dataset for evaluation in the KerasClassifier class?
Or should I use the “validation_split parameter in the fit method?

Thank’s

Reply
- Jason Brownlee June 2, 2017 at 12:06 pm #
  
  Hi Michael,
  
  Yes, we use the sklearn infrastructure to fit and evaluate the model.
  
  You can try both methods. The best evaluation test harness is really problem dependent. k-fold cross validation generally gives a less biased estimate of performance and is often recommended.
  
  Reply
Nimesh May 29, 2017 at 4:20 pm #

I am classifying mp3s into 7 genre classes. I have 1200 mp3 files dataset with 7 features as input. I got basic Neural network as your example shows and it gives nearly 60% of accuracy. Any suggestions on how to improve accuracy? your suggestions will be very helpful for me.

Reply
- Jason Brownlee June 2, 2017 at 12:22 pm #
  
  Yes, see this post:
  https://machinelearningmastery.com/machine-learning-performance-improvement-cheat-sheet/
  
  And this post:
  https://machinelearningmastery.com/improve-deep-learning-performance/
  
  Reply
J. A. Gildea June 9, 2017 at 3:35 am #

Hello Jason,

I posted here a while back and I’m back for more wisdom!

I have my own model and dataset for text classification (6 labels representing sentiment of tweets). I am not sure on how to evaluate it, I have tried using k fold just as in your example and it yields 100% accuracy which I assume is not the reality.
Just using model.fit() I obtain a result of 99%, which also makes me think I am not evaluating my model correctly.
I have been looking for a way to do this and apparently a good approach is to use a confusion matrix. Is this necessary to evaluate a multiclass model for text classification, or will other methods suffice?

Thanks

Reply
- Jason Brownlee June 9, 2017 at 6:30 am #
  
  Generally, I would recommend this process to work through your problem systematically:
  https://machinelearningmastery.com/start-here/#process
  
  I would recommend this post to get a robust estimate of the skill of a deep learning model on unseen data:
  https://machinelearningmastery.com/evaluate-skill-deep-learning-models/
  
  For multi-class classification, I would recommend a confusion matrix, but also measures like logloss.
  
  Reply
zakaria June 11, 2017 at 3:47 am #

Hi Jason, I need your help I use tensorflow and keras to classify cifar10 images. My question is how to make prediction (make prediction for only one image)

Reply
- Jason Brownlee June 11, 2017 at 8:26 am #
  
  Like this:
  
  yhat = model.predict(X)
  
  1
  
  yhat = model.predict(X)
  
  Reply
zakaria June 12, 2017 at 6:35 pm #

Hi Jason,
To make the prediction I used this function Y_pred = model.predict (x_test)
print (Y_pred)
Y_pred = np.argmax (Y_pred, axis = 1)
print (y_pred)

And I got these results
[[0, 0, …, 0, 0, 0]]
[0, 1, 0, …, 0, 0, 0]
[1. 0. 0. …, 0. 0. 0.]
…
[0, 0, 0, …, 0, 0, 0]]
[1. 0. 0. …, 0. 0. 0.]
[0. 0. 0. …, 1. 0. 0.]]
[0 1 0 …, 5 0 7]
What these results mean
And how to display for example the first 10 images of the test database to see if the model works well

Reply
- Jason Brownlee June 13, 2017 at 8:18 am #
  
  The prediction result may be an outcome (probability-like value) for each class.
  
  You can take an argmax() of each vector to find the selected class.
  
  Alternately, you can call predict_classes() to predict the class directly.
  
  Reply
Huong June 12, 2017 at 11:55 pm #

Dear @Jason,
Thank you for your useful post. I have a issues.
My dataset have 3 columns (features) for output data. Each column has multi-classes. So how can I process in this case?
Thanks.

Reply
- Jason Brownlee June 13, 2017 at 8:22 am #
  
  I don’t have a great answer for you off the cuff. I would suggest doing a little research to see how this type of problem has been handled in the literature.
  
  Maybe you can model each class separately?
  
  Maybe you can one-hot encode each output variable and use a neural network to output everyone directly.
  
  Let me know how you go.
  
  Reply
Anastasios June 17, 2017 at 10:05 pm #

Hello Jason,

great post on multiclass classification. I am trying to do a gridsearch on a multiclass dataset i created, but I get an error when calling the fit function on the gridsearch. Can we apply gridsearch on a multiclass dataset ?

My code looks like: https://pastebin.com/eB35aJmW

And the error I get is: https://pastebin.com/C1ch7709

Reply
- Jason Brownlee June 18, 2017 at 6:31 am #
  
  Yes, I believe you can grid search a multi-class classification problem.
  
  Sorry, it is not clear to me what the cause of the error might be. You will need to cut your example back to a minimum case that still produces the error.
  
  Reply
Anupam Samanta June 29, 2017 at 3:42 am #

Hi Jason,

Excellent tutorials! I have been able to learn a lot reading your articles.

I ran into some problem while implementing this program
My accuracy was around Accuracy: 70.67% (12.00%)
I dont know why the accuracy is so dismal!
I tried changing some parameters, mostly that are mentioned in the comments, such as removing kernel_initializer, changing activation function, also the number of hidden nodes. But the best I was able to achieve was 70 %

Any reason something is going wrong here in my code?!

# Modules
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from keras import backend as K
import os

def set_keras_backend(backend):
if K.backend() != backend:
os.environ[‘KERAS_BACKEND’] = backend
reload(K)
assert K.backend() == backend

set_keras_backend(“theano”)
# seed
seed = 7
numpy.random.seed(seed)

# load dataset
dataFrame = pandas.read_csv(“iris.csv”, header=None)
dataset = dataFrame.values

X = dataset[:, 0:4].astype(float)
Y = dataset[:, 4]

# encode class values
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

dummy_Y = np_utils.to_categorical(encoded_Y)

# baseline model
def baseline_model():
# create model
model = Sequential()
model.add(Dense(8, input_dim=4, kernel_initializer=’normal’, activation=’softplus’))
model.add(Dense(3, kernel_initializer=’normal’, activation=’softmax’))
# compile model
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
return model

estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)

results = cross_val_score(estimator, X, dummy_Y, cv=kfold)

print(“Accuracy: %.2f%% (%.2f%%)” % (results.mean() * 100, results.std() * 100))

Reply
- Anupam Samanta June 29, 2017 at 3:45 am #
  
  I added my code here: https://pastebin.com/3Kr7P6Kw
  Its better formatted here!
  
  Reply
- Jason Brownlee June 29, 2017 at 6:39 am #
  
  There are more ideas here:
  https://machinelearningmastery.com/deploy-machine-learning-model-to-production/
  
  Reply
  - Anupam Samanta June 30, 2017 at 3:36 am #
    
    But isnt it strange, that when I use the same code as yours, my program in my machine returns such bad results!
    Is there anything I am doing wrong in my code?!
    
    Reply
    - Jason Brownlee June 30, 2017 at 8:14 am #
      
      No. Try running the example a few times. Neural networks are stochastic and give different results each time they are run.
      
      See this post on why:
      https://machinelearningmastery.com/randomness-in-machine-learning/
      
      See this post on how to address it and get a robust estimate of model performance:
      https://machinelearningmastery.com/evaluate-skill-deep-learning-models/
      
      Reply
- Zefeng Wu June 30, 2017 at 11:05 pm #
  
  Hi, my codes is as followings, but keras gave a extremebad results,
  import numpy
  import pandas
  from keras.models import Sequential
  from keras.layers import Dense
  from keras.wrappers.scikit_learn import KerasClassifier
  from keras.utils import np_utils
  from sklearn.model_selection import cross_val_score
  from sklearn.model_selection import KFold
  from sklearn.preprocessing import LabelEncoder
  from sklearn.pipeline import Pipeline
  # fix random seed for reproducibility
  seed = 7
  numpy.random.seed(seed)
  # load dataset
  dataframe = pandas.read_csv(“iris.csv”, header=None)
  dataset = dataframe.values
  X = dataset[:,0:4].astype(float)
  Y = dataset[:,4]
  # encode class values as integers
  encoder = LabelEncoder()
  encoder.fit(Y)
  encoded_Y = encoder.transform(Y)
  # convert integers to dummy variables (i.e. one hot encoded)
  dummy_y = np_utils.to_categorical(encoded_Y)
  # define baseline model
  def baseline_model():
  # create model
  model = Sequential()
  model.add(Dense(8, input_dim=4 , activation= “relu” ))
  model.add(Dense(3, activation= “softmax” ))
  # Compile model
  model.compile(loss= “categorical_crossentropy” , optimizer= “adam” , metrics=[“accuracy”])
  return model
  estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0)
  kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
  results = cross_val_score(estimator, X, dummy_y, cv=kfold)
  print(“Accuracy: %.2f%% (%.2f%%)” % (results.mean()*100, results.std()*100))
  
  Using Theano backend.
  Accuracy: 64.67% (15.22%)
  
  Reply
Nunu July 4, 2017 at 12:13 am #

Dear Jason,
How can I increase the accuracy while training ? I am always getting an accuracy arround 68% and 70%!! even if i am chanching the optimizer, the loss function and the learning rate.
(I am using keras and CNN)

Reply
- Jason Brownlee July 6, 2017 at 10:02 am #
  
  Here are many ideas:
  https://machinelearningmastery.com/machine-learning-performance-improvement-cheat-sheet/
  
  Reply
- Nunu July 8, 2017 at 12:06 am #
  
  Thanks a lot it is very useful 🙂
  
  Reply
  - Jason Brownlee July 9, 2017 at 10:47 am #
    
    Glad to hear it.
    
    Reply
    - Nunu July 12, 2017 at 7:27 pm #
      
      Dear Jason,
      I have a question: my model should classify every image in one of the 4 classes that I have, should I use “categorical cross entropy” or I can use instead the “Binary cross entropy” ? because I read a lot that when there is n classes it is better to use categorical cross entropy, but also the binary one is used for the same cases. I am too much confused 🙁 can you help me in understanding this issue better!!
      Thanks in advance,
      Nunu
      
      Reply
      - Jason Brownlee July 13, 2017 at 9:53 am #
        
        When you have more than 2 classes, use categorical cross entropy.
      - Nunu July 19, 2017 at 12:47 am #
        
        oh ok thanks a lot 🙂 I have another question : I used Rmsprop with different learning rates such that 0.0001, 0.001 and 0.01 and with softmax in the last dense layer everything was good so far. Then i changed from softmax to sigmoid and i tried to excuted the same program with the same learning rates used in the cas of softmax, and here i got the problem : using learning rate 0.001 i got loss and val loss NAN after 24 epochs !! In your opinion what is the reason of getting such values??
        Thanks in advance,
        have a nice day,
        Nunu
      - Jason Brownlee July 19, 2017 at 8:27 am #
        
        Ensure you have scaled your input/output data to the bounds of the input/output activation functions.
      - Nunu July 19, 2017 at 5:49 pm #
        
        Thanksssss 🙂
Sriram July 5, 2017 at 5:12 pm #

HI Jason,

Thanks for the awesome tutorial. I have a question regarding your first hidden layer which has 8 neurons. Correct me if I’m wrong, but shouldn’t the number of neurons in a hidden layer be upperbounded by the number of inputs? (in this case 4).

Thanks,
Sriram

Reply
- Jason Brownlee July 6, 2017 at 10:24 am #
  
  No. There are no rules for the number of neurons in the hidden layer. Try different configurations and go with whatever robustly gives the best results on your problem.
  
  Reply
- Nunu July 13, 2017 at 8:17 pm #
  
  ok thanks a lot,
  
  have a nice day 🙂
  
  Reply
riya July 5, 2017 at 10:33 pm #

i ran the above program and got error
Import error: bad magic numbers in ‘keras’:b’\xf3\r\n’

Reply
- Jason Brownlee July 6, 2017 at 10:25 am #
  
  You may have a copy-paste example. Check your code file.
  
  Reply
riya July 6, 2017 at 9:43 pm #

actually a pyc file was created in the same directory due to which this error occoured.After deleting the file,error was solved

Reply
- Jason Brownlee July 9, 2017 at 10:29 am #
  
  Glad to hear it.
  
  Reply
riya July 7, 2017 at 9:44 pm #

Hello jason,
how is the error calculated to adjust weights in neural network?does the classifier uses backpropgation or anything else for error correction and weight adjustment?

Reply
- Jason Brownlee July 9, 2017 at 10:44 am #
  
  Yes, the backpropgation algorithm is used.
  
  Reply
riya July 9, 2017 at 7:15 pm #

Thanks jason

Reply
- Jason Brownlee July 11, 2017 at 10:15 am #
  
  You’re welcome.
  
  Reply
Nunu July 19, 2017 at 6:27 pm #

Dear Jason,
In my classifier I have 4 classes and as I know the last Dense layer should also have 4 outputs correct me please if i am wrong :). Now I want to change the number of classes from 4 to 2 !! my dataset is labeled as follows :
1) BirdYES_TreeNo
2) BirdNo_TreeNo
3)BirdYES_TreeYES
4)BirdNo_TreeYES
At the begining my output vector that i did was [0,0,0,0] in such a way that it can take 1 in the first place and all the rest are zeros if the image labeled as BirdYES_TreeNo and it can take 1 in the second place if it is labeled as BirdNo_TreeNo and so on…

Can you give me any hint inorder to convert these 4 classes into only 2 ( is there a function in Python that can do this ?) class Bird and class Tree in which every class takes 2 values 1 and 0 ( 1 indicates the exsistance of a Bird/Tree and 0 indicates that there is no Bird/Tree). I hope that my explanation is clear.
I will appreciate so much any answer from your side.
Thanks in advance,
have a nice day,
Nunu

Reply
- Jason Brownlee July 20, 2017 at 6:18 am #
  
  Yes, the number of nodes in the output layer should match the number of classes.
  
  Unless the number of classes is 2, in which case you can use a sigmoid activation function with a single neuron. Remember to change loss to binary_crossentropy.
  
  Reply
Nunu July 20, 2017 at 6:07 pm #

Thanks a lot for your help i will try it.

Have a nice day,
Nunu

Reply
- Jason Brownlee July 21, 2017 at 9:32 am #
  
  Good luck!
  
  Reply
- Quang Huy Chu June 7, 2020 at 9:46 am #
  
  Hi Jason.
  
  Can we use this baseline model to predict new data?
  
  If yes, we use the function model.evaluate() or model.predict() ?
  
  Thank you very much.
  
  Reply
  - Jason Brownlee June 7, 2020 at 1:13 pm #
    
    Yes, you can fit the model on all available data and use the predict() function from scikit-learn API.
    
    If this is new to you, see this tutorial:
    https://machinelearningmastery.com/make-predictions-scikit-learn/
    
    Reply
Prathm July 26, 2017 at 8:32 am #

kfold = KFold(n_splits=10, shuffle=True, random_state=seed)

This line is giving me follwing error:

File “C:\Users\pratmerc\AppData\Local\Continuum\Anaconda3\lib\site-
packages\pandas\core\indexing.py”, line 1231, in _convert_to_indexer raise KeyError(‘%s
not in index’ % objarr[mask])

KeyError: ‘[41421 7755 11349 16135 36853] not in index’

Can you please help ?

Reply
- Jason Brownlee July 26, 2017 at 3:58 pm #
  
  I’m sorry to hear that, perhaps check the data that you have loaded?
  
  Reply
Q. I. August 5, 2017 at 5:20 am #

Hi,

Thanks for a great site. New visitor. I have a question. In line 38 in your code above, which is “print(encoder.inverse_transform(predictions))”, don’t you have to do un-one-hot-encoded or reverse one-hot-encoded first to do encoder.inverse_transform(predictions)?

Thanks.

Reply
- Jason Brownlee August 5, 2017 at 5:49 am #
  
  Normally yes, here I would guess that the learn wrapper predicted integers directly (I don’t recall the specifics off hand).
  
  Try printing the outcome of predict() to confirm.
  
  Reply
Hernando Salas August 11, 2017 at 5:16 am #

Hi Jason,

I really enjoy your tutorials awesome at presenting the material. I’m a little bit puzzled by the results of this project as I get %44 rather than %95 which is a huge difference. I have used your code as follows in ipython notebook online:

import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.cross_validation import cross_val_score, KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# load dataset
dataframe = pandas.read_csv(“iris.csv”, header=None)
dataset = dataframe.values
X = dataset[:,0:4].astype(float)
Y = dataset[:,4]

#encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

# convert integers to dummy variables (hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

# define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(Dense(4, input_dim=4, init=’normal’, activation=’relu’))
model.add(Dense(3, init=’normal’, activation=’sigmoid’))
# Compile model
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

return model

estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0)
kfold = KFold(n=len(X), n_folds=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, dummy_y, cv=kfold)

print(“Accuracy: %.2f%% (%.2f%%)” % (results.mean()*100, results.std()*100))

Reply
- Jason Brownlee August 11, 2017 at 6:46 am #
  
  The algorithm is stochastic, so you will get different results each time it is run, try running it multiple times and take the average.
  
  More about the stochastic nature of the algorithms here:
  https://machinelearningmastery.com/randomness-in-machine-learning/
  
  Reply
  - Hernando Salas August 15, 2017 at 5:41 am #
    
    Hi Jason,
    
    Thanks for the reply. Run several times and got the same result. Any ideas?
    
    Reply
    - Hernando Salas August 15, 2017 at 5:43 am #
      
      https://notebooks.azure.com/hernandosalas/libraries/deeplearning/html/main.ipynb
      
      Reply
    - Jason Brownlee August 15, 2017 at 6:45 am #
      
      You could try varying the configuration of the network to see if that has an effect?
      
      Reply
Hernando Salas August 16, 2017 at 5:02 am #

If I set it to:

# create model
model = Sequential()
model.add(Dense(4, input_dim=4, init=’normal’, activation=’relu’))
model.add(Dense(3, init=’normal’, activation=’sigmoid’))

I get Accuracy: 44.00% (17.44%) everytime

If I set it to:

# create model
model = Sequential()
model.add(Dense(8, input_dim=4, init=’normal’, activation=’relu’))
model.add(Dense(3, init=’normal’, activation=’softmax’))

I get Accuracy: 64.00% (10.83%) everytime

Reply
- Jason Brownlee August 16, 2017 at 6:40 am #
  
  Interesting. Thanks for sharing.
  
  Reply
Akash August 22, 2017 at 12:42 am #

Hi Jason,

Thank you for your wonderful tutorial and it was really helpful. I just want to ask if we can perform grid search cv also the similar way because I am not able to do it right now?

Reply
- Jason Brownlee August 22, 2017 at 6:44 am #
  
  Yes, see this post:
  https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/
  
  Reply
Alexander September 9, 2017 at 6:56 am #

Hi, Jason. Thank you for beautiful work.
Help me please.
Where (in which folder, directory) should i save file “iris.csv” to use this code? Now system doesn’t see this file, when I write “dataframe=pandas.read_csv….”

4. Load The Dataset
The dataset can be loaded directly. Because the output variable contains strings, it is easiest to load the data using pandas. We can then split the attributes (columns) into input variables (X) and output variables (Y).
# load dataset
dataframe = pandas.read_csv(“iris.csv”, header=None)
dataset = dataframe.values
X = dataset[:,0:4].astype(float)
Y = dataset[:,4]

Reply
- Jason Brownlee September 9, 2017 at 12:01 pm #
  
  Download it and place it in the same directory as your Python code file.
  
  Reply
Alexander September 9, 2017 at 5:59 pm #

Thank you, Jason. I’ll try.

Reply
Tran Minh September 18, 2017 at 4:29 pm #

Hi Jason, thank you for your great instruction
I follow your code but unfortunately, I get only 68%~70% accuracy rate.
I use Tensorflow backend and modified seed as well as the number of hidden units but I still can’t reach to 90% of accuracy rate.

Do you have any idea how to improve it

Reply
- Jason Brownlee September 19, 2017 at 7:32 am #
  
  Perhaps try running the example a few times, see this post:
  https://machinelearningmastery.com/randomness-in-machine-learning/
  
  Reply
Greg September 21, 2017 at 8:23 am #

Jason,

First thanks so much for a great post.

I cut and pasted the code above and got the following run times with a GTX 1060

real 2m49.436s
user 4m46.852s
sys 0m21.944s

and running without the GPU

124.93 user 25.74 system 1:04.90 elapsed 232% CPU

Is this reasonable? It seems slow for a toy problem.

Reply
- Jason Brownlee September 21, 2017 at 4:19 pm #
  
  Thanks for sharing.
  
  Yes, LSTMs are slower than MLPs generally.
  
  Reply
Bee September 27, 2017 at 1:20 am #

Hi Dr. Jason,

It’s a great tutorial. Do you have any similar tutorials for Unsupervised classification too?

Thanks,
Bee

Reply
- Jason Brownlee September 27, 2017 at 5:43 am #
  
  Unsupervised methods cannot be used for classification, only supervised methods.
  
  Reply
  - Bee October 2, 2017 at 5:08 am #
    
    Sorry, it was my poor choice of words. What I meant was clustering data using unsupervised methods when I don’t have labels. Is that possible with Keras?
    
    Thanks,
    Bee
    
    Reply
    - Jason Brownlee October 2, 2017 at 9:40 am #
      
      It may be, but I do not have examples of working with unsupervised methods, sorry.
      
      Reply
Miqueias October 3, 2017 at 8:48 am #

Hi Jason,

Thanks for you work describing in a very nice way how to use Keras! I’ve a question about the performance of categorical classification versus the binary one. Suppose you have a class for something you call your signal and, then, many other classes which you would call background. In that case, which way is more efficient to work on Keras: merging the different background classes and considering all of them as just one background class and then use binary classification or use a categorical one to account all the classes? In other words, is one way more sensible than the other for keras learn well the features from all the classes?

Reply
- Jason Brownlee October 3, 2017 at 3:44 pm #
  
  Great question.
  
  It really depends on the specific data. I would recommend designing some experiments to see what works best.
  
  Reply
  - Miqueias October 3, 2017 at 10:52 pm #
    
    Thanks for fast replay Jason!
    I’ll try that to see what I get.
    I’m wondering if in categorical classification Keras can build up independent functions inside it. Because, since the background classes may exist in different phase space regions (what would be more truthfully described by separated functions), training the net with all of them together for binary classification may not extract all the features from each one. In principle, that could be done with a single net but, it would probably require more neurons (which increases the over-fitting issue).
    By the way, what do you think about training different nets for signal vs. each background? Could they be combined in the end?
    
    Reply
    - Jason Brownlee October 4, 2017 at 5:45 am #
      
      If the classes are separable I would encourage you to model them as separate problems.
      
      Nevertheless, the best advice is always to test each idea and see what works best on your problem.
      
      Reply
Dave October 11, 2017 at 5:22 am #

Hi Jason! I have a question about multi classification

I would like to classify the 3 class of sleep disordered breathing.

I designed the LSTM network. but it works like under the table.

What is this situation?

Train matrix: precision recall f1-score support

0 0.00 0.00 0.00 1749
1 0.46 1.00 0.63 2979
2 0.00 0.00 0.00 1760

avg / total 0.21 0.46 0.29 6488

Train matrix: precision recall f1-score support

0 0.00 0.00 0.00 441
1 0.46 1.00 0.63 750
2 0.00 0.00 0.00 431

avg / total 0.21 0.46 0.29 1622

Reply
sasi October 13, 2017 at 10:48 pm #

Hi Jason,
Does this topic will match for this tutorial??
“Deep learning based multiclass classification tutorial”

Reply
- Jason Brownlee October 14, 2017 at 5:46 am #
  
  Yes.
  
  Reply
zaheer October 16, 2017 at 11:48 pm #

This tutorial is awsom. thanks for your time.
My data is
404. instances
2. class label. A/B
20. attribute columns.

i have tried the this example gives me 58% acc.
model = Sequential()
model.add(Dense(200, input_dim=20, activation=’relu’))
model.add(Dense(2, activation=’softmax’))
# Compile model
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
return model

#Classifier invoking
estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)

what should i do, how to increase the acc of the system

Reply
- Jason Brownlee October 17, 2017 at 5:47 am #
  
  See this post for a ton of ideas:
  https://machinelearningmastery.com/improve-deep-learning-performance/
  
  Reply
Curious_Kid October 24, 2017 at 1:00 am #

Hi Jason,
My training data consists of lines of characters with each line corresponding to a label.

E.g. afhafajkfajkfhafkahfahk 6

fafafafafaftuiowjtwojdfafanfa 8

dakworfwajanfnafjajfahifqnfqqfnq 4

Here, 6,8 and 4 are labels for each line of the training data.
……………………………………………………..

I have first done the integer encoding for each character and then done the one hot encoding. To keep the integer encoding consistent, I first looked for the unique letters in all the rows and then did the integer encoding. e.g. that’s why letter h will always be encoded as 7 in all the lines.

For a better understanding, consider a simple example where my training data has 3 lines(each line has some label):
af
fa
nf

It will be one hot encoded as:

0 [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]
1 [[0.0, 1.0, 0.0], [1.0, 0.0, 0.0]]
2 [[0.0, 0.0, 1.0], [0.0, 1.0, 0.0]]

I wanted to do the classification for the unseen data(which label does the new line belong to) by training a neural network on this one hot encoded training data.

I am not able to understand how my model should look like as I want the model to learn from each one hot encoded character for each line. Could you please suggest me something in this case? Please let me know if you need more information to understand the problem.

Reply
- Jason Brownlee October 24, 2017 at 5:33 am #
  
  This is a sequence classification task.
  
  Perhaps this post will give you a template to get started:
  https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/
  
  Reply
  - Curious_Kid October 24, 2017 at 6:10 am #
    
    Thanks Jason for the reply.
    
    However, I am not dealing with words. I just have characters in a line and I am doing one hot encoding for each character in a single line as I explained above. What I am confused with is the shapes that I have to give to the layers of my network.
    
    Reply
    - Jason Brownlee October 24, 2017 at 3:56 pm #
      
      I see, perhaps this post will help with reshaping your data:
      https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
      
      Reply
- Nrithya Muniswamy January 26, 2018 at 1:29 am #
  
  @Curious_Kid : did you find a workaround, I am dealing with same problem
  
  Reply
philippe November 6, 2017 at 9:10 pm #

Hello Jason,

very clear tutorial. one quick question, how do you decide on the number of hidden neurons (in classification case). it seems to follow (Hidden neurons = input * 2) , how about * 1 or *3 is there a rule. same goes for epoch ; how do you choose nbr of iterations;

thanks.

Reply
- Jason Brownlee November 7, 2017 at 9:49 am #
  
  There are no good rules, use trial and error or a robust test harness and a grid search.
  
  Reply

Niklas Wilke November 13, 2017 at 11:26 pm #

Hi Jason,

great tutorial!
I’ve got a multi class classification problem. I try to classify different kind of bills into categorys (that are given !! no clustering!!) , like flight, train/bus, meal, hotels and so on.
I got a couple files in PDF which i transform in PNG to make it processable by MC Computer Vision using OCR.
After that i come out with a .txt or .csv file of the plain text.
Now i used skelarns vectorizers to create a bag of words and fit the single bills/documents.
Ending up with numpy-arrays looking like this (sample data i used to craete the code while i was gathering data):

[[3 0 1 1 0 0 0 0 2 0 2 2 1 3 1 1 0 3 0 0 3 2 1 0 1 3 1 0 0 5 0 0 1 1 0 1 0
0 1 1 1 1 0 1 0 1 0 1 0 2 0 2 1 0 1 0 1 1 1 1 1 0 0 1 0 1 1 1 1 0 0 1 1 1
0 1 1 0 0 0 0 1 0 0 0 1 0 0 1 1 1 2 1 0 0 0 0 0 0 0 2 1 0 0 0 2 1 0 1 0 1
0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 2 0 1
0 0 1 1 0 0 1 1 1 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 1 1 1 0 2 0 0
0 1 4 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 2 3 0 1 0 0 0 0 0 1 0 3 0 1 0
1 1 0 0 0 0 0 0 1 2 0 0 0 3 0 0 0 1 0 0 0 1 1 0 2 0 0 0 0 1 0 1 1 0 0 1 0
1 1 0 1 0 0 1 0 0 0 0 1 0]]

How do i categoryze or transform this to something like the iris dataset ?
Isn’t it basically the same ? Just with way more numbers and bigger arrrays ?
I tried to iterate through the array to print every single number in a .csv-file and then just append the category at the back with some for loops but sadly you can’t iterate through numpy-arrays … + i can’t imagine that’s the intended way of labeling data …

Thanks for reading through this way too long comment , help is highly apreciated.

Jason Brownlee November 14, 2017 at 10:18 am #

Yes, the vectorized documents become input to ML algorithms.

I’d love to hear how you go, post your results!

Niklas Wilke November 30, 2017 at 1:27 am #

Finally solved all my preprocessing problems and today i was able to perform my first training trial runns with my actual dataset. (Btw : buffer_y = dummy_y)

def createModell():
    #8137 words = input shape
    #14 categorys = output shape

    model = Sequential()
    model.add(Dense(8137, input_dim = 8137, activation='relu'))
    model.add(Dense(2250,  activation='relu'))
    #model.add(Dense(581,  activation='relu'))
    model.add(Dense(14,  activation='softmax'))
    
    model.compile(loss= 'categorical_crossentropy', optimizer='adam', metrics = ['accuracy'])
    return model

estimator = KerasClassifier(build_fn = createModell, epochs = 10, batch_size = 6)
crossvalidation_data = KFold(n_splits = 39, shuffle = True)
results = cross_val_score(estimator, X, buffer_y, cv= crossvalidation_data)

def createModell():

#8137 words = input shape

#14 categorys = output shape

model = Sequential()

model.add(Dense(8137, input_dim = 8137, activation='relu'))

model.add(Dense(2250, activation='relu'))

#model.add(Dense(581, activation='relu'))

model.add(Dense(14, activation='softmax'))

model.compile(loss= 'categorical_crossentropy', optimizer='adam', metrics = ['accuracy'])

return model

estimator = KerasClassifier(build_fn = createModell, epochs = 10, batch_size = 6)

crossvalidation_data = KFold(n_splits = 39, shuffle = True)

results = cross_val_score(estimator, X, buffer_y, cv= crossvalidation_data)

And hell am i overfitting.
0.98 acuraccy , which can’t be because my dataset is horribly unbalanced. (maybe thats the issue?)

Anyhow, i enabled the print option and for me it only displays 564/564 sample files for every epoche even though my dataset contains 579 … i check for you example and it also only displays 140/140 even though the iris dataset is 150 files big.

Are the splits to high ?
and what is a good amount of nodes for such a high input shape :/ tried to split it up to multiple layers so its not 8139 -> 4000-> 14

Cheers
Niklas

Jason Brownlee November 30, 2017 at 8:20 am #

Well done!

Consider the options in this post for imbalanced data:
https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/

The count is wrong because you are using cross-validation (e.g. not all samples for each run).

You must use trial and error to explore alternative configurations, here are some ideas:
https://machinelearningmastery.com/improve-deep-learning-performance/

I hope that helps as a start.

Reply
- Niklas Wilke November 30, 2017 at 6:52 pm #
  
  Ah ok , good point. When i create 10 splits it only uses 521 files => 90% of 579
  
  Will look into it and post my hopefully sucessfull results here.
  
  Given that i had no issue with the imbalance of my dataset, is the general amount of nodes or layers alright ? I have literally no clue because all the tipps ive found so far refer to way smaller input shapes like 4 or 8.
- Jason Brownlee December 1, 2017 at 7:28 am #
  
  There are no good rules of thumb, I recommend testing a suite of configurations to see what works best for your problem.
- Niklas Wilke November 30, 2017 at 7:34 pm #
  
  I read you mentioned other classifiers like decision trees performing well on imbalanced datasets.
  
  Is there some way i can use other classifiers INSIDE of my NN ?
  
  for example could i implement naive bayes into my NN ?
- Jason Brownlee December 1, 2017 at 7:29 am #
  
  Not that I am aware.
  
  You could combine the predictions from multiple models into an ensemble though.

Niklas Wilke November 30, 2017 at 6:59 pm #

Btw, even though i tell it to run 10 epoches , after the 10 epoches it just starts again with slightly different values. In your example it doesnt.

Epoch 1/10
521/521 [==============================] – 12s – loss: 2.0381 – acc: 0.4952
Epoch 2/10
521/521 [==============================] – 10s – loss: 0.3139 – acc: 0.9443
Epoch 3/10
521/521 [==============================] – 10s – loss: 0.0748 – acc: 0.9866
Epoch 4/10
521/521 [==============================] – 11s – loss: 0.0578 – acc: 0.9942
Epoch 5/10
521/521 [==============================] – 11s – loss: 0.0434 – acc: 0.9962
Epoch 6/10
521/521 [==============================] – 11s – loss: 0.0352 – acc: 0.9962
Epoch 7/10
521/521 [==============================] – 11s – loss: 0.0321 – acc: 0.9981
Epoch 8/10
521/521 [==============================] – 11s – loss: 0.0314 – acc: 0.9981
Epoch 9/10
521/521 [==============================] – 11s – loss: 0.0312 – acc: 0.9981
Epoch 10/10
521/521 [==============================] – 11s – loss: 0.0311 – acc: 0.9981
58/58 [==============================] – 0s
Epoch 1/10
521/521 [==============================] – 13s – loss: 1.9028 – acc: 0.4722
Epoch 2/10
521/521 [==============================] – 11s – loss: 0.2883 – acc: 0.9463
Epoch 3/10
521/521 [==============================] – 11s – loss: 0.1044 – acc: 0.9770
Epoch 4/10
521/521 [==============================] – 11s – loss: 0.0543 – acc: 0.9942

Reply
Niklas Wilke November 30, 2017 at 11:38 pm #

Hi Jason,
could you please comment on this blog entry :
http://www.alfredo.motta.name/cross-validation-done-wrong/

Sounds pretty logical to me and isnt that exactly what we are doing here ?
If we ignore the feature selection part, we also split the data first and afterwards train the model ….

Thanks in advance

Reply
Summer Cassidy December 16, 2017 at 12:12 am #

Hello Jason Brownlee,
When I run the code I get an error. I have checked multiple times whether I have copied the code correctly. I am unable to trace why the error is occurring. Can you please help me out?
The error is:

Traceback (most recent call last):
File “F:/7th semester/machine language/thesis work/python/iris2.py”, line 36, in
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
File “C:\Users\ratul\AppData\Local\Programs\Python\Python35\lib\site-packages\sklearn\model_selection\_validation.py”, line 342, in cross_val_score
pre_dispatch=pre_dispatch)
File “C:\Users\ratul\AppData\Local\Programs\Python\Python35\lib\site-packages\sklearn\model_selection\_validation.py”, line 206, in cross_validate
for train, test in cv.split(X, y, groups))
File “C:\Users\ratul\AppData\Local\Programs\Python\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py”, line 779, in __call__
while self.dispatch_one_batch(iterator):
File “C:\Users\ratul\AppData\Local\Programs\Python\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py”, line 625, in dispatch_one_batch
self._dispatch(tasks)
File “C:\Users\ratul\AppData\Local\Programs\Python\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py”, line 588, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File “C:\Users\ratul\AppData\Local\Programs\Python\Python35\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py”, line 111, in apply_async
result = ImmediateResult(func)
File “C:\Users\ratul\AppData\Local\Programs\Python\Python35\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py”, line 332, in __init__
self.results = batch()
File “C:\Users\ratul\AppData\Local\Programs\Python\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py”, line 131, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File “C:\Users\ratul\AppData\Local\Programs\Python\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py”, line 131, in
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File “C:\Users\ratul\AppData\Local\Programs\Python\Python35\lib\site-packages\sklearn\model_selection\_validation.py”, line 458, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File “C:\Users\ratul\AppData\Local\Programs\Python\Python35\lib\site-packages\keras\wrappers\scikit_learn.py”, line 203, in fit
return super(KerasClassifier, self).fit(x, y, **kwargs)
File “C:\Users\ratul\AppData\Local\Programs\Python\Python35\lib\site-packages\keras\wrappers\scikit_learn.py”, line 147, in fit
history = self.model.fit(x, y, **fit_args)
File “C:\Users\ratul\AppData\Local\Programs\Python\Python35\lib\site-packages\keras\models.py”, line 960, in fit
validation_steps=validation_steps)
File “C:\Users\ratul\AppData\Local\Programs\Python\Python35\lib\site-packages\keras\engine\training.py”, line 1581, in fit
batch_size=batch_size)
File “C:\Users\ratul\AppData\Local\Programs\Python\Python35\lib\site-packages\keras\engine\training.py”, line 1418, in _standardize_user_data
exception_prefix=’target’)
File “C:\Users\ratul\AppData\Local\Programs\Python\Python35\lib\site-packages\keras\engine\training.py”, line 153, in _standardize_input_data
str(array.shape))
ValueError: Error when checking target: expected dense_2 to have shape (None, 3) but got array with shape (90, 40)

Reply
- Jason Brownlee December 16, 2017 at 5:32 am #
  
  Looks like you might be using different data.
  
  Reply
  - Summer Cassidy December 16, 2017 at 6:30 am #
    
    Thanks for looking into the problem. I downloaded the iris flower dataset but from a different source. Changing the source to UCI Machine Learning repository solved my problem.
    
    Reply
    - Jason Brownlee December 16, 2017 at 9:21 am #
      
      Glad to hear it!
      
      Reply
Pubudu January 1, 2018 at 4:52 pm #

Hey Jason:

Thanks for the tute. BTW, how do you planning to void dummy variable trap. You don’t need all three types. Can you explain why you didn’t use train_test_split method?

Reply
- Jason Brownlee January 2, 2018 at 5:34 am #
  
  The example uses k-fold cross validation instead of a train/test split.
  
  The results are less biased with this method and I recommend it for smaller models.
  
  Reply
Hieu January 7, 2018 at 7:38 pm #

Dear Jason,
Thank you for your sharing.
I run your source code, now I want to replace “activation=’softmax'” – (model.add(Dense(3, activation=’softmax’)) with multi-class SVM to classify. How can I do it?
Coul you please help me? Thank you so much!

Reply
- Jason Brownlee January 8, 2018 at 5:42 am #
  
  This is a neural network example, not SVM. Perhaps I don’t understand your question. Can you restate it?
  
  Reply
Hieu January 9, 2018 at 8:39 pm #

Dear Jaso,
Thank you for your reply.
Because your example uses “Softmax regression” method to classify, Now I want to use “multi-class SVM” method to add to the neural network to classify. When using SVM method, the accuracy of training data doesn’t change in each iteration and I only got 9.5% after training.
This is my code
……
model.add(Dense(1000, activation=’relu’))

#=======for softmax============
# model.add(Dense(10, activation=’softmax’))
# model.compile(loss=keras.losses.categorical_crossentropy,
# optimizer=keras.optimizers.Adam(),
# metrics=[‘accuracy’])

#========for SVM ==============
model.add(Dense(10, kernel_regularizer=regularizers.l2(0.01), activity_regularizer=regularizers.l1(0.01)))
model.add(Activation(‘linear’))
model.compile(loss=’hinge’,
optimizer=’sgd’,
metrics=[‘accuracy’])

Thank you!

Reply
- Jason Brownlee January 10, 2018 at 5:24 am #
  
  Here are some ideas to try:
  https://machinelearningmastery.com/improve-deep-learning-performance/
  
  Reply
  - Hieu January 11, 2018 at 6:13 am #
    
    Dear Jason,
    Thank you for your help! I will read and try it.
    
    Have a nice day.
    Trung Hieu
    
    Reply
Arjun January 10, 2018 at 9:26 am #

Hi Jason,

Thanks for the content.
Could you tell me how we could do grid search for a multi class classification problem?

I tried doing:
# create model
model = KerasClassifier(build_fn=neural, verbose=0)

# define the grid search parameters
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X_train, Y_train)
# summarize results
print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_[‘mean_test_score’]
stds = grid_result.cv_results_[‘std_test_score’]
params = grid_result.cv_results_[‘params’]
for mean, stdev, param in zip(means, stds, params):
print(“%f (%f) with: %r” % (mean, stdev, param))

but its giving me an error saying :
ValueError: Invalid shape for y: ()

I had one hot encoded the Y variable( having 3 classes)

Reply
- Jason Brownlee January 10, 2018 at 3:41 pm #
  
  Looks like you might need to one hot encode your output data.
  
  Reply
Budi January 15, 2018 at 4:03 am #

Another nice result..

Using TensorFlow backend.
2018-01-15 00:01:58.609360: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
Baseline: 97.33% (4.42%)

but, could you explain what the meaning of my CPU support instruction..

thanks alot..

Reply
- Jason Brownlee January 15, 2018 at 7:01 am #
  
  Well done.
  
  You can ignore that warning.
  
  Reply
kristi January 18, 2018 at 3:14 pm #

I’m getting accuracy 0f 33.3% only.I’m using keras2

Reply
- Jason Brownlee January 19, 2018 at 6:27 am #
  
  Perhaps try running the example again?
  
  Reply
Shivang January 19, 2018 at 1:20 am #

Hey Jason,
How would you handle the dummy variable trap? In this case, we have 3 categories by applying One hot encoding we get three columns but we can work with only two of them to avoid this dummy variable trap.
Please tell how is it handled here?

Reply
- Jason Brownlee January 19, 2018 at 6:34 am #
  
  What trap are you referring to?
  
  Reply
  - Shivang January 19, 2018 at 5:56 pm #
    
    Please refer this:
    http://www.algosome.com/articles/dummy-variable-trap-regression.html
    
    Reply
    - Jason Brownlee January 20, 2018 at 8:18 am #
      
      This is for inputs not outputs and is for linear models not non-linear models.
      
      Reply
Pradeep February 2, 2018 at 3:23 pm #

Hello Jason !! Thanx for explaining in such a nice way.
I am using the similar dataset, having multiple classes. But at the end, model give the accuracy.
How can I visualize the individual class accuracy in terms of Precision and Recall?

Reply
- Jason Brownlee February 3, 2018 at 8:33 am #
  
  You could collect the prediction in an array and compare them to the expected values using tools in sklearn:
  http://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics
  
  Reply
Rahul February 2, 2018 at 9:13 pm #

I want to plot confusion metrics to see the distribution of data in different classes. We got the value in the range of 0-1 for every data instances by using the softmax function.
Out[30]:
array([[ 0.2284117 , 0.03548411, 0.0659482 , 0.63993007, 0.03022591],
[ 0.10440681, 0.11356669, 0.09002439, 0.63514292, 0.05685928],
[ 0.40078917, 0.11887287, 0.1319678 , 0.30179501, 0.04657512],
…,
[ 0.38920838, 0.09161357, 0.10990805, 0.37070984, 0.03856021],
[ 0.14154498, 0.53637242, 0.11574779, 0.18590394, 0.02043088],
[ 0.17462374, 0.02110649, 0.03105714, 0.6064955 , 0.16671705]], dtype=float32)

I want to the result in only 0 and 1 format as the hight value is replaced by 1 and others are 0. How can I do this? For example, the above array should be converted into
[0,0,0,1,0] and so on for different data. Please help

Reply
- Jason Brownlee February 3, 2018 at 8:37 am #
  
  Perhaps apply the round() function?
  
  Reply
CHIRANJEEVI February 12, 2018 at 6:07 pm #

how can we predict output for new input values after validation ?

Reply
- Jason Brownlee February 13, 2018 at 8:00 am #
  
  See this post:
  https://machinelearningmastery.com/train-final-machine-learning-model/
  
  Once you have a final model you can call:
  
  yhat = model.predict(X)
  
  Reply
Meroua Daoudi February 19, 2018 at 12:12 am #

Hi jason,

in my problem i have multi class and one data object can belong to multiple class at time

Do you know of any reference to this kind of problem

Reply
- Jason Brownlee February 19, 2018 at 9:07 am #
  
  This is called multi-label classification:
  https://en.wikipedia.org/wiki/Multi-label_classification
  
  Reply
Madhav Bhattarai February 23, 2018 at 4:25 pm #

Hi Jason, as elegant as always. I am trying to solve the multiclass classification problem similar to this tutorial with the different dataset, where all my inputs are categorical. However, the accuracy of my model converges after achieving the accuracy of 57% and loss also converges after some point. My model doesn’t learn thereafter. Does this tutorial work for the dataset where all inputs are categorical? Is there some way to visualize and diagnose the issue?

Reply
- Jason Brownlee February 24, 2018 at 9:10 am #
  
  This post should give you some good ideas to try:
  https://machinelearningmastery.com/improve-deep-learning-performance/
  
  Reply
  - Madhav Bhattarai February 26, 2018 at 4:42 pm #
    
    Thank you so much.
    
    Reply
Yodish February 26, 2018 at 11:14 pm #

Is there a way I can print all the training epochs?

Reply
- Jason Brownlee February 27, 2018 at 6:29 am #
  
  Yes, you can set the verbose=1 when calling fit().
  
  Reply
ankitha February 27, 2018 at 4:42 pm #

HI Jason
Is it possible to train a classifier dynamically ?
if yes how can we implement that

Reply
- Jason Brownlee February 28, 2018 at 6:01 am #
  
  Yes, it is called online learning where the model is updated after each pattern.
  
  You can achieve this directly in Keras by setting the batch size to 1.
  
  Reply
Varoons March 2, 2018 at 5:00 am #

Thanks for these great tutorials Jason.

I had a question on multi label classification where the labels are one-hot encoded.

When predicting new data, how do you map the one-hot encoded outputs to the actual class labels?

Thanks!

Reply
- Jason Brownlee March 2, 2018 at 5:38 am #
  
  You can use argmax() on the vector to get the index with the highest probability.
  
  Also Keras has a predict_classes() function on the model that does the same thing.
  
  Reply
Gledson Melotti March 4, 2018 at 5:16 am #

Hi, how are you? I really enjoyed your example over sorting using iris dataset. I have some doubts. I use anaconda with python 3.6. I installed keras. In my algorithm and I would like to assign (include) more hidden layers. How should I do it?
For example:
4 inputs -> [8 hidden nodes] -> [8 hidden nodes -> [12 hidden nodes] -> 3 outputs

Then you provided, as a response to a comment, a new prediction algorithm (where we split the dataset, train on 67% and make predictions on 33%). However, you included in the network model the following command: init = ‘normal’ (line 28). Why did you do this? When you’ve split the set into training and testing, you no longer use cross-validation. Could you use cross-validation together with the training and test set division?

Other questions: How to save the training template to use in the future with other test data? How to generate the ROC curves?

Thank you very much for your attention.

Reply
- Jason Brownlee March 4, 2018 at 6:08 am #
  
  To add new lauyers, just add lines to the code as follows:
  
  model.add(...)
  
  1
  
  model.add(...)
  
  And replace … with the type of layer you want to add.
  
  I used ‘normal’ to initialize the weights. I found it gave better skill with some trial and error.
  
  Sorry, Id on’t have an example of generating roc curves for keras models.
  
  Reply
Gledson Melotti March 4, 2018 at 5:41 am #

Hi, how are you? I’m using python by spider-anaconda. I use your iris dataset example for sorting. However, when I use the following commands:
import matplotlib.pyplot as plt
import keras.backend as K
from keras import preprocessing
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.pipeline import Pipeline

I get the following message: imported but unused.
What should I do to not receive this message?

Thank you very much for your attention.

Reply
- Jason Brownlee March 4, 2018 at 6:08 am #
  
  You can ignore it.
  
  Reply
Mohannad Rateb March 5, 2018 at 12:12 am #

HI jason ,

Excellent tutorial.

i have a question concerning on the number of hidden nodes , on which basis do we know it’s value .
Thanks

Reply
- Jason Brownlee March 5, 2018 at 6:25 am #
  
  Use experimentation to estimate the number of hidden nodes that results in a model with the best skill on your dataset.
  
  Reply

Mo March 5, 2018 at 6:15 am #

Hi Jason,
So after building the neural network from the training data, I want to test the network with the new set of test data. How can I do that?

# import neural network libs
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline
from sklearn.cross_validation import train_test_split

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# load pima indians dataset
trainset = numpy.loadtxt("optdigits.tra", delimiter=",")
testset = numpy.loadtxt("optdigits.tes", delimiter=",")

# split into input (data) and output (labels) variables
data = trainset[:,0:64]
labels = trainset[:,64]

data_testset = testset[:,0:64]
labels_testset = testset[:,64]

# encode class values as integers
encoder = LabelEncoder()
encoder.fit(labels)
encoded_labels = encoder.transform(labels)
# convert integers to OneHot variables (i.e. one hot encoded)
OneHot_labels = np_utils.to_categorical(encoded_labels)

# define baseline model
def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(36, input_dim=64, init="uniform", activation='relu'))
    model.add(Dense(10, activation='softmax'))
    
    # Compile model
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# Fit the model
estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=10, verbose=0)

# evaluate the model using kFold cross validation with 20% of the data for testing and 80% for training
kfold = KFold(n_splits=5, shuffle=True, random_state=seed)
results = cross_val_score(estimator, data, OneHot_labels, cv=kfold)
print("\nOverall Validation accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))


# build the neural network from all the training set
estimator.fit(data, labels)
predictions = estimator.predict(data_testset)
print("\nPredeiction: \n" , predictions)
print("\nThe actual labels of the test set:\n" , labels_testset)

# build the confusion matrix after classifing the test data
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(labels_testset, predictions)
print ("\nThe confusion matrix when apply the test set on the trained nerual network:\n" , cm)

# import neural network libs

import numpy

import pandas

from keras.models import Sequential

from keras.layers import Dense

from keras.wrappers.scikit_learn import KerasClassifier

from keras.utils import np_utils

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import KFold

from sklearn.preprocessing import LabelEncoder

from sklearn.pipeline import Pipeline

from sklearn.cross_validation import train_test_split

# fix random seed for reproducibility

seed = 7

numpy.random.seed(seed)

# load pima indians dataset

trainset = numpy.loadtxt("optdigits.tra", delimiter=",")

testset = numpy.loadtxt("optdigits.tes", delimiter=",")

# split into input (data) and output (labels) variables

data = trainset[:,0:64]

labels = trainset[:,64]

data_testset = testset[:,0:64]

labels_testset = testset[:,64]

# encode class values as integers

encoder = LabelEncoder()

encoder.fit(labels)

encoded_labels = encoder.transform(labels)

# convert integers to OneHot variables (i.e. one hot encoded)

OneHot_labels = np_utils.to_categorical(encoded_labels)

# define baseline model

def baseline_model():

# create model

model = Sequential()

model.add(Dense(36, input_dim=64, init="uniform", activation='relu'))

model.add(Dense(10, activation='softmax'))

# Compile model

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

# Fit the model

estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=10, verbose=0)

# evaluate the model using kFold cross validation with 20% of the data for testing and 80% for training

kfold = KFold(n_splits=5, shuffle=True, random_state=seed)

results = cross_val_score(estimator, data, OneHot_labels, cv=kfold)

print("\nOverall Validation accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

# build the neural network from all the training set

estimator.fit(data, labels)

predictions = estimator.predict(data_testset)

print("\nPredeiction: \n" , predictions)

print("\nThe actual labels of the test set:\n" , labels_testset)

# build the confusion matrix after classifing the test data

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(labels_testset, predictions)

print ("\nThe confusion matrix when apply the test set on the trained nerual network:\n" , cm)

Jason Brownlee March 5, 2018 at 6:27 am #

You must fit a final model.

This post will make the concept clear:
https://machinelearningmastery.com/train-final-machine-learning-model/

Reply
- Partha Shankar Nayak February 19, 2019 at 3:52 pm #
  
  Mo has done that in line 57 I believe.
  
  Reply

Kashyap Raiyani March 21, 2018 at 9:24 pm #

Hello Jason,

I did undergo the page and all the posts. I am having trouble with encoding label list. Please find the details as follows:

Problem:
Input data set file contain 3 columns in the following format unique_id,text,aggression-level

The columns are separated by the comma and follow a minimal quoting pattern (such that only those columns are quoted which are in multiple lines or contain quotes in the text).

column 1: unique_id facebook id
column 2: post/text
column 3: aggression-level: OAG, CAG, and NAG

There are 12000 records

Code as follows:

texts = [] # list of text samples
labels = [] # list of label ids
csvfile = pd.read_csv(‘agr_en_train.csv’,names=[‘id’, ‘post’, ‘label’])
texts = csvfile[‘post’]
labels = csvfile[‘label’]
print(‘Found %s texts.’ % len(texts))

#label_encoding
encoder = LabelEncoder()
encoder.fit(labels)
encoded_Y = encoder.transform(labels)
dummy_y = np_utils.to_categorical(encoded_Y)
print(‘Shape of label tensor:’, dummy_y.shape)

After Training model:

model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
reduce_lr = ReduceLROnPlateau(monitor=’val_loss’, factor=0.5, patience=2, min_lr=0.000001)
print(model.summary())
model.fit(x_train, y_train, batch_size=256, epochs=25,validation_data=(x_val, y_val), shuffle=True, callbacks=[reduce_lr])

Below lines are giving Eorros:

predict = model.predict(x_val)
print(encoder.inverse_transform(predict))

Traceback (most recent call last):
  File "aggression_analysis_on_facebookv2.py", line 161, in 
    print(encoder.inverse_transform(predict))
  File "/usr/local/lib/python2.7/dist-packages/sklearn/preprocessing/label.py", line 151, in inverse_transform
    if diff:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

predict = model.predict(x_val)

print(encoder.inverse_transform(predict))

Traceback (most recent call last):

File "aggression_analysis_on_facebookv2.py", line 161, in

print(encoder.inverse_transform(predict))

File "/usr/local/lib/python2.7/dist-packages/sklearn/preprocessing/label.py", line 151, in inverse_transform

if diff:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I am getting the predictions in np array but I am not able to convert back to the 3 classes (OAG, CAG, NAG) for test data.

Can you please have look at it?

Many thanks in advance.

Esteban Vargas April 5, 2018 at 12:54 pm #

Jason this tutorial is just amazing! Thank you so much.

I want to ask you, how can this model be adapted for variables that measure different things? For example mixing lenghts, weights, etc.

Reply
- Jason Brownlee April 5, 2018 at 3:15 pm #
  
  Thanks.
  
  Provide all the variables to the model, but rescale all variables to the range 0-1 prior to modeling.
  
  Reply
Stuart Black April 7, 2018 at 6:32 am #

Hi Jason,

Thanks for the really helpful tutorial!

Can you recommend a good way to normalise the data prior to feeding it into the model? Half of my columns have data values in the thousands and others have values no greater than 10.

Thanks

S

Reply
- Jason Brownlee April 7, 2018 at 6:41 am #
  
  Yes, use the sklearn MinMaxScaler. I have many tutorials on the topic:
  https://machinelearningmastery.com/?s=MinMaxScaler&submit=Search
  
  Reply
Nikunj April 19, 2018 at 8:20 am #

Hi Jason! Thanks for the tutorial!
However I’m facing this problem –

Here is the code:

def baseline_model():
model = Sequential()
model.add(Dense(256, input_dim=90, activation=’relu’))
model.add(Dense(9, activation=’softmax’))
# learning rate is specified
keras.optimizers.Adam(lr=0.001)
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
return model

estimator = KerasClassifier(build_fn=baseline_model, epochs=50, batch_size=500, verbose=1)
estimator.fit(X, dummy_y)

Now, the output is :

150000/150000 [==============================] – 2s 12us/step – loss: 11.4893 – acc: 0.2870
Epoch 2/50
150000/150000 [==============================] – 2s 11us/step – loss: 11.4329 – acc: 0.2907
Epoch 3/50
150000/150000 [==============================] – 2s 10us/step – loss: 11.4329 – acc: 0.2907
Epoch 4/50
150000/150000 [==============================] – 2s 11us/step – loss: 11.4329 – acc: 0.2907
Epoch 5/50
150000/150000 [==============================] – 2s 11us/step – loss: 11.4329 – acc: 0.2907
Epoch 6/50
150000/150000 [==============================] – 2s 11us/step – loss: 11.4329 – acc: 0.2907
………………..
……………….

The loss and acc remain the same for the remaining epochs.
The no. of layers and activation type are specified.
Why is the loss remaining constant?

Reply
- Jason Brownlee April 19, 2018 at 2:46 pm #
  
  You may need to tune the model for your problem.
  
  Reply
  - Nikunj April 20, 2018 at 5:34 am #
    
    How can I do that Jason?
    
    Reply
    - Jason Brownlee April 20, 2018 at 6:01 am #
      
      I provide a long list of ideas here:
      https://machinelearningmastery.com/improve-deep-learning-performance/
      
      Reply
anand May 10, 2018 at 3:27 am #

hi jason thanks for this tutorial

when iam trying this tutorial iam getting an error message of

Using TensorFlow backend.
Traceback (most recent call last):
File “C:\Users\hp\AppData\Local\Programs\Python\Python36\keras example1.py”, line 29, in
model = KerasClassifier(built_fn = baseline_model,epochs=200, batch_size=5,verbose=0)
File “C:\Users\hp\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\wrappers\scikit_learn.py”, line 61, in __init__
self.check_params(sk_params)
File “C:\Users\hp\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\wrappers\scikit_learn.py”, line 75, in check_params
legal_params_fns.append(self.__call__)
AttributeError: ‘KerasClassifier’ object has no attribute ‘__call__’

and second what if i use numpy to load the dataset. “numpy.loadtxt(x.csv)”
and how to encode the labels

Reply
- Jason Brownlee May 10, 2018 at 6:35 am #
  
  I’m sorry to hear that, here are some ideas:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Voodoomonkey May 15, 2018 at 4:04 am #

Hello, Jason.
Been looking through some of your topics on deep learning with python.
They are very useful and give us a lot of information about using python with NN.
Thank you!

I’ve been trying to create a multi class classifier using your example but i can’t get it to work properly.

You see, i have approximately 20-80 classes and using your example i only get a really small accuracy rate.

My code looks like this (basically your code ) :

seed = 7
numpy.random.seed(seed)
# load dataset
dataframe = pandas.read_csv(“csv1.csv”, header=None)
dataset = dataframe.values
X = dataset[:,0:8]
Y = dataset[:,8:9]
print(X.shape)
print(Y.shape)
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
# define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(Dense(8, input_dim=8, activation=’relu’))
model.add(Dense(56, activation=’softmax’))
# Compile model
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
return model
estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print(“Baseline: %.2f%% (%.2f%%)” % (results.mean()*100, results.std()*100))

and my csv is :
https://drive.google.com/open?id=1KmTpLHHd8apXrqOK8UcJfr3MbqWMe9ok

Looking forward for your answer. This is very important for me and my future.

Reply
- Jason Brownlee May 15, 2018 at 8:07 am #
  
  Sorry, I cannot review your code, what problem are you having exactly?
  
  Reply
  - Voodoomonkey May 15, 2018 at 11:39 pm #
    
    Would it be easier to review like this ?
    https://pastebin.com/hYa2cpmW
    
    The problem i’m having is that using the code you provided with my dataset i get
    Baseline: 4.00% (6.63%)
    Which is really low, and i don’t see any ways to fix that.
    I’m trying to train it on 100 rows of data with 38 classes.
    
    If i try to use it with more data, the baseline drops even more.
    
    Is there a way to increase the percentage ? Maybe i’m doing something wrong ?
    It always come’s down to – every example you provide works, but when i try my own data – it doesn’t work.
    
    Can you please take a look at code and data, maybe ?
    
    Reply
    - Jason Brownlee May 16, 2018 at 6:05 am #
      
      Here are some suggestions to lift model skill:
      https://machinelearningmastery.com/machine-learning-performance-improvement-cheat-sheet/
      
      Reply
Shyam May 19, 2018 at 4:38 am #

I am running the code with the dependencies installed, but I am receiving this as an output.

C:\Users\shyam\Anaconda3\envs\tensorflow\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.

Shouldn’t it be printing more than just “using TensorFlow backend”? Any help would be greatly appreciated

Reply
- Jason Brownlee May 19, 2018 at 7:45 am #
  
  You can ignore this warning.
  
  Reply
Chrisa May 22, 2018 at 4:40 am #

Hello and thanks for this excellent tutorial.

I have a dataset with 150 attributes per entry. If an attribute is unknown for an entry, then in the csv file it is represented with a “?”. I suppose this will be a problem in the training phase. Can you suggest a way to handle his?

Reply
- Jason Brownlee May 22, 2018 at 6:30 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-handle-missing-data
  
  Reply
  - Chrisa May 22, 2018 at 7:30 am #
    
    Thank you very much
    
    Reply
Chrisa June 26, 2018 at 8:02 pm #

Hello again. I finally narowed down which of the 150 attributes I need to use, but now there is another problem. The attributes I need are in specific columns and of different datatype. I tried working with numpy.loadtxt and numpy.genfromtxt but the format of the resulting arrays is not the right one. I get the mistake:
ValueError: Error when checking input: expected dense_1_input to have shape (5,) but got array with shape (1,)
where 5 are the attributes I m using
Can you help me?

Reply
- Chrisa June 26, 2018 at 11:30 pm #
  
  I figured it out using:
  dataframe = pandas.read_csv(“IrisDataset.csv”, header=None, usecols = [0,1,2,3,5], dtype ={0:np.float32, 1:np.float32, 2:np.float32, 3:np.float32, 5: np.str })
  where the fifth column is one I added in order to check the string attributes.
  Now there is problem of how can I have strings as input for the neural
  
  Reply
  - Jason Brownlee June 27, 2018 at 8:19 am #
    
    Strings must be encoded, see this:
    https://machinelearningmastery.com/faq/single-faq/how-to-handle-categorical-data-with-string-values
    
    Reply
- Jason Brownlee June 27, 2018 at 8:18 am #
  
  Perhaps this post will help you load your data:
  https://machinelearningmastery.com/load-machine-learning-data-python/
  
  Reply
Kushagra July 2, 2018 at 12:14 pm #

Hey Jason!
Thank you for such awesome posts. Do you have tutorials or recommendations for classifying raw time series data using RNN GRU or LSTM?

Reply
- Jason Brownlee July 2, 2018 at 2:59 pm #
  
  1D CNNs are very effective for time series classification in my experience.
  
  Reply
Sanjeev Ranjan July 9, 2018 at 3:46 pm #

Please help:
Error when checking target: expected dense_6 to have shape (10,) but got array with shape (1,)

I have to do a multi-class classification to predict value ranging between 1 to 5
there are total of 46 columns. All columns have numerical values only.

model = Sequential()
model.add(Dense(64, activation=’relu’, input_dim=46)) #there are 46 feature in my dataset to be trained
model.add(Dropout(0.5))
model.add(Dense(64, activation=’relu’))
model.add(Dropout(0.5))
model.add(Dense(10, activation=’softmax’))

model.compile(optimizer=’rmsprop’, loss=’categorical_crossentropy’, metrics=[‘accuracy’])

model.fit(X_train, Y_train, epochs=20, batch_size=128)

I got error in last line.

Reply
- Jason Brownlee July 10, 2018 at 6:42 am #
  
  This might help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Chrisa July 10, 2018 at 11:58 am #

I tried adding this block of code in the end in order to test the model on new data,

estimator.fit(X, dummy_y)
predictions=estimator.predict(X)
correct=0

for i in range(np.size(X,0)):
if predictions[i].argmax()==dummy_y[i].argmax():
print (“%d well predicted\n” %i)
correct+=1
print (“Correct predicted: %d” %correct)

In fact, there is no new data. The test array X is the same as the training one, so I expected a very big number of corrects.. However the corrects are 50. After printing the predictions, I realized that all indexes are predicted as “Iris-setosa” which is the first label, so the rate is approximately 33.3%. Am I doing something wrong?

Reply
- Jason Brownlee July 10, 2018 at 2:29 pm #
  
  I explain how to make predictions on new data here:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  Reply
Alex July 11, 2018 at 1:13 am #

Thanks for the awesome tutorial

One question, now that I have the model, how can I predict new data.

Imagine I have now this scenario

1. flowers.csv with 4 rows of collected data (without the labels)

Now I want to feed the csv to the model to have the predictions for every data

Reply
- Jason Brownlee July 11, 2018 at 5:59 am #
  
  This post explains more on how to make predictions:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  Reply
Alex July 11, 2018 at 1:36 am #

I tried this for predictions
# load dataset
dataframe2 = pandas.read_csv(“flowers-pred.csv”, header=None)
dataset2 = dataframe.values
# new instance where we do not know the answer
Xnew = dataset2[:,0:4].astype(float)
# make a prediction
ynew = model.predict_classes(Xnew)
# show the inputs and predicted outputs
print(“X=%s, Predicted=%s” % (Xnew[0], ynew[0]))

And I get the result
X=[4.6 3.1 1.5 0.2], Predicted=1

Sometimes the values of X does not correspond to the real values of the file and always the prediction is 1.

Because is one hot encoding I supouse the prediccion should be 0 0 1 or 1 0 0 or 0 1 0

Reply
- Jason Brownlee July 11, 2018 at 6:00 am #
  
  All models have error. You can try improving the performance of the model.
  
  Reply
Alex July 11, 2018 at 1:49 am #

I found what I was doing wrong,

estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)

results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print(“Baseline: %.2f%% (%.2f%%)” % (results.mean()*100, results.std()*100))

model = baseline_model()

# load dataset
dataframe2 = pandas.read_csv(“flores-pred.csv”, header=None)
dataset2 = dataframe.values
# new instance where we do not know the answer
Xnew = dataset2[:,0:4].astype(float)
# make a prediction
ynew = model.predict(Xnew)
# show the inputs and predicted outputs
print(“X=%s, Predicted=%s” % (Xnew[2], ynew[2]))

Now this works, but all the predictions are almost the same
X=[4.7 3.2 1.3 0.2], Predicted=[0.13254479 0.7711002 0.09635501]

NO matter wich flower is in the row, I always gets 0 1 0

Reply
- Jason Brownlee July 11, 2018 at 6:00 am #
  
  Perhaps there’s a bug in the way you are making predictions?
  
  Reply
Ansh July 22, 2018 at 12:30 am #

Hi Jason,

I was just wondering. rather than one hot encoding 3 categories as shown below.

Iris-setosa, Iris-versicolor, Iris-virginica
1, 0, 0
0, 1, 0
0, 0, 1

Can’t we change the three categories.
Y Y1
Iris-setosa 0 0
Iris-versicolor 0 1
Iris-virginica 1 0

and if we could what will be the core difference in training the models using the above two mentioned ways.

Reply
- Jason Brownlee July 22, 2018 at 6:25 am #
  
  I don’t follow, what would a model predict?
  
  Reply
Felipe July 25, 2018 at 12:20 am #

Hi Jason, great tutorial, thanks.
Do you know some path to use ontology (OWL or RDF) like input data to improve a best analise?

Reply
- Jason Brownlee July 25, 2018 at 6:19 am #
  
  I don’t sorry.
  
  Reply
Shooter August 9, 2018 at 7:35 pm #

Hi Jason, what if X data contains numbers as well as multiple classes?

Thanks in advance.

Reply
- Jason Brownlee August 10, 2018 at 6:12 am #
  
  X is the input only, y contains the output or the classes.
  
  Reply
Shooter August 10, 2018 at 3:14 pm #

I mean what if X contains multiple labels like “High and Low”? We need to use one hot encoding on that X data too and continue other steps in the same way?

Reply
- Jason Brownlee August 11, 2018 at 6:05 am #
  
  If you are working with categorical inputs, you will need to encode them in some way.
  
  Reply
Shooter August 10, 2018 at 7:00 pm #

Hi jason, It seems you have already answered my question in one of the comments. I need to convert the categorical value into one hot encoding then create dummy variable and then input it. Thanks.

Reply
- Jason Brownlee August 11, 2018 at 6:07 am #
  
  Yes.
  
  Reply
Shooter August 10, 2018 at 8:23 pm #

Hi, I wanted to ask again that using K-fold validation like this

kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, dummy_y, cv=kfold)

or using train/test split and validation data like this

x_train,x_test,y_train,y_test=train_test_split(X,dummy_y,test_size=0.33,random_state=seed)

estimator.fit(x_train,y_train,validation_data=(x_test,y_test))

These are just sampling techniques, we can use any one of them according to the availability and size of data right?

Reply
- Jason Brownlee August 11, 2018 at 6:08 am #
  
  Yes.
  
  Reply
  - Shooter August 17, 2018 at 6:17 pm #
    
    Thanks.
    
    Reply
Shooter August 17, 2018 at 10:03 pm #

Can u please provide one example of multilabel multi-class classification too?

Reply
- Jason Brownlee August 18, 2018 at 5:38 am #
  
  Thanks for the suggestion.
  
  Reply
Shooter August 18, 2018 at 3:48 pm #

All examples i have seen so far in LSTM are related to classifiying imdb datasets or vocabulary like that. There are no simple examples to describe classification using LSTM. Can u please provide one example doing the same above iris classification using LSTM so that we can have a general idea.

Thanks in advance.

Reply
- Jason Brownlee August 19, 2018 at 6:17 am #
  
  LSTMs are for sequence data. For classification, this means sequence classification or time series classification.
  
  Does that help?
  
  You cannot use LSTMs on the Iris flowers dataset for example. Learn more here:
  https://machinelearningmastery.com/when-to-use-mlp-cnn-and-rnn-neural-networks/
  
  Reply
  - Shooter August 24, 2018 at 5:49 pm #
    
    Thanks Jason. I have another question. I have total of 1950 data. Will it be enough if i train/test split into 90:10 ratio i.e 1560 data for training,195 for validation and 195 for testing. If i decrease training data, accuracy starts decreasing.
    
    Reply
    - Jason Brownlee August 25, 2018 at 5:47 am #
      
      It is impossible for me to say, try it and see.
      
      Reply
Shooter August 25, 2018 at 7:08 pm #

Ok thanks, I’ll try it. Another question, How can i calculate accuracy of the model using sum of squared errors. I need to compare a model that gives sum of squared errors in regression with my model that gives output in accuracy that is a classification problem.

Reply
- Jason Brownlee August 26, 2018 at 6:26 am #
  
  Sum squared errors is for regression, not classification.
  
  For metrics, you can use sklearn to calculate anything you wish:
  http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics
  
  Reply
Jet September 4, 2018 at 5:41 pm #

Hi Jason,
awesome page & tutorials!

Is there a way to do stratified k-fold cross-validation on multi-label classification, or at least k-fold cross-validation?

Reply
- Jason Brownlee September 5, 2018 at 6:30 am #
  
  There may be, I don’t have any multi-label examples though, sorry.
  
  Reply
- Manuel Gonçalves November 23, 2018 at 6:39 am #
  
  Use k-Fold on your Y and put the indexes on your one-hot-encodered. Something like this:
  
  df = pandas.read_csv, slice, blah blah blah
  X = slice df etc..etc..
  y = slice df etc..etc..
  
  dum_y = np_utils.to_categorical(y) #from keras
  
  #now you have y and dum_y that is one-hot-encodered
  
  skfold = StratifiedKFold(n_splits=10, random_state=0) #create a stratified Kfold
  for train, test in skfold.split(X, y): #note that you are spliting the y without one-hot just to get indexes
  model = Sequential()
  model.add(Dense(blah blah blah)
  …
  #compile
  model.compile(blah blah blah)
  #now the magic, use indexes on one-hot-encodered, since the indexes are the same
  model.fit(X[train], dum_y[train], validation_data=(X[test], dum_y[test]), epochs=250, batch_size=50,verbose=False)
  #do the rest of your code
  #the model will be created and fitted 10 times
  
  Reply
sathvik September 21, 2018 at 2:24 pm #

That was really an excellent article..
can i implement CNN for feature Extraction from images then save the extracted features and apply SVM or XG boost for binary classification..please share the code to serve the purpose..thanks a lot..

Reply
- Jason Brownlee September 22, 2018 at 6:24 am #
  
  Yes! I show how to use a VGG model to extraction features for describing the contents of photos. For example, the last part of this tutorial:
  https://machinelearningmastery.com/prepare-photo-caption-dataset-training-deep-learning-model/
  
  Reply
George October 22, 2018 at 10:07 pm #

Hi Jason and many thanks for your helpful posts.

I haven’t find any multilabel classification post, so I am posting on this.

I have a problem to interpret the results of multi label classification.

Let’s say I have this problem.I have images with structures (ex building)

structure: 0 is there is no structure , 1 if it is
type: 3 different types of structures (1,2,3)
nb of structure

So, I have data:

labels = np.array([[0,’nan’, ‘nan’],
[1, 2, 2],
[1, 3, 1],
[1, 1, 1]])

When I have no structure all rest values are nan.

The second line means I have a structure of type 2 and also have 2 structures.
The third line means, I have a structure of type 3 and it is just one.
The fourth means I have a structure of type 1, just one.

I am applying the mlb:

mlb = MultiLabelBinarizer()
labels = mlb.fit_transform(labels)

and the mlb classes is :

array([‘0’, ‘1’, ‘2’, ‘3’, ‘nan’], dtype=object)

My test data is for example: [1, 2, 2] // 1: there is a structure, 2: of type 2, 2: there are 2 structures in the image

And my result predict array is : [20,10,2,4,50]

The problem is what 20 means.Is there a structure or not?Test data has the value 1 which means there is structure.So,
I 20% means possibility to have structure?

The 10 means that we have 10% possibility to be of type 1, then 2% to be of type 2 and 4% to be of type 3.
The 50% means that there is a possibility 50% to have how number of faces??? Two faces?as the test data says?

If there is no structure, the test array will be ([0, ‘nan’, ‘nan’])
So, the same prediction : [20,10,2,4,50]

What 20% means?There is , or there is not a structure?
The 10,2,4 are the possibilities of type 1,2,3
The 50% is for the number of structures.But, is 50% for no structures , or for some number?

So, I have problem with first and last indices.

Thank you very much!

George

Reply
- Jason Brownlee October 23, 2018 at 6:24 am #
  
  Sorry, I don’t have material on multi-label classification, so I can’t give useful off the cuff advice on the topic. I hope to cover it in the future.
  
  Reply
  - George October 23, 2018 at 7:50 am #
    
    Ok, thanks maybe I’ll post on stackoverflow if someone can help.Thanks.
    
    Reply
chris December 22, 2018 at 4:18 am #

hi Jason ,thanks for this amazing article.I want to predict the number of passengers in diferent airports.i am given the date ,airport departure,airport arrival , city ,longitude etc.I want to use neural network since the problem is not linear but i am having dificulty finding the right model.Everything that i uses gives me acc 0.42 max.Any suggestions?

Reply
- Jason Brownlee December 22, 2018 at 6:07 am #
  
  I have some suggestions here:
  https://machinelearningmastery.com/improve-deep-learning-performance/
  
  Reply
DeeB December 29, 2018 at 12:16 pm #

Dr J,

Thanks for all your hard work and contribution. They are immensely useful.

One quick question, how to cross plot y_pred (which is a vector) and dummy_y (which is a tuple etc,) to test how good the prediction is? It gives obvious error msg of size mismatch.

Thanks,

Reply
- Jason Brownlee December 30, 2018 at 5:34 am #
  
  Perhaps change both pieces of data to have the same dimensionality first?
  
  Reply
FlávioJFPereira December 31, 2018 at 7:19 am #

Hey Jason! What a nice tutorial! I’m needing some advice for an academic project. Instead of classification between 3 classes, like in your problem, I got 5 classes and my target has a probability of belonging to each of these 5 classes!

What are you advices for my network implementation? I mean, how should my output layer be to return the probabilities?

Thanks

Reply
- Jason Brownlee December 31, 2018 at 11:13 am #
  
  Use a softmax activation function on the output layer.
  
  Reply
  - FlávioJFPereira December 31, 2018 at 12:04 pm #
    
    still using categorical_crossentropy as loss function? or something like mse?
    
    Reply
    - Jason Brownlee January 1, 2019 at 6:11 am #
      
      Yes, categorical cross entropy loss is used for multi-class classification.
      
      Reply
Sajad January 20, 2019 at 12:54 am #

Thanks for your great explanation
Is there any code for getting back from ‘dummy y’ one hot matrix to the actual ‘y’ vector?

Reply
- Jason Brownlee January 20, 2019 at 5:40 am #
  
  Yes, you can use the argmax() function.
  
  Reply
EmmenTrap February 7, 2019 at 12:38 am #

HI , Thanks for your great tutorial sir, I have used this code for my project to classification of rise seed varieties, the classifier has 15 classes and i have received the 90% accuracy. now i need to get prediction with the trained model, so can you help me that ho to get the prediction with unknown data for multi-class classification
thank you

Reply
- Jason Brownlee February 7, 2019 at 6:40 am #
  
  Yes, I explain how here:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  Reply
rio suneth February 26, 2019 at 2:36 pm #

is there an example of a classification model for networking traffic to detect botnets on a computer network package, thanks

Reply
- Jason Brownlee February 27, 2019 at 7:24 am #
  
  There might be, I’m not aware of it sorry. Perhaps try a search on scholar.google.com.
  
  Reply
ZAINAB SHEERIN M S February 27, 2019 at 12:10 am #

Hey!!!
I’m working on medical data, with the same model done here.
I have data in 3 different files that is normal, bacterial pneumonia and viral pneumonia with images in it.
instead of using csv file in the directory, how can I do it with my data.
kindly do the needful.

Reply
- ZAINAB SHEERIN M S February 27, 2019 at 12:14 am #
  
  That 3 different files is in train,test and validation categories
  each.
  
  Reply
  - Jason Brownlee February 27, 2019 at 7:31 am #
    
    Yes, you can use the Keras flow_from_directory() function:
    https://keras.io/preprocessing/image/
    
    I hope to have an example of this very soon.
    
    Reply
pablo March 5, 2019 at 12:02 am #

Jason

Run perfectly¡…thank you very much for you time and interesting for helping us¡.

Reply
- Jason Brownlee March 5, 2019 at 6:40 am #
  
  Well done!
  
  Reply
Andrew March 9, 2019 at 2:51 am #

Hey Jason,

Your guides have been a tremendous help to me. Unfortunately, I’m coming from an applied science background and don’t quite fully understand LSTMs. I’ve run a Random Forest classifier on my data and already gotten a 92% accuracy, but my accuracy is absolutely awful with my LSTM (~11%, 9 classes so basically random chance). My data is 4500 trials of triaxial data at 3 joints (9 inputs), time series data, padded with 0s to match sequence length.

This is my code:
model.add(Masking(mask_value=0., input_shape=(366,9)))
model.add(LSTM(10,input_shape=(366,9),return_sequences=True, activation=’tanh’))
model.add(LSTM(10,return_sequences=False,activation=’tanh’))
model.add(Dense(units=9,activation=’softmax’))
model.compile(loss=’categorical_crossentropy’, optimizer = ‘adam’, metrics=[‘accuracy’])
history = model.fit(xtrain_nots,ytrain, epochs=400, batch_size=100)

This is what my training accuracy looks like:
https://i.imgur.com/tCZUlNi.png

Is it possible that I just don’t have enough data? Would greatly appreciate some help on figuring out how to improve accuracy.

Thanks

Reply
- Jason Brownlee March 9, 2019 at 6:31 am #
  
  It is possible that the LSTM is just not a good fit for your data.
  
  Compare an MLP and CNN, as well as hybrids like CNN-LSTM and ConvLSTM. You can get started here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
ismetb March 9, 2019 at 6:51 am #

Hi Jason. First of all, thanks for all the great effort you put in ML. I’ve learnt a great deal of things from you.

My question is, after using LabelEncoder to assign integers to our target instead of String, do we have to use OHE? I mean, after “encoded_Y = encoder.transform(Y)” code, I have a target of single column and 3 classes all of which are integer. Why do we go further and make the target 3 columns?

Is there any difference between; a) using single column as target and using 1 neuron at output layer along with softmax and b) using 3 columns as target and using 3 neurons at output layer along with softmax.

I know OHE is mainly used for String labels but if my target is labeled with integers only (such as 1 for flower_1, 2 for flower_2 and 3 for flower_3), I should be able to use it as is, am I wrong?

Regards

Reply
- Jason Brownlee March 10, 2019 at 8:09 am #
  
  The idea of a OHE is to treat the labels separately, rather than a linear continuum on one variable (which might not make sense, e.g. what is 1.5?).
  
  You don’t have to OHE, try it and see if it improves performance.
  
  Reply
  - ismetb March 11, 2019 at 7:11 pm #
    
    Thanks Jason for the reply. I’ve always thought, predicting 1.5 was equal to [0, 0.5, 0.5] categorical prediction which means 50-50 chance for classes 1 and 2.
    
    Then what about binary classification (BC)? Is there any difference between 0 and 1 labelling (linear conitnuum of one variable) and categorical labelling? I have never seen anyone try categorical labelling for BC (and I intend to try) but I would like to learn your thought on this.
    
    And for BC, would you suggest [0, 1] or [-1, 1] for labels? Would it make any difference?
    
    Regards
    
    Reply
    - Jason Brownlee March 12, 2019 at 6:48 am #
      
      Typically, a one hot encoding for binary classification is equivalent to predicting a probability 0-1.
      
      Reply
Tim March 18, 2019 at 3:56 am #

Hello, Jason!

How can I do step-by-step debugging for functions (Kfold, KerasClassifier, hidden layer) to see intermediate values?

Reply
- Jason Brownlee March 18, 2019 at 6:08 am #
  
  Good question.
  
  It might be easier to use the Keras API and the KFold class directly so that you can see what is happening.
  
  Reply
MtimIL March 25, 2019 at 9:37 pm #

Hello, Jason.

1) After learning the neural network I get the following weights:

[[-0.04067891 -0.01663 0.01646814 -0.07344743]
[ 0.02537021 -0.03948928 0.00033538 -0.1734132 ]
[ 0.06725066 0.07520587 0.04672117 0.03763839]
[ 0.02950417 0.02176755 -0.023499 0.05072991]] [0. 0. 0. 0.]

[[ 0.00432587 -0.04444616 0.02091608]
[ 0.01232713 -0.02063667 -0.07363331]
[ 0.04093491 -0.0216442 -0.05544085]
[ 0.08577123 -0.03977689 0.02796889]] [0. 0. 0.]

Why is bias zero and the weights values are very small ?

Code:
# define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(Dense(4, input_dim=4, kernel_initializer=’normal’, activation=’relu’))
model.add(Dense(3, kernel_initializer=’normal’, activation=’sigmoid’))
# Compile model
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
return model

estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)

kfold = KFold(n_splits=10, shuffle=True, random_state=seed)

model = baseline_model()

results = cross_val_score(estimator, X, dummy_y, cv=kfold)

print(model.layers[0].get_weights()[0], model.layers[0].get_weights()[1])
print(model.layers[1].get_weights()[0], model.layers[1].get_weights()[1])

print(“Accuracy: %.2f%% (%.2f%%)” % (results.mean()*100, results.std()*100))

2) How can I get (output on screen) the values as a result of the activation function for the hidden and output layer ?

Reply
- Jason Brownlee March 26, 2019 at 8:05 am #
  
  Why? That is what the model learned – that’s the best we can say.
  
  You can make each layer an output layer via the functional API, then collect all of the activations. This might help as a start:
  https://machinelearningmastery.com/keras-functional-api-deep-learning/
  
  Reply
  - MtimIL March 27, 2019 at 4:04 am #
    
    Thank You very much!
    
    Reply
James Lee May 23, 2019 at 3:47 am #

Thank you for the excellent tutorial as always!
Do you mind clarifying what output activation and loss function should be used for multilabel problems?
For example, tagging movie genres with comedy, thriller, crime, scifi. They are not mutually exclusive. A movie can be tagged with all 4.
Then I could hot encode like [1, 0, 0, 0], [1, 1, 0, 0], [1, 1, 1, 0] [1, 0, 1, 0], and so on.
What would be the best combination in this case: activation (softmax vs sigmoid) and loss (binary_crossentropy vs categorical_crossentropy)?
What makes sense most to me is sigmoid activation (not exclusive) + binary_crossentropy (treat each output neuron as binary problem), but I’ve read multiple stackoverflow and other articles suggesting conflicting informations.
Thank you!

Reply
- Jason Brownlee May 23, 2019 at 6:07 am #
  
  Yes, I given an example of multi-label classification here:
  https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-to-classify-satellite-photos-of-the-amazon-rainforest/
  
  Reply
Somaye Hamedi Bazaz June 15, 2019 at 4:57 am #

very well Thanks a lot

Reply
- Jason Brownlee June 15, 2019 at 6:39 am #
  
  Thanks, I’m happy to hear that.
  
  Reply
Emma June 17, 2019 at 9:46 pm #

Hi Jason, how can I add “none of the above” class in neural network? I searched on the net but didn’t find anything useful.

Reply
- Jason Brownlee June 18, 2019 at 6:39 am #
  
  It would be one more class, e.g.: apple, orange, none.
  
  You would then need to add examples of this new “none” class.
  
  Reply
Esra Karasu July 10, 2019 at 6:10 am #

Hi, I have a question for you. For this study, I wrote code of performance measures such as confusion matrix, precision, recall and f-score. But it gave me the following error. I’d be very happy if you could help.

Code:
confusion matrix
Y_pred = baseline_model.predict(X)
Y_pred_classes=np.argmax(Y_pred, axis=1)
Y_true= np.argmax(Y, axis=1)
confusion_mtx= confusion_matrix (Y_true, Y_pred_classes)
fig,ax= plt.subplots(figsize=(8,8))
sns.heatmap(confusion_mtx, annot=True, linewidths=0.01, cmap=’Greens’, linecolor=’gray’, fmt=’.1f’, ax=ax)
plt.ylabel(‘Gerçek Sınıf’)
plt.xlabel(‘Tahmin Edilen Sınıf’)

# accuracy: (tp + tn) / (p + n)
accuracy = accuracy_score(Y_true, Y_pred_classes)
print(‘Accuracy: %f’ % accuracy)
# precision tp / (tp + fp)
precision = precision_score(Y_true, Y_pred_classes, average=”macro”)
print(‘Precision: %f’ % precision)
# recall: tp / (tp + fn)
recall = recall_score(Y_true, Y_pred_classes, average=”macro”)
print(‘Recall: %f’ % recall)
# f1: 2 tp / (2 tp + fp + fn)
f1 = f1_score(Y_true, Y_pred_classes, average=”macro”)
print(‘F1 score: %f’ % f1)
plt.show()

ERROR:
AttributeError: ‘function’ object has no attribute ‘predict’

Reply
- Jason Brownlee July 10, 2019 at 8:20 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
  
  Reply
  - Esra Karasu July 10, 2019 at 5:56 pm #
    
    Thank you. Code is running. It gives accuracy. But it doesn’t give the confusion matrix. I can’t find my mistake.
    
    Y_pred = baseline_model.predict(X)
    Y_pred_classes=np.argmax(Y_pred, axis=1)
    Y_true= np.argmax(Y, axis=1)
    
    There’s an error here.
    
    Reply
    - Jason Brownlee July 11, 2019 at 9:45 am #
      
      Perhaps use the sklearn function:
      https://machinelearningmastery.com/how-to-calculate-precision-recall-f1-and-more-for-deep-learning-models/
      
      Reply
Jiawei Zhang July 12, 2019 at 12:18 pm #

Hi Jason!

Much thanks to your tutorials (I finished my first fully functional lstm classification project)

I have a simple question about keras LSTM binary classification, it might sounds stupid but I am stuck.
My train_y and test_y are now values of {0,1,2,4}. I want to set the binary output label 0 if{0,1} 1 if {2,4}. Could you give me some advice on how to do the data preprocessing please ?

Thank you so much!

Reply
- Jason Brownlee July 13, 2019 at 6:51 am #
  
  Perhaps try defining your data manually?
  Perhaps try defining your data programatically?
  Perhaps try defining your data in excel?
  
  Reply
anes ouadou August 6, 2019 at 12:49 pm #

Hi Jason,
thank you for your posts, I learned a lot from them. I have a multi class classification problem with three classes. I am currently trying to create the data for the training. the problem is the items belonging to each class are very close to each other to the point that when I extract one element that belongs to class 1 I will have a part of another element that belongs to class 2 or class 3. my question is, is it okay to have a part of an element that belong to one class appear in an instance that belongs to another class.

Reply
- Jason Brownlee August 6, 2019 at 2:06 pm #
  
  Perhaps you can locate or devise additional features that help to separate the instances/samples?
  
  Reply
  - anes ouadou August 7, 2019 at 12:39 am #
    
    the instances are extracted from a 3-D density map. Each instance is a type of atom that are located close to each other. I am trying to create a model to detect each atom (3 atoms) so that I can later find the optimal path between them (distance between atoms matters). I am treating the problem as multi-class classification. what do you recommend I do
    
    Reply
    - Jason Brownlee August 7, 2019 at 7:59 am #
      
      Intersting. Not sure what you’re trying to achieve exactly, optimal paths in n-dimensional space (e.g. 3d) sounds like a spanning tree or kd tree or similar would be more appropriate.
      
      Reply
      - anes ouadou August 7, 2019 at 11:01 am #
        
        The problem is for protein tertiary structure prediction. a protein is a series of amino acids. There are these three atoms that appear in each amino acid. So I am trying to detect them so that later on I can find the optimal path. Predicting the correct location of these atoms facilitate the building of the path.
      - Jason Brownlee August 7, 2019 at 2:21 pm #
        
        Perhaps distance between points, e.g predict membership of new point based on a distance measure, like euclidean distance?
Doron August 6, 2019 at 6:57 pm #

Hi Jason,

I love reading your posts. Extremely helpful and well detailed.

I am currently working on a multiclass-multivariate-multistep time series forecasting project using LSTM’s and its other variations using Keras with Tensorflow backend. I was wondering perhaps you posted an article about it/something similar that I can use as a reference.

The closest one I have found (over the internet) was a post by you:

https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/

However that did not include this specific problem statement. Any advice?

Much appreciated!

Reply
- Jason Brownlee August 7, 2019 at 7:44 am #
  
  I do have examples of multi step, multivariate and time series classification, but not all together.
  
  You can draw together the elements needed from the tutorials here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
JG August 6, 2019 at 9:54 pm #

Hi Jason,

Great (2016 old Tutorial), that I used to explore new facts. Let me share with you.

1) When I used KerasClassifier (within cross_val_score for Kfold partition) I repeat your results of 97.3% Acc and 4.4 for sigma (std deviation), but I also train a model (manually and I obtain Acc = 100%. So it is clear the effect of Kfold statistical partition that average results of many cases. do you agree?

2) I changed the module ‘keras.utils.np.utils.to_categorical’ to more direct ‘keras.utils.to_categorical’.same results. And using now ‘model Api keras’ instead of ‘sequential’ for more versatility.

3) I applied the Pipeline module to include ‘standardize’ options such as MinMaxScaler, StandardScaler, for Iris Input X data preprocessing. But I always get a little be worst results (96% Acc and 5.3 Sigma)…I am surprised about it! any idea why?

4) The most sensitive analysis I perform in comparison with your results is when apply ‘validation-split’ e.g. 0.2 instead of your default of 0.0 as argument of KerasClassifier…in that case Acc Kfol d(average) get down to 94.7% . I guess subtracting sample from training to allocate unsee validation sample must be the cause…do you agree?

5) I also confirme that if instead of using binary matrix of Iris Output (‘onehotencoding’) I use integer class values of Iris for training…I get worse results, as you anticipated it (i get down from 97% Acc to 88.7% Acc). OK.

6) I also implement ‘GaussianNoise’ function of keras layer to get better performance (some kind of data augmentation that simulate more sample data of Iris)…But always get ‘little’ be worst results or equal as maximum in some cases…any explanation?

Jason one more time thank you for your ‘scriplet’ fully codes that are inside any tutorial, as case study, that could be explore right away, numerically and conceptually, in many ways.

JG

Reply
- Jason Brownlee August 7, 2019 at 7:56 am #
  
  Good questions!
  
  I would go with the k-fold result, in practice data samples are noisy, you want a robust score to reflect that.
  
  Scaling is not a silver bullet, always good to check with and without, especially when using relu activations.
  
  Changing the form of the output would require a change to loss function as well. categorical cross entropy for categorical distribution is a gold standard for a reason – it works really well.
  
  Try shrinking the amount of noise down so that the samples don’t overlap too much across classes.
  
  Reply
  - JG August 14, 2019 at 7:59 pm #
    
    wise answers Jason I appreciate your continuous engagement to share and give support to these tutorials…
    
    Reply
joker August 13, 2019 at 5:06 am #

This site was… how do you say it? Relevant!! Finally I’ve found something that helped me.
Many thanks!

Reply
- Jason Brownlee August 13, 2019 at 6:14 am #
  
  Thanks, I’m glad it helps.
  
  Reply
Pooja August 19, 2019 at 1:22 am #

hi,
I m doing work on EMG classification where I have 3 different types of EMG time series data named as myopathy, neuropathy, healthy data. my task is to build a model that classifies different EMG.

so my question is can I classify my data without attribute of data .if no then please let me how I can find the different attribute of my data and feed to network

Reply
- Jason Brownlee August 19, 2019 at 6:11 am #
  
  To train a supervised learning model, you must have input data and a label or real value as output.
  
  If you are working with time series classification data, you can get started here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Layne August 30, 2019 at 1:24 pm #

Thank you so much! I love it.

10 is a lot of cv folds for such a small dataset. Feels like the folds would be too small to get 10 good chunks that represent the data. I went with 3 and got Baseline: 98.00% (1.63%).

Have you written other more advanced keras classification tutorials?

Reply
- Jason Brownlee August 30, 2019 at 2:17 pm #
  
  Thanks. Yes, you could be right, 15 examples per fold is small.
  
  Yes, some of the computer vision examples are more advanced:
  https://machinelearningmastery.com/start-here/#dlfcv
  
  The time series examples are more advanced as well:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  What are you looking for exactly?
  
  Reply
  - Layne August 30, 2019 at 2:29 pm #
    
    Thanks Jason! Well, I am gearing up for a project to automatically classify DNA mutations (MB of labeled data, not GB). There are 4 categories of the impact column with subcategories of each
    https://useast.ensembl.org/info/genome/variation/prediction/predicted_data.html
    
    So I am looking to learn things like “how many layers and nodes should i have” and “what are other important feature engineering tools aside from StandardScaler().”
    
    Here is a slice of the data (not the real dataset)
    https://www.kaggle.com/kevinarvai/clinvar-conflicting
    
    Reply
    - Jason Brownlee August 31, 2019 at 6:00 am #
      
      This might help:
      https://machinelearningmastery.com/faq/single-faq/how-many-layers-and-nodes-do-i-need-in-my-neural-network
      
      Reply
- Layne Sadler August 30, 2019 at 2:20 pm #
  
  A second run with the same settings 98.67% (0.94%)
  
  Reply
Mehrab September 9, 2019 at 4:59 pm #

in this model, how i can generate classification report like precision & recall value

Reply
- Jason Brownlee September 10, 2019 at 5:36 am #
  
  See this post:
  https://machinelearningmastery.com/how-to-calculate-precision-recall-f1-and-more-for-deep-learning-models/
  
  Reply
wan September 24, 2019 at 5:26 am #

Hi Jason,

Sorry I’m new to this. I have 2 question.

Is it important for the dataset in CSV file?
If i have set of dataset image in .png, how to modify the coding?

Thank you =)

Reply
- Jason Brownlee September 24, 2019 at 7:55 am #
  
  Yes, this tutorial will show you how to load images:
  https://machinelearningmastery.com/how-to-load-convert-and-save-images-with-the-keras-api/
  
  And this:
  https://machinelearningmastery.com/how-to-load-and-manipulate-images-for-deep-learning-in-python-with-pil-pillow/
  
  Reply
majimomi October 25, 2019 at 12:25 am #

What is a point for introducing scikit-learn here? We could just stick to Keras to train our model using Keras?

# Define loss function and optimization technique
model.compile(
optimizer=’adam’,
loss=’categorical_crossentropy’,
metrics=[‘accuracy’],
)

# Train the model
history = model.fit(X,dummy_Y,epochs=200, batch_size=5, verbose=0)

# evaluate the keras model
_, accuracy = model.evaluate(X, dummy_Y)
print(‘Accuracy: %.2f’ % (accuracy*100))

Reply
- Jason Brownlee October 25, 2019 at 6:40 am #
  
  Yes, you can use Keras directly.
  
  The wrapper helps if you want to use a pipeline or cross validation.
  
  Reply
nani November 26, 2019 at 7:55 pm #

i have a data in 40001 rows and 8 columns in that how to take input layer size and hidden layer layers
i’m taking
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation=’relu’))
model.add(Dense(8, activation=’relu’))
model.add(Dense(1, activation=’sigmoid’))
this is correct t worng?
how?
i did n’t understanding neural network?
plz help me?

Reply
- Jason Brownlee November 27, 2019 at 6:03 am #
  
  Sounds like a good start, perhaps then try tuning the model in order to get the most out of it.
  
  There are some suggestions here:
  https://machinelearningmastery.com/improve-deep-learning-performance/
  
  Reply
nani November 27, 2019 at 5:19 pm #

thank you for valuable time….

Reply
- Jason Brownlee November 28, 2019 at 6:32 am #
  
  You’re welcome.
  
  Reply
nani November 28, 2019 at 3:51 pm #

i have a data training data 40001 rows and 8 columns and testing data 40001 x 8 how to take input layer size and hidden layer layers
i did n’t understanding neural network?
how to classify the one class neural network
send me neural network programming code??

Reply
- Jason Brownlee November 29, 2019 at 6:43 am #
  
  Perhaps start with this tutorial to better understand how to develop a small neural network:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  Reply
Rana Saleem December 6, 2019 at 4:26 am #

Dear
Please how can i handle output desecrate value 0,25,50,75,100 and the data also in numeric form. you have any example code please share the link. and guide me.does any need of classification? how can i handle?

Reply
- Jason Brownlee December 6, 2019 at 5:27 am #
  
  Perhaps you can post-process the predictions?
  
  Perhaps you can map the discrete values to an ordinal, e.g. 1,2,3?
  
  Reply
zaheer Ullah Khan January 3, 2020 at 8:10 pm #

Hello, Jason, Your articles and post are really awesome, would you please a post about multi-class multi-label problem. and brief about some evaluation metrics used in measuring the model output.
would be very thankful.

Reply
- Jason Brownlee January 4, 2020 at 8:29 am #
  
  See this tutorial:
  https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-to-classify-satellite-photos-of-the-amazon-rainforest/
  
  Reply
Shiva Ram Dam February 8, 2020 at 4:05 am #

Hi Jason,
Thanks for the great post. I am taking reference from your post for my masters thesis.

How can we print the individual confusion matrix for each fold of cross validation set (here 10 folds in your tutorial). And also the confusion matrix for overall validation set.

Looking forward for your prompt response.

Reply
- Jason Brownlee February 8, 2020 at 7:17 am #
  
  No, confusion matrix is used for one test set only.
  
  Use a different metric across the folds.
  
  Reply
Cameron Wilson February 13, 2020 at 11:18 am #

Hi Jason

I am trying to implement a CNN for classifying images. I have been following bits of a couple of different tutorials on how to do each section.

I have a convolutional model I think I am happy with, however, my problem arises that I want to do k-fold validation as shown in your tutorial here. The other tutorial I have been following uses ImageDataGenerator().flow_from_directory() but I see no way to use this and then perform k-fold validation on the data.

My data set is a total of 50,000 images split into 24 respective folders of each class of image.

Any pointers would be appreciated.

Thanks

Cameron

Reply
- Jason Brownlee February 13, 2020 at 1:24 pm #
  
  Yes, perhaps enumerate the k-fold manually, this shows you how:
  https://machinelearningmastery.com/k-fold-cross-validation/
  
  Reply
adam February 13, 2020 at 3:47 pm #

Hi Jason,
Appreciate your hard work on these tutorials.It really helps.
I have a question at high level:

I’ve done multiple multi-class classification projects. Some of them I can transfer the problem to be building multiple binomial classification model.Some are not.

For a multi-class classification problem with let’s say 100 classes. It is usually very hard for the model to make prediction. I’ve been trying to build tree-based models, but the accuracy or confusion metrics dont seem good enough.
My question is: Is neural network (deep learning) models a better fit for this problem? How should we approach classification problem with a large number of classes?

Reply
- Jason Brownlee February 14, 2020 at 6:26 am #
  
  Thanks.
  
  It really depends on the specifics of the data. I recommend testing a suite of different algorithms in order to discover what works best for your dataset.
  
  Reply
Alex Ramirez March 16, 2020 at 7:05 am #

Hello! Amazing explanaition. I have a question.

In the example where you add the following code:

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

My question is If I add

seed = 7
numpy.random.seed(seed) ; numpy.random.rand(4)

to restart the random seed, do you think its a good idea?

If so, what number would you use for this example?

Reply
- Jason Brownlee March 16, 2020 at 10:22 am #
  
  Thanks.
  
  I would recommend removing random seed stuff these days and use repeated cross-validation to evaluate your model:
  https://machinelearningmastery.com/evaluate-skill-deep-learning-models/
  
  Reply
Alex Ramirez March 16, 2020 at 7:12 am #

I forgot to ask. How many baseline scores would you consider as minimum to obtain the average?

Reply
Mbonu Chinedu April 24, 2020 at 2:37 pm #

Thank you very much for this topic jason.
it really helped me in solving a huge problem for Multi Label classification.

Thanks J…..

Reply
- Jason Brownlee April 25, 2020 at 6:38 am #
  
  You’re welcome, I’m happy to hear that!
  
  Reply
Sankar Raj April 30, 2020 at 2:44 am #

Hi Jason
How to find the number of neurons for hidden layer(s)? Is there any specific method or approach?

Thanks in advance!!
Sankar R

Reply
- Jason Brownlee April 30, 2020 at 6:51 am #
  
  Good question, I answer it here:
  https://machinelearningmastery.com/faq/single-faq/how-many-layers-and-nodes-do-i-need-in-my-neural-network
  
  Reply
Achintha May 16, 2020 at 1:54 am #

Hi Jason,
my project have 3 inputs and 1 output this output I mean predicted value. so my question is this tutorial can I use my situation??

Reply
- Jason Brownlee May 16, 2020 at 6:17 am #
  
  If the output is a class label and there are more than 2 labels, this might be a useful tutorial for your problem.
  
  Reply
Achintha May 21, 2020 at 1:22 pm #

Thanks a lot,

Hi. Jason,
ValueError: Error when checking input: expected dense_3_input to have shape (4,) but got array with shape (2,) – when I input the last two lines in this tutorial come up this error. why error like this??

X = dataset[:,0:4].astype(float)
Y = dataset[:,4] these your code lines I changed like this,

X = dataset[:,1:3].astype(float)
Y = dataset[:,4]

Thank you.
Achintha

Reply
- Jason Brownlee May 21, 2020 at 1:42 pm #
  
  Sorry to hear that, this will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
sana May 22, 2020 at 4:02 pm #

how can i convert image dataset to csv file and how can I differentiate species of fruit fly

Reply
- Jason Brownlee May 23, 2020 at 6:13 am #
  
  We do not convert images to CVS, we load them directly as numpy arrays:
  https://machinelearningmastery.com/how-to-load-and-manipulate-images-for-deep-learning-in-python-with-pil-pillow/
  
  Reply
QUANG HUY CHU May 26, 2020 at 12:40 pm #

Hi Jason, I have run the model for several time and noticed that as my dataset (which is 5 input, 3 classes) I got standard deviation result about over 40%.

Can you have any suggestions how we can optimize this value or it is come from my dataset value?

Thank you vary much

Reply
- Jason Brownlee May 26, 2020 at 1:22 pm #
  
  Yes, the tutorials here will help you lift the performance of your deep learning model:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
QUANG HUY CHU June 5, 2020 at 12:56 am #

Hi Jason, as I see your code I have noticed this line:

estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)
results = cross_val_score(estimator, X, dummy_y, cv=kfold)

Also in another post I also see you use this code:

history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)

What is different aim of those 2 code line since the model is constructed in the same way.

The Baseline is the same with Accuracy ?

Reply
- Jason Brownlee June 5, 2020 at 8:16 am #
  
  The first line defines the model then evaluates it using cross-validation.
  
  The second fits the model on a train dataset and evaluates it each epoch using a validation dataset.
  
  Reply
  - QUANG HUY CHU June 5, 2020 at 10:46 am #
    
    So as I understand the First model is used when we want to check how good the model with Training dataset with KFold Cross-Validation
    
    The Seccond Model is used when we check how good the model with the validation data (which is split from the train data), also the training data of this model just trained one time only and use the parameter from that train and predict the validation data (Its like one time Kfold validation if k=1).
    
    Sorry if what i am saying confused you, I am new to Keras and also Deep Learning, I am read many your post and figuring how the difference when we want to build a model and test the model from the beginning.
    
    Reply
    - Jason Brownlee June 5, 2020 at 1:41 pm #
      
      You can do it that way if you like. Whatever gives you confidence in evaluating the models performance in making predictions on new data.
      
      Reply
Aastha June 8, 2020 at 7:56 pm #

Could you please let me know what would be the best approach for image classification in case we have an extremely large number of Labels and there might be overlapping in some labels i.e. not all are extremely distinguishable.

Reply
- Jason Brownlee June 9, 2020 at 6:01 am #
  
  Perhaps try using transfer learning and tune a model to your dataset.
  
  This may help as a starting point that you can adapt to your problem:
  https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-to-classify-photos-of-dogs-and-cats/
  
  Reply
Kaushik June 20, 2020 at 5:06 am #

Hey Jason, Does this classification work if there are let’s say 10 classes and all 9 classes are integers and one class is a string. Does the encoding work in this case?

Reply
- Jason Brownlee June 20, 2020 at 6:19 am #
  
  All classes must be encoded as numbers first.
  
  Reply
MD MAHMUDUL HASAN June 24, 2020 at 10:36 pm #

Hi Jason Brownlee, Thanks. Very helpful tutorial. How can I find the sensitivity & specificity in the case of 10 fold cross-validation instead of scores?

Reply
- Jason Brownlee June 25, 2020 at 6:18 am #
  
  Good question, this library implements sensitivity and specificity:
  https://imbalanced-learn.readthedocs.io/en/stable/api.html#module-imblearn.metrics
  
  Reply
Nicolas August 12, 2020 at 1:59 am #

Hi Jason, sorry I have a question, if I want to use this model to predict the categorical class of some new data, lets say:

import numpy as np
new_data = np.array([[5.7, 2.5, 5. , 2. ]])

How can I do that? since result that the baseline_model () function returns does not have the .predict() function.

Thanks

Reply
- Jason Brownlee August 12, 2020 at 6:11 am #
  
  This will show you how to make a single prediction:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  Reply
Sergio August 27, 2020 at 1:55 am #

Hey Jason, I followed up and got similar results regarding the Iris multi-class problem, but then I tried to implement a similar solution to another multiclassification problem of my own and I’m getting less than 50% accuracy in the crossvalidation, I have already tried plenty of batch sizes, epochs and also added extra hiddien layers or change the number of neurons and I got from 30% to 50%, but I can’t seem to get any higher, can you please tell me what should I try, or why can this be happening? Or the way that I should troubleshoot it?

PD: I have also changed the sized of the input data and its features, to see if that was maybe the problem but it remains the same.

Thanks for your time, I’ll be waiting for a response.

Reply
- Jason Brownlee August 27, 2020 at 6:20 am #
  
  Debugging and tuning neural nets is a big topic, you can get started here:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
Bob August 29, 2020 at 10:14 am #

How would you setup a 2, 3, 4 classification model? For instance if you have an NLP multi classification problem, where you have 4 labels [agree, disagree, discuss, unrelated], where related = [agree, disagree, discuss] this is also true so that: [related, unrelated].

How would you do:

1st. Model
[related, unrelated] — (classification model, but only grab the things classified as related) –>

2nd. Model ( gree = [agree, disagree] )
[gree, unrelated] –( classification model, but only grabs the gree)->

3rd Model
[agree, disagree) –(classification model, that now classifies only these two) –> output would be all 4 original classifications without ‘related’. So it would be [agree, disagree, discuss, unrelated]

Really, I just don’t know how to divert the Keras results to a different model. Would I make multiple Y-columns that are one-hot encode like
[agree| disagree| discuss| unrelated| related]
0 1 0 0 1

Reply
- Jason Brownlee August 29, 2020 at 1:02 pm #
  
  Probably start off treating the labels as nominal, one hot encoding, 4 nodes in the output layer.
  
  Then perhaps try encoding them in the range 0-1, try modeling as a regression problem and see if the ordinal relationship can be harnessed.
  
  Yes, to get started with one hot encoding, see this:
  https://machinelearningmastery.com/one-hot-encoding-for-categorical-data/
  
  Keras has the to_categorical() function to make things very easy:
  https://keras.io/api/utils/python_utils/#to_categorical-function
  
  Reply
  - Vonka September 1, 2020 at 5:05 pm #
    
    Hi Jason, thank you for this wonderful article. I dit not see where to post a comment, I only see the reply button, so I post my comment here.
    I would like to know how I could get the confusion matrix from this Multi-Class Classification model. Thank you in advance.
    
    Reply
    - Jason Brownlee September 2, 2020 at 6:24 am #
      
      You’re welcome.
      
      Here is an example:
      https://machinelearningmastery.com/how-to-calculate-precision-recall-f1-and-more-for-deep-learning-models/
      
      Reply
Andres September 20, 2020 at 9:30 pm #

Hi Jason, very good article.
I have a question, on this website https://unipython.com/clasificacion-multiclase-de-especies-de-flores/
They use your article, have they asked your permission? because I think they charge money because it is within a more general course…
A greeting

Reply
- Jason Brownlee September 21, 2020 at 8:09 am #
  
  They do not have permission!
  
  Reply
K D October 27, 2020 at 3:30 am #

The code did not run. I got the following message:
No module named ‘scipy.sparse’

Reply
- Jason Brownlee October 27, 2020 at 6:48 am #
  
  Perhaps you need to update your version of scipy.
  
  Reply
A A December 16, 2020 at 5:57 am #

Do I also have to one-hot encode the class labels even if I use the loss parameter sparse_categorical_crossentropy as an argument to model.compile function?

Reply
- Jason Brownlee December 16, 2020 at 7:54 am #
  
  No.
  
  Reply
Strivathsav Ashwin Ramamoorthy December 28, 2020 at 7:10 pm #

Hi Jason,

I need your opinion on two questions which I have.

1) First one is that, I have been trying to implement a MLP model for multi-classification based on your post “Multi-class classification tutorial with keras deep learning library”. The input dimension is [34000,33] and output is [34000,64] where 64 is the total number of classes. I have defined an architecture as follows:
model = Sequential()
model.add(Dense(100, input_dim = 33, activation = ‘relu’))
model.add(Dense(64, activation = ‘softmax’))
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’,metrics = [‘accuracy’])

I think I have defined one input layer, one hidden layer and one output layer. Could you validate the python lines which I have written?

2) Finally, after training the model how could we use the model to predict on some examples after training?

Reply
- Jason Brownlee December 29, 2020 at 5:12 am #
  
  Looks like a good start.
  
  You can make predictions by calling model.predict(), here are some examples:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  Reply
  - Strivathsav Ashwin Ramamoorthy January 1, 2021 at 3:02 am #
    
    Thanks Jason for the response.
    
    Reply
ahmed ben mohamed January 1, 2021 at 9:13 am #

how to download file csv this project ?

Reply
- Jason Brownlee January 1, 2021 at 9:22 am #
  
  Here is the direct link:
  http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data
  
  Reply
ahmed ben mohamed January 3, 2021 at 9:50 pm #

ImportError: Keras requires TensorFlow 2.2 or higher. Install TensorFlow via pip install tensorflow

Reply
- Jason Brownlee January 4, 2021 at 6:07 am #
  
  The error suggest you need to update your version of the tensorflow library.
  
  Reply
Muhammad Usama Zahid January 9, 2021 at 8:59 pm #

I really love your tutorials. Very neatly explained.Kudos to u sir!

Reply
- Jason Brownlee January 10, 2021 at 5:40 am #
  
  Thanks!
  
  Reply
Katia lagha January 24, 2021 at 9:56 pm #

First of all, thank you for this tutorial. I really appreciate it.
Then i have a question if you can help me !

I build a model that will predict 3 outputs. But if I want to predict on another dataset that contains just 2 of these values I can’t use the previous model since this one will have just 2 outputs, this is my problem!

To be more clearer, I’ll explain again.
We suppose that the IRIS database is divided into two datasets:
dataset1: Iris-setosa, Iris-versicolor, Iris-virginica
dataset2: Iris-setosa, Iris-versicolor

If I do my training on the 1st one. Then, if I want to reload the model and improve it but this time I do the training on the 2nd dataset. This is where I block because the number of outputs is not the same.

Reply
- Jason Brownlee January 25, 2021 at 5:50 am #
  
  You’re welcome.
  
  A model for dataset1 can be used to make predictions for dataset2 directly without change.
  
  It could be fined tuned on dataset2, perhaps with a small learning rate.
  
  Reply
Arjun Satish February 17, 2021 at 6:07 am #

How can I display learning curves in the above python code? I am a novice.

Reply
- Jason Brownlee February 17, 2021 at 7:49 am #
  
  This tutorial will show you how:
  https://machinelearningmastery.com/display-deep-learning-model-training-history-in-keras/
  
  Reply
  - Arjun Satish February 18, 2021 at 6:43 pm #
    
    Is there a way to classify a single data point into more than one class? For, example, I have data about a flower and I need the model to predicts its presence in more than one class like it comes under plant, green-leafy, red-coloured.
    So after we encode it, its may look like [1,1,0,1,0]. Hope you get the idea of what I am trying to project.
    
    Reply
    - Jason Brownlee February 19, 2021 at 5:57 am #
      
      Yes, predict() returns probabilities for each class.
      
      Reply
Farhat February 28, 2021 at 7:50 am #

Hi Jason,

thanks for your awesome contents.

I wanted to know, what if I have multiple columns as outputs and all of them are categorical?

I understand that for one categorical output column I have to use n_outputs for output layer with softmax activation.

But how to modify the model when I have suppose 5 columns with categorical values and I have say 3 categories in each.

Do i have to use 5 output layers with n_output=3 and softmax activation or is there any way where i can do this in one layer?

Thanks in advance

Reply
- Jason Brownlee February 28, 2021 at 1:54 pm #
  
  You’re welcome.
  
  One approach would be to use the functional API and define 3 output models, each outputs a vector with softmax activation.
  
  This will give you ideas:
  https://machinelearningmastery.com/keras-functional-api-deep-learning/
  
  Reply
Shikhar April 1, 2021 at 10:52 pm #

If i am using only label encoder then my y_train data will only contain3 different values and it would be of the shape ( -1 , 1) . Then can you please tell me what would be the last dense layer shape and what loss would be used

Reply
- Jason Brownlee April 2, 2021 at 5:39 am #
  
  When using a one hot encoding, the shape of y should be the number of samples (rows) and the number of classes (columns).
  
  Reply
Mohammadreza April 21, 2021 at 2:26 pm #

Hi

Don’t we need a DenseFeatures layer as the first layer for multi-class classification?

Reply
- Jason Brownlee April 22, 2021 at 5:36 am #
  
  Sure, we do.
  
  Reply
Mohammadreza April 22, 2021 at 9:54 am #

Thanks! I noticed you don’t have it in your code and the code still works. How would you add this layer to your code? Does it bring any advantages?

Reply
- Jason Brownlee April 23, 2021 at 4:55 am #
  
  If you’re referring to Dense MLP layers, these can be created with the Dense() layer object.
  https://keras.io/api/layers/core_layers/dense/
  
  Reply
Sowmya Krishnan April 28, 2021 at 9:00 pm #

Thanks for this helpful tutorial on multi-class classification! I’m working on a similar problem with a large number of classes (100+) and my metrics deteriorate with larger batch sizes (above 5). I noticed that you have used a batch size of 5 in the tutorial. Does the batch size depend in any way on the number of classes we have to predict at the end? How can I find the reason for my model metrics deteriorating for larger batch sizes?

For the model I have used a 1D-CNN which takes a string as input and predicts the classes as output. Any suggestions to troubleshoot this problem will be really helpful. And finally a naive question – Are smaller batch sizes such as 1 or 5 acceptable for publications? I’m really new to machine learning and I’m not aware if there is a general trend for batch sizes in publications.

Reply
- Jason Brownlee April 29, 2021 at 6:26 am #
  
  Batch size may have to be tuned for your model and dataset, this may help:
  https://machinelearningmastery.com/how-to-control-the-speed-and-stability-of-training-neural-networks-with-gradient-descent-batch-size/
  
  Reply
Abraham Lin April 29, 2021 at 12:53 am #

Hi, Jason, Thank for for the tutorial. I am a beginner of deep learning. This model worked well on my computer. I assumed this model was “trained” by running this model existing iris data. My question is how to test the performance of this “trained” model on new datasets? Do you have codes to perform that?

Reply
- Jason Brownlee April 29, 2021 at 6:30 am #
  
  We estimate the performance a model on new data using k-fold cross-validation:
  https://machinelearningmastery.com/k-fold-cross-validation/
  
  We then choose a model/config (based on k-fold cross-validation estiamtes), train it on all data and use it to start making predictions on new data:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  Reply
Arjun Satish May 29, 2021 at 8:29 pm #

Is this a feedback or feedforward algorithm?

Reply
- Jason Brownlee May 30, 2021 at 5:50 am #
  
  Not sure what you mean exactly?
  
  MLPs are a feed-forward neural network once trained. Their trained using backprop under SGD.
  
  Reply
Maite June 25, 2021 at 11:08 pm #

I’m working on a similar problem of multi-class clasification. How can I deal with adding “None of the above” in Image Classification? Is there any resourse I can check?

You website has always help me so much. Thank you!

Reply
- Jason Brownlee June 26, 2021 at 4:55 am #
  
  None of the above would be all zeros (e.g. no class).
  
  Reply
sinfer June 27, 2021 at 10:08 pm #

Hi Jason,

A pretty interesting learning !
I have a question if i use this MLP model in scenario like fault detection and diagnosis for a time series telemetry data with labeled as fault and normal data can i predict the possible class label based on the multiple input samples.

as per your model implemented here it will predict [5,3,4,5] –>>> iris setosa

I want to have something like this [[5,3,4,5],[4.9,3,4,5.2][4.8,2.9,4,5.1]] –>>> iris setosa

Thank you

Reply
- Jason Brownlee June 28, 2021 at 7:58 am #
  
  Thanks!
  
  Perhaps explore an RNN like an lstm:
  https://machinelearningmastery.com/start-here/#lstm
  
  Reply
  - sinfer June 28, 2021 at 12:43 pm #
    
    Thanks 😉
    
    Reply
    - Jason Brownlee June 29, 2021 at 4:45 am #
      
      You’re welcome.
      
      Reply
sinfer June 28, 2021 at 2:37 am #

One more thing Jason,
Have you got any multi class classification done with a Deep Neural Network, since this is a MLP implemented here

Reply
- Jason Brownlee June 28, 2021 at 7:59 am #
  
  Yes, you can see some LSTM examples for HAR here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  There are also many CNN examples for multi-class classification on the blog.
  
  Reply
  - sinfer June 28, 2021 at 11:38 am #
    
    Cool.I saw that somewhere it had been mentioned, applying LSTM on time series data for multi class classification is not doing that good if my memory is correct. Is there any truth about that?
    
    Reply
    - Jason Brownlee June 29, 2021 at 4:44 am #
      
      It really depends on the specifics of the data. Perhaps try it and compare the results to other methods.
      
      Reply
sinfer June 28, 2021 at 2:47 am #

Hi Jason,
Is there any way to convert to this MLP to DNN model by adding back propagation? if yes how.

Thanks a lot

Reply
- Jason Brownlee June 28, 2021 at 7:59 am #
  
  MLPs are trained using backprop in keras.
  
  Reply
  - sinfer June 28, 2021 at 11:51 am #
    
    Oh thanks Jason, didn’t know that earlier.
    Can you please check my following code i have added for my KERAS model to predict the label out of 8 classes for time series classification.
    
    def create_dnn_model():
    #create sequential model
    model = Sequential()
    model.add(Dense(64,input_dim=10,activation=’relu’))
    model.add(Dense(32,activation=’relu’))
    model.add(Dense(16,activation=’relu’))
    model.add(Dense(8,activation=’softmax’))
    # Compile model
    model.compile(loss=’categorical_crossentropy’,optimizer=’adam’, metrics=[‘accuracy’])
    return model
    
    estimator = KerasClassifier(build_fn=create_dnn_model, epochs=200, batch_size=5, verbose =1)
    estimator.fit(X_train,y_train)
    
    There are like more than 100,000 records from all the 8 class labels.
    So this is a MLP model. and I followed this IRIS tutorial to adhere the model into my implementation.
    
    And more specially I have performed Standardization on input train data and input test data. And this model gives me a train accuracy of 99.05 % and test accuracy of 97%. but it hardly predicts a label correctly and i find this a bit confusing since my accuracy takes a high value.
    
    and i got the train accuracy based on my standardized train data and test accuracy based on standardized test data like in the following
    
    estimator.score(Scaled _X_train,y_train)
    estimator.score(Scaled_X_test,y_test)
    
    when i call the predict function i don’t have to call the exact same standardize function i have called on the input data on my sample inputs riight
    
    Reply
    - Jason Brownlee June 29, 2021 at 4:45 am #
      
      I recommend testing a suite of model configurations in order to discover what works well or best for your specific dataset.
      
      Yes, you must prepare any new data in an identical manner as the training set, e.g. the same data prep object.
      
      Reply
      - sinfer June 29, 2021 at 3:49 pm #
        
        Can i call my model a deep neural network since i have used (more than one) hidden layers instead of calling this a MLP? And also since i have used an keras classifier this does back propagation as well right
      - Jason Brownlee June 30, 2021 at 5:17 am #
        
        Sure. It’s just marketing.
        
        Yes, all neural nets are fit with backprop.
Amanda July 5, 2021 at 11:44 pm #

Hi Jason
Thank you in advance for the material that has been submitted

I want to ask for classes on cnn, if there are 213 classes, can I use a confusion matrix?
or is there anything other than confusion matrix?
thank you

Reply
- Jason Brownlee July 6, 2021 at 5:48 am #
  
  Yes, but it may not be useful – e.g. too large to read effectively.
  
  Reply
Will July 28, 2021 at 2:03 am #

Hi Jason.

Thanks for the above article.

I have a query regarding the OHE aspect of the above.

Should one hot encoding (OHE) always be performed on the output variables as good practice? Even, for example, on a multi-output regression problem? (I have taken a look here: https://machinelearningmastery.com/deep-learning-models-for-multi-output-regression/ but still unsure).

In the example I’m looking at, my dataframe/array has 8 separate columns which are the output variables the model is trying to predict. Should OHE be performed on these?

If so, this transforms the shape of my input/output data from:

Input (200664, 8)
Output (200664, 8)

To:

Input (200664, 8)
Output (200664, 8, 8)

Would this then mean I have to reshape my input variables? I appreciate it’s hard without visibility of the data/code, but any guidance would be more than welcomed.

Thanks again.

Reply
- Jason Brownlee July 28, 2021 at 5:28 am #
  
  OHE is only needed for the output variable if there are more than two classes.
  
  Reply
Lena August 19, 2021 at 10:06 pm #

Hi Jason,

Thank you for the great tutorial. I am wondering if there is a way to label the outputs here like in a multi output Keras Functional API model. I am hoping to use KerasClassifier for 100+ categories (possible chronic conditions in medicare data), and having them labeled throughout would limit the possibility of mixing them up along the way.

Thanks again!
Lena

Reply
- Adrian Tam August 20, 2021 at 1:28 am #
  
  Are you looking for the argmax() function in numpy?
  
  Reply
sama samaan August 31, 2021 at 6:28 am #

Hello
I have a dataset with 4000 rows. The deep learning model work well with 14 to 20 rows. when I put all the 4000 rows it takes too long time in execution and no results occur. what is the reason behind this?

Reply
- Adrian Tam September 1, 2021 at 8:31 am #
  
  Because you train with entire dataset in each iteration? You can use SGD and train with only a small subset of the rows in each iteration, but sampling from the entire 4000 rows each time.
  
  Reply
Kevin September 1, 2021 at 3:16 am #

Hi Jason,

When I have created this model, how do I see which features were the strongest predictors in the model? I want to present this model to stakeholders, but how do I interpret the model? In a logistic regression I would look at the p-values of the model.

Many thanks

Reply
- Adrian Tam September 1, 2021 at 9:00 am #
  
  Deep learning is difficult to see this, at least difficult to see from the model. One way to heuristically verify, however, is to do the prediction over and over with each feature replaced (e.g., zero out all features, or replace with random number) and expect the prediction went bad. An unimportant feature would not change the prediction much.
  
  Reply
ben February 6, 2022 at 4:19 pm #

Came across this article looking for ways to address class imbalance for a Multi-Class problem as this will affect the scores.

If the classes are say 90/40/20 accuracy will not be 97%.

Reply
Saif Sohel April 8, 2022 at 8:18 pm #

Hi jason
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

Um facing below problem

—————————————————————————
NameError Traceback (most recent call last)
in ()
1 # encode class values as integers
—-> 2 encoder = LabelEncoder()
3 encoder.fit(Y)
4 encoded_Y = encoder.transform(Y)
5 # convert integers to dummy variables (i.e. one hot encoded)

NameError: name ‘LabelEncoder’ is not defined

could you plz help me to resolve. For your kind information um a beginner

Reply
- James Carmichael April 9, 2022 at 8:44 am #
  
  Hi Saif…Without seeing a complete code listing, it is not clear how you defined all of the variables and functions.
  
  Please provide a complete listing so that we can better assist you.
  
  Reply
Jannadi Khemais October 4, 2022 at 7:38 pm #

Thank you, much appreciated , sharply and clearly explained, it helps me to make the implementation of neural networks easy for multiclass or multinomial classification.

Question: Could you please elaborate more the L2 and L1 regularization techniques and what is the difference between L1 and L2 regularization for linear regression? Please share tutorial link ,if possible _ Thanks

Reply
Maryam March 30, 2023 at 11:02 pm #

Hi Jason,

Thanks for this great tutorial. Could you kindly let me know how to get the confusion matrix and all metrics reports (precision, recall, f1) for all features/predictors using your exact code? I really appreciate it.

Reply
- James Carmichael March 31, 2023 at 7:09 am #
  
  Hi Maryam…The following resource may be of interest:
  
  https://machinelearningmastery.com/how-to-calculate-precision-recall-f1-and-more-for-deep-learning-models/
  
  Reply

Navigation

Multi-Class Classification Tutorial with the Keras Deep Learning Library

1. Problem Description

Need help with Deep Learning in Python?

2. Import Classes and Functions

3. Load the Dataset

4. Encode the Output Variable

5. Define the Neural Network Model

6. Evaluate the Model with k-Fold Cross Validation

7. Complete Example

Summary

More On This Topic

614 Responses to Multi-Class Classification Tutorial with the Keras Deep Learning Library

Leave a Reply Click here to cancel reply.