Multi-Class Classification Tutorial with the Keras Deep Learning Library

Keras is a Python library for deep learning that wraps the efficient numerical libraries Theano and TensorFlow.

In this post you will discover how you can use Keras to develop and evaluate neural network models for multi-class classification problems.

After completing this step-by-step tutorial, you will know:

  • How to load data from CSV and make it available to Keras.
  • How to prepare multi-class classification data for modeling with neural networks.
  • How to evaluate Keras neural network models with scikit-learn.

Let’s get started.

  • Update Oct/2016: Updated examples for Keras 1.1.0 and scikit-learn v0.18.
  • Update Mar/2017: Updated example for Keras 2.0.2, TensorFlow 1.0.1 and Theano 0.9.0.
Multi-Class Classification Tutorial with the Keras Deep Learning Library

Multi-Class Classification Tutorial with the Keras Deep Learning Library
Photo by houroumono, some rights reserved.

1. Problem Description

In this tutorial we will use the standard machine learning problem called the iris flowers dataset.

This dataset is well studied and is a good problem for practicing on neural networks because all of the 4 input variables are numeric and have the same scale in centimeters. Each instance describes the properties of an observed flower measurements and the output variable is specific iris species.

This is a multi-class classification problem, meaning that there are more than two classes to be predicted, in fact there are three flower species. This is an important type of problem on which to practice with neural networks because the three class values require specialized handling.

The iris flower dataset is a well studied problem and a such we can expect to achieve an model accuracy in the range of 95% to 97%. This provides a good target to aim for when developing our models.

You can download the iris flowers dataset from the UCI Machine Learning repository and place it in your current working directory with the filename iris.csv

Beat the Math/Theory Doldrums and Start using Deep Learning in your own projects Today, without getting lost in “documentation hell”

Deep Learning With Python Mini-CourseGet my free Deep Learning With Python mini course and develop your own deep nets by the time you’ve finished the first PDF with just a few lines of Python.

Daily lessons in your inbox for 14 days, and a DL-With-Python “Cheat Sheet” you can download right now.   

Download Your FREE Mini-Course  


2. Import Classes and Functions

We can begin by importing all of the classes and functions we will need in this tutorial.

This includes both the functionality we require from Keras, but also data loading from pandas as well as data preparation and model evaluation from scikit-learn.

3. Initialize Random Number Generator

Next we need to initialize the random number generator to a constant value (7).

This is important to ensure that the results we achieve from this model can be achieved again precisely. It ensures that the stochastic process of training a neural network model can be reproduced.

4. Load The Dataset

The dataset can be loaded directly. Because the output variable contains strings, it is easiest to load the data using pandas. We can then split the attributes (columns) into input variables (X) and output variables (Y).

5. Encode The Output Variable

The output variable contains three different string values.

When modeling multi-class classification problems using neural networks, it is good practice to reshape the output attribute from a vector that contains values for each class value to be a matrix with a boolean for each class value and whether or not a given instance has that class value or not.

This is called one hot encoding or creating dummy variables from a categorical variable.

For example, in this problem three class values are Iris-setosa, Iris-versicolor and Iris-virginica. If we had the observations:

We can turn this into a one-hot encoded binary matrix for each data instance that would look as follows:

We can do this by first encoding the strings consistently to integers using the scikit-learn class LabelEncoder. Then convert the vector of integers to a one hot encoding using the Keras function to_categorical().

6. Define The Neural Network Model

The Keras library provides wrapper classes to allow you to use neural network models developed with Keras in scikit-learn.

There is a KerasClassifier class in Keras that can be used as an Estimator in scikit-learn, the base type of model in the library. The KerasClassifier takes the name of a function as an argument. This function must return the constructed neural network model, ready for training.

Below is a function that will create a baseline neural network for the iris classification problem. It creates a simple fully connected network with one hidden layer that contains 4 neurons, the same number of inputs (it could be any number of neurons).

The hidden layer uses a rectifier activation function which is a good practice. Because we used a one-hot encoding for our iris dataset, the output layer must create 3 output values, one for each class. The output value with the largest value will be taken as the class predicted by the model.

The network topology of this simple one-layer neural network can be summarized as:

Note that we use a sigmoid activation function in the output layer. This is to ensure the output values are in the range of 0 and 1 and may be used as predicted probabilities.

Finally, the network uses the efficient ADAM gradient descent optimization algorithm with a logarithmic loss function, which is called categorical_crossentropy in Keras.

We can now create our KerasClassifier for use in scikit-learn.

We can also pass arguments in the construction of the KerasClassifier class that will be passed on to the fit() function internally used to train the neural network. Here, we pass the number of epochs as 200 and batch size as 5 to use when training the model. Debugging is also turned off when training by setting verbose to 0.

7. Evaluate The Model with k-Fold Cross Validation

We can now evaluate the neural network model on our training data.

The scikit-learn has excellent capability to evaluate models using a suite of techniques. The gold standard for evaluating machine learning models is k-fold cross validation.

First we can define the model evaluation procedure. Here, we set the number of folds to be 10 (an excellent default) and to shuffle the data before partitioning it.

Now we can evaluate our model (estimator) on our dataset (X and dummy_y) using a 10-fold cross validation procedure (kfold).

Evaluating the model only takes approximately 10 seconds and returns an object that describes the evaluation of the 10 constructed models for each of the splits of the dataset.

The results are summarized as both the mean and standard deviation of the model accuracy on the dataset. This is a reasonable estimation of the performance of the model on unseen data. It is also within the realm of known top results for this problem.


In this post you discovered how to develop and evaluate a neural network using the Keras Python library for deep learning.

By completing this tutorial, you learned:

  • How to load data and make it available to Keras.
  • How to prepare multi-class classification data for modeling using one hot encoding.
  • How to use Keras neural network models with scikit-learn.
  • How to define a neural network using Keras for multi-class classification.
  • How to evaluate a Keras neural network model using scikit-learn with k-fold cross validation

Do you have any questions about deep learning with Keras or this post?

Ask your questions in the comments below and I will do my best to answer them.

Frustrated With Your Progress In Deep Learning?

 What If You Could Develop Your Own Deep Nets in Minutes

...with just a few lines of Python

Discover how in my new Ebook: Deep Learning With Python

It covers self-study tutorials and end-to-end projects on topics like:
Multilayer PerceptronsConvolutional Nets and Recurrent Neural Nets, and more...

Finally Bring Deep Learning To
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.

128 Responses to Multi-Class Classification Tutorial with the Keras Deep Learning Library

  1. Jack June 19, 2016 at 3:12 pm #

    Thanks for this cool tutorial! I have a question about the input data. If the datatypes of input variables are different (i.e. string and numeric). How to preprocess the train data to fit keras?

    • Jason Brownlee June 20, 2016 at 5:41 am #

      Great question. Eventually, all of the data need to be turned into real values.

      With categorical variables, you can create dummy variables and use one-hot encoding. For string data, you can use word embeddings.

      • Shraddha February 10, 2017 at 8:32 pm #

        Could you please let me know how to convert string data into word embeddings in large datasets?
        Would really appreciate it
        Thanks so much

        • Jason Brownlee February 11, 2017 at 5:01 am #

          Hi Shraddha,

          First, convert the chars to vectors of integers. You can then pad all vectors to the same length. Then away you go.

          I hope that helps.

          • Shraddha Sunil February 13, 2017 at 4:52 pm #

            Thanks so much Jason!

          • Jason Brownlee February 14, 2017 at 10:04 am #

            You’re welcome.

  2. Aakash Nain July 4, 2016 at 2:25 pm #

    Hello Jason,
    It’s a very nice tutorial to learn. I implemented the same model but on my work station I achieved a score of 88.67% only. After modifying the number of hidden layers, I achieved an accuracy of 93.04%. But I am not able to achieve the score of 95% or above. Any particular reason behind it ?

    • Jason Brownlee July 6, 2016 at 6:27 am #

      Interesting Aakash.

      I used the Theano backend. Are you using the same?

      Are all your libraries up to date? (Keras, Theano, NumPy, etc…)

      • Aakash Nain July 7, 2016 at 12:03 am #

        Yes Jason . Backend is theano and all libraries are up to date.

        • Jason Brownlee July 7, 2016 at 9:40 am #

          Interesting. Perhaps seeding the random number generator is not having the desired effect for reproducibility. It perhaps it has different effects on different platforms.

          Perhaps re-run the above code example a few times and see the spread of accuracy scores you achieve?

  3. La Tuan Nghia July 6, 2016 at 1:29 am #

    Hello Jason,

    In chapter 10 of the book “Deep Learning With Python”, there is a fraction of code:

    estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0)
    kfold = KFold(n=len(X), n_folds=10, shuffle=True, random_state=seed)
    results = cross_val_score(estimator, X, dummy_y, cv=kfold)
    print(“Accuracy: %.2f%% (%.2f%%)” % (results.mean()*100, results.std()*100))

    How to save this model and weights to file, then how to load these file to predict a new input data?

    Many thanks!

    • Jason Brownlee July 6, 2016 at 6:26 am #

      Really good question.

      Keras does provide functions to save network weights to HDF5 and network structure to JSON or YAML. The problem is, once you wrap the network in a scikit-learn classifier, how do you access the model and save it. Or can you save the whole wrapped model.

      Perhaps a simple but inefficient place to start would be to try and simply pickle the whole classifier?

      • Constantin Weisser July 30, 2016 at 4:30 am #

        I tried doing that. It works for a normal sklearn classifier, but apparently not for a Keras Classifier:

        import pickle
        with open(“name.p”,”wb”) as fw:

        with open(name+”.p”,”rb”) as fr:
        clf_saved = pickle.load(fr)


        This gives:

        theano.gof.fg.MissingInputError: An input of the graph, used to compute DimShuffle{x,x}(keras_learning_phase), was not provided and not given a value.Use the Theano flag exception_verbosity=’high’,for more information on this error.

        Backtrace when the variable is created:
        File “”, line 6, in
        import classifier_eval_simplified
        File “../../../../”, line 26, in
        from keras.utils import np_utils
        File “/usr/local/lib/python2.7/site-packages/keras/”, line 2, in
        from . import backend
        File “/usr/local/lib/python2.7/site-packages/keras/backend/”, line 56, in
        from .theano_backend import *
        File “/usr/local/lib/python2.7/site-packages/keras/backend/”, line 17, in
        _LEARNING_PHASE = T.scalar(dtype=’uint8′, name=’keras_learning_phase’) # 0 = test, 1 = train

  4. Sally July 15, 2016 at 4:10 am #

    Dear Dr. Jason,

    Thanks very much for this great tutorial . I got extra benefit from it, but I need to calculate precision, recall and confusion matrix for such multi-class classification. I tried to did it but each time I got a different problem. could you please explain me how to do this

  5. Fabian Leon July 31, 2016 at 4:12 am #

    Hi jason, Reading the tutorial and the same example in your book, you still don’t tell us how can use the model to make predictions, you have only show us how to train and evaluate it but I would like to see you using this model to make predictions on at least one example of iris flowers data no matters if is dummy data.

    I would like to see how can I load my own instance of an iris-flower and use the above model to predict what kind is the flower?

    could you do that for us?

    • Jason Brownlee July 31, 2016 at 7:31 am #

      Hi Fabian, no problem.

      In the tutorial above, we are using the scikit-learn wrapper. That means we can use the standard model.predict() function to make predictions from scikit-learn.

      For example, below is an an example adapted from the above where we split the dataset, train on 67% and make predictions on 33%. Remember that we have encoded the output class value as integers, so the predictions are integers. We can then use encoder.inverse_transform() to turn the predicted integers back into strings.

      Running this example prints:

      I hope that is clear and useful. Let me know if you have any more questions.

      • Devendra November 27, 2016 at 9:40 pm #

        Hi Jason,

        I was facing error while converting string to float and so I had to make a minor correction to my code
        X = dataset[1:,0:4].astype(float)
        Y = dataset[1:,4]

        However, I am still unable to run since I am getting the following error for line

        “—-> 1 results = cross_val_score(estimator, X, dummy_y, cv=kfold)”
        “Exception: Error when checking model target: expected dense_4 to have shape (None, 3) but got array with shape (135L, 22L)”

        I would appreciate your help. Thanks.

        • Devendra November 28, 2016 at 5:41 am #

          I found the issue. It was with with the indexes.
          I had to take [1:,1:5] for X and [1:,5] for Y.

          I am using Jupyter notebook to run my code.
          The index range seems to be different in my case.

          • Jason Brownlee November 28, 2016 at 8:47 am #

            I’m glad you worked it out Devendra.

      • Cristina March 24, 2017 at 2:23 am #

        For some reason, when I run this example I get 0 as prediction value for all the samples. What could be happening?

        I’ve the same problem on prediction with other code I’m executing, and decided to run yours to check if i could be doing something wrong?

        I’m lost now, this is very strange.

        Thanks a in advance!

        • Cristina March 24, 2017 at 2:42 am #

          Hello again,

          This is happening with Keras 2.0, with Keras 1 works fine.



        • Jason Brownlee March 24, 2017 at 7:57 am #

          Very strange.

          Maybe check that your data file is correct, that you have all of the code and that your environment is installed and is working correctly.

      • Tanvir. March 27, 2017 at 7:43 am #

        Hi Jason,
        Thanks for your awesome tutorials. I had a curious question:
        As we are using KerasClassifier or KerasRegressor of Scikit-Learn wrapper, then how to save them as a file after fitting ?

        For example, I am predicting regression or multiclass classification. I have to use KerasRegressor or KerasClassifier then. After fitting a large volume of data, I want to save the trained neural network model to use it for prediction purpose only. How to save them and how to restore them from saved files ? Your answer will help me a lot.

  6. Prash August 14, 2016 at 9:15 pm #

    Jason, boss you are too good! You have really helped me out especially in implementation of Deep learning part. I was rattled and lost and was desperately looking for some technology and came across your blogs. thanks a lot.

    • Jason Brownlee August 15, 2016 at 12:38 pm #

      I’m glad I have helped in some small way Prash.

  7. Harsha August 18, 2016 at 7:03 pm #

    It is a great tutorial Dr. Jason. Very clear and crispy. I am a beginner in Keras. I have a small doubt.

    Is it necessary to use scikit-learn. Can we solve the same problem using basic keras?

    • Jason Brownlee August 19, 2016 at 5:25 am #

      You can use basic Keras, but scikit-learn make Keras better. They work very well together.

      • Harsha August 19, 2016 at 11:06 pm #

        Thank You Jason for your prompt reply

      • jokla January 12, 2017 at 7:30 am #

        Hi Jason, nice tutorial!

        I have a question. You mentioned that scikit-learn make Keras better, why?


        • Jason Brownlee January 12, 2017 at 9:40 am #

          Hi jokla, great question.

          The reason is that we can access all of sklearn’s features using the Keras Wrapper classes. Tools like grid searching, cross validation, ensembles, and more.

  8. moeyzf August 21, 2016 at 10:17 am #

    Hi Jason,

    I’m a CS student currently studying sentiment analysis and was wondering how to use keras for multi classification of text, ideally I would like the functionality of the TFidvectoriser from sklearn so a one hot vector representation against a given vocabulary is used, within a neural net to determine the final classification.

    I am having trouble understanding the initial steps in transforming and feeding word data into vector representations. Can you help me out with some basic code examples of this first step in the sense that say I have a text file with 5000 words for example, which also include emoji (to use as the vocabulary), how can I feed in a training file in csv format text,sentiment and convert each text into a one hot representation then feed it into the neural net, for a final output vector of size e.g 1×7 to denote the various class labels.

    I have tried to find help online and most of the solutions use helper methods to load in text data such as imdb, while others use word2vec which isnt what i need.

    Hope you can help, I would really appreciate it!



  9. Qichang September 12, 2016 at 3:01 pm #

    Hi Jason,

    Thanks for the great tutorial!

    Just one question regarding the output variable encoding. You mentioned that it is a good practice to convert the output variable to one hot encoding matrix. Is this a necessary step? If the output varible consists of discrete integters, say 1, 2, 3, do we still need to to_categorical() to perform one hot encoding?

    I check some example codes in keras github, it seems this is required. Can you please kindly shed some lights on it?

    Thanks in advance.

    • Jason Brownlee September 13, 2016 at 8:09 am #

      Hi Qichang, great question.

      A one hot encoding is not required, you can train the network to predict an integer, it is just a MUCH harder problem.

      By using a one hot encoding, you greatly simplify the prediction problem making it easier to train for and achieve better performance.

      Try it and compare the results.

  10. Pedro A. Castillo September 16, 2016 at 12:31 am #

    I have followed your tutorial and I get an error in the following line:

    results = cross_val_score(estimator, X, dummy_y, cv=kfold)

    Traceback (most recent call last):
    File “”, line 84, in
    results = cross_val_score(estimator, X, dummy_y, cv=kfold)
    File “/Library/Python/2.7/site-packages/scikit_learn-0.17.1-py2.7-macosx-10.9-intel.egg/sklearn/”, line 1433, in cross_val_score
    for train, test in cv)
    File “/Library/Python/2.7/site-packages/scikit_learn-0.17.1-py2.7-macosx-10.9-intel.egg/sklearn/externals/joblib/”, line 800, in __call__
    while self.dispatch_one_batch(iterator):
    File “/Library/Python/2.7/site-packages/scikit_learn-0.17.1-py2.7-macosx-10.9-intel.egg/sklearn/externals/joblib/”, line 658, in dispatch_one_batch
    File “/Library/Python/2.7/site-packages/scikit_learn-0.17.1-py2.7-macosx-10.9-intel.egg/sklearn/externals/joblib/”, line 566, in _dispatch
    job = ImmediateComputeBatch(batch)
    File “/Library/Python/2.7/site-packages/scikit_learn-0.17.1-py2.7-macosx-10.9-intel.egg/sklearn/externals/joblib/”, line 180, in __init__
    self.results = batch()
    File “/Library/Python/2.7/site-packages/scikit_learn-0.17.1-py2.7-macosx-10.9-intel.egg/sklearn/externals/joblib/”, line 72, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
    File “/Library/Python/2.7/site-packages/scikit_learn-0.17.1-py2.7-macosx-10.9-intel.egg/sklearn/”, line 1531, in _fit_and_score, y_train, **fit_params)
    File “/Library/Python/2.7/site-packages/keras/wrappers/”, line 135, in fit
    TypeError: __call__() takes at least 2 arguments (1 given)

    Do you have received this error before? do you have an idea how to fix that?

    • Jason Brownlee September 16, 2016 at 9:07 am #

      I have not seen this before Pedro.

      Perhaps it is something simple like a copy-paste error from the tutorial?

      Are you able to double check the code matches the tutorial exactly?

      • Victor October 8, 2016 at 10:15 pm #

        I have exactly the same problem.
        Double checked the code,
        have all the versions of keras etc, updated.

        • Jason Brownlee October 9, 2016 at 6:50 am #

          Hi Victor, are you able to share your version of Keras, scikit-learn, TensorFlow/Theano?

  11. Yunita September 25, 2016 at 12:17 am #

    Hi Jason,

    Thanks for the great tutorial.
    But I have a question, why did you use sigmoid activation function together with categorical_crossentropy loss function?
    Usually, for multiclass classification problem, I found implementations always using softmax activation function with categorical_cross entropy.
    In addition, does one-hot encoding in the output make it as binary classification instead of multiclass classification? Could you please give some explanations on it?

    • Jason Brownlee September 25, 2016 at 8:04 am #

      Yes, you could use a softmax instead of sigmoid. Try it and see.

      The one hot encoding creates 3 binary output features. This too would be required with the softmax activation function.

  12. Marcus September 26, 2016 at 6:49 am #

    For Text classification or to basically assign them a category based on the text. How would the baseline_model change????

    I’m trying to have an inner layer of 24 nodes and an output of 17 categories but the input_dim=4 as specified in the tutorial wouldn’t be right cause the text length will change depending on the number of words.

    I’m a little confused. Your help would be much appreciated.

    model.add(Dense(24, init=’normal’, activation=’relu’))

    def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(24, init=’normal’, activation=’relu’))
    model.add(Dense(17, init=’normal’, activation=’sigmoid’))
    # Compile model
    model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
    return model

  13. Vishnu October 19, 2016 at 9:07 pm #

    Hi Jason,

    Thank you for your tutorial. I was really interested in Deep Learning and was looking for a place to start, this helped a lot.

    But while I was running the code, I came across two errors. The first one was, that while loading the data through pandas, just like your code i set “header= None” but in the next line when we convert the value to float i got the following error message.

    “ValueError: could not convert string to float: ‘Petal.Length'”.

    This problem went away after I took the header=None condition off.

    The second one came at the end, during the Kfold validation. during the one hot encoding it’s binning the values into 22 categories and not 3. which is causing this error:

    “Exception: Error when checking model target: expected dense_2 to have shape (None, 3) but got array with shape (135, 22)”

    I haven’t been able to get around this. Any suggestion would be appreciated.

  14. Homagni Saha October 20, 2016 at 10:39 am #

    Hello, I tried to use the exact same code for another dataset , the only difference being the dataset had 78 columns and 100000 rows . I had to predict the last column taking the remaining 77 columns as features . I must also say that the last column has 23 different classes.(types basically) and the 23 different classes are all integers not strings like you have used.

    model = Sequential()
    model.add(Dense(77, input_dim=77, init=’normal’, activation=’relu’))
    model.add(Dense(10, init=’normal’, activation=’relu’))
    model.add(Dense(23, init=’normal’, activation=’sigmoid’))

    also I used nb_epoch=20 and batch_size=1000

    also in estimator I changed the verbose to 1, and now the accuracy is a dismal of 0.52% at the end. Also while running I saw strange outputs in the verbose as :

    93807/93807 [==============================] – 0s – loss: nan – acc: 0.0052

    why is the loss always as loss: nan ??

    Can you please tell me how to modify the code to make it run correctly for my dataset?(remaining everything in the code is unchanged)

  15. Jason Brownlee October 21, 2016 at 8:30 am #

    Hi Homagni,

    That is a lot of classes for 100K records. If you can reduce that by splitting up the problem, that might be good.

    Your batch size is probably too big and your number of epochs is way too small. Dramatically increase the number of epochs bu 2-3 orders of magnitude.

    Start there and let me know how you go.

  16. AbuZekry October 30, 2016 at 12:02 am #

    Hi Jason,

    I’ve edited the first layer’s activation to ‘softplus’ instead of ‘relu’ and number of neurons to 8 instead of 4
    Then I edited the second layer’s activation to ‘softmax’ instead of sigmoid and I got 97.33% (4.42%) performance. Do you have an explanation to this enhancement in performance ?

    • Jason Brownlee October 30, 2016 at 8:55 am #

      Well done AbuZekry.

      Neural nets are infinitely configurable.

  17. Panand November 7, 2016 at 3:58 am #

    Hello Jason,

    Is there a error in your code? You said the network has 4 input neurons , 4 hidden neurons and 3 output neurons.But in the code you haven’t added the hidden neurons.You just specified only the input and output neurons… Will it effect the output in anyway?

    • Jason Brownlee November 7, 2016 at 7:18 am #

      Hi Panand,

      The network structure is as follows:

      Line 5 of the code in section 6 adds both the input and hidden layer:

      The input_dim argument defines the shape of the input.

  18. JD November 13, 2016 at 5:28 pm #

    Hi Jason,
    I have a set of categorical features and continuous features, I have this model:
    model = Sequential()
    model.add(Dense(117, input_dim=117, init=’normal’, activation=’relu’))
    model.add(Dense(10, activation=’softmax’))

    I am getting a dismal : (‘Test accuracy:’, 0.43541752685249119) :
    Total records 45k, 10 classes to predict
    batch_size=1000, nb_epoch=25

    Any improvements also I would like to put LSTM how to go about doing that as I am getting errors if I add
    model.add(Dense(117, input_dim=117, init=’normal’, activation=’relu’))
    model.add(LSTM(117,dropout_W=0.2, dropout_U=0.2, return_sequences=True))
    model.add(Dense(10, activation=’softmax’))
    Exception: Input 0 is incompatible with layer lstm_6: expected ndim=3, found ndim=2

  19. YA November 17, 2016 at 7:00 pm #

    Hi Jason,

    I have a set of categorical features(events) from a real system, and i am trying to build a deep learning model for event prediction.
    The event’s are not appears equally in the training set and one of them is relatively rare compared to the others.
    event count in training set
    1 22000
    2 6000
    3 13000
    4 12000
    5 26000

    Should i continue with this training set? or should i restructure the training set?
    What is your recommendation?

    • Jason Brownlee November 18, 2016 at 8:20 am #

      Hi YA, I would try as many different “views” on your problem as you can think of and see which best exposes the problem to the learning algorithms (gets the best performance when everything else is held constant).

  20. Tom December 9, 2016 at 12:13 am #

    Hello Jason,
    Great work on your website and tuturials! I was wondering if you could show a multi hot encoding, I think you can call it al multi label classification.
    Now you have (only one option on and the rest off)

    And do like (each classification has the option on or off)

    This would really help for me

    • Tom December 9, 2016 at 1:07 am #

      Extra side note, with k-Fold Cross Validation. I got it working with binary_crossentropy with quite bad results. Therefore I wanted to optimize the model and add cross validation which unfortunately didn’t work.

  21. Martin December 26, 2016 at 6:02 pm #

    Hi, Jason: Regarding this, I have 2 questions:
    1) You said this is a “simple one-layer neural network”. However, I feel it’s still 3-layer network: input layer, hidden layer and output layer.

    4 inputs -> [4 hidden nodes] -> 3 outputs

    2) However, in your model definition:
    model.add(Dense(4, input_dim=4, init=’normal’, activation=’relu’))
    model.add(Dense(3, init=’normal’, activation=’sigmoid’))

    Seems that only two layers, input and output, there is no hidden layer. So this is actually a 2-layer network. Is this right?

    • Jason Brownlee December 27, 2016 at 5:24 am #

      Hi Martin, yes. One hidden layer. I take the input and output layers as assumed, the work happens in the hidden layer.

      The first line defines the number of inputs (input_dim=4) AND the number of nodes in the hidden layer:

      I hope that helps.

  22. Seun January 16, 2017 at 3:58 pm #

    Hi, Jason: I ran this same code but got this error:

    Traceback (most recent call last):

    File “”, line 1, in
    runfile(‘C:/Users/USER/Documents/keras-master/examples/’, wdir=’C:/Users/USER/Documents/keras-master/examples’)

    File “C:\Users\USER\Anaconda2\lib\site-packages\spyder\utils\site\”, line 866, in runfile
    execfile(filename, namespace)

    File “C:\Users\USER\Anaconda2\lib\site-packages\spyder\utils\site\”, line 87, in execfile
    exec(compile(scripttext, filename, ‘exec’), glob, loc)

    File “C:/Users/USER/Documents/keras-master/examples/”, line 46, in
    results = cross_val_score(estimator, X, dummy_y, cv=kfold)

    File “C:\Users\USER\Anaconda2\lib\site-packages\sklearn\model_selection\”, line 140, in cross_val_score
    for train, test in cv_iter)

    File “C:\Users\USER\Anaconda2\lib\site-packages\sklearn\externals\joblib\”, line 758, in __call__
    while self.dispatch_one_batch(iterator):

    File “C:\Users\USER\Anaconda2\lib\site-packages\sklearn\externals\joblib\”, line 603, in dispatch_one_batch
    tasks = BatchedCalls(itertools.islice(iterator, batch_size))

    File “C:\Users\USER\Anaconda2\lib\site-packages\sklearn\externals\joblib\”, line 127, in __init__
    self.items = list(iterator_slice)

    File “C:\Users\USER\Anaconda2\lib\site-packages\sklearn\model_selection\”, line 140, in
    for train, test in cv_iter)

    File “C:\Users\USER\Anaconda2\lib\site-packages\sklearn\”, line 67, in clone
    new_object_params = estimator.get_params(deep=False)

    TypeError: get_params() got an unexpected keyword argument ‘deep’

    Please, I need your help on how to resolve this.

  23. shazz January 25, 2017 at 7:36 am #

    I have the same issue….
    File “/usr/local/lib/python3.5/dist-packages/sklearn/”, line 67, in clone
    new_object_params = estimator.get_params(deep=False)
    TypeError: get_params() got an unexpected keyword argument ‘deep’

    Looks to be an old issue fixed last year so I don’t understand which lib is in the wrong version…

  24. Seun January 25, 2017 at 10:13 pm #

    Hi Jasson,
    Thanks so much. The second fix worked for me.

  25. Sulthan January 31, 2017 at 3:08 am #

    Dear Jason,

    With the help of your example i am trying to use the same for handwritten digits pixel data to classify the no input is 5000rows with example 20*20 pixels so totally x matrix is (5000,400) and Y is (5000,1), i am not able to successfully run the model getting error as below in the end of the code.

    #importing the needed libraries
    import numpy
    from sklearn.preprocessing import LabelEncoder
    from keras.models import Sequential
    from keras.layers import Dense
    from keras.wrappers.scikit_learn import KerasClassifier
    from keras.utils import np_utils
    from sklearn.model_selection import cross_val_score
    from sklearn.model_selection import KFold
    from sklearn.preprocessing import LabelEncoder
    from sklearn.pipeline import Pipeline

    In [158]:

    #Intializing random no for reproductiblity
    seed = 7

    In [159]:

    #loading the dataset from mat file
    mat =‘C:\\Users\\Sulthan\\Desktop\\NeuralNet\\ex3data1.mat’)

    {‘X’: array([[ 0., 0., 0., …, 0., 0., 0.],
    [ 0., 0., 0., …, 0., 0., 0.],
    [ 0., 0., 0., …, 0., 0., 0.],
    [ 0., 0., 0., …, 0., 0., 0.],
    [ 0., 0., 0., …, 0., 0., 0.],
    [ 0., 0., 0., …, 0., 0., 0.]]), ‘__header__’: b’MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Sun Oct 16 13:09:09 2011′, ‘__version__’: ‘1.0’, ‘y’: array([[10],
    [ 9],
    [ 9],
    [ 9]], dtype=uint8), ‘__globals__’: []}

    Type Markdown and LaTeX:
    In [ ]:

    In [ ]:

    In [160]:

    #Splitting of X and Y of DATA
    X_train = mat[‘X’]

    In [161]:


    array([[ 0., 0., 0., …, 0., 0., 0.],
    [ 0., 0., 0., …, 0., 0., 0.],
    [ 0., 0., 0., …, 0., 0., 0.],
    [ 0., 0., 0., …, 0., 0., 0.],
    [ 0., 0., 0., …, 0., 0., 0.],
    [ 0., 0., 0., …, 0., 0., 0.]])
    In [162]:

    Y_train = mat[‘y’]

    In [163]:


    [ 9],
    [ 9],
    [ 9]], dtype=uint8)
    In [164]:


    (5000, 400)
    In [165]:


    (5000, 1)
    In [166]:

    data_trainX = X_train[2500:,0:400]

    In [167]:


    array([[ 0., 0., 0., …, 0., 0., 0.],
    [ 0., 0., 0., …, 0., 0., 0.],
    [ 0., 0., 0., …, 0., 0., 0.],
    [ 0., 0., 0., …, 0., 0., 0.],
    [ 0., 0., 0., …, 0., 0., 0.],
    [ 0., 0., 0., …, 0., 0., 0.]])
    In [168]:


    (2500, 400)
    In [256]:

    data_trainY = Y_train[:2500,:].reshape(-1)

    In [257]:


    In [284]:

    #enocode class values as integers
    encoder = LabelEncoder()
    encoded_Y = encoder.transform(data_trainY)
    # convert integers to dummy variables
    dummy_Y= np_utils.to_categorical(encoded_Y)

    In [285]:


    array([[ 0., 0., 0., 0., 1.],
    [ 0., 0., 0., 0., 1.],
    [ 0., 0., 0., 0., 1.],
    [ 0., 0., 0., 1., 0.],
    [ 0., 0., 0., 1., 0.],
    [ 0., 0., 0., 1., 0.]])
    In [298]:

    newy = dummy_Y.reshape(-1,1)

    In [300]:


    array([[ 0.],
    [ 0.],
    [ 0.],
    [ 0.],
    [ 1.],
    [ 0.]])
    In [293]:

    #define baseline model
    def baseline_model():
    #create model
    model = Sequential()
    return model

    estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200,batch_size=5,verbose=0)

    In [295]:

    kfold = KFold(n_splits=10, shuffle=True, random_state=seed)

    results = cross_val_score(estimator, data_trainX, newy, cv=kfold)
    print(“Baseline: %.2f%% (%.2f%%)” % (results.mean()*100, results.std()*100))

    ValueError Traceback (most recent call last)
    in ()
    —-> 1 results = cross_val_score(estimator, data_trainX, newy, cv=kfold)
    2 print(“Baseline: %.2f%% (%.2f%%)” % (results.mean()*100, results.std()*100))

    C:\Users\Sulthan\Anaconda3\lib\site-packages\sklearn\model_selection\ in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
    127 “””
    –> 128 X, y, groups = indexable(X, y, groups)
    130 cv = check_cv(cv, y, classifier=is_classifier(estimator))

    C:\Users\Sulthan\Anaconda3\lib\site-packages\sklearn\utils\ in indexable(*iterables)
    204 else:
    205 result.append(np.array(X))
    –> 206 check_consistent_length(*result)
    207 return result

    C:\Users\Sulthan\Anaconda3\lib\site-packages\sklearn\utils\ in check_consistent_length(*arrays)
    179 if len(uniques) > 1:
    180 raise ValueError(“Found input variables with inconsistent numbers of”
    –> 181 ” samples: %r” % [int(l) for l in lengths])

    ValueError: Found input variables with inconsistent numbers of samples: [2500, 12500]

    • Jason Brownlee February 1, 2017 at 10:26 am #

      Hi Sulthan, the trace is a little hard to read.

      Sorry, I have no off the cuff ideas.

      Perhaps try cutting your example back to the minimum to help isolate the fault?

  26. Linmu February 3, 2017 at 2:13 am #

    Hi Jason,

    Thanks for your tutorial!

    Just one question regarding the output. In this problem, we got three classes (setosa, versicolor and virginica), and since each data instance should be classified into only one category, the problem is more specifically “single-lable, multi-class classification”. What if each data instance belonged to multiple categories. Then we are facing “multi-lable, multi-class classification”. In our case, each flower belongs to at least two species (Let’s just forget the biology 🙂 ).

    My solution is to modify the output variable (Y) with mutiple ‘1’ in it, i.e. [1 1 0], [0 1 1], [1 1 1 ]……. This is definitely not one-hot encoding any more (maybe two or three-hot?)

    Will my method work out? If not, how do you think the problem of “multi-lable, multi-class classification” should be solved?

    Thanks in advance

    • Jason Brownlee February 3, 2017 at 10:07 am #

      Your method sounds very reasonable.

      You may also want to use sigmoid activation functions on the output layer to allow binary class membership to each available class.

  27. solarenqu February 19, 2017 at 9:28 pm #

    Hello, how can I use the model to create predictions?

    if i try this: print(‘predict: ‘,estimator.predict([[5.7,4.4,1.5,0.4]])) i got this exception:

    AttributeError: ‘KerasClassifier’ object has no attribute ‘model’
    Exception ignored in: <bound method BaseSession.__del__ of >
    Traceback (most recent call last):
    File “/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tensorflow/python/client/”, line 581, in __del__
    AttributeError: ‘NoneType’ object has no attribute ‘TF_DeleteStatus’

    • Jason Brownlee February 20, 2017 at 9:29 am #

      I have not seen this error before.

      What versions of Keras/TF/sklearn/Python are you using?

  28. Suvam March 1, 2017 at 7:34 am #

    Thanks for the great tutorial.
    It would be great if you could outline what changes would be necessary if I want to do a multi-class classification with text data: the training data assigns scores to different lines of text, and the problem is to infer the score for a new line of text. It seems that the estimator above cannot handle strings. What would be the fix for this?

    Thanks in advance for the help.

  29. Sweta March 1, 2017 at 9:10 pm #

    This was a great tutorial to enhance the skills in deep learning. My question: is it possible to use this same dataset for LSTM? Can you please help with this how to solve in LSTM?

    • Jason Brownlee March 2, 2017 at 8:15 am #

      Hi Sweta,

      You could use an LSTM, but it would not be appropriate because LSTMs are intended for sequence prediction problems and this is not a sequence prediction problem.

  30. Akash March 22, 2017 at 5:47 pm #

    Hi Jason,

    I have this problem where I have 1500 features as input to my DNN and 2 output classes, can you explain how do I decide the size of neurons in my hidden layer and how many hidden layers I need to process such high features with accuracy.

    • Jason Brownlee March 23, 2017 at 8:47 am #

      Lots of trial and error.

      Start with a small network and keep adding neurons and layers and epochs until no more benefit is seen.

  31. Ananya Mohapatra March 24, 2017 at 9:39 pm #

    sir, the following code is showing an error message.. could you help me figure it out. i am trying to do a multi class classification with 5 datasets combined in one( 4 non epileptic patients and 1 epileptic) …500 x 25 dataset and the 26th column is the class.

    # Train model and make predictions
    import numpy
    import pandas
    from keras.models import Sequential
    from keras.layers import Dense
    from keras.wrappers.scikit_learn import KerasClassifier
    from keras.utils import np_utils
    from sklearn.model_selection import cross_val_score
    from sklearn.cross_validation import train_test_split
    from sklearn.preprocessing import LabelEncoder
    from sklearn.model_selection import KFold

    # fix random seed for reproducibility
    seed = 7
    # load dataset
    dataframe = pandas.read_csv(“DemoNSO.csv”, header=None)
    dataset = dataframe.values
    X = dataset[:,0:25].astype(float)
    Y = dataset[:,25]
    # encode class values as integers
    encoder = LabelEncoder()
    encoded_Y = encoder.transform(Y)
    # convert integers to dummy variables (i.e. one hot encoded)
    dummy_y = np_utils.to_categorical(encoded_Y)
    # define baseline model
    def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(700, input_dim=25, init=’normal’, activation=’relu’))
    model.add(Dense(2, init=’normal’, activation=’sigmoid’))

    # Compile model
    model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
    return model

    estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=50, batch_size=20)

    kfold = KFold(n_splits=5, shuffle=True, random_state=seed)

    results = cross_val_score(estimator, X, dummy_y, cv=kfold)
    print(“Baseline: %.2f%% (%.2f%%)” % (results.mean()*100, results.std()*100))

    X_train, X_test, Y_train, Y_test = train_test_split(X, dummy_y, test_size=0.55, random_state=seed), Y_train)
    predictions = estimator.predict(X_test)


    error message:
    ValueError: Error when checking model target: expected dense_56 to have shape (None, 2) but got array with shape (240, 3)

    • Jason Brownlee March 25, 2017 at 7:36 am #

      Confirm the size of your output (y) matches the dimension of your output layer.

  32. Alican March 28, 2017 at 4:05 am #

    Hello Jason,

    I got your model to work using Python 2.7.13, Keras 2.0.2, Theano…, by copying the codes exactly, however the results that I get are not only very bad (59.33%, 48.67%, 38.00% on different trials), but they are also different.

    I was under the impression that using a fixed seed would allow us to reproduce the same results.

    Do you have any idea what could have caused such bad results?


    • Alican March 28, 2017 at 4:28 am #

      edit: I was re-executing only the results=cross_val_score(…) line to get different results I listed above.

      Running the whole script over and over generates the same result: “Baseline: 59.33% (21.59%)”

    • Jason Brownlee March 28, 2017 at 8:25 am #

      Not sure why the results are so bad. I’ll take a look.

      The fixed seed does not seem to have an effect on the Theano or TensorFlow backends. Try running examples multiple times and take the average performance.

      • Alican April 2, 2017 at 2:30 am #

        Did you have time to look into this?

        I had my colleague run this script on Theano 1.0.1, and it gave the expected performance of 95.33%. I then installed Theano 1.0.1, and got the same result again.

        However, using Theano 2.0.2 I was getting 59.33% with seed=7, and similar performances with different seeds. Is it possible the developers made some crucial changes with the new version?

        • Jason Brownlee April 2, 2017 at 6:30 am #

          The most recent version of Theano is 0.9:

          Do you mean Keras versions?

          It may not be the Keras version causing the difference in the run. The fixed random seed may not be having an effect in general, or may not be having when a Theano backend is being used.

          Neural networks are stochastic algorithms and will produce a different result each run:

          • Alican April 2, 2017 at 6:59 am #

            Yes I meant Keras, sorry.

            There is no issue with the seed, I’m getting the same result with you on multiple computers using Keras 1.1.1. But with Keras 2.0.2, the results are absymally bad.

  33. Nalini March 29, 2017 at 3:13 am #

    Hi Jason

    in this code for multiclass classification can u suggest me how to plot graph to display the accuracy and also what should be the axis represent

    • Jason Brownlee March 29, 2017 at 9:10 am #

      No, we normally do not graph accuracy, unless you want to graph it over training epochs?

  34. Nalini March 31, 2017 at 1:42 am #


  35. Frank April 6, 2017 at 8:47 pm #

    Dear Jason,
    I have found this tutorial very interesting and helpful.
    What I wanted to ask is, I am currently trying to classify poker hands as this kaggle competition: (For a school project) I wish to create a neural network as you have created above. What do you suggest for me to start this?
    Your help would be greatly appreciated!

  36. shiva April 8, 2017 at 12:28 pm #

    Hi Jason,
    Its an awesome tutorial. It would be great if you can come up with a blog post on multiclass medical image classification with Keras Deep Learning library. It would serve as a great asset for researchers like me, working with medical image classification. Looking forward.

  37. Toby April 9, 2017 at 4:38 am #

    Thanks for the great tutorial!
    I duplicated the result using Theano as backend.
    However, using Tensorflow yield a worse accuracy, 88.67%.
    Any explanation?

  38. Anupam April 11, 2017 at 6:11 pm #

    Hi Jason, How to find the Precision, Recall and f1 score of your example?

    Case-1 I have used like :

    model.compile(loss=’categorical_crossentropy’, optimizer=’Nadam’, metrics=[‘acc’, ‘fmeasure’, ‘precision’, ‘recall’])

    Case-2 and also used :

    def score(yh, pr):
    coords = [np.where(yhh > 0)[0][0] for yhh in yh]
    yh = [yhh[co:] for yhh, co in zip(yh, coords)]
    ypr = [prr[co:] for prr, co in zip(pr, coords)]
    fyh = [c for row in yh for c in row]
    fpr = [c for row in ypr for c in row]
    return fyh, fpr

    pr = model.predict_classes(X_train)
    yh = y_train.argmax(2)
    fyh, fpr = score(yh, pr)
    print ‘Training accuracy:’, accuracy_score(fyh, fpr)
    print ‘Training confusion matrix:’
    print confusion_matrix(fyh, fpr)
    precision_recall_fscore_support(fyh, fpr)

    pr = model.predict_classes(X_test)
    yh = y_test.argmax(2)
    fyh, fpr = score(yh, pr)
    print ‘Testing accuracy:’, accuracy_score(fyh, fpr)
    print ‘Testing confusion matrix:’
    print confusion_matrix(fyh, fpr)
    precision_recall_fscore_support(fyh, fpr)

    What I have observed is that, accuracy of case-1 and case-2 are different?

    Any solution?

  39. Raynier van Egmond April 15, 2017 at 12:19 pm #

    Hi Jason,

    Like a student earlier in the comments my accuracy results are exactly the same as his:

    ********** Baseline: 88.67% (21.09%)

    and I think this is related to having Tensorflow as the backend rather than the Theano backend.

    I am working this through in a Jupyter notebook

    I went through your earlier tutorials on setting up the environment:

    scipy: 0.18.1
    numpy: 1.11.3
    matplotlib: 2.0.0
    pandas: 0.19.2
    statsmodels: 0.6.1
    sklearn: 0.18.1
    tensorflow: 1.0.1
    Using TensorFlow backend.
    keras: 2.0.3

    The Tensorflow is a Python3.6 recompile picked up from the web at:

    Do you know have I can force the Keras library to take Theano as a backend rather than the Tensorflow library?

    Thanks for the great work on your tutorials… for beginners it is such in invaluable thing to have tutorials that actually work !!!

    Looking forward to get more of your books


    • Raynier van Egmond April 15, 2017 at 12:42 pm #

      Changing to the Theano backend doesn’t change the results:

      Managed to change to a Theano backend by setting the Keras config file:
      “image_data_format”: “channels_last”,
      “epsilon”: 1e-07,
      “floatx”: “float32”,
      “backend”: “theano”

      as instructed at:

      The notebook no longer reports it is using Tensorflow so I guess the switch worked but the results are still:

      ****** Baseline: 88.67% (21.09%)

      Will need to look a little deeper and play with the actual architecture a bit.

      All the same great material to get started with

      Thanks again


      • Raynier van Egmond April 15, 2017 at 1:26 pm #

        Confirmed that changes to the model as someone above mentioned

        model.add(Dense(8, input_dim=4, kernel_initializer=’normal’, activation=’relu’))
        model.add(Dense(3, kernel_initializer=’normal’, activation=’softmax’))

        nodes makes a substantial difference:

        **** Baseline: 96.67% (4.47%)

        but there is no difference between the Tensorflow and Theano backend results. I guess that’s as far as I can take this for now.

        Take care,


    • Jason Brownlee April 16, 2017 at 9:22 am #

      You can change the back-end used by Keras in the Kersas config file. See this post:

  40. Tursun April 16, 2017 at 9:18 pm #

    Thank you very much first. These tutorials are excellent. They are very practical. Your are an excellent educator.
    I want classify my data into multiple classes of 25-30. Your IRIS example is nearest classification. They DL4J previously has IRIS classification with DBN; but disappeared in new community version.
    I have following issues:
    It takes so long. My laptop is TOSHIBA L745, 4GB RAM, i3 processor. it has CUDA.
    My classification problem is solved with SVM in very short time. I’d say in split second.
    Do you think speed would increase if we use DBN or CNN something ?
    My result :
    Baseline: 88.67% (21.09%),
    Once I have installed Docker (tensorflow in it),then run IRIS classification. It shows 96%.
    I wish similar or better accuracy. How to reach that level ?

    Thank you

  41. Chris April 17, 2017 at 5:13 am #

    Hello Jason,
    first of all, your tutorials are really well done when you start working with keras.

    I have a question about the epochs and batch_size in this tutorial. I think I haven’t understood it correctly.

    I loaded the record and it contains 150 entries.

    You choose 200 epochs and batch_size=5. So you use 5*200=1000 examples for training. So does keras use the same entries multiple times or does it stop automatically?


    • Jason Brownlee April 18, 2017 at 8:23 am #

      One epoch involves exposing each pattern in the training dataset to the model.

      One epoch is comprised of one or more batches.

      One batch involves showing a subset of the patterns in the training data to the model and updating weights.

      The number of patterns in the dataset for one epoch must be a factor of the batch size (e.g. divide evenly).

      Does that help?

      • Chris April 22, 2017 at 3:43 am #

        thank you for the explanation.
        The explanation helped me, and in the meantime I have read and tried several LSTM tutorials from you and it became much clearer to me.
        greetings, Chris

  42. Abhilash Menon April 17, 2017 at 1:27 pm #

    Hey Jason,

    I have been following your tutorials and they have been very very helpful!. Especially, the most useful section being the comments where people like me get to ask you questions and some of them are the same ones I had in my mind.

    Although, I have one that I think hasn’t been asked before, at least on this page!

    What changes should I make to the regular program you illustrated with the “pima_indians_diabetes.csv” in order to take a dataset that has 5 categorical inputs and 1 binary output.

    This would be a huge help! Thanks in advance!

  43. Tuba April 18, 2017 at 8:43 am #

    Hi Jason,

    First of all, your tutorials are really very interesting.

    I was facing error this when i run it . I’m work with python 3 and the same file input .

    Error :
    ImportError: Traceback (most recent call last):
    File “/home/indatacore/anaconda3/lib/python3.5/site-packages/tensorflow/python/”, line 61, in
    from tensorflow.python import pywrap_tensorflow
    File “/home/indatacore/anaconda3/lib/python3.5/site-packages/tensorflow/python/”, line 28, in
    _pywrap_tensorflow = swig_import_helper()
    File “/home/indatacore/anaconda3/lib/python3.5/site-packages/tensorflow/python/”, line 24, in swig_import_helper
    _mod = imp.load_module(‘_pywrap_tensorflow’, fp, pathname, description)
    File “/home/indatacore/anaconda3/lib/python3.5/”, line 242, in load_module
    return load_dynamic(name, filename, file)
    File “/home/indatacore/anaconda3/lib/python3.5/”, line 342, in load_dynamic
    return _load(spec)
    ImportError: cannot open shared object file: No such file or directory

    Failed to load the native TensorFlow runtime.


    for some common reasons and solutions. Include the entire stack trace
    above this error message when asking for help.

    • Jason Brownlee April 19, 2017 at 7:44 am #

      Ouch. I have not seen this error before.

      Consider trying the Theano backend and see if that makes a difference.

  44. Tursun April 21, 2017 at 2:17 am #

    Thank you. I got your notion: there is no key which opens all doors.

    Here, I have multi class classification problem.
    My data can be downloaded from here:

    size of my data set : 512*16, last column is 21 classes, they are digits 1-21
    note: number of samples (rows in my data) for each class is different. mostly 20 rows, but sometimes 17 or 31 rows
    my network has:
    first layer (input) has 15 neurons
    second layer (hidden) has 30 neurons
    last layer (output) has 21 neurons
    in last layer I used “softmax” based on this recommendation from
    “The softmax function transforms your hidden units into probability scores of the class labels you have; and thus is more suited to classification problems ”
    error message:
    alueError: Error when checking model target: expected dense_8 to have shape (None, 21) but got array with shape (512, 1)

    I would be thankful if you can help me to run this code.

    I modified this code from yours:
    ———–keras code start ———–
    from keras.models import Sequential
    from keras.layers import Dense
    import numpy
    # fix random seed for reproducibility
    # load pima indians dataset
    dataset = numpy.loadtxt(“tursun_deep_p6.csv”, delimiter=”,”)
    # split into input (X) and output (Y) variables
    X = dataset[:,0:15]
    Y = dataset[:,15]

    # create model
    model = Sequential()
    model.add(Dense(30, input_dim=15, activation=’relu’)) # not sure if 30 too much. not sure #about lower and upper limits
    #model.add(Dense(25, activation=’relu’)) # think about to add one more hidden layer
    model.add(Dense(21, activation=’softmax’)) # they say softmax at last L does classification
    # Compile model
    model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
    # Fit the model, Y, epochs=150, batch_size=5)
    # evaluate the model
    scores = model.evaluate(X, Y)
    print(“\n%s: %.2f%%” % (model.metrics_names[1], scores[1]*100))

    ———–keras code start ———–

    • Jason Brownlee April 21, 2017 at 8:40 am #

      I see the problem, your output layer expects 8 columns and you only have 1.

      You need to transform your output variable int 8 variables. You can do this using a one hot encoding.

  45. Shiva April 23, 2017 at 5:54 am #

    Hi jason, I am following your book deep learning with python and i have an issue with the script. I have succesfully read my .csv datafile through pandas and trying to adopt a decay based learning rate as discussed in the book. I define the initial lrate, drop, epochs_drop and the formula for lrate update as said in the book. I then created the model like this (works best for my problem) and started creating a pipeline in contrary to the model fitting strategy used by you in the book:

    def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(50, input_dim=15, kernel_initializer=’normal’, activation=’relu’))
    model.add(Dense(3, kernel_initializer=’normal’, activation=’sigmoid’))
    sgd = SGD(lr=0.0, momentum=0.9, decay=0, nesterov=False)
    model.compile(loss=’categorical_crossentropy’, optimizer=sgd, metrics=[‘accuracy’])
    return model
    #learning schedule callback
    lrate = LearningRateScheduler(step_decay)
    callbacks_list = [lrate]

    estimators = []
    estimators.append((‘standardize’, StandardScaler()))
    estimators.append((‘mlp’, KerasClassifier(build_fn=baseline_model, epochs=100,
    batch_size=5, callbacks=[lrate], verbose=1)))
    pipeline = Pipeline(estimators)
    kfold = StratifiedKFold(n_splits=2, shuffle=True, random_state=seed)
    results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)

    I’m getting the error “Cannot clone object , as the constructor does not seem to set parameter callbacks”. According to keras documentation, I can see that i can pass callbacks to the kerasclassifier wrapper. kindly suggest what to do in this occasion. Looking forward.

    • Jason Brownlee April 24, 2017 at 5:29 am #

      I have not tried to use callbacks with the sklearn wrapper sorry.

      Perhaps it is a limitation that you can’t? Though, I’d be surprised.

      you may have to use the keras API directly.

  46. Shiva April 25, 2017 at 6:23 am #

    Hi Jason,
    I’m trying to apply the image augmentation techniques discussed in your book to the data I have stored in my system under C:\images\train and C:\images\test. Could you help me with the syntax on how to load my own data with a modification to the syntax available in the book:

    # load data
    (X_train, y_train), (X_test, y_test) = mnist.load_data()

    Thanks in advance.

  47. Michael Ng April 28, 2017 at 12:49 am #


    By implementing neural network in Keras, how can we get the associated probabilities for each predicted class?’

    Many Thanks!
    Michael Ng

    • Jason Brownlee April 28, 2017 at 7:47 am #

      Review the outputs from the softmax, although not strictly probabilities, they can be used as such.

      Also see the keras function model.predict_proba() for predicting probabilities directly.

  48. Ann April 28, 2017 at 2:08 am #

    Hi, Jason! I’m exactly newbie to Keras, and I want to figure out confusion matrix by using sklearn.confusion_matrix(y_test, predict). But I was facing error this when i run it .

    ValueError Traceback (most recent call last)
    in ()
    —-> 1 confusion_matrix(y_test, predict)

    C:\Users\Ann\Anaconda3\envs\py27\lib\site-packages\sklearn\metrics\classification.pyc in confusion_matrix(y_true, y_pred, labels, sample_weight)
    240 y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    241 if y_type not in (“binary”, “multiclass”):
    –> 242 raise ValueError(“%s is not supported” % y_type)
    244 if labels is None:

    ValueError: multilabel-indicator is not supported

    I’ve checked that y_test and predict have same shape (231L, 2L).
    Any solution?
    Your help would be greatly appreciated!

    • Jason Brownlee April 28, 2017 at 7:50 am #

      Consider checking the dimensionality of both y and yhat to ensure they are the same (e.g. print the shape of them).

Leave a Reply