[New Book] Click to get Mastering Digital Art with Stable Diffusion!
Use the offer code 20offearlybird to get 20% off. Hurry, sale ends soon!

Object Classification with CNNs Using the Keras Deep Learning Library

Keras is a Python library for deep learning that wraps the powerful numerical libraries Theano and TensorFlow.

A difficult problem where traditional neural networks fall down is called object recognition. It is where a model is able to identify the objects in images.

In this post, you will discover how to develop and evaluate deep learning models for object recognition in Keras. After completing this tutorial, you will know:

  • About the CIFAR-10 object classification dataset and how to load and use it in Keras
  • How to create a simple Convolutional Neural Network for object recognition
  • How to lift performance by creating deeper Convolutional Neural Networks

Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Jul/2016: First published
  • Update Oct/2016: Updated for Keras 1.1.0 and TensorFlow 0.10.0.
  • Update Mar/2017: Updated for Keras 2.0.2, TensorFlow 1.0.1 and Theano 0.9.0.
  • Update Sep/2019: Updated or Keras 2.2.5 API.
  • Update Jul/2022: Updated for TensorFlow 2.x API

For an extended tutorial on developing a CNN for CIFAR-10, see the post:

The CIFAR-10 Problem Description

The problem of automatically classifying photographs of objects is difficult because of the nearly infinite number of permutations of objects, positions, lighting, and so on. It’s a tough problem.

This is a well-studied problem in computer vision and, more recently, an important demonstration of the capability of deep learning. A standard computer vision and deep learning dataset for this problem was developed by the Canadian Institute for Advanced Research (CIFAR).

The CIFAR-10 dataset consists of 60,000 photos divided into 10 classes (hence the name CIFAR-10). Classes include common objects such as airplanes, automobiles, birds, cats, and so on. The dataset is split in a standard way, where 50,000 images are used for training a model and the remaining 10,000 for evaluating its performance.

The photos are in color with red, green, and blue components but are small, measuring 32 by 32 pixel squares.

State-of-the-art results are achieved using very large convolutional neural networks. You can learn about state-of-the-art results on CIFAR-10 on Rodrigo Benenson’s webpage. Model performance is reported in classification accuracy, with very good performance above 90%, with human performance on the problem at 94% and state-of-the-art results at 96% at the time of writing.

There is a Kaggle competition that makes use of the CIFAR-10 dataset. It is a good place to join the discussion of developing new models for the problem and picking up models and scripts as a starting point.

Need help with Deep Learning in Python?

Take my free 2-week email course and discover MLPs, CNNs and LSTMs (with code).

Click to sign-up now and also get a free PDF Ebook version of the course.

Loading The CIFAR-10 Dataset in Keras

The CIFAR-10 dataset can easily be loaded in Keras.

Keras has the facility to automatically download standard datasets like CIFAR-10 and store them in the ~/.keras/datasets directory using the cifar10.load_data() function. This dataset is large at 163 megabytes, so it may take a few minutes to download.

Once downloaded, subsequent calls to the function will load the dataset ready for use.

The dataset is stored as pickled training and test sets, ready for use in Keras. Each image is represented as a three-dimensional matrix, with dimensions for red, green, blue, width, and height. We can plot images directly using matplotlib.

Running the code creates a 3×3 plot of photographs. The images have been scaled up from their small 32×32 size, but you can clearly see trucks, horses, and cars. You can also see some distortion in some images that have been forced to the square aspect ratio.

Small Sample of CIFAR-10 Images

Small sample of CIFAR-10 images

Simple Convolutional Neural Network for CIFAR-10

The CIFAR-10 problem is best solved using a convolutional neural network (CNN).

You can quickly start by defining all the classes and functions you will need in this example.

Next, you can load the CIFAR-10 dataset.

The pixel values range from 0 to 255 for each of the red, green, and blue channels.

It is good practice to work with normalized data. Because the input values are well understood, you can easily normalize to the range 0 to 1 by dividing each value by the maximum observation, which is 255.

Note that the data is loaded as integers, so you must cast it to floating point values in order to perform the division.

The output variables are defined as a vector of integers from 0 to 1 for each class.

You can use a one-hot encoding to transform them into a binary matrix to best model the classification problem. There are ten classes for this problem, so you can expect the binary matrix to have a width of 10.

Let’s start by defining a simple CNN structure as a baseline and evaluate how well it performs on the problem.

You will use a structure with two convolutional layers followed by max pooling and a flattening out of the network to fully connected layers to make predictions.

The baseline network structure can be summarized as follows:

  1. Convolutional input layer, 32 feature maps with a size of 3×3, a rectifier activation function, and a weight constraint of max norm set to 3
  2. Dropout set to 20%
  3. Convolutional layer, 32 feature maps with a size of 3×3, a rectifier activation function, and a weight constraint of max norm set to 3
  4. Max Pool layer with size 2×2
  5. Flatten layer
  6. Fully connected layer with 512 units and a rectifier activation function
  7. Dropout set to 50%
  8. Fully connected output layer with 10 units and a softmax activation function

A logarithmic loss function is used with the stochastic gradient descent optimization algorithm configured with a large momentum and weight decay start with a learning rate of 0.01.

You can fit this model with 25 epochs and a batch size of 32.

A small number of epochs was chosen to help keep this tutorial moving. Usually, the number of epochs would be one or two orders of magnitude larger for this problem.

Once the model is fit, you evaluate it on the test dataset and print out the classification accuracy.

Tying this all together, the complete example is listed below.

Running this example provides the results below. First, the network structure is summarized, which confirms the design was implemented correctly.

The classification accuracy and loss are printed after each epoch on both the training and test datasets.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

The model is evaluated on the test set and achieves an accuracy of 70.5%, which is not excellent.

You can improve the accuracy significantly by creating a much deeper network. This is what you will look at in the next section.

Larger Convolutional Neural Network for CIFAR-10

You have seen that a simple CNN performs poorly on this complex problem. In this section, you will look at scaling up the size and complexity of your model.

Let’s design a deep version of the simple CNN above. You can introduce an additional round of convolutions with many more feature maps. You will use the same pattern of Convolutional, Dropout, Convolutional, and Max Pooling layers.

This pattern will be repeated three times with 32, 64, and 128 feature maps. The effect is an increasing number of feature maps with a smaller and smaller size given the max pooling layers. Finally, an additional and larger Dense layer will be used at the output end of the network in an attempt to better translate the large number of feature maps to class values.

A summary of the new network architecture is as follows:

  • Convolutional input layer, 32 feature maps with a size of 3×3, and a rectifier activation function
  • Dropout layer at 20%
  • Convolutional layer, 32 feature maps with a size of 3×3, and a rectifier activation function
  • Max Pool layer with size 2×2
  • Convolutional layer, 64 feature maps with a size of 3×3, and a rectifier activation function
  • Dropout layer at 20%.
  • Convolutional layer, 64 feature maps with a size of 3×3, and a rectifier activation function
  • Max Pool layer with size 2×2
  • Convolutional layer, 128 feature maps with a size of 3×3, and a rectifier activation function
  • Dropout layer at 20%
  • Convolutional layer,128 feature maps with a size of 3×3, and a rectifier activation function
  • Max Pool layer with size 2×2
  • Flatten layer
  • Dropout layer at 20%
  • Fully connected layer with 1024 units and a rectifier activation function
  • Dropout layer at 20%
  • Fully connected layer with 512 units and a rectifier activation function
  • Dropout layer at 20%
  • Fully connected output layer with 10 units and a softmax activation function

You can very easily define this network topology in Keras as follows:

You can fit and evaluate this model using the same procedure from above and the same number of epochs but a larger batch size of 64, found through some minor experimentation.

Tying this all together, the complete example is listed below.

Running this example prints the classification accuracy and loss on the training and test datasets for each epoch.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

The estimate of classification accuracy for the final model is 79.5% which is nine points better than our simpler model.

Extensions to Improve Model Performance

You have achieved good results on this very difficult problem, but you are still a good way from achieving world-class results.

Below are some ideas that you can try to extend upon the models and improve model performance.

  • Train for More Epochs. Each model was trained for a very small number of epochs, 25. It is common to train large convolutional neural networks for hundreds or thousands of epochs. You should expect performance gains can be achieved by significantly raising the number of training epochs.
  • Image Data Augmentation. The objects in the image vary in their position. Another boost in model performance can likely be achieved by using some data augmentation. Methods such as standardization, random shifts, or horizontal image flips may be beneficial.
  • Deeper Network Topology. The larger network presented is deep, but larger networks could be designed for the problem. This may involve more feature maps closer to the input and perhaps less aggressive pooling. Additionally, standard convolutional network topologies that have been shown useful may be adopted and evaluated on the problem.


In this post, you discovered how to create deep learning models in Keras for object recognition in photographs.

After working through this tutorial, you learned:

  • About the CIFAR-10 dataset and how to load it in Keras and plot ad hoc examples from the dataset
  • How to train and evaluate a simple Convolutional Neural Network on the problem
  • How to expand a simple Convolutional Neural Network into a deep Convolutional Neural Network in order to boost performance on the difficult problem
  • How to use data augmentation to get a further boost on the difficult object recognition problem

Do you have any questions about object recognition or this post? Ask your question in the comments, and I will do my best to answer.

188 Responses to Object Classification with CNNs Using the Keras Deep Learning Library

  1. Avatar
    Aakash Nain July 24, 2016 at 9:53 pm #

    Hello Jason,
    What is the use of maxnorm in context of deep learning ?

  2. Avatar
    Aqsa July 31, 2016 at 4:46 pm #

    Hi Jason
    I am doing detection of road signs in real time . The size of my images is 800*1360. The size of the road sign varies from 16*16 to 256*256. How can I use convolutional neural network for this puppose to get good detection accuracy in real time

    • Avatar
      Jason Brownlee August 1, 2016 at 6:28 am #

      Consider how you frame the problem Aqsa. Two options are:

      1) You could rescale all images to the same size.
      2) You could zero-pad all images.

      As for the specifics of the network for this problem, you will have to design and test different structures. Perhaps you can leverage an existing well performing structure like VGG or inception.

  3. Avatar
    Jack September 1, 2016 at 12:54 pm #

    What should be the input dimensions for 3D dataset of pcd format or off format.?
    BdwI I find your tutorials very helpful.:-)

    • Avatar
      Jason Brownlee September 2, 2016 at 8:04 am #

      Sorry, I don’t know what those formats are Jack.

      • Avatar
        Jack September 2, 2016 at 2:48 pm #

        Point cloud data(PCD) contains the x,y and z coordinates of the object….. I want to build a neural network for 3D object classification… The problem I am facing is I don’t know what shd be the input to my network… For a neural network that classifies images you pass the pixel values (0-255), but a pcd file just has the coordinates…Is it wise to pass the coordinates as the inputs?..
        I can extract some features of the object ( from pcd file)… Can I pass those features as input ??
        I am new to this field, so I having difficulty understanding things…

        • Avatar
          Jason Brownlee September 3, 2016 at 6:55 am #

          I wonder if you can rescale the coordinates to all have the range 0-to-1. Then provide them directly.

          From there, you will have a baseline and can start to explore other transforms of your coords, such as perhaps projections into 2D.

        • Avatar
          Sanketh February 18, 2018 at 4:41 am #

          Did you figure out a way to do that. I am currently facing a similar situation. Can you tell me how you went on to solve that problem

  4. Avatar
    shudhan September 2, 2016 at 4:49 pm #

    May i know how to extract features from the images?

    • Avatar
      Jason Brownlee September 3, 2016 at 6:58 am #

      We no longer need to extract features when using deep learning methods as we are performing automatic feature learning. A great benefit of the approach.

  5. Avatar
    Walid Ahmed September 10, 2016 at 4:52 am #

    Thanks a lot.

    I have one question

    In Keras, How can I extract the exact location of the detected object (or objects) within image that includes a background?
    I assume it uses sliding window for object detection

    • Avatar
      Jason Brownlee September 10, 2016 at 7:12 am #

      Great question Walid.

      This is called object identification in an image. I do not have an example at the moment, but I will prepare one in the future.

  6. Avatar
    Vinay September 12, 2016 at 5:04 am #

    Hi…could you please give same example for prima diabetes or airline passenger data set. My question is to apply CNN for direct numeric features. You could give any simple example

  7. Avatar
    Walid Ahmed September 14, 2016 at 3:22 am #

    Thanks Jason, I will look forward to your example.

  8. Avatar
    NotMikeJones September 29, 2016 at 11:31 am #

    Where in Keras are you specifying the input dimension for your first convolution layer? I’d like to try a convolution NN with time series for event detection, and am having issues with keras 1d convolution working. Let’s say each of my samples are a timeseries represented by a 1 x 100 vector, and within the vector, I expect three types of events to occur somewhere in those time frames (unclear what the length of the event would be, but lets say roughly 10 time points across). Would I use three feature maps, and then use a ‘window’ of, say 10 time points, that map onto a single neuron in the convolution later? So I would have a convolution layer of 3 x 10?


    • Avatar
      Jason Brownlee September 30, 2016 at 7:47 am #

      LSTM is the network for dealing with sequences rather than CNN.

      CNN is good at spatial structure such as in images or text.

      This tutorial on time series with LSTMs might be what you’re looking for:

      • Avatar
        NotMikeJones October 1, 2016 at 12:12 am #

        Thanks for that! I actually think I misspoke – I don’t want to forecast values at future time points, but instead identify if a current set of timepoints fit a pattern that indicate a certain event.

        For example, let’s say we have fitness tracker data with various types of sensors (heart rate, pedometer, accelerometer), and we know a person does three types of activity: yoga, running, cooking. I want to train a model to identify these activities based on sensor data, and then be able to pull real-time data and classify what they are currently doing.

        I was thinking a CNN with a windowed-approach would be the best bet, but I might be completely off-base.

        • Avatar
          Jason Brownlee October 1, 2016 at 8:02 am #

          It does sound like an anomaly detection or change detection problem, you may have benefit in framing the problem this way.

  9. Avatar
    Rafi October 18, 2016 at 6:34 pm #

    Hi Jason,

    I’m running the exact same code in this page which produced a 71.82% accuracy on test data.

    only difference is I’m using a validation data set split by 70-30%. I’m getting only less than 11% validation and test accuracy. What could be my mistake? Have you faced such
    results? please help.


    35000/35000 [==============================] – 94s – loss: 2.2974 – acc: 0.1158 – val_loss: 2.3033 – val_acc: 0.0991
    Epoch 24/25
    35000/35000 [==============================] – 94s – loss: 2.2961 – acc: 0.1170 – val_loss: 2.3035 – val_acc: 0.1022
    Epoch 25/25
    35000/35000 [==============================] – 93s – loss: 2.2954 – acc: 0.1213 – val_loss: 2.3036 – val_acc: 0.0987

  10. Avatar
    Walid Ahmed November 2, 2016 at 2:30 am #

    Dear Jason

    I appreciate if you can illustrate In Keras : How can I extract the exact location of the detected object (or objects) within image that includes a background?

    • Avatar
      Jason Brownlee November 2, 2016 at 9:08 am #

      Great question Walid,

      Sorry, I don’t have an example of object localization with Keras yet. It is on the TODO list though.

  11. Avatar
    Augusto Aguirre November 2, 2016 at 2:28 pm #

    Hello! Very good explanation!
    I am using your model to classify images containing either 0 or 1.
    To resolve this, I am able to pleanteas using initially with a variant in the last hidden layer:

    model.add (Dense (1, activation = ‘sigmoid’))

    For me return a result between 0 or 1.

    When I want to adjust the model gives me the following error:
    ‘Error when checking input model: convolution2d_input_20 expected to have 4 dimensions, but got array With shape (8000, 3072)’
    My dataset are 32×32 RGB images. Therefore, contains 3072 columns, but one with a 0 or 1.

    • Avatar
      Jason Brownlee November 3, 2016 at 7:50 am #

      Hi Augusto, sorry to hear about the error on your own data.

      It is not clear what the cause could be, sorry. Perhaps you are able to experiment and discover the root cause. Try simplifying your example to the minimum required and see if that helps to flush it out.

  12. Avatar
    Walid Ahmed November 11, 2016 at 12:49 am #

    Why would I need to apply a dropout layer before a convolutional layer?

    It just make sense for me when applied to input layer or any other layers in the fully connected layers.

    • Avatar
      Jason Brownlee November 11, 2016 at 10:03 am #

      Hi Walid,

      It’s all about putting pressure on the network to force it to generalize. It may or may not be a good pressure point to force this type of learning. Try and see on your problem.

  13. Avatar
    William Amador November 11, 2016 at 3:51 am #

    Hi,Jason I have a question, how is the procedure for training if in my initial layer I do not handle a single image but a sequence of 30 images. For a specific case a sequence of movements of a person; Do you have any examples of training for a CNN for this case

    Thank you

    • Avatar
      Jason Brownlee November 11, 2016 at 10:05 am #

      Hi William, great question. Sorry, I don’t have any worked examples of working with sequences of images at the moment.

  14. Avatar
    Walid Ahmed November 15, 2016 at 2:47 am #

    Hi Jason

    When I removed the dropout layers before any convolutional layer, my results improved.
    especially when the size of dataset is large.

  15. Avatar
    Walid Ahmed November 16, 2016 at 6:06 am #

    Hi Jason.
    Another question , I know that keras comes with different optimizers, in your code you used sgd,others may use another optimizer like adam.
    any advice or recommendation about yupt of optimizer?

    • Avatar
      Jason Brownlee November 16, 2016 at 9:34 am #

      Hi Walid,

      I find optimizers generally make minor differences to the results – move the needle less than the network topology.

      SGD is well understood and a great place to start. ADAM is fast and gives good results and I often use it in practice. I don’t really have much opinions beyond that.

  16. Avatar
    Bharath Paturi November 16, 2016 at 5:54 pm #

    Hi Jason,

    I have very large images to analyze. Probably each image size will be around 500 MB to 1 GB.
    I wanted to apply segmentation to the images. Can we use convolution NN to do unsupervised learning.

    • Avatar
      Jason Brownlee November 17, 2016 at 9:52 am #

      Ouch Bharath, they are massive images.

      It is possible, but you’re going to run out of memory really fast!

      I don’t have good advice, sorry. I have not researched this specific problem.

  17. Avatar
    Michael November 17, 2016 at 8:40 am #


    I saw the line where you add a layer has a typo. It should read

    model.add(Convolution2D(32, 3, 3, input_shape=(32, 32, 3), border_mode=’same’, activation=’relu’,

    • Avatar
      Jason Brownlee November 17, 2016 at 9:58 am #

      Are you sure Michael? It all looks good to me and the example runs with Theano and TensorFlow backends.

      Maybe I’m missing something?

      • Avatar
        Sean July 4, 2017 at 8:35 am #

        I think Michael is right. It throws me an error with input_shape=(3,32,32). input_shape=(32,32,3) should be the correct one

    • Avatar
      Maxim December 19, 2016 at 1:44 am #

      Michael, for solving your problem put the line: K.set_image_dim_ordering(‘th’)

      ABOVE the

      (X_train, y_train), (X_test, y_test)= cifar10.load_data()

  18. Avatar
    Shristi Baral November 21, 2016 at 10:15 pm #

    How do I deal with the error? Got this while running script under “Loading The CIFAR-10 Dataset in Keras”. I tried altering the script of cifar10.py to figure out what the error is. But I couldnot.
    UnicodeDecodeError Traceback (most recent call last)
    in ()
    4 from scipy.misc import toimage
    5 # load data
    —-> 6 (X_train, y_train), (X_test, y_test) = cifar10.load_data()
    7 # create a grid of 3×3 images
    8 for i in range(0, 9):

    /home/kdc/anaconda3/lib/python3.5/site-packages/keras/datasets/cifar10.py in load_data()
    18 for i in range(1, 6):
    19 fpath = os.path.join(path, ‘data_batch_’ + str(i))
    —> 20 data, labels = load_batch(fpath)
    21 X_train[(i-1)*10000:i*10000, :, :, :] = data
    22 y_train[(i-1)*10000:i*10000] = labels

    /home/kdc/anaconda3/lib/python3.5/site-packages/keras/datasets/cifar.py in load_batch(fpath, label_key)
    10 d = cPickle.load(f)
    11 else:
    —> 12 d = cPickle.load(f, encoding=”bytes”)
    13 # decode utf8
    14 for k, v in d.items():

    UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0x80 in position 3031: ordinal not in range(128)

    • Avatar
      Jason Brownlee November 22, 2016 at 7:05 am #

      I have not seen this error before, perhaps post to stack overflow or the Keras list?

  19. Avatar
    Aquib Javed Khan November 22, 2016 at 8:35 pm #

    Hi, Thanks for the awesome tutorial, you’r codes and explanations helps me lot in understanding the classification task.

    I actually want to feed new images want to get the return the label matches to it, How can I do it like I am doing like this:

    import keras
    from keras.models import load_model
    from keras.models import Sequential
    import cv2
    import numpy as np
    from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
    model = Sequential()

    model =load_model(‘firstmodel.h5′)

    img = cv2.imread(‘cat.jpg’,0)
    img = cv2.resize(img,(150,150))

    classes = model.predict_classes(img, batch_size=32)
    print classes

    I’m getting error:
    Exception: Error when checking : expected convolution2d_input_1 to have 4 dimensions, but got array with shape (150, 150)

    How to fix it?

    • Avatar
      Jason Brownlee November 23, 2016 at 8:58 am #

      My suggestion would be to ensure that your loaded data matches the expected input dimensions exactly. You may have to resize or pad new data to make it match.

  20. Avatar
    s December 1, 2016 at 1:23 pm #

    This is NOT objection detection, it’s classification

  21. Avatar
    Walid December 2, 2016 at 4:40 am #

    Dear Jason

    I am so eager for an example of object localization with Keras yet, I hope you can come with one soon

    • Avatar
      Jason Brownlee December 2, 2016 at 8:18 am #

      I would like to prepare one soon Walid, hopefully in the new year.

  22. Avatar
    Ahmed Desoky December 7, 2016 at 4:55 am #

    Hello Jason,

    Your tutorials are so helpful and awesome.

    I am working on a classification problem using Keras on kitti dataset. I found that kitti is not supported yet in https://keras.io/datasets/ .

    Is there any advice or an introductory point to work out with my problem ?

    Thanks for help

  23. Avatar
    Thanawin December 7, 2016 at 1:31 pm #

    HI Jason

    I have seen many tutorials, but this is the best.
    I am wondering if “scores = model.evaluate(X_test, y_test, verbose=0)” will take the last model from “model.fit()”. I would really appreciate if you can suggest me how can I evaluate my test data using the best model from “model.fit()”

  24. Avatar
    TSchecker December 14, 2016 at 7:35 pm #

    Hi Jason,

    just in case facing the error

    “AttributeError: ‘module’ object has no attribute ‘control_flow_ops'”

    found here:

    import tensorflow as tf
    tf.python.control_flow_ops = tf

  25. Avatar
    yask December 15, 2016 at 3:04 am #

    Hi Jason
    I really like your article
    I would like to extract objects such as ice, water etc. from photograph.
    is this library useful for that? How can I define training areas? Do I need to provide sample image of ice, water as a training area to the classifier? Which classifier servers best here? how about Artificial Neural Network ?

  26. Avatar
    pranoy December 19, 2016 at 7:58 pm #

    which function is used for predicting..

    if i give an image of a horse and i want to predict the output. which function i have to use

    • Avatar
      Jason Brownlee December 20, 2016 at 7:30 am #

      You can use model.predict() to make a prediction on new data.

      The new data must have the same shape as the data used to train the network.

      • Avatar
        Philip L. December 3, 2019 at 8:30 am #

        Hello, I’m a pure newbie to all this but I’m learning all thanks to your amazing tutorials. But I still don’t understand how to use model.predict() especially on this example. Do you have an example of this project with model.predict()?? I really don’t have any idea yet how am I able to code that will load my own dataset to be predicted, rescale it to 32×32 dimension (is this the size of data?? not sure though) and then use that image as X_test. there’s so much I don’t know yet..
        Any response will be of so much significance to me.. Thank you

        • Avatar
          Jason Brownlee December 3, 2019 at 1:32 pm #


          You can predict for a single image by calling model.predict(), it will take an array with the shape [samples, rows, cols, channels], where you probably have 1 sample, 32×32 rows and cols and 3 channels.

          Does that help?

  27. Avatar
    Milos December 29, 2016 at 10:34 pm #

    Hi Jason,

    great article. I have a question. In Dense part you have specified 512 neurons. Can you tell me how did you determine number of neurons?


    • Avatar
      Jason Brownlee December 30, 2016 at 5:50 am #

      Hi Milos, I used trial and error. Selecting the number and size of layer is an art – test a lot of configs.

      • Avatar
        Milos December 30, 2016 at 7:07 pm #

        Thanks for answer. I have assumed so , but I had to ask :).
        Great and really useful articles I have found on your site :).

  28. Avatar
    Deepak January 30, 2017 at 3:56 pm #

    When I use the model.predict, the following error is seen. Please help me

    TypeError Traceback (most recent call last)
    in ()
    —-> 1 model.predict(tmp)

    /usr/local/lib/python2.7/dist-packages/keras/models.pyc in predict(self, x, batch_size, verbose)
    722 if self.model is None:
    723 self.build()
    –> 724 return self.model.predict(x, batch_size=batch_size, verbose=verbose)
    726 def predict_on_batch(self, x):

    /usr/local/lib/python2.7/dist-packages/keras/engine/training.pyc in predict(self, x, batch_size, verbose)
    1266 f = self.predict_function
    1267 return self._predict_loop(f, ins,
    -> 1268 batch_size=batch_size, verbose=verbose)
    1270 def train_on_batch(self, x, y,

    /usr/local/lib/python2.7/dist-packages/keras/engine/training.pyc in _predict_loop(self, f, ins, batch_size, verbose)
    944 ins_batch = slice_X(ins, batch_ids)
    –> 946 batch_outs = f(ins_batch)
    947 if not isinstance(batch_outs, list):
    948 batch_outs = [batch_outs]

    /usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.pyc in __call__(self, inputs)
    957 def __call__(self, inputs):
    958 assert isinstance(inputs, (list, tuple))
    –> 959 return self.function(*inputs)

    /usr/local/lib/python2.7/dist-packages/Theano-0.9.0.dev5-py2.7.egg/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
    786 s.storage[0] = s.type.filter(
    787 arg, strict=s.strict,
    –> 788 allow_downcast=s.allow_downcast)
    790 except Exception as e:

    /usr/local/lib/python2.7/dist-packages/Theano-0.9.0.dev5-py2.7.egg/theano/tensor/type.pyc in filter(self, data, strict, allow_downcast)
    115 if allow_downcast:
    116 # Convert to self.dtype, regardless of the type of data
    –> 117 data = theano._asarray(data, dtype=self.dtype)
    118 # TODO: consider to pad shape with ones to make it consistent
    119 # with self.broadcastable… like vector->row type thing

    /usr/local/lib/python2.7/dist-packages/Theano-0.9.0.dev5-py2.7.egg/theano/misc/safe_asarray.pyc in _asarray(a, dtype, order)
    32 dtype = theano.config.floatX
    33 dtype = numpy.dtype(dtype) # Convert into dtype object.
    —> 34 rval = numpy.asarray(a, dtype=dtype, order=order)
    35 # Note that dtype comparison must be done by comparing their num
    36 # attribute. One cannot assume that two identical data types are pointers

    /home/yashwanth/.local/lib/python2.7/site-packages/numpy/core/numeric.pyc in asarray(a, dtype, order)
    530 “””
    –> 531 return array(a, dtype, copy=False, order=order)

    TypeError: Bad input argument to theano function with name “/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py:955” at index 0 (0-based).
    Backtrace when that variable is created:

    File “/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py”, line 2821, in run_ast_nodes
    if self.run_code(code, result):
    File “/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py”, line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
    File “”, line 2, in
    model.add(Convolution1D(64, 2, input_shape=[1,4], border_mode=’same’, activation=’relu’, W_constraint=maxnorm(3)))
    File “/usr/local/lib/python2.7/dist-packages/keras/models.py”, line 299, in add
    layer.create_input_layer(batch_input_shape, input_dtype)
    File “/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py”, line 397, in create_input_layer
    dtype=input_dtype, name=name)
    File “/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py”, line 1198, in Input
    File “/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py”, line 1116, in __init__
    File “/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py”, line 110, in placeholder
    x = T.TensorType(dtype, broadcast)(name)
    float() argument must be a string or a number

    • Avatar
      Jason Brownlee February 1, 2017 at 10:21 am #

      I’m sorry to hear that.

      The cause is not obvious to me, the stack trace is hard to read.

      Perhaps you could try posting to stackoverflow or the Keras google group?

  29. Avatar
    Rajesh February 6, 2017 at 4:08 am #

    Hi Jason,

    Any idea about the following error? Everything looked good till the model.summary point. But when I tried to fit the model, I am seeing the following error.

    ValueError Traceback (most recent call last)
    in ()
    1 # Fit the model
    —-> 2 model.fit(X_train, y_train, validation_data=(X_test, y_test), nb_epoch=epochs, batch_size=32)
    3 # Final evaluation of the model
    4 scores = model.evaluate(X_test, y_test, verbose=0)
    5 print(“Accuracy: %.2f%%” % (scores[1]*100))

    /home/rajesh/anaconda2/lib/python2.7/site-packages/keras/models.pyc in fit(self, x, y, batch_size, nb_epoch, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, **kwargs)
    670 class_weight=class_weight,
    671 sample_weight=sample_weight,
    –> 672 initial_epoch=initial_epoch)
    674 def evaluate(self, x, y, batch_size=32, verbose=1,

    /home/rajesh/anaconda2/lib/python2.7/site-packages/keras/engine/training.pyc in fit(self, x, y, batch_size, nb_epoch, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch)
    1115 class_weight=class_weight,
    1116 check_batch_axis=False,
    -> 1117 batch_size=batch_size)
    1118 # prepare validation data
    1119 if validation_data:

    /home/rajesh/anaconda2/lib/python2.7/site-packages/keras/engine/training.pyc in _standardize_user_data(self, x, y, sample_weight, class_weight, check_batch_axis, batch_size)
    1028 self.internal_input_shapes,
    1029 check_batch_axis=False,
    -> 1030 exception_prefix=’model input’)
    1031 y = standardize_input_data(y, self.output_names,
    1032 output_shapes,

    /home/rajesh/anaconda2/lib/python2.7/site-packages/keras/engine/training.pyc in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
    122 ‘ to have shape ‘ + str(shapes[i]) +
    123 ‘ but got array with shape ‘ +
    –> 124 str(array.shape))
    125 return arrays

    ValueError: Error when checking model input: expected convolution2d_input_6 to have shape (None, 3, 32, 32) but got array with shape (50000, 32, 32, 3)

    • Avatar
      Rajesh February 6, 2017 at 4:22 am #

      I understood the issue. I was mistaking at the input_shape step.

      It is a very nice tutorial. very well written. Thanks

      • Avatar
        Rajesh February 6, 2017 at 8:59 am #

        Sorry to spam..but I am still seeing the error 🙁

        • Avatar
          Jason Brownlee February 6, 2017 at 9:45 am #

          Hi Rajesh, you may want to confirm that you are not missing any lines of code.

          Also, confirm your version of Keras, TensorFlow/Theano and Python.

  30. Avatar
    Sam February 6, 2017 at 9:11 pm #

    Thank you for your example. I tried to run your code. But our network server is blocked and the code could not get data from the below url.


    Can you let me know if there is another way to make training data set without using download wrapper method ?

    • Avatar
      Sam February 6, 2017 at 9:37 pm #

      Oh, I found a solution through googling. Thanks anyway.

  31. Avatar
    John February 7, 2017 at 2:10 pm #


    How I can recognize bike in video. It would be great if you can give example.

    • Avatar
      Jason Brownlee February 8, 2017 at 9:32 am #

      Great question John, it is an area I’d like to cover in the future.

  32. Avatar
    Ikhsan March 17, 2017 at 2:22 pm #

    Hi Jason,

    Thanks for the tutorial. I have a question about the random.seed(seed). Why do we need to seed it first and where is the random number generator used in the rest of the code?


  33. Avatar
    Ali April 21, 2017 at 5:43 pm #

    Hi jason

    Thanks for this great tutorial, I am trying to run the code but I got the following error any suggestion to fix

    I am using keras with theona 2.7 back end

    model = Sequential()
    model.add(Conv2D(32, (3, 3), input_shape=(3, 32, 32), padding=’same’, activation=’relu’, kernel_constraint=maxnorm(3)))
    model.add(Conv2D(32, (3, 3), activation=’relu’, padding=’same’, kernel_constraint=maxnorm(3)))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dense(512, activation=’relu’, kernel_constraint=maxnorm(3)))
    model.add(Dense(num_classes, activation=’softmax’))
    # Compile model
    epochs = 25
    lrate = 0.01
    decay = lrate/epochs
    sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
    model.compile(loss=’categorical_crossentropy’, optimizer=sgd, metrics=[‘accuracy’])

    Traceback (most recent call last):

    File “”, line 2, in
    model.add(Conv2D(32, (3, 3), input_shape=(3, 32, 32), padding=’same’, activation=’relu’, kernel_constraint=maxnorm(3)))

    TypeError: __init__() takes at least 4 arguments (4 given)

    • Avatar
      Jason Brownlee April 22, 2017 at 9:24 am #

      I’m not sure Ali, I have not seen this error before.

      Perhaps confirm that you have copied the code exactly?

      Consider removing arguments to help zoom in on the cause of the fault.

  34. Avatar
    AIZEN May 1, 2017 at 12:12 am #

    Hi, i already manage to detect object in the still images. What am i supposed to do to detect objects in a video input? i intend to use the same Keras CNN.

    • Avatar
      Jason Brownlee May 1, 2017 at 5:57 am #

      You could process each frame of the video as an image with a CNN and use an LSTM to handle sequences of data from the CNN.

  35. Avatar
    Supriya May 2, 2017 at 2:24 am #

    Hi, in this example we have created CNN model , but how to test it..

    • Avatar
      Jason Brownlee May 2, 2017 at 6:02 am #

      You can use a train/test split or k-fold cross validation.

  36. Avatar
    Chao May 18, 2017 at 3:55 pm #

    What does “kernel_constraint=maxnorm(3)” mean?

    Thanks a lot!

  37. Avatar
    Mohamed Mnete June 1, 2017 at 5:22 pm #

    Hi, say I had an image I would like the model you just made to predict what it has on it. I save it in the same directory as the python file. How can I load this to put into he prediction function and how can I write the prediction function. I would also like to try and use my own images on your model. I was also stuck on the technique of changing the images to numpy.arrays. I would also like the output to be either -1 or 1 . How can I code this up. Please help….

  38. Avatar
    Natthaphon June 8, 2017 at 1:25 pm #

    So can I insert annotation in each images

  39. Avatar
    Daniel June 18, 2017 at 5:33 am #


    I’m trying to do an image classifier that determines if something should be given a specific hashtag or not. My problem is that the accuracy of the classifier after each epoch remains constant, and is essentially assigning the same class to all images. This makes no sense as the image classes are fairly distinct (#gym and #foraging). I basically copied the smaller CNN you used:

    model = Sequential()
    model.add(Convolution2D(32, 3, 3, input_shape=(3, 100, 100), activation=’relu’))
    model.add(Convolution2D(32, 3, 3, activation=’relu’))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dense(512, activation=’relu’))
    model.add(Dense(1, activation=’softmax’))

    epochs = 10
    lrate = 0.001
    decay = lrate/epochs
    sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
    print(‘compiling model’)
    model.compile(loss=’binary_crossentropy’, optimizer=sgd, metrics=[‘accuracy’])

    print(“fitting model”)
    model.fit(X_training, Y_training, nb_epoch=epochs, batch_size=100)

    But every time, I get these results:

    fitting model
    Train on 908 samples, validate on 908 samples
    Epoch 1/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000
    Epoch 2/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000
    Epoch 3/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000
    Epoch 4/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000
    Epoch 5/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000
    Epoch 6/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000
    Epoch 7/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000
    Epoch 8/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000
    Epoch 9/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000
    Epoch 10/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000

    What am I doing wrong?

  40. Avatar
    Darlington Akogo June 28, 2017 at 10:42 am #

    Hello Jason, thanks for your great tutorials! I need some help please, I’m working on an a medical image recognition(diagnostics) project based on your tutorials. Over here, you import the Cifar10 dataset provided via Keras with the load_data() function, but since am downloading the dataset from a diff source, that wouldn’t be possible. So from some research, I came across Keras’ flow_from_directory() function for Image data processing, which is amazing, I could just separate the images into folders and it’d consider them as classes. However, medical images are in the “DICOM” image format, and Keras image functions don’t seem to support it, so with further research, I came across pydicom module, for processing DICOM images in python, however, now, I can’t use flow_from_directory(), can you PLEASE offer some help as to how I can train my ConvNet model with DICOM images and be able to use it to classify(with predict() function) any new DICOM image?

    Thanks in advanced.

    • Avatar
      Jason Brownlee June 29, 2017 at 6:28 am #

      I don’t know about that format.

      Perhaps you can covert the images?
      Perhaps you can put together your own DICOM-comapitable flow from dir function?

    • Avatar
      shila mosammami December 10, 2017 at 9:21 am #

      Dear Friend,
      try this:

  41. Avatar
    Bruce Wind June 29, 2017 at 11:30 am #

    Hi, Jason, thanks for sharing. I test the code you provided, but my machine does not support CUDA, so it runs very slowly( half an hour per epoch). Since you have such a powerful computer, could you please show the results after hundreds or thousands epoches later? Thanks.

  42. Avatar
    Nunu July 23, 2017 at 10:00 pm #

    Dear Jason,
    Really it is a very nice tutorial :). If you will plot acc vs acc_val there is a gap between the graph of acc and the graph of acc_val does this mean an overfitting ?! and also what I ( correct me if iam wrong) the accuracy graph should become after a certain number of epochs asymptotic( that is the acuracy will not increase anymore) !!
    Thanks in advance

    • Avatar
      Jason Brownlee July 24, 2017 at 6:54 am #

      If acc is less than val_acc than it may mean that the model is underfitting and that perhaps a larger model or a model fit for longer would do better on the validation set.

      • Avatar
        Nunu July 24, 2017 at 6:55 pm #

        yes it is true but also if the acc_val is more than the acc then there is overfitting and I noticed this in your both results above ! what could be the reason of the overfitting ? and what we can do to get rid form it.


        • Avatar
          Jason Brownlee July 25, 2017 at 9:40 am #

          Perhaps train less, perhaps train a smaller model, perhaps add some regularization like dropout.

          I hope that helps as a start.

  43. Avatar
    Nunu July 25, 2017 at 7:40 pm #

    Yes I added dropouts and I added one more fully connected layer and i guess it worked.

    Thanks a lot Jason 🙂
    Best regards,

  44. Avatar
    Luna August 27, 2017 at 6:46 pm #

    Hi Jason,

    Normally CNN for image classification will result in a vector contented probabilities of possible class. Is there a function in this library to extract the vector? Thank you!

    • Avatar
      Jason Brownlee August 28, 2017 at 6:49 am #

      You can use a softmax activation function on the output layer to get probabilities for multiple classes or sigmoid activation for binary class probabilities.

  45. Avatar
    Walid Ahmed September 2, 2017 at 4:55 am #

    is there is a way to apply the keras.predict function in a vectorized manner so that it can handle more than one input(image) simultaneously?


    • Avatar
      Jason Brownlee September 2, 2017 at 6:17 am #

      Yes, I believe predict() can take a list of samples (X) and return a list of predictions (yhat).

  46. Avatar
    zoda September 5, 2017 at 3:32 am #


    what can i do if i have one image which consistent of 6 objects in the image how can i detect and measure the size of the object in the image. it would be helpful if you gave me some tips
    thank you .

    • Avatar
      Jason Brownlee September 7, 2017 at 12:37 pm #

      You can use object localization.

      Sorry, I don’t have an example.

  47. Avatar
    Sunil A Patel September 24, 2017 at 4:10 pm #


    I want to detect hand in a image. my problem is the size of hand is tiny in a image and its vary image to image(my image size is 1920*1080),. An image contain hand, cup ,face, table, etc…..

    My problem is
    1. should we train CNN for various hand shape and fixed size for example 50X50 or else
    2. how to find localization of hand.

    • Avatar
      Jason Brownlee September 25, 2017 at 5:37 am #

      Sorry, I don’t have examples of object localization. I hope to develop examples in the future.

  48. Avatar
    vinay October 17, 2017 at 12:00 am #

    How to run this code on GPU enabled machine?

  49. Avatar
    Miles October 18, 2017 at 1:28 am #

    Hi Jason,

    In the first example network, you use a maxnorm kernel constraint in all hidden layers (conv and fully connected). I understand the advantages of this, particularly when used in combination with dropout. I’m wondering if there was a reason you removed the kernel constraint from the conv. layers in the second iteration of your model? If so, would you mind explaining the motivation for doing so? Thanks!

    • Avatar
      Jason Brownlee October 18, 2017 at 5:39 am #

      It was few years ago now, I don’t remember. Perhaps the model achieved better results without the constraint?

  50. Avatar
    Tin Tran October 18, 2017 at 6:44 pm #

    Hi Jason,

    When we completed training the model, how do we use that model with a new picture?

  51. Avatar
    Gabriele Minniti October 28, 2017 at 8:10 pm #

    Hi Jason! My name is Gabriele, and I’m a young data scientis! I’m current working with the CNNs and i’ve few question for you.

    1. I’ve seen that you are using images 32 x 32, my question is about which is the best shape for an high-resolution image of 1070 – 720? It’s necessary that the images are squared?

    2. How significatly could be the improvement of the number of epoch in ‘accuracy metrics’? Maybe 100 epoch? 1000? Could them return an high perfomarce model?

    Thanks a lot!
    PS. Sorry for my “not perfect english” i hope that you understand my questions!

  52. Avatar
    Kumar November 7, 2017 at 6:44 pm #

    Dear Jason,

    I am trying to identify multiple objects in an image and count the number of objects for each class in each image. Can you help me on this.

    • Avatar
      Jason Brownlee November 8, 2017 at 9:21 am #

      Sounds like a great problem, but I don’t have material on this type of problem, sorry.

  53. Avatar
    Arun November 15, 2017 at 12:10 am #


    Traceback (most recent call last):

    File “”, line 1, in
    b=model.fit(X, Y,validation_split=0.33, epochs=150, batch_size=10,verbose=0)

    File “/Users/arun/anaconda/lib/python3.6/site-packages/keras/models.py”, line 893, in fit

    File “/Users/arun/anaconda/lib/python3.6/site-packages/keras/engine/training.py”, line 1555, in fit

    File “/Users/arun/anaconda/lib/python3.6/site-packages/keras/engine/training.py”, line 1409, in _standardize_user_data

    File “/Users/arun/anaconda/lib/python3.6/site-packages/keras/engine/training.py”, line 126, in _standardize_input_data
    array = data[i]

    UnboundLocalError: local variable ‘arrays’ referenced before assignment

    I am getting this error when trying to fit the model. Looks like its an error in Keras.

  54. Avatar
    Himmat Ram Bairwa November 16, 2017 at 6:37 pm #

    i have give my own .jpg image then i want to predict the class but i am getting wrong result.

    img = cv2.imread(‘C:/Users/8himmat/Desktop/ML/data/train/cats/dog2.jpg’)
    img = cv2.resize(img,(32,32))
    img = numpy.array(img)
    img = numpy.reshape(img,[1,3,32,32])
    preds = model.predict_classes(img)

    • Avatar
      Jason Brownlee November 17, 2017 at 9:23 am #

      Perhaps the model is not good enough for your example?

      Try more examples, try training the model with more examples like the example you tried, try augmenting during training, and so on.

  55. Avatar
    Sheila M December 10, 2017 at 9:38 am #

    Dear Jason,
    First thanks a million for the tutorial you have provided.
    I am trying to work on liver segmentation with ConvNN, and my data set is Sliver07 which includes meta images. there are two folders named scan which includes the oroginal meta images of CTscan, 20 on total and the second folder named segment which includes 20 meta images of segmented livers. Each instance has 2 formats .mhd and .raw, they can be easily viewed by the itksnap. I can load data and get the numpy array but the numpy array i get from the output for the original images and segmented images are as below:
    [[[-1007 -997 -1007 …, -973 -969 -1007]
    [-1002 -1000 -1007 …, -979 -999 -1015]
    [-1000 -993 -1003 …, -999 -1009 -1007]
    [ -894 -877 -883 …, -893 -895 -906]
    [ -872 -878 -894 …, -896 -892 -894]
    [ -873 -882 -893 …, -898 -909 -896]]

    [[-1005 -999 -1006 …, -972 -964 -1008]
    [-1008 -995 -1004 …, -966 -995 -1022]
    [-1004 -991 -1001 …, -990 -1015 -1008]
    [ -905 -882 -897 …, -889 -878 -895]
    [ -879 -884 -910 …, -885 -880 -891]
    [ -874 -893 -907 …, -882 -887 -897]]

    [[-1001 -1010 -1009 …, -991 -967 -1000]
    [-1000 -1004 -1006 …, -977 -989 -1019]
    [ -993 -993 -1001 …, -988 -1013 -1001]
    [ -902 -918 -911 …, -899 -888 -892]
    [ -905 -909 -911 …, -899 -887 -888]
    [ -909 -911 -908 …, -900 -901 -896]]

    [[-1001 -1002 -1002 …, -1007 -998 -1004]
    [-1004 -1004 -1005 …, -1003 -1009 -1000]
    [ -996 -1002 -1005 …, -996 -1000 -988]
    [ -888 -896 -897 …, -883 -888 -898]
    [ -879 -875 -866 …, -886 -881 -894]
    [ -878 -866 -867 …, -895 -891 -896]]

    [[ -986 -990 -997 …, -1010 -1004 -1008]
    [ -999 -994 -995 …, -1003 -1007 -1002]
    [-1006 -1011 -1006 …, -1000 -996 -980]
    [ -887 -894 -900 …, -888 -893 -903]
    [ -882 -885 -879 …, -891 -883 -892]
    [ -880 -876 -876 …, -896 -884 -882]]

    [[-1000 -1004 -1003 …, -1002 -1003 -997]
    [-1007 -1002 -1004 …, -1002 -997 -998]
    [-1013 -1011 -997 …, -998 -995 -993]
    [ -887 -894 -891 …, -886 -891 -900]
    [ -892 -892 -885 …, -892 -885 -893]
    [ -895 -893 -885 …, -902 -893 -889]]]
    segmented numpy:
    [[[0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]]

    [[0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]]

    [[0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]]

    [[0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]]

    [[0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]]

    [[0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]
    [0 0 0 …, 0 0 0]]]
    1) I have no idea how i can build a model for this.
    2)The numbers seem to be so strange for me, why all negative in original and all zero in segmented ones?
    3) I expected to have 20 array for each folder, which I cannot make sense with these results.
    Best Regards

    • Avatar
      Jason Brownlee December 11, 2017 at 5:20 am #

      Perhaps contact the owners of the dataset to ask for more information about it?

  56. Avatar
    Fabrizio January 18, 2018 at 11:29 pm #

    Hi Jason,
    I’m currently working on a real-time object detection. I have some constrains I need to keep in mind: this program should work under an ordinary PC desktop and the main idea is that it should be able to detect objects taking a video as input. So it shouldn’t be expensive when it comes to computational cost.
    Is a CNN a good approach to solve this problem? Googling, I’ve seen a pretrained DNN on OpenCV (I should use OpenCV as much as possible) but it runs under Caffe and I have no idea about Caffe.
    That’s why I am here and I think a CNN would be a nice approach. What do you think? Do you have some extra advices/tips?

    • Avatar
      Jason Brownlee January 19, 2018 at 6:29 am #

      I think running a CNN is straight forward, training is very slow.

  57. Avatar
    ali February 3, 2018 at 6:37 am #

    hi.i did not see any image identification!

  58. Avatar
    Pratip March 11, 2018 at 12:35 am #

    Hello Sir ,

    Can you please tell me what is the error , i have just copied the above code but it displays some error .

    ValueError: Error when checking input: expected conv2d_26_input to have shape (3, 32, 32) but got array with shape (32, 32, 3)

    I have tried changing the input shape but error still exists .

    • Avatar
      Jason Brownlee March 11, 2018 at 6:29 am #

      It looks like the model expects the data to be one size and the data is another. You can change the model or change the data.

  59. Avatar
    Satendra Ramesh Varma March 26, 2018 at 7:56 am #

    It takes too long for CIFAR dataset to download (EST 4 HRS+). Is there a way we can load the data directly using the zip file from their website ?

    • Avatar
      Satendra Ramesh Varma March 26, 2018 at 8:29 am #

      Never mind. I found the solution.
      Download the zip file from the CIFAR dataset website.
      Place the zip file in keras–>datasets folder.
      The zip file contains the batches folder.
      Now use 7zip on windows to compress the file first as tar and then as zip using Add to Archive option in right click menu
      And run the code again.
      Saves a lot of time and no code changes.

    • Avatar
      Jason Brownlee March 26, 2018 at 10:04 am #

      I’m not sure off hand, sorry.

  60. Avatar
    Meenakshi Choudhary April 7, 2018 at 4:15 pm #

    Could You please share the code for fusion of multiple convolutional neural networks in python/R/Matlab.

  61. Avatar
    Satyajit Pattnaik April 9, 2018 at 3:15 pm #

    Need a direction on how to proceed with Crack detection, I have certain images of some machines out of which some images have cracks on it, need to identify the images having cracks, how this problem can be solved?

    Tried using Canny edge detection, but after finding the edges, a bit stuck how to proceed, if there are any other ways to achieve, please reply, i shall try working on it..

    • Avatar
      Jason Brownlee April 10, 2018 at 6:13 am #

      Sounds like a great problem. Perhaps you can use transfer learning with a pre-trained model like the VGG to get started with model development?

  62. Avatar
    Raj April 12, 2018 at 11:14 pm #

    Dear Jason,

    I have some images of scatterplots (matplotlib) and idl file containing the coordinates for the bounding boxes of the tick marks, tick values and points, and I want to do the object detection task now can you please suggest me some pretrained models with which I can do object detection for the tick marks, tick values and points?

    • Avatar
      Jason Brownlee April 13, 2018 at 6:41 am #

      That sounds like a great project and using a pre-trained model is a great way to accelerate progress.

      Perhaps start with one of these:

      • Avatar
        Raj April 15, 2018 at 2:03 am #

        Dear Jason,
        Thank you for your reply . Actually I have one more question, in object detection task if I am giving the images and the (Xmin,Ymin,Xmax,Ymax) i.e. bounding boxes in the training of the model then what will I give in the testing of the model only the images ?

        • Avatar
          Jason Brownlee April 15, 2018 at 6:35 am #

          I’m not sure I follow, what do you mean exactly?

  63. Avatar
    Raj April 15, 2018 at 8:26 am #

    I mean if I have training and testing images and I have the bounding boxes for the training as well as testing images and I am giving the images and the bounding boxes as the labelled data to my model for training then what will I give during the testing of my model only the testing images or images with the bounding boxes (labelled data ) ?

  64. Avatar
    Kartik May 10, 2018 at 4:23 pm #

    Hi, I want to detect an object as small as 30×40 pixels in an image that is 6000×4000 pixels. Would this approach be correct ?

  65. Avatar
    Ekaterina May 17, 2018 at 10:49 am #

    Hi everyone, I have a problem with importing cifar10 dataset:

    (x_train, y_train), (x_test, y_test) = cifar10.load_data()

    AttributeError Traceback (most recent call last)
    in ()
    4 # load the pre-shuffled train and test data
    —-> 5 (x_train, y_train), (x_test, y_test) = cifar10.load_data()

    ~/anaconda3/lib/python3.6/site-packages/keras/datasets/cifar10.py in load_data()
    18 for i in range(1, 6):
    19 fpath = os.path.join(path, ‘data_batch_’ + str(i))
    —> 20 data, labels = load_batch(fpath)
    21 X_train[(i-1)*10000:i*10000, :, :, :] = data
    22 y_train[(i-1)*10000:i*10000] = labels

    ~/anaconda3/lib/python3.6/site-packages/keras/datasets/cifar.py in load_batch(fpath, label_key)
    14 for k, v in d.items():
    15 del(d[k])
    —> 16 d[k.decode(“utf8”)] = v
    17 f.close()
    18 data = d[“data”]

    AttributeError: ‘str’ object has no attribute ‘decode’

    Did anyone have this problem? Any solution?

    • Avatar
      Ekaterina May 17, 2018 at 1:06 pm #

      Solved by reinstalling Keras

    • Avatar
      Jason Brownlee May 17, 2018 at 3:11 pm #

      I’m sorry to hear that. Perhaps try posting your code and error to stackoverflow?

  66. Avatar
    Julia June 7, 2018 at 1:48 am #

    Hi, thanks for this tutorial, it is very helpful! I used the same architecture as your deep CNN but it is taking over 10 minutes to do one epoch, but on yours it looks like it only took about 30 seconds. As far as I can tell, my code is the same as yours and my computer has never been this slow before. Any ideas on what might be causing this?

  67. Avatar
    Anindya July 19, 2018 at 10:36 pm #

    Hi Jason,

    Great tutorial, very helpful for my project. 🙂
    It will be very helpful if you can publish any post on real time object detection and localization, like YOLO or SSD. If you already have any post on that can you please share the link here. Thanks 🙂

  68. Avatar
    Inder August 5, 2018 at 4:53 pm #

    Jason, what if our testing image have other object which is not from this dataset. or simply plain image without class object.
    This will still classify and match something from our class set, how we can handle this problem.
    One solution i found is create a class which have only negative images (not related to existing classes), then in this world there are two many different negative images which is hard to collect. please explain how we can make this more perfect.

  69. Avatar
    Vishwas August 7, 2018 at 9:02 pm #

    Hi Jason,

    How do we decide on the number of Convolutional layers for an image,Is it Trial-error method or is there a rule behind it

  70. Avatar
    Suprasad Kamath August 9, 2018 at 4:26 pm #

    Thanks a lot, Jason, for all your posts. It has given me the confidence to write simple deep learning models using keras. This post really helped me create my first CNN model, something which I have been banging my head around for a few days. I just have few questions.

    1) How did you derive the number of neurons for each hidden layer? I know we can do hyperparameter tuning using GridSearchCV from scikit-learn as detailed out in your other blog post. I am able to tune the neurons for the input layer but I get an error when I try to tune the subsequent layers.

    2) How can I check if my model is overfitting or underfitting? I know you have used Dropout to hep reduce overfitting. But is there a way I can be sure if my model is overfitting or underfitting?

    3) Why have you added two Conv2D layers followed by just one MaxPooling2D layer? I was under the impression that max pooling has to be done after every convolution layer. Can you please put some more light on this?

    4) I am using tensorflow-gpu as my backend. In order for my gpu to take the processing load I have to increase the batch_size. I am just curious to know how batch_size affects accuracy while there is a significant boost to processing speed.

  71. Avatar
    Suprasad Kamath August 11, 2018 at 1:35 pm #

    Thank you, sir. I will go through these links too.

    I do have one small suggestion. The example you have used did not have many steps to prepare the training and test datasets. What if someone wanted to use their own image files to create a CNN model instead of the standard CIFAR-10? This is where I had a lot of problems, something which is not as straightforward as other supervised ML codes. I finally figured it out after a lot of trial and error. It would really help beginners like me who visit your blog posts if an example was provided (if you have not already written a post on this) where own data is used to build a model. This is just my personal opinion as only I know how hard it was for me to write an end to end program. I am assuming there would be many more like me who are new to deep learning.

    Thanks again for all your blog posts.

  72. Avatar
    Deepanshu Agarwal August 13, 2018 at 3:05 pm #

    Hi Jason,

    I am working on a similar project, in which i have to determine whether handwritten signature is present or not in a scanned image.

    Could you please suggest some approach solve this problem, to get yes or no as output for signature presence.


    • Avatar
      Jason Brownlee August 14, 2018 at 6:14 am #

      You might be able to solve it with classical computer vision techniques>, maybe you don’t need a learned model?

      E.g. if white space on document is not empty, then there is a signature.

  73. Avatar
    Dhiraj August 27, 2018 at 7:47 pm #

    Hi Jason,

    Your article helped me a lot to understand the implementation of the convolutional neural net.
    Thank you.

    I want to know can I implement the same concept to differentiate between different types of coin using CNN ?

  74. Avatar
    YASH MEHTA October 19, 2018 at 8:44 pm #

    Hi Jason,

    I am doing detecting the water leakage on raod using CNN with kears library. How to decide the no of convolution layer to be applied and the values that are there in the conv2d function. From where i can get to understang this basics of this things. The dataset of water leakage is not available so i myself downloading the images creating the dataset so how much data i need to get better result. Can i train model using 400 images because i m not able to find more images

    • Avatar
      Jason Brownlee October 20, 2018 at 5:55 am #

      I recommend testing a suite of different model configurations in order to discover what works best for your specific problem.

  75. Avatar
    Arjun November 10, 2018 at 1:54 pm #

    Hi Jason…this is a great post on CNN and thanks for this awsome post.
    I would be glad if u could tell me how can we implement object localisation on this…..Thanks in advance????

  76. Avatar
    prisilla January 8, 2019 at 11:16 pm #

    Hi Jason,

    I trying evaluate the total accuracy

    my piece of code is as follows
    training_set = train_datagen.flow_from_directory(‘dataset/training_set’,
    target_size = (64, 64),
    batch_size = batch_size,
    class_mode = ‘binary’)
    test_set = test_datagen.flow_from_directory(‘dataset/test_set’,
    target_size = (64, 64),
    batch_size = batch_size,
    class_mode = ‘binary’)
    H = classifier.fit_generator(training_set,
    steps_per_epoch = 2000,
    epochs = 2,
    validation_data = None)

    # Making Predictions
    import numpy as np
    from keras.preprocessing import image
    test_image = image.load_img(r’dataset\dog.4028.jpg’, target_size = (64,64))
    test_image = image.img_to_array(test_image)
    test_image = np.expand_dims(test_image, axis = 0)
    acc = classifier.evaluate(training_set,test_image, verbose = 0)
    print(‘Test accuracy:’, acc)

    Two statements:

    scores = model.evaluate(X_test, y_test, verbose=0)
    print(“Accuracy: %.2f%%” % (scores[1]*100))

    are replaced by following two statement

    acc = classifier.evaluate(training_set,test_image, verbose = 0)
    print(‘Test accuracy:’, acc)

    but i am not getting the accuracy it is giving error. what is wrong in the syntax

    • Avatar
      Jason Brownlee January 9, 2019 at 8:46 am #

      Perhaps try posting your code and error to stackoverflow?

  77. Avatar
    ajith February 6, 2019 at 3:11 pm #

    Hy, strated with deeplearing
    Cats and dogs
    But my accuracy is stuck @50%
    I tried droupout, regulations and data ammune

  78. Avatar
    Amit February 8, 2019 at 3:19 am #

    Hi Jason,
    I am new to Machine Learning & wanted to understand the first layer weight dimension calculation for 3 channels. Modified the first layer like
    model.add(Conv2D(8, (5, 5), input_shape=(32, 32, 3), padding=’same’, activation=’relu’, kernel_constraint=maxnorm(3)))
    and displayed below weight matrix dimension.

    The result is
    (5, 5, 3, 8)

    what have learned so far theoritically , 5*5 filter will convolve to 32*32 having stride 1*1 & padding same, it will result into 28*28 for each channel.

    addition of such result matrix(28*28) will happen for all three channels, still the result matrix will be 28*28.

    finally for all kernels (8) total weight dimension will be 8*28*28

    But how it is getting calculated as (5, 5, 3, 8)?


    • Avatar
      Jason Brownlee February 8, 2019 at 7:56 am #

      The filters are the weights, e.g. 8 5×5 filters with 3 channels.

  79. Avatar
    Vatsal March 8, 2019 at 9:39 pm #

    Hi Json,
    can you tell or send me a link on how to save the model and use it to predict some new images,?

  80. Avatar
    Sakthi Dasan Sekar May 14, 2019 at 8:40 pm #

    This is an image classification problem and not object recognition/detection.

    • Avatar
      Jason Brownlee May 15, 2019 at 8:12 am #

      It is the classification of photos of objects.

      Indeed, it is not object recognition.

  81. Avatar
    Mbunga Blaise July 8, 2019 at 7:50 pm #

    Hi Jason, It ‘s a pleasure to read all your tutorials concerning Machine Learning and deep learning. I would like to thank you for all your tutorials and the responses you bring to us. I have created two blocks of CNN using Keras functionnal API to extract best representations of Smiles (drugs sequence) and Proteines sequences, then I concanated the two outputs from the 2 blocs, dropped out and gave it to dense layers. When I printed the summary of the model. it was fine, however when I try to fit the model, the screen displayed the following error message:

    ValueError: Error when checking input: expected input_8 to have shape (103, 64) but got array with shape (31824, 72)
    Below is my short code

    Regarding the error message, could you be so kind to tell me what is wrong

  82. Avatar
    Peter August 30, 2019 at 2:16 am #

    Hello Jason
    I was wondering if loading all the images would at some point make the memory collapse.
    So .. is there any way in Keras for reading the images direcly from a folder not loading all of them at once into memory?

  83. Avatar
    tiennguyen February 23, 2021 at 12:14 pm #

    Thank for your sharing, Jason Brownlee.
    I have a question: Why this tutorial, we do not use Reshape() function for data like example https://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/. More detail:
    # flatten 28*28 images to a 784 vector for each image
    num_pixels = X_train.shape[1] * X_train.shape[2]
    X_train = X_train.reshape((X_train.shape[0], num_pixels)).astype(‘float32’)
    X_test = X_test.reshape((X_test.shape[0], num_pixels)).astype(‘float32’)

    • Avatar
      Jason Brownlee February 23, 2021 at 1:25 pm #

      Because the input data already has color channels and is ready to model directly.

  84. Avatar
    tiennguyen February 23, 2021 at 12:17 pm #

    Sorry, I post wrong details. Correctly detail which I want to show with you:

    # load data
    (X_train, y_train), (X_test, y_test) = mnist.load_data()
    # reshape to be [samples][width][height][channels]
    X_train = X_train.reshape(X_train.shape[0], 28, 28, 1).astype(‘float32’)
    X_test = X_test.reshape(X_test.shape[0], 28, 28, 1).astype(‘float32’)

  85. Avatar
    Arpit Gupta December 5, 2021 at 8:52 pm #

    Hi In this example you mentioned Data augmentation but not used this blog . if we use augmentation in current exmple is it increase the performnce?

    • Adrian Tam
      Adrian Tam December 8, 2021 at 7:36 am #

      Can be, especially if your augmentation can reduce the noise to the network so it learn better.

Leave a Reply