Object Recognition with Convolutional Neural Networks in the Keras Deep Learning Library

Keras is a Python library for deep learning that wraps the powerful numerical libraries Theano and TensorFlow.

A difficult problem where traditional neural networks fall down is called object recognition. It is where a model is able to identify the objects in images.

In this post, you will discover how to develop and evaluate deep learning models for object recognition in Keras. After completing this tutorial you will know:

  • About the CIFAR-10 object recognition dataset and how to load and use it in Keras.
  • How to create a simple Convolutional Neural Network for object recognition.
  • How to lift performance by creating deeper Convolutional Neural Networks.

Let’s get started.

  • Update Oct/2016: Updated examples for Keras 1.1.0 and TensorFlow 0.10.0.
  • Update Mar/2017: Updated example for Keras 2.0.2, TensorFlow 1.0.1 and Theano 0.9.0.

The CIFAR-10 Problem Description

The problem of automatically identifying objects in photographs is difficult because of the near infinite number of permutations of objects, positions, lighting and so on. It’s a really hard problem.

This is a well-studied problem in computer vision and more recently an important demonstration of the capability of deep learning. A standard computer vision and deep learning dataset for this problem was developed by the Canadian Institute for Advanced Research (CIFAR).

The CIFAR-10 dataset consists of 60,000 photos divided into 10 classes (hence the name CIFAR-10). Classes include common objects such as airplanes, automobiles, birds, cats and so on. The dataset is split in a standard way, where 50,000 images are used for training a model and the remaining 10,000 for evaluating its performance.

The photos are in color with red, green and blue components, but are small measuring 32 by 32 pixel squares.

State of the art results are achieved using very large Convolutional Neural networks. You can learn about state of the are results on CIFAR-10 on Rodrigo Benenson’s webpage. Model performance is reported in classification accuracy, with very good performance above 90% with human performance on the problem at 94% and state-of-the-art results at 96% at the time of writing.

There is a Kaggle competition that makes use of the CIFAR-10 dataset. It is a good place to join the discussion of developing new models for the problem and picking up models and scripts as a starting point.

Need help with Deep Learning in Python?

Take my free 2-week email course and discover MLPs, CNNs and LSTMs (with sample code).

Click to sign-up now and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

Loading The CIFAR-10 Dataset in Keras

The CIFAR-10 dataset can easily be loaded in Keras.

Keras has the facility to automatically download standard datasets like CIFAR-10 and store them in the ~/.keras/datasets directory using the cifar10.load_data() function. This dataset is large at 163 megabytes, so it may take a few minutes to download.

Once downloaded, subsequent calls to the function will load the dataset ready for use.

The dataset is stored as pickled training and test sets, ready for use in Keras. Each image is represented as a three dimensional matrix, with dimensions for red, green, blue, width and height. We can plot images directly using matplotlib.

Running the code create a 3×3 plot of photographs. The images have been scaled up from their small 32×32 size, but you can clearly see trucks horses and cars. You can also see some distortion in some images that have been forced to the square aspect ratio.

Small Sample of CIFAR-10 Images

Small Sample of CIFAR-10 Images

Simple Convolutional Neural Network for CIFAR-10

The CIFAR-10 problem is best solved using a Convolutional Neural Network (CNN).

We can quickly start off by defining all of the classes and functions we will need in this example.

As is good practice, we next initialize the random number seed with a constant to ensure the results are reproducible.

Next we can load the CIFAR-10 dataset.

The pixel values are in the range of 0 to 255 for each of the red, green and blue channels.

It is good practice to work with normalized data. Because the input values are well understood, we can easily normalize to the range 0 to 1 by dividing each value by the maximum observation which is 255.

Note, the data is loaded as integers, so we must cast it to floating point values in order to perform the division.

The output variables are defined as a vector of integers from 0 to 1 for each class.

We can use a one hot encoding to transform them into a binary matrix in order to best model the classification problem. We know there are 10 classes for this problem, so we can expect the binary matrix to have a width of 10.

Let’s start off by defining a simple CNN structure as a baseline and evaluate how well it performs on the problem.

We will use a structure with two convolutional layers followed by max pooling and a flattening out of the network to fully connected layers to make predictions.

Our baseline network structure can be summarized as follows:

  1. Convolutional input layer, 32 feature maps with a size of 3×3, a rectifier activation function and a weight constraint of max norm set to 3.
  2. Dropout set to 20%.
  3. Convolutional layer, 32 feature maps with a size of 3×3, a rectifier activation function and a weight constraint of max norm set to 3.
  4. Max Pool layer with size 2×2.
  5. Flatten layer.
  6. Fully connected layer with 512 units and a rectifier activation function.
  7. Dropout set to 50%.
  8. Fully connected output layer with 10 units and a softmax activation function.

A logarithmic loss function is used with the stochastic gradient descent optimization algorithm configured with a large momentum and weight decay start with a learning rate of 0.01.

We can fit this model with 25 epochs and a batch size of 32.

A small number of epochs was chosen to help keep this tutorial moving. Normally the number of epochs would be one or two orders of magnitude larger for this problem.

Once the model is fit, we evaluate it on the test dataset and print out the classification accuracy.

Running this example provides the results below. First the network structure is summarized which confirms our design was implemented correctly.

The classification accuracy and loss is printed each epoch on both the training and test datasets. The model is evaluated on the test set and achieves an accuracy of 70.85%, which is not excellent.

We can improve the accuracy significantly by creating a much deeper network. This is what we will look at in the next section.

Larger Convolutional Neural Network for CIFAR-10

We have seen that a simple CNN performs poorly on this complex problem. In this section we look at scaling up the size and complexity of our model.

Let’s design a deep version of the simple CNN above. We can introduce an additional round of convolutions with many more feature maps. We will use the same pattern of Convolutional, Dropout, Convolutional and Max Pooling layers.

This pattern will be repeated 3 times with 32, 64, and 128 feature maps. The effect be an increasing number of feature maps with a smaller and smaller size given the max pooling layers. Finally an additional and larger Dense layer will be used at the output end of the network in an attempt to better translate the large number feature maps to class values.

We can summarize a new network architecture as follows:

  • Convolutional input layer, 32 feature maps with a size of 3×3 and a rectifier activation function.
  • Dropout layer at 20%.
  • Convolutional layer, 32 feature maps with a size of 3×3 and a rectifier activation function.
  • Max Pool layer with size 2×2.
  • Convolutional layer, 64 feature maps with a size of 3×3 and a rectifier activation function.
  • Dropout layer at 20%.
  • Convolutional layer, 64 feature maps with a size of 3×3 and a rectifier activation function.
  • Max Pool layer with size 2×2.
  • Convolutional layer, 128 feature maps with a size of 3×3 and a rectifier activation function.
  • Dropout layer at 20%.
  • Convolutional layer,128 feature maps with a size of 3×3 and a rectifier activation function.
  • Max Pool layer with size 2×2.
  • Flatten layer.
  • Dropout layer at 20%.
  • Fully connected layer with 1024 units and a rectifier activation function.
  • Dropout layer at 20%.
  • Fully connected layer with 512 units and a rectifier activation function.
  • Dropout layer at 20%.
  • Fully connected output layer with 10 units and a softmax activation function.

We can very easily define this network topology in Keras, as follows:

We can fit and evaluate this model using the same a procedure above and the same number of epochs but a larger batch size of 64, found through some minor experimentation.

Running this example prints the classification accuracy and loss on the training and test datasets each epoch. The estimate of classification accuracy for the final model is 80.18% which is nearly 10 points better than our simpler model.

Extensions To Improve Model Performance

We have achieved good results on this very difficult problem, but we are still a good way from achieving world class results.

Below are some ideas that you can try to extend upon the models and improve model performance.

  • Train for More Epochs. Each model was trained for a very small number of epochs, 25. It is common to train large convolutional neural networks for hundreds or thousands of epochs. I would expect that performance gains can be achieved by significantly raising the number of training epochs.
  • Image Data Augmentation. The objects in the image vary in their position. Another boost in model performance can likely be achieved by using some data augmentation. Methods such as standardization and random shifts and horizontal image flips may be beneficial.
  • Deeper Network Topology. The larger network presented is deep, but larger networks could be designed for the problem. This may involve more feature maps closer to the input and perhaps less aggressive pooling. Additionally, standard convolutional network topologies that have been shown useful may be adopted and evaluated on the problem.


In this post you discovered how to create deep learning models in Keras for object recognition in photographs.

After working through this tutorial you learned:

  • About the CIFAR-10 dataset and how to load it in Keras and plot ad hoc examples from the dataset.
  • How to train and evaluate a simple Convolutional Neural Network on the problem.
  • How to expand a simple Convolutional Neural Network into a deep Convolutional Neural Network in order to boost performance on the difficult problem.
  • How to use data augmentation to get a further boost on the difficult object recognition problem.

Do you have any questions about object recognition or about this post? Ask your question in the comments and I will do my best to answer.

Frustrated With Your Progress In Deep Learning?

Deep Learning with Python

 What If You Could Develop A Network in Minutes

…with just a few lines of Python

Discover how in my new Ebook: Deep Learning With Python

It covers self-study tutorials and end-to-end projects on topics like:
Multilayer PerceptronsConvolutional Nets and Recurrent Neural Nets, and more…

Finally Bring Deep Learning To
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.

97 Responses to Object Recognition with Convolutional Neural Networks in the Keras Deep Learning Library

  1. Aakash Nain July 24, 2016 at 9:53 pm #

    Hello Jason,
    What is the use of maxnorm in context of deep learning ?

  2. Aqsa July 31, 2016 at 4:46 pm #

    Hi Jason
    I am doing detection of road signs in real time . The size of my images is 800*1360. The size of the road sign varies from 16*16 to 256*256. How can I use convolutional neural network for this puppose to get good detection accuracy in real time

    • Jason Brownlee August 1, 2016 at 6:28 am #

      Consider how you frame the problem Aqsa. Two options are:

      1) You could rescale all images to the same size.
      2) You could zero-pad all images.

      As for the specifics of the network for this problem, you will have to design and test different structures. Perhaps you can leverage an existing well performing structure like VGG or inception.

  3. Jack September 1, 2016 at 12:54 pm #

    What should be the input dimensions for 3D dataset of pcd format or off format.?
    BdwI I find your tutorials very helpful.:-)

    • Jason Brownlee September 2, 2016 at 8:04 am #

      Sorry, I don’t know what those formats are Jack.

      • Jack September 2, 2016 at 2:48 pm #

        Point cloud data(PCD) contains the x,y and z coordinates of the object….. I want to build a neural network for 3D object classification… The problem I am facing is I don’t know what shd be the input to my network… For a neural network that classifies images you pass the pixel values (0-255), but a pcd file just has the coordinates…Is it wise to pass the coordinates as the inputs?..
        I can extract some features of the object ( from pcd file)… Can I pass those features as input ??
        I am new to this field, so I having difficulty understanding things…

        • Jason Brownlee September 3, 2016 at 6:55 am #

          I wonder if you can rescale the coordinates to all have the range 0-to-1. Then provide them directly.

          From there, you will have a baseline and can start to explore other transforms of your coords, such as perhaps projections into 2D.

  4. shudhan September 2, 2016 at 4:49 pm #

    May i know how to extract features from the images?

    • Jason Brownlee September 3, 2016 at 6:58 am #

      We no longer need to extract features when using deep learning methods as we are performing automatic feature learning. A great benefit of the approach.

  5. Walid Ahmed September 10, 2016 at 4:52 am #

    Thanks a lot.

    I have one question

    In Keras, How can I extract the exact location of the detected object (or objects) within image that includes a background?
    I assume it uses sliding window for object detection

    • Jason Brownlee September 10, 2016 at 7:12 am #

      Great question Walid.

      This is called object identification in an image. I do not have an example at the moment, but I will prepare one in the future.

  6. Vinay September 12, 2016 at 5:04 am #

    Hi…could you please give same example for prima diabetes or airline passenger data set. My question is to apply CNN for direct numeric features. You could give any simple example

  7. Walid Ahmed September 14, 2016 at 3:22 am #

    Thanks Jason, I will look forward to your example.

  8. NotMikeJones September 29, 2016 at 11:31 am #

    Where in Keras are you specifying the input dimension for your first convolution layer? I’d like to try a convolution NN with time series for event detection, and am having issues with keras 1d convolution working. Let’s say each of my samples are a timeseries represented by a 1 x 100 vector, and within the vector, I expect three types of events to occur somewhere in those time frames (unclear what the length of the event would be, but lets say roughly 10 time points across). Would I use three feature maps, and then use a ‘window’ of, say 10 time points, that map onto a single neuron in the convolution later? So I would have a convolution layer of 3 x 10?


    • Jason Brownlee September 30, 2016 at 7:47 am #

      LSTM is the network for dealing with sequences rather than CNN.

      CNN is good at spatial structure such as in images or text.

      This tutorial on time series with LSTMs might be what you’re looking for:

      • NotMikeJones October 1, 2016 at 12:12 am #

        Thanks for that! I actually think I misspoke – I don’t want to forecast values at future time points, but instead identify if a current set of timepoints fit a pattern that indicate a certain event.

        For example, let’s say we have fitness tracker data with various types of sensors (heart rate, pedometer, accelerometer), and we know a person does three types of activity: yoga, running, cooking. I want to train a model to identify these activities based on sensor data, and then be able to pull real-time data and classify what they are currently doing.

        I was thinking a CNN with a windowed-approach would be the best bet, but I might be completely off-base.

        • Jason Brownlee October 1, 2016 at 8:02 am #

          It does sound like an anomaly detection or change detection problem, you may have benefit in framing the problem this way.

  9. Rafi October 18, 2016 at 6:34 pm #

    Hi Jason,

    I’m running the exact same code in this page which produced a 71.82% accuracy on test data.

    only difference is I’m using a validation data set split by 70-30%. I’m getting only less than 11% validation and test accuracy. What could be my mistake? Have you faced such
    results? please help.


    35000/35000 [==============================] – 94s – loss: 2.2974 – acc: 0.1158 – val_loss: 2.3033 – val_acc: 0.0991
    Epoch 24/25
    35000/35000 [==============================] – 94s – loss: 2.2961 – acc: 0.1170 – val_loss: 2.3035 – val_acc: 0.1022
    Epoch 25/25
    35000/35000 [==============================] – 93s – loss: 2.2954 – acc: 0.1213 – val_loss: 2.3036 – val_acc: 0.0987

  10. Walid Ahmed November 2, 2016 at 2:30 am #

    Dear Jason

    I appreciate if you can illustrate In Keras : How can I extract the exact location of the detected object (or objects) within image that includes a background?

    • Jason Brownlee November 2, 2016 at 9:08 am #

      Great question Walid,

      Sorry, I don’t have an example of object localization with Keras yet. It is on the TODO list though.

  11. Augusto Aguirre November 2, 2016 at 2:28 pm #

    Hello! Very good explanation!
    I am using your model to classify images containing either 0 or 1.
    To resolve this, I am able to pleanteas using initially with a variant in the last hidden layer:

    model.add (Dense (1, activation = ‘sigmoid’))

    For me return a result between 0 or 1.

    When I want to adjust the model gives me the following error:
    ‘Error when checking input model: convolution2d_input_20 expected to have 4 dimensions, but got array With shape (8000, 3072)’
    My dataset are 32×32 RGB images. Therefore, contains 3072 columns, but one with a 0 or 1.

    • Jason Brownlee November 3, 2016 at 7:50 am #

      Hi Augusto, sorry to hear about the error on your own data.

      It is not clear what the cause could be, sorry. Perhaps you are able to experiment and discover the root cause. Try simplifying your example to the minimum required and see if that helps to flush it out.

  12. Walid Ahmed November 11, 2016 at 12:49 am #

    Why would I need to apply a dropout layer before a convolutional layer?

    It just make sense for me when applied to input layer or any other layers in the fully connected layers.

    • Jason Brownlee November 11, 2016 at 10:03 am #

      Hi Walid,

      It’s all about putting pressure on the network to force it to generalize. It may or may not be a good pressure point to force this type of learning. Try and see on your problem.

  13. William Amador November 11, 2016 at 3:51 am #

    Hi,Jason I have a question, how is the procedure for training if in my initial layer I do not handle a single image but a sequence of 30 images. For a specific case a sequence of movements of a person; Do you have any examples of training for a CNN for this case

    Thank you

    • Jason Brownlee November 11, 2016 at 10:05 am #

      Hi William, great question. Sorry, I don’t have any worked examples of working with sequences of images at the moment.

  14. Walid Ahmed November 15, 2016 at 2:47 am #

    Hi Jason

    When I removed the dropout layers before any convolutional layer, my results improved.
    especially when the size of dataset is large.

  15. Walid Ahmed November 16, 2016 at 6:06 am #

    Hi Jason.
    Another question , I know that keras comes with different optimizers, in your code you used sgd,others may use another optimizer like adam.
    any advice or recommendation about yupt of optimizer?

    • Jason Brownlee November 16, 2016 at 9:34 am #

      Hi Walid,

      I find optimizers generally make minor differences to the results – move the needle less than the network topology.

      SGD is well understood and a great place to start. ADAM is fast and gives good results and I often use it in practice. I don’t really have much opinions beyond that.

  16. Bharath Paturi November 16, 2016 at 5:54 pm #

    Hi Jason,

    I have very large images to analyze. Probably each image size will be around 500 MB to 1 GB.
    I wanted to apply segmentation to the images. Can we use convolution NN to do unsupervised learning.

    • Jason Brownlee November 17, 2016 at 9:52 am #

      Ouch Bharath, they are massive images.

      It is possible, but you’re going to run out of memory really fast!

      I don’t have good advice, sorry. I have not researched this specific problem.

  17. Michael November 17, 2016 at 8:40 am #


    I saw the line where you add a layer has a typo. It should read

    model.add(Convolution2D(32, 3, 3, input_shape=(32, 32, 3), border_mode=’same’, activation=’relu’,

    • Jason Brownlee November 17, 2016 at 9:58 am #

      Are you sure Michael? It all looks good to me and the example runs with Theano and TensorFlow backends.

      Maybe I’m missing something?

      • Sean July 4, 2017 at 8:35 am #

        I think Michael is right. It throws me an error with input_shape=(3,32,32). input_shape=(32,32,3) should be the correct one

    • Maxim December 19, 2016 at 1:44 am #

      Michael, for solving your problem put the line: K.set_image_dim_ordering(‘th’)

      ABOVE the

      (X_train, y_train), (X_test, y_test)= cifar10.load_data()

  18. Shristi Baral November 21, 2016 at 10:15 pm #

    How do I deal with the error? Got this while running script under “Loading The CIFAR-10 Dataset in Keras”. I tried altering the script of cifar10.py to figure out what the error is. But I couldnot.
    UnicodeDecodeError Traceback (most recent call last)
    in ()
    4 from scipy.misc import toimage
    5 # load data
    —-> 6 (X_train, y_train), (X_test, y_test) = cifar10.load_data()
    7 # create a grid of 3×3 images
    8 for i in range(0, 9):

    /home/kdc/anaconda3/lib/python3.5/site-packages/keras/datasets/cifar10.py in load_data()
    18 for i in range(1, 6):
    19 fpath = os.path.join(path, ‘data_batch_’ + str(i))
    —> 20 data, labels = load_batch(fpath)
    21 X_train[(i-1)*10000:i*10000, :, :, :] = data
    22 y_train[(i-1)*10000:i*10000] = labels

    /home/kdc/anaconda3/lib/python3.5/site-packages/keras/datasets/cifar.py in load_batch(fpath, label_key)
    10 d = cPickle.load(f)
    11 else:
    —> 12 d = cPickle.load(f, encoding=”bytes”)
    13 # decode utf8
    14 for k, v in d.items():

    UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0x80 in position 3031: ordinal not in range(128)

    • Jason Brownlee November 22, 2016 at 7:05 am #

      I have not seen this error before, perhaps post to stack overflow or the Keras list?

  19. Aquib Javed Khan November 22, 2016 at 8:35 pm #

    Hi, Thanks for the awesome tutorial, you’r codes and explanations helps me lot in understanding the classification task.

    I actually want to feed new images want to get the return the label matches to it, How can I do it like I am doing like this:

    import keras
    from keras.models import load_model
    from keras.models import Sequential
    import cv2
    import numpy as np
    from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
    model = Sequential()

    model =load_model(‘firstmodel.h5′)

    img = cv2.imread(‘cat.jpg’,0)
    img = cv2.resize(img,(150,150))

    classes = model.predict_classes(img, batch_size=32)
    print classes

    I’m getting error:
    Exception: Error when checking : expected convolution2d_input_1 to have 4 dimensions, but got array with shape (150, 150)

    How to fix it?

    • Jason Brownlee November 23, 2016 at 8:58 am #

      My suggestion would be to ensure that your loaded data matches the expected input dimensions exactly. You may have to resize or pad new data to make it match.

  20. s December 1, 2016 at 1:23 pm #

    This is NOT objection detection, it’s classification

  21. Walid December 2, 2016 at 4:40 am #

    Dear Jason

    I am so eager for an example of object localization with Keras yet, I hope you can come with one soon

    • Jason Brownlee December 2, 2016 at 8:18 am #

      I would like to prepare one soon Walid, hopefully in the new year.

  22. Ahmed Desoky December 7, 2016 at 4:55 am #

    Hello Jason,

    Your tutorials are so helpful and awesome.

    I am working on a classification problem using Keras on kitti dataset. I found that kitti is not supported yet in https://keras.io/datasets/ .

    Is there any advice or an introductory point to work out with my problem ?

    Thanks for help

  23. Thanawin December 7, 2016 at 1:31 pm #

    HI Jason

    I have seen many tutorials, but this is the best.
    I am wondering if “scores = model.evaluate(X_test, y_test, verbose=0)” will take the last model from “model.fit()”. I would really appreciate if you can suggest me how can I evaluate my test data using the best model from “model.fit()”

  24. TSchecker December 14, 2016 at 7:35 pm #

    Hi Jason,

    just in case facing the error

    “AttributeError: ‘module’ object has no attribute ‘control_flow_ops'”

    found here:

    import tensorflow as tf
    tf.python.control_flow_ops = tf

  25. yask December 15, 2016 at 3:04 am #

    Hi Jason
    I really like your article
    I would like to extract objects such as ice, water etc. from photograph.
    is this library useful for that? How can I define training areas? Do I need to provide sample image of ice, water as a training area to the classifier? Which classifier servers best here? how about Artificial Neural Network ?

  26. pranoy December 19, 2016 at 7:58 pm #

    which function is used for predicting..

    if i give an image of a horse and i want to predict the output. which function i have to use

    • Jason Brownlee December 20, 2016 at 7:30 am #

      You can use model.predict() to make a prediction on new data.

      The new data must have the same shape as the data used to train the network.

  27. Milos December 29, 2016 at 10:34 pm #

    Hi Jason,

    great article. I have a question. In Dense part you have specified 512 neurons. Can you tell me how did you determine number of neurons?


    • Jason Brownlee December 30, 2016 at 5:50 am #

      Hi Milos, I used trial and error. Selecting the number and size of layer is an art – test a lot of configs.

      • Milos December 30, 2016 at 7:07 pm #

        Thanks for answer. I have assumed so , but I had to ask :).
        Great and really useful articles I have found on your site :).

  28. Deepak January 30, 2017 at 3:56 pm #

    When I use the model.predict, the following error is seen. Please help me

    TypeError Traceback (most recent call last)
    in ()
    —-> 1 model.predict(tmp)

    /usr/local/lib/python2.7/dist-packages/keras/models.pyc in predict(self, x, batch_size, verbose)
    722 if self.model is None:
    723 self.build()
    –> 724 return self.model.predict(x, batch_size=batch_size, verbose=verbose)
    726 def predict_on_batch(self, x):

    /usr/local/lib/python2.7/dist-packages/keras/engine/training.pyc in predict(self, x, batch_size, verbose)
    1266 f = self.predict_function
    1267 return self._predict_loop(f, ins,
    -> 1268 batch_size=batch_size, verbose=verbose)
    1270 def train_on_batch(self, x, y,

    /usr/local/lib/python2.7/dist-packages/keras/engine/training.pyc in _predict_loop(self, f, ins, batch_size, verbose)
    944 ins_batch = slice_X(ins, batch_ids)
    –> 946 batch_outs = f(ins_batch)
    947 if not isinstance(batch_outs, list):
    948 batch_outs = [batch_outs]

    /usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.pyc in __call__(self, inputs)
    957 def __call__(self, inputs):
    958 assert isinstance(inputs, (list, tuple))
    –> 959 return self.function(*inputs)

    /usr/local/lib/python2.7/dist-packages/Theano-0.9.0.dev5-py2.7.egg/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
    786 s.storage[0] = s.type.filter(
    787 arg, strict=s.strict,
    –> 788 allow_downcast=s.allow_downcast)
    790 except Exception as e:

    /usr/local/lib/python2.7/dist-packages/Theano-0.9.0.dev5-py2.7.egg/theano/tensor/type.pyc in filter(self, data, strict, allow_downcast)
    115 if allow_downcast:
    116 # Convert to self.dtype, regardless of the type of data
    –> 117 data = theano._asarray(data, dtype=self.dtype)
    118 # TODO: consider to pad shape with ones to make it consistent
    119 # with self.broadcastable… like vector->row type thing

    /usr/local/lib/python2.7/dist-packages/Theano-0.9.0.dev5-py2.7.egg/theano/misc/safe_asarray.pyc in _asarray(a, dtype, order)
    32 dtype = theano.config.floatX
    33 dtype = numpy.dtype(dtype) # Convert into dtype object.
    —> 34 rval = numpy.asarray(a, dtype=dtype, order=order)
    35 # Note that dtype comparison must be done by comparing their num
    36 # attribute. One cannot assume that two identical data types are pointers

    /home/yashwanth/.local/lib/python2.7/site-packages/numpy/core/numeric.pyc in asarray(a, dtype, order)
    530 “””
    –> 531 return array(a, dtype, copy=False, order=order)

    TypeError: Bad input argument to theano function with name “/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py:955” at index 0 (0-based).
    Backtrace when that variable is created:

    File “/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py”, line 2821, in run_ast_nodes
    if self.run_code(code, result):
    File “/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py”, line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
    File “”, line 2, in
    model.add(Convolution1D(64, 2, input_shape=[1,4], border_mode=’same’, activation=’relu’, W_constraint=maxnorm(3)))
    File “/usr/local/lib/python2.7/dist-packages/keras/models.py”, line 299, in add
    layer.create_input_layer(batch_input_shape, input_dtype)
    File “/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py”, line 397, in create_input_layer
    dtype=input_dtype, name=name)
    File “/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py”, line 1198, in Input
    File “/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py”, line 1116, in __init__
    File “/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py”, line 110, in placeholder
    x = T.TensorType(dtype, broadcast)(name)
    float() argument must be a string or a number

    • Jason Brownlee February 1, 2017 at 10:21 am #

      I’m sorry to hear that.

      The cause is not obvious to me, the stack trace is hard to read.

      Perhaps you could try posting to stackoverflow or the Keras google group?

  29. Rajesh February 6, 2017 at 4:08 am #

    Hi Jason,

    Any idea about the following error? Everything looked good till the model.summary point. But when I tried to fit the model, I am seeing the following error.

    ValueError Traceback (most recent call last)
    in ()
    1 # Fit the model
    —-> 2 model.fit(X_train, y_train, validation_data=(X_test, y_test), nb_epoch=epochs, batch_size=32)
    3 # Final evaluation of the model
    4 scores = model.evaluate(X_test, y_test, verbose=0)
    5 print(“Accuracy: %.2f%%” % (scores[1]*100))

    /home/rajesh/anaconda2/lib/python2.7/site-packages/keras/models.pyc in fit(self, x, y, batch_size, nb_epoch, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, **kwargs)
    670 class_weight=class_weight,
    671 sample_weight=sample_weight,
    –> 672 initial_epoch=initial_epoch)
    674 def evaluate(self, x, y, batch_size=32, verbose=1,

    /home/rajesh/anaconda2/lib/python2.7/site-packages/keras/engine/training.pyc in fit(self, x, y, batch_size, nb_epoch, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch)
    1115 class_weight=class_weight,
    1116 check_batch_axis=False,
    -> 1117 batch_size=batch_size)
    1118 # prepare validation data
    1119 if validation_data:

    /home/rajesh/anaconda2/lib/python2.7/site-packages/keras/engine/training.pyc in _standardize_user_data(self, x, y, sample_weight, class_weight, check_batch_axis, batch_size)
    1028 self.internal_input_shapes,
    1029 check_batch_axis=False,
    -> 1030 exception_prefix=’model input’)
    1031 y = standardize_input_data(y, self.output_names,
    1032 output_shapes,

    /home/rajesh/anaconda2/lib/python2.7/site-packages/keras/engine/training.pyc in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
    122 ‘ to have shape ‘ + str(shapes[i]) +
    123 ‘ but got array with shape ‘ +
    –> 124 str(array.shape))
    125 return arrays

    ValueError: Error when checking model input: expected convolution2d_input_6 to have shape (None, 3, 32, 32) but got array with shape (50000, 32, 32, 3)

    • Rajesh February 6, 2017 at 4:22 am #

      I understood the issue. I was mistaking at the input_shape step.

      It is a very nice tutorial. very well written. Thanks

      • Rajesh February 6, 2017 at 8:59 am #

        Sorry to spam..but I am still seeing the error 🙁

        • Jason Brownlee February 6, 2017 at 9:45 am #

          Hi Rajesh, you may want to confirm that you are not missing any lines of code.

          Also, confirm your version of Keras, TensorFlow/Theano and Python.

  30. Sam February 6, 2017 at 9:11 pm #

    Thank you for your example. I tried to run your code. But our network server is blocked and the code could not get data from the below url.


    Can you let me know if there is another way to make training data set without using download wrapper method ?

    • Sam February 6, 2017 at 9:37 pm #

      Oh, I found a solution through googling. Thanks anyway.

  31. John February 7, 2017 at 2:10 pm #


    How I can recognize bike in video. It would be great if you can give example.

    • Jason Brownlee February 8, 2017 at 9:32 am #

      Great question John, it is an area I’d like to cover in the future.

  32. Ikhsan March 17, 2017 at 2:22 pm #

    Hi Jason,

    Thanks for the tutorial. I have a question about the random.seed(seed). Why do we need to seed it first and where is the random number generator used in the rest of the code?


  33. Ali April 21, 2017 at 5:43 pm #

    Hi jason

    Thanks for this great tutorial, I am trying to run the code but I got the following error any suggestion to fix

    I am using keras with theona 2.7 back end

    model = Sequential()
    model.add(Conv2D(32, (3, 3), input_shape=(3, 32, 32), padding=’same’, activation=’relu’, kernel_constraint=maxnorm(3)))
    model.add(Conv2D(32, (3, 3), activation=’relu’, padding=’same’, kernel_constraint=maxnorm(3)))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dense(512, activation=’relu’, kernel_constraint=maxnorm(3)))
    model.add(Dense(num_classes, activation=’softmax’))
    # Compile model
    epochs = 25
    lrate = 0.01
    decay = lrate/epochs
    sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
    model.compile(loss=’categorical_crossentropy’, optimizer=sgd, metrics=[‘accuracy’])

    Traceback (most recent call last):

    File “”, line 2, in
    model.add(Conv2D(32, (3, 3), input_shape=(3, 32, 32), padding=’same’, activation=’relu’, kernel_constraint=maxnorm(3)))

    TypeError: __init__() takes at least 4 arguments (4 given)

    • Jason Brownlee April 22, 2017 at 9:24 am #

      I’m not sure Ali, I have not seen this error before.

      Perhaps confirm that you have copied the code exactly?

      Consider removing arguments to help zoom in on the cause of the fault.

  34. AIZEN May 1, 2017 at 12:12 am #

    Hi, i already manage to detect object in the still images. What am i supposed to do to detect objects in a video input? i intend to use the same Keras CNN.

    • Jason Brownlee May 1, 2017 at 5:57 am #

      You could process each frame of the video as an image with a CNN and use an LSTM to handle sequences of data from the CNN.

  35. Supriya May 2, 2017 at 2:24 am #

    Hi, in this example we have created CNN model , but how to test it..

    • Jason Brownlee May 2, 2017 at 6:02 am #

      You can use a train/test split or k-fold cross validation.

  36. Chao May 18, 2017 at 3:55 pm #

    What does “kernel_constraint=maxnorm(3)” mean?

    Thanks a lot!

  37. Mohamed Mnete June 1, 2017 at 5:22 pm #

    Hi, say I had an image I would like the model you just made to predict what it has on it. I save it in the same directory as the python file. How can I load this to put into he prediction function and how can I write the prediction function. I would also like to try and use my own images on your model. I was also stuck on the technique of changing the images to numpy.arrays. I would also like the output to be either -1 or 1 . How can I code this up. Please help….

  38. Natthaphon June 8, 2017 at 1:25 pm #

    So can I insert annotation in each images

  39. Daniel June 18, 2017 at 5:33 am #


    I’m trying to do an image classifier that determines if something should be given a specific hashtag or not. My problem is that the accuracy of the classifier after each epoch remains constant, and is essentially assigning the same class to all images. This makes no sense as the image classes are fairly distinct (#gym and #foraging). I basically copied the smaller CNN you used:

    model = Sequential()
    model.add(Convolution2D(32, 3, 3, input_shape=(3, 100, 100), activation=’relu’))
    model.add(Convolution2D(32, 3, 3, activation=’relu’))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dense(512, activation=’relu’))
    model.add(Dense(1, activation=’softmax’))

    epochs = 10
    lrate = 0.001
    decay = lrate/epochs
    sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
    print(‘compiling model’)
    model.compile(loss=’binary_crossentropy’, optimizer=sgd, metrics=[‘accuracy’])

    print(“fitting model”)
    model.fit(X_training, Y_training, nb_epoch=epochs, batch_size=100)

    But every time, I get these results:

    fitting model
    Train on 908 samples, validate on 908 samples
    Epoch 1/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000
    Epoch 2/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000
    Epoch 3/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000
    Epoch 4/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000
    Epoch 5/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000
    Epoch 6/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000
    Epoch 7/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000
    Epoch 8/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000
    Epoch 9/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000
    Epoch 10/10
    908/908 [==============================] – 104s – loss: 7.9712 – acc: 0.5000

    What am I doing wrong?

  40. Darlington Akogo June 28, 2017 at 10:42 am #

    Hello Jason, thanks for your great tutorials! I need some help please, I’m working on an a medical image recognition(diagnostics) project based on your tutorials. Over here, you import the Cifar10 dataset provided via Keras with the load_data() function, but since am downloading the dataset from a diff source, that wouldn’t be possible. So from some research, I came across Keras’ flow_from_directory() function for Image data processing, which is amazing, I could just separate the images into folders and it’d consider them as classes. However, medical images are in the “DICOM” image format, and Keras image functions don’t seem to support it, so with further research, I came across pydicom module, for processing DICOM images in python, however, now, I can’t use flow_from_directory(), can you PLEASE offer some help as to how I can train my ConvNet model with DICOM images and be able to use it to classify(with predict() function) any new DICOM image?

    Thanks in advanced.

    • Jason Brownlee June 29, 2017 at 6:28 am #

      I don’t know about that format.

      Perhaps you can covert the images?
      Perhaps you can put together your own DICOM-comapitable flow from dir function?

  41. Bruce Wind June 29, 2017 at 11:30 am #

    Hi, Jason, thanks for sharing. I test the code you provided, but my machine does not support CUDA, so it runs very slowly( half an hour per epoch). Since you have such a powerful computer, could you please show the results after hundreds or thousands epoches later? Thanks.

  42. Nunu July 23, 2017 at 10:00 pm #

    Dear Jason,
    Really it is a very nice tutorial :). If you will plot acc vs acc_val there is a gap between the graph of acc and the graph of acc_val does this mean an overfitting ?! and also what I ( correct me if iam wrong) the accuracy graph should become after a certain number of epochs asymptotic( that is the acuracy will not increase anymore) !!
    Thanks in advance

    • Jason Brownlee July 24, 2017 at 6:54 am #

      If acc is less than val_acc than it may mean that the model is underfitting and that perhaps a larger model or a model fit for longer would do better on the validation set.

      • Nunu July 24, 2017 at 6:55 pm #

        yes it is true but also if the acc_val is more than the acc then there is overfitting and I noticed this in your both results above ! what could be the reason of the overfitting ? and what we can do to get rid form it.


        • Jason Brownlee July 25, 2017 at 9:40 am #

          Perhaps train less, perhaps train a smaller model, perhaps add some regularization like dropout.

          I hope that helps as a start.

  43. Nunu July 25, 2017 at 7:40 pm #

    Yes I added dropouts and I added one more fully connected layer and i guess it worked.

    Thanks a lot Jason 🙂
    Best regards,

Leave a Reply