A popular demonstration of the capability of deep learning techniques is object recognition in image data.

The “hello world” of object recognition for machine learning and deep learning is the MNIST dataset for handwritten digit recognition.

In this post you will discover how to develop a deep learning model to achieve near state of the art performance on the MNIST handwritten digit recognition task in Python using the Keras deep learning library.

After completing this tutorial, you will know:

- How to load the MNIST dataset in Keras.
- How to develop and evaluate a baseline neural network model for the MNIST problem.
- How to implement and evaluate a simple Convolutional Neural Network for MNIST.
- How to implement a close to state-of-the-art deep learning model for MNIST.

Let’s get started.

**Update Oct/2016**: Updated examples for Keras 1.1.0, TensorFlow 0.10.0 and scikit-learn v0.18.

## Description of the MNIST Handwritten Digit Recognition Problem

The MNIST problem is a dataset developed by Yann LeCun, Corinna Cortes and Christopher Burges for evaluating machine learning models on the handwritten digit classification problem.

The dataset was constructed from a number of scanned document dataset available from the National Institute of Standards and Technology (NIST). This is where the name for the dataset comes from, as the Modified NIST or MNIST dataset.

Images of digits were taken from a variety of scanned documents, normalized in size and centered. This makes it an excellent dataset for evaluating models, allowing the developer to focus on the machine learning with very little data cleaning or preparation required.

Each image is a 28 by 28 pixel square (784 pixels total). A standard spit of the dataset is used to evaluate and compare models, where 60,000 images are used to train a model and a separate set of 10,000 images are used to test it.

It is a digit recognition task. As such there are 10 digits (0 to 9) or 10 classes to predict. Results are reported using prediction error, which is nothing more than the inverted classification accuracy.

Excellent results achieve a prediction error of less than 1%. State-of-the-art prediction error of approximately 0.2% can be achieved with large Convolutional Neural Networks. There is a listing of the state-of-the-art results and links to the relevant papers on the MNIST and other datasets on Rodrigo Benenson’s webpage.

## Beat the Math/Theory Doldrums and Start using Deep Learning in your own projects Today, without getting lost in “documentation hell”

Get my free Deep Learning With Python mini course and develop your own deep nets by the time you’ve finished the first PDF with just a few lines of Python.

#### Daily lessons in your inbox for 14 days, and a DL-With-Python “Cheat Sheet” you can download right now.

## Loading the MNIST dataset in Keras

The Keras deep learning library provides a convenience method for loading the MNIST dataset.

The dataset is downloaded automatically the first time this function is called and is stored in your home directory in ~/.keras/datasets/mnist.pkl.gz as a 15MB file.

This is very handy for developing and testing deep learning models.

To demonstrate how easy it is to load the MNIST dataset, we will first write a little script to download and visualize the first 4 images in the training dataset.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# Plot ad hoc mnist instances from keras.datasets import mnist import matplotlib.pyplot as plt # load (downloaded if needed) the MNIST dataset (X_train, y_train), (X_test, y_test) = mnist.load_data() # plot 4 images as gray scale plt.subplot(221) plt.imshow(X_train[0], cmap=plt.get_cmap('gray')) plt.subplot(222) plt.imshow(X_train[1], cmap=plt.get_cmap('gray')) plt.subplot(223) plt.imshow(X_train[2], cmap=plt.get_cmap('gray')) plt.subplot(224) plt.imshow(X_train[3], cmap=plt.get_cmap('gray')) # show the plot plt.show() |

You can see that downloading and loading the MNIST dataset is as easy as calling the mnist.load_data() function. Running the above example, you should see the image below.

## Baseline Model with Multi-Layer Perceptrons

Do we really need a complex model like a convolutional neural network to get the best results with MNIST?

You can get very good results using a very simple neural network model with a single hidden layer. In this section we will create a simple multi-layer perceptron model that achieves an error rate of 1.74%. We will use this as a baseline for comparing more complex convolutional neural network models.

Let’s start off by importing the classes and functions we will need.

1 2 3 4 5 6 |
import numpy from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense from keras.layers import Dropout from keras.utils import np_utils |

It is always a good idea to initialize the random number generator to a constant to ensure that the results of your script are reproducible.

1 2 3 |
# fix random seed for reproducibility seed = 7 numpy.random.seed(seed) |

Now we can load the MNIST dataset using the Keras helper function.

1 2 |
# load data (X_train, y_train), (X_test, y_test) = mnist.load_data() |

The training dataset is structured as a 3-dimensional array of instance, image width and image height. For a multi-layer perceptron model we must reduce the images down into a vector of pixels. In this case the 28×28 sized images will be 784 pixel input values.

We can do this transform easily using the reshape() function on the NumPy array. We can also reduce our memory requirements by forcing the precision of the pixel values to be 32 bit, the default precision used by Keras anyway.

1 2 3 4 |
# flatten 28*28 images to a 784 vector for each image num_pixels = X_train.shape[1] * X_train.shape[2] X_train = X_train.reshape(X_train.shape[0], num_pixels).astype('float32') X_test = X_test.reshape(X_test.shape[0], num_pixels).astype('float32') |

The pixel values are gray scale between 0 and 255. It is almost always a good idea to perform some scaling of input values when using neural network models. Because the scale is well known and well behaved, we can very quickly normalize the pixel values to the range 0 and 1 by dividing each value by the maximum of 255.

1 2 3 |
# normalize inputs from 0-255 to 0-1 X_train = X_train / 255 X_test = X_test / 255 |

Finally, the output variable is an integer from 0 to 9. This is a multi-class classification problem. As such, it is good practice to use a one hot encoding of the class values, transforming the vector of class integers into a binary matrix.

We can easily do this using the built-in np_utils.to_categorical() helper function in Keras.

1 2 3 4 |
# one hot encode outputs y_train = np_utils.to_categorical(y_train) y_test = np_utils.to_categorical(y_test) num_classes = y_test.shape[1] |

We are now ready to create our simple neural network model. We will define our model in a function. This is handy if you want to extend the example later and try and get a better score.

1 2 3 4 5 6 7 8 9 |
# define baseline model def baseline_model(): # create model model = Sequential() model.add(Dense(num_pixels, input_dim=num_pixels, init='normal', activation='relu')) model.add(Dense(num_classes, init='normal', activation='softmax')) # Compile model model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) return model |

The model is a simple neural network with one hidden layer with the same number of neurons as there are inputs (784). A rectifier activation function is used for the neurons in the hidden layer.

A softmax activation function is used on the output layer to turn the outputs into probability-like values and allow one class of the 10 to be selected as the model’s output prediction. Logarithmic loss is used as the loss function (called categorical_crossentropy in Keras) and the efficient ADAM gradient descent algorithm is used to learn the weights.

We can now fit and evaluate the model. The model is fit over 10 epochs with updates every 200 images. The test data is used as the validation dataset, allowing you to see the skill of the model as it trains. A verbose value of 2 is used to reduce the output to one line for each training epoch.

Finally, the test dataset is used to evaluate the model and a classification error rate is printed.

1 2 3 4 5 6 7 |
# build the model model = baseline_model() # Fit the model model.fit(X_train, y_train, validation_data=(X_test, y_test), nb_epoch=10, batch_size=200, verbose=2) # Final evaluation of the model scores = model.evaluate(X_test, y_test, verbose=0) print("Baseline Error: %.2f%%" % (100-scores[1]*100)) |

Running the example might take a few minutes when run on a CPU. You should see the output below. This very simple network defined in very few lines of code achieves a respectable error rate of 1.74%.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
Train on 60000 samples, validate on 10000 samples Epoch 1/10 11s - loss: 0.2791 - acc: 0.9203 - val_loss: 0.1422 - val_acc: 0.9583 Epoch 2/10 11s - loss: 0.1121 - acc: 0.9680 - val_loss: 0.0994 - val_acc: 0.9697 Epoch 3/10 12s - loss: 0.0724 - acc: 0.9790 - val_loss: 0.0786 - val_acc: 0.9748 Epoch 4/10 12s - loss: 0.0508 - acc: 0.9856 - val_loss: 0.0790 - val_acc: 0.9762 Epoch 5/10 12s - loss: 0.0365 - acc: 0.9897 - val_loss: 0.0631 - val_acc: 0.9795 Epoch 6/10 12s - loss: 0.0263 - acc: 0.9931 - val_loss: 0.0644 - val_acc: 0.9798 Epoch 7/10 12s - loss: 0.0188 - acc: 0.9956 - val_loss: 0.0613 - val_acc: 0.9803 Epoch 8/10 12s - loss: 0.0149 - acc: 0.9967 - val_loss: 0.0628 - val_acc: 0.9814 Epoch 9/10 12s - loss: 0.0108 - acc: 0.9980 - val_loss: 0.0595 - val_acc: 0.9816 Epoch 10/10 12s - loss: 0.0072 - acc: 0.9989 - val_loss: 0.0577 - val_acc: 0.9826 Baseline Error: 1.74% |

## Simple Convolutional Neural Network for MNIST

Now that we have seen how to load the MNIST dataset and train a simple multi-layer perceptron model on it, it is time to develop a more sophisticated convolutional neural network or CNN model.

Keras does provide a lot of capability for creating convolutional neural networks.

In this section we will create a simple CNN for MNIST that demonstrates how to use all of the aspects of a modern CNN implementation, including Convolutional layers, Pooling layers and Dropout layers.

The first step is to import the classes and functions needed.

1 2 3 4 5 6 7 8 9 10 11 |
import numpy from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense from keras.layers import Dropout from keras.layers import Flatten from keras.layers.convolutional import Convolution2D from keras.layers.convolutional import MaxPooling2D from keras.utils import np_utils from keras import backend as K K.set_image_dim_ordering('th') |

Again, we always initialize the random number generator to a constant seed value for reproducibility of results.

1 2 3 |
# fix random seed for reproducibility seed = 7 numpy.random.seed(seed) |

Next we need to load the MNIST dataset and reshape it so that it is suitable for use training a CNN. In Keras, the layers used for two-dimensional convolutions expect pixel values with the dimensions [pixels][width][height].

In the case of RGB, the first dimension pixels would be 3 for the red, green and blue components and it would be like having 3 image inputs for every color image. In the case of MNIST where the pixel values are gray scale, the pixel dimension is set to 1.

1 2 3 4 5 |
# load data (X_train, y_train), (X_test, y_test) = mnist.load_data() # reshape to be [samples][pixels][width][height] X_train = X_train.reshape(X_train.shape[0], 1, 28, 28).astype('float32') X_test = X_test.reshape(X_test.shape[0], 1, 28, 28).astype('float32') |

As before, it is a good idea to normalize the pixel values to the range 0 and 1 and one hot encode the output variables.

1 2 3 4 5 6 7 |
# normalize inputs from 0-255 to 0-1 X_train = X_train / 255 X_test = X_test / 255 # one hot encode outputs y_train = np_utils.to_categorical(y_train) y_test = np_utils.to_categorical(y_test) num_classes = y_test.shape[1] |

Next we define our neural network model.

Convolutional neural networks are more complex than standard multi-layer perceptrons, so we will start by using a simple structure to begin with that uses all of the elements for state of the art results. Below summarizes the network architecture.

- The first hidden layer is a convolutional layer called a Convolution2D. The layer has 32 feature maps, which with the size of 5×5 and a rectifier activation function. This is the input layer, expecting images with the structure outline above [pixels][width][height].
- Next we define a pooling layer that takes the max called MaxPooling2D. It is configured with a pool size of 2×2.
- The next layer is a regularization layer using dropout called Dropout. It is configured to randomly exclude 20% of neurons in the layer in order to reduce overfitting.
- Next is a layer that converts the 2D matrix data to a vector called Flatten. It allows the output to be processed by standard fully connected layers.
- Next a fully connected layer with 128 neurons and rectifier activation function.
- Finally, the output layer has 10 neurons for the 10 classes and a softmax activation function to output probability-like predictions for each class.

As before, the model is trained using logarithmic loss and the ADAM gradient descent algorithm.

1 2 3 4 5 6 7 8 9 10 11 12 |
def baseline_model(): # create model model = Sequential() model.add(Convolution2D(32, 5, 5, border_mode='valid', input_shape=(1, 28, 28), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.2)) model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dense(num_classes, activation='softmax')) # Compile model model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) return model |

We evaluate the model the same way as before with the multi-layer perceptron. The CNN is fit over 10 epochs with a batch size of 200.

1 2 3 4 5 6 7 |
# build the model model = baseline_model() # Fit the model model.fit(X_train, y_train, validation_data=(X_test, y_test), nb_epoch=10, batch_size=200, verbose=2) # Final evaluation of the model scores = model.evaluate(X_test, y_test, verbose=0) print("Baseline Error: %.2f%%" % (100-scores[1]*100)) |

Running the example, the accuracy on the training and validation test is printed each epoch and at the end of the classification error rate is printed.

Epochs may take 60 to 90 seconds to run on the CPU, or about 15 minutes in total depending on your hardware. You can see that the network achieves an error rate of 1.10, which is better than our simple multi-layer perceptron model above.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
Train on 60000 samples, validate on 10000 samples Epoch 1/10 84s - loss: 0.2065 - acc: 0.9370 - val_loss: 0.0759 - val_acc: 0.9756 Epoch 2/10 84s - loss: 0.0644 - acc: 0.9802 - val_loss: 0.0475 - val_acc: 0.9837 Epoch 3/10 89s - loss: 0.0447 - acc: 0.9864 - val_loss: 0.0402 - val_acc: 0.9877 Epoch 4/10 88s - loss: 0.0346 - acc: 0.9891 - val_loss: 0.0358 - val_acc: 0.9881 Epoch 5/10 89s - loss: 0.0271 - acc: 0.9913 - val_loss: 0.0342 - val_acc: 0.9891 Epoch 6/10 89s - loss: 0.0210 - acc: 0.9933 - val_loss: 0.0391 - val_acc: 0.9880 Epoch 7/10 89s - loss: 0.0182 - acc: 0.9943 - val_loss: 0.0345 - val_acc: 0.9887 Epoch 8/10 89s - loss: 0.0142 - acc: 0.9956 - val_loss: 0.0323 - val_acc: 0.9904 Epoch 9/10 88s - loss: 0.0120 - acc: 0.9961 - val_loss: 0.0343 - val_acc: 0.9901 Epoch 10/10 89s - loss: 0.0108 - acc: 0.9965 - val_loss: 0.0353 - val_acc: 0.9890 Classification Error: 1.10% |

## Larger Convolutional Neural Network for MNIST

Now that we have seen how to create a simple CNN, let’s take a look at a model capable of close to state of the art results.

We import classes and function then load and prepare the data the same as in the previous CNN example.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
import numpy from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense from keras.layers import Dropout from keras.layers import Flatten from keras.layers.convolutional import Convolution2D from keras.layers.convolutional import MaxPooling2D from keras.utils import np_utils from keras import backend as K K.set_image_dim_ordering('th') # fix random seed for reproducibility seed = 7 numpy.random.seed(seed) # load data (X_train, y_train), (X_test, y_test) = mnist.load_data() # reshape to be [samples][pixels][width][height] X_train = X_train.reshape(X_train.shape[0], 1, 28, 28).astype('float32') X_test = X_test.reshape(X_test.shape[0], 1, 28, 28).astype('float32') # normalize inputs from 0-255 to 0-1 X_train = X_train / 255 X_test = X_test / 255 # one hot encode outputs y_train = np_utils.to_categorical(y_train) y_test = np_utils.to_categorical(y_test) num_classes = y_test.shape[1] |

This time we define a large CNN architecture with additional convolutional, max pooling layers and fully connected layers. The network topology can be summarized as follows.

- Convolutional layer with 30 feature maps of size 5×5.
- Pooling layer taking the max over 2*2 patches.
- Convolutional layer with 15 feature maps of size 3×3.
- Pooling layer taking the max over 2*2 patches.
- Dropout layer with a probability of 20%.
- Flatten layer.
- Fully connected layer with 128 neurons and rectifier activation.
- Fully connected layer with 50 neurons and rectifier activation.
- Output layer.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
def larger_model(): # create model model = Sequential() model.add(Convolution2D(30, 5, 5, border_mode='valid', input_shape=(1, 28, 28), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Convolution2D(15, 3, 3, activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.2)) model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dense(50, activation='relu')) model.add(Dense(num_classes, activation='softmax')) # Compile model model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) return model |

Like the previous two experiments, the model is fit over 10 epochs with a batch size of 200.

1 2 3 4 5 6 7 |
# build the model model = larger_model() # Fit the model model.fit(X_train, y_train, validation_data=(X_test, y_test), nb_epoch=10, batch_size=200, verbose=2) # Final evaluation of the model scores = model.evaluate(X_test, y_test, verbose=0) print("Baseline Error: %.2f%%" % (100-scores[1]*100)) |

Running the example prints accuracy on the training and validation datasets each epoch and a final classification error rate.

The model takes about 100 seconds to run per epoch. This slightly larger model achieves the respectable classification error rate of 0.89%.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
Using Theano backend. Train on 60000 samples, validate on 10000 samples Epoch 1/10 102s - loss: 0.3263 - acc: 0.8962 - val_loss: 0.0690 - val_acc: 0.9785 Epoch 2/10 103s - loss: 0.0858 - acc: 0.9737 - val_loss: 0.0430 - val_acc: 0.9862 Epoch 3/10 102s - loss: 0.0627 - acc: 0.9806 - val_loss: 0.0379 - val_acc: 0.9875 Epoch 4/10 101s - loss: 0.0501 - acc: 0.9842 - val_loss: 0.0342 - val_acc: 0.9891 Epoch 5/10 102s - loss: 0.0444 - acc: 0.9856 - val_loss: 0.0338 - val_acc: 0.9889 Epoch 6/10 101s - loss: 0.0389 - acc: 0.9878 - val_loss: 0.0302 - val_acc: 0.9897 Epoch 7/10 101s - loss: 0.0335 - acc: 0.9894 - val_loss: 0.0260 - val_acc: 0.9916 Epoch 8/10 102s - loss: 0.0305 - acc: 0.9898 - val_loss: 0.0267 - val_acc: 0.9911 Epoch 9/10 101s - loss: 0.0296 - acc: 0.9904 - val_loss: 0.0211 - val_acc: 0.9933 Epoch 10/10 102s - loss: 0.0272 - acc: 0.9911 - val_loss: 0.0269 - val_acc: 0.9911 Classification Error: 0.89% |

This is not an optimized network topology. Nor is a reproduction of a network topology from a recent paper. There is a lot of opportunity for you to tune and improve upon this model.

What is the best error rate score you can achieve?

Post your configuration and best score in the comments.

## Resources on MNIST

The MNIST dataset is very well studied. Below are some additional resources you might like to look into.

- The Official MNIST dataset webpage.
- Rodrigo Benenson’s webpage that lists state of the art results.
- Kaggle competition that uses this dataset (check the scripts and forum sections for sample code)
- Read-only model trained on MNIST that you can test in your browser (very cool)

## Summary

In this post you discovered the MNIST handwritten digit recognition problem and deep learning models developed in Python using the Keras library that are capable of achieving excellent results.

Working through this tutorial you learned:

- How to load the MNIST dataset in Keras and generate plots of the dataset.
- How to reshape the MNIST dataset and develop a simple but well performing multi-layer perceptron model on the problem.
- How to use Keras to create convolutional neural network models for MNIST.
- How to develop and evaluate larger CNN models for MNIST capable of near world class results.

Do you have any questions about handwriting recognition with deep learning or this post? Ask your question in the comments and I will do my best to answer.

## Frustrated With Your Progress In Deep Learning?

#### What If You Could Develop Your Own Deep Nets in Minutes

...with just a few lines of Python

Discover how in my new Ebook: Deep Learning With Python

It covers **self-study tutorials** and **end-to-end projects** on topics like:*Multilayer Perceptrons*, *Convolutional Nets* and *Recurrent Neural Nets*, and more...

#### Finally Bring Deep Learning To

Your Own Projects

Skip the Academics. Just Results.

Thanks for this tutorial. It was great. Though(it might sound silly) how do I see it in action? I mean if I wanna see it predict an answer for an image how do I do that?

Thanks again.

In it’s current form it is not a robust system.

You will have to provide a digit image with the same dimensions.

great work!!

but can you show that in action with a sample image

Do you have a working program which recogniting the numbers ?

Just the examples in this tutorial Adrian.

When I try the baseline model with MLPs I get much worse performance than what you are showing (an error rate of 53.64%). Any idea why I could be seeing such vastly different results when I’m using the same code? Thanks.

Hi Matthew, that is surprising that the numbers are so different.

Theano backend? or TensorFlow? What Platform? What version of Python?

Try running the example 3 times and report all 3 scores.

I have the same problem. I using Theano backend. platform: Pycharm. version 3.5

Sorry to hear that Adrian.

Does it work if you run on the command line?

To get an error rate that high, the code must have been copied incorrectly or something similar. Beyond that, do notice that each time you run this, the final output will be slightly different each time because of the Dropout layer in the neural network. It will randomly choose that 20% each time it runs thereby slightly affecting the final outcome.

Could you please give some simple example for CNN for ex may be in uci repository data set. Whether is possible to apply CNN for numeric features.

Sorry, I don’t have such an example.

Hello Jason, I tried running the script, but the baseline model is taking too much time.. its running from past 20 hours and still is on 4th EPoch,, can you please suggest some way to speed up the process.. I am using 4 gb ram computer, and running on Anaconda Theano backened Keras

Sorry to hear that Dinesh.

Perhaps try training on AWS:

http://machinelearningmastery.com/develop-evaluate-large-deep-learning-models-keras-amazon-web-services/

Hi Jason,

What is the configuration of machine that you used to run the model.. have you used GPU to improve performance? How much time it took for you?

Also AWS is a paid platform, is there any free platform for running ML algorithms?

Thanks

I used at 8 core machine with 8GB of RAM. It completed in reasonable time from memory.

AWS is very reasonably priced, I think less than $1 USD per hour. Great for one-off models like this.

Hi! Great post! I tried it, but for the first CNN It does not seem to compile. I got:

ValueError: Filter must not be larger than the input: Filter: (5, 5) Input: (1, 28)

just after model = baseline_model()

I have updated the examples, try again!

Hi jason, I tried it, but I got the error below. I use tensorflow r0.11. I’m not sure whether it is the casuse.

Using TensorFlow backend.

Traceback (most recent call last):

File “/Users/Jack/.pyenv/versions/3.5.1/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py”, line 594, in call_cpp_shape_fn

status)

File “/Users/Jack/.pyenv/versions/3.5.1/lib/python3.5/contextlib.py”, line 66, in __exit__

next(self.gen)

File “/Users/Jack/.pyenv/versions/3.5.1/lib/python3.5/site-packages/tensorflow/python/framework/errors.py”, line 463, in raise_exception_on_not_ok_status

pywrap_tensorflow.TF_GetCode(status))

tensorflow.python.framework.errors.InvalidArgumentError: Negative dimension size caused by subtracting 5 from 1

Ouch Jack, that does not look good.

It looks like the API has changed. I’ll dive into it and fix up the examples.

OK, I have updated the examples.

Firstly, I recommend using TensorFlow 0.10.0, NOT 0.11 as there are issues with the latest version.

Secondly, You must add the following two lines to make the CNNs work:

Fix taken from here: https://github.com/fchollet/keras/issues/2681

I hope that helps Jack.

Hi Jason.

Thanks for the great tutorial.

Your comment has not solved the problem yet, and we still the same error, Could you please modify your model to work with TF backend?

Hi, thanks for the great tutorial !

I tried predicting with a test set and got the one-hot encoded predictions. I was just wondering if there’s a built-in function to convert it back to original labels (0,1,2,3…).

Great question Abhai.

If you use scikit-learn to perform the one hot encoding, it offers an inverse transform to turn the encoded prediction back into the original values.

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

That would be my prefers as a starting point.

Hello,

Thank you very much for your usual brief and comprehensive illustration and discussion.

I’m glad you found the post useful Berisha.

Hello,

After finishing learning, how can I recognize my own pictures with this network.

Great question gs, I don’t have an example at the moment.

You will need to encode your own pictures in the same way as the MNIST dataset – mainly rescale to the same size. Then load them as a matrix of pixel values and you can make predictions.

Hello,

Thanks for great example, but how do I save the state of the net,

I mean that net learns on 60000 examples, then it tests and try to guess 10000

But if I want to use always, every day, for example, how can I use it without training it every day?

Great question, see this post for a tutorial on saving your net:

http://machinelearningmastery.com/save-load-keras-deep-learning-models/

Jason, does your book explain “WHY” you chose the various layers you did in this tutorial and shed light on how and why to choose certain designs for different data sets?

No, just the how John.

Why us hard, in most cases best results are achieved with trial and error. There is no “theory of neural networks” that helps you configure them.

hello

when i try to make a prediction for my own image, the net get it wrong

this is depressing me.

i use the command model.predict_classes(img)

please is there a way to get correct answer for my handwritten digit

Perhaps you need more and different training examples Nassim?

Perhaps some image augmentation can make your model more robust?

Great tutorial Jason, in fact you are the best, very easy to follow, I enjoy all your tutorials, thank you! In fact,

I achieved an error rate of 0.74 at one point using GPU and it took about 30sec to run.

Well done Anthony!

Thanks for the great tutorial. Just one thing I didn’t understand. In the Convolution2D layer, there is a border_mode=”valid” parameter. What does this do? What’s its purpose? The Keras documentation doesn’t seem to have an explanation for it either.

Excellent tutorial Jason. I really enjoyed reading it and implementing it. I just figured that if you have cuDNN installed it makes things waay fast (at least for the toy examples I’ve tried). I recommend anyone reading this to install cuDNN and configure theano to use it. You just have to put

[dnn]

enabled = True

in theanorc file.

Nice, thank for the tip Sanjaya.

See this post for how to run on AWS with GPUs if you do not have the hardware locally:

http://machinelearningmastery.com/develop-evaluate-large-deep-learning-models-keras-amazon-web-services/

Hi Jason,

I am trying to apply the CONVOLUTION1D for the IRIS Data.

The code is as below

—————————————————————————————–

max_features = 150

maxlen = 4

batch_size = 16

embedding_dims = 3

nb_epoch = 3

nb_classes =3

dropoutVal = 0.5

nb_filter = 5

hidden_dims = 500

filter_length = 4

import pandas as pd

data_load = pd.read_csv(“iris.csv”)

data = data_load.ix[:,0:4]

target = data_load.ix[:,4]

X_train = np.array(data[:100].values.astype(‘float32’))

Y_train = np.array(target[:100])

Y_train = np_utils.to_categorical(Y_train,nb_classes)

X_test = np.array(data[100:].values.astype(‘float32′))

Y_test = np.array(target[100:])

Y_test = np_utils.to_categorical(Y_test,nb_classes)

std = StandardScaler()

X_train = X_train_scaled = std.fit_transform(X_train)

X_test = X_test_scaled = std.transform(X_test)

X_train1 = sequence.pad_sequences(X_train_scaled,maxlen=maxlen)

X_test1 = sequence.pad_sequences(X_test_scaled,maxlen=maxlen)

model = Sequential()

model.add(Embedding(max_features,embedding_dims,input_length=maxlen))

model.add(Convolution1D(nb_filter=nb_filter,filter_length=filter_length, border_mode=’valid’,activation=’relu’))

model.add(GlobalMaxPooling1D())

model.add(Dense(hidden_dims,activation=’softmax’))

model.add(Dense(nb_classes))

model.add(Activation(‘sigmoid’))

model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

model.fit(X_train1, Y_train, nb_epoch=5, batch_size=10)

scores = model.evaluate(X_test1, Y_test, verbose=0)

predictions = model.predict(X_test1)

—————————————————————————————-

I want to check if I am in the right direction on this.

I am not getting the accuracy more than 66% which is quite surprising.

Am I doing the Embedding Layer correctly. As when I see the embedding layer weights I see there is difference in what Layer Paremeters I set with the Weights I retreive.

Please advise.

Regards

Ganesh

I would recommend using an MLP rather than a CNN for the iris flowers dataset.

See this post:

http://machinelearningmastery.com/multi-class-classification-tutorial-keras-deep-learning-library/

Hi Jason

Thank you so much for your great tutorial.

By the way, can you explain why MLP is better than CNN for iris flowers dataset. Thanks a lot.

Best wishes,

Lua

Because the data is tabular (e.g. measurements of flowers), not images (e.g. photos).

If the data was photos, then a CNN would be the method of choice.

Thank you very much for this post. (:

You’re welcome Ger.

can you tell me how can i give the system an image and he tells me what number is it ? sorry i am new to this , thank you !

Hi Remon,

The image will have to be scaled to the same dimensions as those expected by the network.

Also, in this example, the network expects images to have a specific set of proportions and to be white digits on a black background. New examples will have to be prepared in the same way.

Hi jason,

snippet of your code:

————————-

in the step # load data

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# reshape to be [samples][pixels][width][height]

X_train = X_train.reshape(X_train.shape[0], 1, 28, 28).astype(‘float32’)

X_test = X_test.reshape(X_test.shape[0], 1, 28, 28).astype(‘float32’)

you are using mnist data. what kind of data strurcture is it?

how to pre process images (in a list) an labels(in a list) into this structure anf fid it to keras model?

what exactly this line

X_train = X_train.reshape(X_train.shape[0], 1, 28, 28).astype(‘float32’)

does?

thanks

joseph

Hi Joe,

The MNIST data is available within Keras.

It is stored as NumPy arrays of pixel data.

When used with a CNN, the data is reshaped into the format: [samples, pixels, width, height]

I hope that helps.

Hi Jason, what led you to choose 128 neurons for the fully-connected layer? (Calculating the number of activations leading into the fully-connected layer, it’s much larger than 128) Thanks!

Trial and error Amy.

Thank you!

Hi @Jason Brownlee, first things first awesome tut u got there ..!!

There’s a small problem at d following lines:

Small CNN:

# build the model

model = baseline_model()

Large CNN:

# build the model

model = larger_model()

Using the recent versions of Tensorfllow throws an AttributeError:

=> AttributeError: module ‘tensorflow.python’ has no attribute ‘control_flow_ops’ <=

solution:

Add following lines:

import tensorflow as tf

tf.python.control_flow_ops = tf

ref:

https://github.com/fchollet/keras/issues/3857

can u plz update the code..!!

once again thanx for d grt tut.

keep up the good work..!!

Thanks for the note Shaik, I’ll investigate.

your posts are great for an awesome start with keras. m loving it sir.

Thanks Mouz.

Hello sir, thanks for your great writing and well explanation. I have tried it and it works. But how can I train the network using my own handwriting data set instead of MNIST data.

It would be very helpful if you shed some light on this..

Thanks in advance.

Hi Faruk,

Generally, you will need to make the data consistent in dimensions as a starting point.

From there, you can separate the data into train/test sets or similar and begin exploring different configurations.

Does that help? Perhaps I misunderstand the question?

Hi

Thanks for your nice explanation. I succesfully trained all the networks you introduced here. However, when I want to use the trained model to make some predictions, using this pice of code:

im=misc.imread(‘test8.png’)

im=im[:,:,1]

im=im.flatten()

print(model.predict(im))

it gives me the error:

Error when checking : expected dense_input_1 to have shape (None, 784) but got array with shape (784, 1)

the ‘im’ has the shape (,784) , how can I feed in an array of size (None,784) ?

Hi Arash,

Consider reshaping as follows:

Hi Jason,

Thank you for your wonderful tutorial!

I have question about the ‘model.add(Dropout(0.2))’. As you stated ‘The next layer is a regularization layer using dropout called Dropout. It is configured to randomly exclude 20% of neurons in the layer in order to reduce overfitting.’ in the post, Dropout is treated as a separated layer in Keras, instead of a regularization operation on the existing layers such as convolution layer and fully-connected layer. How is this being achieved?

Since this Dropout is between MaxPooling and the next fully-connected layer, which part of the weights was applied Dropout?

Thank you very much!

Good question, it affects the weights between the layers it is inserted.

Hello sir, thanks for your well explanation. I have tried it and it works well. But how can I train the network using my own handwriting data set instead of MNIST data set.

It would be very thankful if you shed some light on this..

Thanks in advance.

You will need to load the data from file, adjust it so that it all has the same dimensions, then fit your model.

I do not have an example of working with custom data at the moment, sorry.

i tried the simple CNN with theano backend.

”’ImportError: (‘The following error happened while compiling the node’, DotModulo(A, s, m, A2, s2, m2), ‘\n’, ‘/home/pramod/.theano/compiledir_Linux-4.8–generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/tmpXpzrkl/d16654b784f584f17fdc481825fd2cca.so: undefined symbol: _ZdlPvm’, ‘[DotModulo(A, s, m, A2, s2, m2)]’)”’

i got this error while running the baseline model.

can you please tell me how to correct this?. i tried multiple ways of installing theano including pip and conda .

im guessing my theano installation is faulty .

clueless on how to proceed . please help.

thank you

I have not seen this error, sorry.

Many of my students have great success using Keras and Theano with Anaconda Python.

Got the verbatim code from above with one change:

X_train = X_train[:-20000 or None]

y_train = y_train[:-20000 or None]

to reduce the memory usage to run on a Mac OSX El Capitan (GeForce 650M with 512MB)

The error rate was a little higher at

1.51%

I used keras with the tensorflow-GPU backend.

Thanks for the note Chris!

Hi Jason,

Really awesome introduction to keras and digit recognition for a beginner like me.

You are using mnist dataset which is in form of pickled object (I guess). But my question is how will you convert set of existing images to this pickled object?

Secondly, you are calculating error rate compared to your test dataset. But suppose I have an image with a number written on it, how will you return class label of it, without making much changes in the above program.

Thanks Vikalp.

I would recommend loading your image data as numpy arrays and working with them directly.

You can make a prediction with the network (y = model.predict(x)) and use the numpy argmax() function to convert the one hot encoded output into a class index.

Hi Jason,

Thanks for quick reply.

I was looking into the way you suggested. Following is the code for that:

color_image = cv2.imread(“two.jpg”)

gray_image = cv2.cvtColor(color_image, cv2.COLOR_BGR2GRAY)

a = model.predict(numpy.array(gray_image))

print(a)

But getting following error:

ValueError: Error when checking : expected dense_1_input to have shape (None, 784) but got array with shape (1024, 791)

I am not sure if I am doing correct. Please guide over this. Thank you.

The loaded image must have the exact same dimensions as the data used to fit the model.

You may need to resize it.