How to Use The Pre-Trained VGG Model to Classify Objects in Photographs

Convolutional neural networks are now capable of outperforming humans on some computer vision tasks, such as classifying images.

That is, given a photograph of an object, answer the question as to which of 1,000 specific objects the photograph shows.

A competition-winning model for this task is the VGG model by researchers at Oxford. What is important about this model, besides its capability of classifying objects in photographs, is that the model weights are freely available and can be loaded and used in your own models and applications.

In this tutorial, you will discover the VGG convolutional neural network models for image classification.

After completing this tutorial, you will know:

  • About the ImageNet dataset and competition and the VGG winning models.
  • How to load the VGG model in Keras and summarize its structure.
  • How to use the loaded VGG model to classifying objects in ad hoc photographs.

Let’s get started.

Tutorial Overview

This tutorial is divided into 4 parts; they are:

  1. ImageNet
  2. The Oxford VGG Models
  3. Load the VGG Model in Keras
  4. Develop a Simple Photo Classifier

ImageNet

ImageNet is a research project to develop a large database of images with annotations, e.g. images and their descriptions.

The images and their annotations have been the basis for an image classification challenge called the ImageNet Large Scale Visual Recognition Challenge or ILSVRC since 2010. The result is that research organizations battle it out on pre-defined datasets to see who has the best model for classifying the objects in images.

The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions.

ImageNet Large Scale Visual Recognition Challenge, 2015.

For the classification task, images must be classified into one of 1,000 different categories.

For the last few years very deep convolutional neural network models have been used to win these challenges and results on the tasks have exceeded human performance.

Sample of Images from the ImageNet Dataset used in the ILSVRC Challenge

Sample of Images from the ImageNet Dataset used in the ILSVRC Challenge
Taken From “ImageNet Large Scale Visual Recognition Challenge”, 2015.

The Oxford VGG Models

Researchers from the Oxford Visual Geometry Group, or VGG for short, participate in the ILSVRC challenge.

In 2014, convolutional neural network models (CNN) developed by the VGG won the image classification tasks.

ILSVRC Results in 2014 for the Classification task

ILSVRC Results in 2014 for the Classification task

After the competition, the participants wrote up their findings in the paper:

They also made their models and learned weights available online.

This allowed other researchers and developers to use a state-of-the-art image classification model in their own work and programs.

This helped to fuel a rash of transfer learning work where pre-trained models are used with minor modification on wholly new predictive modeling tasks, harnessing the state-of-the-art feature extraction capabilities of proven models.

… we come up with significantly more accurate ConvNet architectures, which not only achieve the state-of-the-art accuracy on ILSVRC classification and localisation tasks, but are also applicable to other image recognition datasets, where they achieve excellent performance even when used as a part of a relatively simple pipelines (e.g. deep features classified by a linear SVM without fine-tuning). We have released our two best-performing models to facilitate further research.

Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014.

VGG released two different CNN models, specifically a 16-layer model and a 19-layer model.

Refer to the paper for the full details of these models.

The VGG models are not longer state-of-the-art by only a few percentage points. Nevertheless, they are very powerful models and useful both as image classifiers and as the basis for new models that use image inputs.

In the next section, we will see how we can use the VGG model directly in Keras.

Load the VGG Model in Keras

The VGG model can be loaded and used in the Keras deep learning library.

Keras provides an Applications interface for loading and using pre-trained models.

Using this interface, you can create a VGG model using the pre-trained weights provided by the Oxford group and use it as a starting point in your own model, or use it as a model directly for classifying images.

In this tutorial, we will focus on the use case of classifying new images using the VGG model.

Keras provides both the 16-layer and 19-layer version via the VGG16 and VGG19 classes. Let’s focus on the VGG16 model.

The model can be created as follows:

That’s it.

The first time you run this example, Keras will download the weight files from the Internet and store them in the ~/.keras/models directory.

Note that the weights are about 528 megabytes, so the download may take a few minutes depending on the speed of your Internet connection.

The weights are only downloaded once. The next time you run the example, the weights are loaded locally and the model should be ready to use in seconds.

We can use the standard Keras tools for inspecting the model structure.

For example, you can print a summary of the network layers as follows:

You can see that the model is huge.

You can also see that, by default, the model expects images as input with the size 224 x 224 pixels with 3 channels (e.g. color).

We can also create a plot of the layers in the VGG model, as follows:

Again, because the model is large, the plot is a little too large and perhaps unreadable. Nevertheless, it is provided below.

Plot of Layers in the VGG Model

Plot of Layers in the VGG Model

The VGG() class takes a few arguments that may only interest you if you are looking to use the model in your own project, e.g. for transfer learning.

For example:

  • include_top (True): Whether or not to include the output layers for the model. You don’t need these if you are fitting the model on your own problem.
  • weights (‘imagenet‘): What weights to load. You can specify None to not load pre-trained weights if you are interested in training the model yourself from scratch.
  • input_tensor (None): A new input layer if you intend to fit the model on new data of a different size.
  • input_shape (None): The size of images that the model is expected to take if you change the input layer.
  • pooling (None): The type of pooling to use when you are training a new set of output layers.
  • classes (1000): The number of classes (e.g. size of output vector) for the model.

Next, let’s look at using the loaded VGG model to classify ad hoc photographs.

Develop a Simple Photo Classifier

Let’s develop a simple image classification script.

1. Get a Sample Image

First, we need an image we can classify.

You can download a random photograph of a coffee mug from Flickr here.

Coffee Mug

Coffee Mug
Photo by jfanaian, some rights reserved.

Download the image and save it to your current working directory with the filename ‘mug.jpg‘.

2. Load the VGG Model

Load the weights for the VGG-16 model, as we did in the previous section.

3. Load and Prepare Image

Next, we can load the image as pixel data and prepare it to be presented to the network.

Keras provides some tools to help with this step.

First, we can use the load_img() function to load the image and resize it to the required size of 224×224 pixels.

Next, we can convert the pixels to a NumPy array so that we can work with it in Keras. We can use the img_to_array() function for this.

The network expects one or more images as input; that means the input array will need to be 4-dimensional: samples, rows, columns, and channels.

We only have one sample (one image). We can reshape the array by calling reshape() and adding the extra dimension.

Next, the image pixels need to be prepared in the same way as the ImageNet training data was prepared. Specifically, from the paper:

The only preprocessing we do is subtracting the mean RGB value, computed on the training set, from each pixel.

Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014.

Keras provides a function called preprocess_input() to prepare new input for the network.

We are now ready to make a prediction for our loaded and prepared image.

4. Make a Prediction

We can call the predict() function on the model in order to get a prediction of the probability of the image belonging to each of the 1000 known object types.

Nearly there, now we need to interpret the probabilities.

5. Interpret Prediction

Keras provides a function to interpret the probabilities called decode_predictions().

It can return a list of classes and their probabilities in case you would like to present the top 3 objects that may be in the photo.

We will just report the first most likely object.

And that’s it.

Complete Example

Tying all of this together, the complete example is listed below:

Running the example, we can see that the image is correctly classified as a “coffee mug” with a 75% likelihood.

Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

  • Create a Function. Update the example and add a function that given an image filename and the loaded model will return the classification result.
  • Command Line Tool. Update the example so that given an image filename on the command line, the program will report the classification for the image.
  • Report Multiple Classes. Update the example to report the top 5 most likely classes for a given image and their probabilities.

Further Reading

This section provides more resources on the topic if you are looking go deeper.

Summary

In this tutorial, you discovered the VGG convolutional neural network models for image classification.

Specifically, you learned:

  • About the ImageNet dataset and competition and the VGG winning models.
  • How to load the VGG model in Keras and summarize its structure.
  • How to use the loaded VGG model to classifying objects in ad hoc photographs.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Frustrated With Your Progress In Deep Learning?

Deep Learning with Python

 What If You Could Develop A Network in Minutes

…with just a few lines of Python

Discover how in my new Ebook: Deep Learning With Python

It covers self-study tutorials and end-to-end projects on topics like:
Multilayer PerceptronsConvolutional Nets and Recurrent Neural Nets, and more…

Finally Bring Deep Learning To
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.

16 Responses to How to Use The Pre-Trained VGG Model to Classify Objects in Photographs

  1. Thabet November 8, 2017 at 4:56 pm #

    Thank you Jason !

  2. Alexander Kireev November 8, 2017 at 6:04 pm #

    Thank you, Jason. Very interest work.
    From this point we continue our journey toward Computer vision?
    If it is possible, tell us please in future works about Region of interest technique. It is difficult to understand for beginner, but very useful in practice.

    • Jason Brownlee November 9, 2017 at 9:54 am #

      Great suggestion, thanks Alexander.

      For the next few months the focus will be NLP with posts related to my new book on the topic.

  3. Gerrit Govaerts November 8, 2017 at 7:09 pm #

    I don’t want to crash your party , but…

    http://www.bbc.com/news/technology-41845878

    • Jason Brownlee November 9, 2017 at 9:56 am #

      Yes I saw that.

      We are still making impressive progress and achieving amazing results we could not dream of 10 years ago.

  4. Ritika November 10, 2017 at 4:51 am #

    Thank you Jason for the wonderful article can you please suggest which pretrained model can be used for for recognizing individual alphabets and digits.

    • Jason Brownlee November 10, 2017 at 10:40 am #

      Good question, I am not sure off the cuff, perhaps try a google search. I expect there are such models available.

      If you discover some, please let me know.

  5. Sam Ranade November 10, 2017 at 7:46 am #

    Thank you Jason,
    Someday can you take time to write about training VGG for objects not belonging to original 1000 classes (Imagenet dataset) but completely new 2000 classes. I am specially interested in training times for starting from scratch and training times for fine-tuning. Do the no_top weights reduce training time much?
    Once again thank you for the post

    • Jason Brownlee November 10, 2017 at 10:43 am #

      Great suggestion, thanks Sam. I hope to.

      Yes, the layers just before the output layer do contain valuable info! I have tested this on some image captioning examples.

  6. Adel November 10, 2017 at 9:15 am #

    Thank you Jason for the wonderful article. We really hope you a post on Object Detection stuff like SSD (Single Shot Multibox Detector ) for standard data and custome data or semantic segmentation stuff like FCN or U-Net that will be very cool.

  7. Reza November 11, 2017 at 1:14 am #

    Many thanks for That.

  8. krisna November 17, 2017 at 1:29 am #

    i’m still confused , can i change the image dataset and train it with VGG ?
    thanks

    • Jason Brownlee November 17, 2017 at 9:27 am #

      Sorry, I don’t follow, perhaps you can restate your question?

Leave a Reply