How to Use The Pre-Trained VGG Model to Classify Objects in Photographs

Convolutional neural networks are now capable of outperforming humans on some computer vision tasks, such as classifying images.

That is, given a photograph of an object, answer the question as to which of 1,000 specific objects the photograph shows.

A competition-winning model for this task is the VGG model by researchers at Oxford. What is important about this model, besides its capability of classifying objects in photographs, is that the model weights are freely available and can be loaded and used in your own models and applications.

In this tutorial, you will discover the VGG convolutional neural network models for image classification.

After completing this tutorial, you will know:

  • About the ImageNet dataset and competition and the VGG winning models.
  • How to load the VGG model in Keras and summarize its structure.
  • How to use the loaded VGG model to classifying objects in ad hoc photographs.

Let’s get started.

Tutorial Overview

This tutorial is divided into 4 parts; they are:

  1. ImageNet
  2. The Oxford VGG Models
  3. Load the VGG Model in Keras
  4. Develop a Simple Photo Classifier

ImageNet

ImageNet is a research project to develop a large database of images with annotations, e.g. images and their descriptions.

The images and their annotations have been the basis for an image classification challenge called the ImageNet Large Scale Visual Recognition Challenge or ILSVRC since 2010. The result is that research organizations battle it out on pre-defined datasets to see who has the best model for classifying the objects in images.

The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions.

ImageNet Large Scale Visual Recognition Challenge, 2015.

For the classification task, images must be classified into one of 1,000 different categories.

For the last few years very deep convolutional neural network models have been used to win these challenges and results on the tasks have exceeded human performance.

Sample of Images from the ImageNet Dataset used in the ILSVRC Challenge

Sample of Images from the ImageNet Dataset used in the ILSVRC Challenge
Taken From “ImageNet Large Scale Visual Recognition Challenge”, 2015.

The Oxford VGG Models

Researchers from the Oxford Visual Geometry Group, or VGG for short, participate in the ILSVRC challenge.

In 2014, convolutional neural network models (CNN) developed by the VGG won the image classification tasks.

ILSVRC Results in 2014 for the Classification task

ILSVRC Results in 2014 for the Classification task

After the competition, the participants wrote up their findings in the paper:

They also made their models and learned weights available online.

This allowed other researchers and developers to use a state-of-the-art image classification model in their own work and programs.

This helped to fuel a rash of transfer learning work where pre-trained models are used with minor modification on wholly new predictive modeling tasks, harnessing the state-of-the-art feature extraction capabilities of proven models.

… we come up with significantly more accurate ConvNet architectures, which not only achieve the state-of-the-art accuracy on ILSVRC classification and localisation tasks, but are also applicable to other image recognition datasets, where they achieve excellent performance even when used as a part of a relatively simple pipelines (e.g. deep features classified by a linear SVM without fine-tuning). We have released our two best-performing models to facilitate further research.

Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014.

VGG released two different CNN models, specifically a 16-layer model and a 19-layer model.

Refer to the paper for the full details of these models.

The VGG models are not longer state-of-the-art by only a few percentage points. Nevertheless, they are very powerful models and useful both as image classifiers and as the basis for new models that use image inputs.

In the next section, we will see how we can use the VGG model directly in Keras.

Load the VGG Model in Keras

The VGG model can be loaded and used in the Keras deep learning library.

Keras provides an Applications interface for loading and using pre-trained models.

Using this interface, you can create a VGG model using the pre-trained weights provided by the Oxford group and use it as a starting point in your own model, or use it as a model directly for classifying images.

In this tutorial, we will focus on the use case of classifying new images using the VGG model.

Keras provides both the 16-layer and 19-layer version via the VGG16 and VGG19 classes. Let’s focus on the VGG16 model.

The model can be created as follows:

That’s it.

The first time you run this example, Keras will download the weight files from the Internet and store them in the ~/.keras/models directory.

Note that the weights are about 528 megabytes, so the download may take a few minutes depending on the speed of your Internet connection.

The weights are only downloaded once. The next time you run the example, the weights are loaded locally and the model should be ready to use in seconds.

We can use the standard Keras tools for inspecting the model structure.

For example, you can print a summary of the network layers as follows:

You can see that the model is huge.

You can also see that, by default, the model expects images as input with the size 224 x 224 pixels with 3 channels (e.g. color).

We can also create a plot of the layers in the VGG model, as follows:

Again, because the model is large, the plot is a little too large and perhaps unreadable. Nevertheless, it is provided below.

Plot of Layers in the VGG Model

Plot of Layers in the VGG Model

The VGG() class takes a few arguments that may only interest you if you are looking to use the model in your own project, e.g. for transfer learning.

For example:

  • include_top (True): Whether or not to include the output layers for the model. You don’t need these if you are fitting the model on your own problem.
  • weights (‘imagenet‘): What weights to load. You can specify None to not load pre-trained weights if you are interested in training the model yourself from scratch.
  • input_tensor (None): A new input layer if you intend to fit the model on new data of a different size.
  • input_shape (None): The size of images that the model is expected to take if you change the input layer.
  • pooling (None): The type of pooling to use when you are training a new set of output layers.
  • classes (1000): The number of classes (e.g. size of output vector) for the model.

Next, let’s look at using the loaded VGG model to classify ad hoc photographs.

Develop a Simple Photo Classifier

Let’s develop a simple image classification script.

1. Get a Sample Image

First, we need an image we can classify.

You can download a random photograph of a coffee mug from Flickr here.

Coffee Mug

Coffee Mug
Photo by jfanaian, some rights reserved.

Download the image and save it to your current working directory with the filename ‘mug.jpg‘.

2. Load the VGG Model

Load the weights for the VGG-16 model, as we did in the previous section.

3. Load and Prepare Image

Next, we can load the image as pixel data and prepare it to be presented to the network.

Keras provides some tools to help with this step.

First, we can use the load_img() function to load the image and resize it to the required size of 224×224 pixels.

Next, we can convert the pixels to a NumPy array so that we can work with it in Keras. We can use the img_to_array() function for this.

The network expects one or more images as input; that means the input array will need to be 4-dimensional: samples, rows, columns, and channels.

We only have one sample (one image). We can reshape the array by calling reshape() and adding the extra dimension.

Next, the image pixels need to be prepared in the same way as the ImageNet training data was prepared. Specifically, from the paper:

The only preprocessing we do is subtracting the mean RGB value, computed on the training set, from each pixel.

Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014.

Keras provides a function called preprocess_input() to prepare new input for the network.

We are now ready to make a prediction for our loaded and prepared image.

4. Make a Prediction

We can call the predict() function on the model in order to get a prediction of the probability of the image belonging to each of the 1000 known object types.

Nearly there, now we need to interpret the probabilities.

5. Interpret Prediction

Keras provides a function to interpret the probabilities called decode_predictions().

It can return a list of classes and their probabilities in case you would like to present the top 3 objects that may be in the photo.

We will just report the first most likely object.

And that’s it.

Complete Example

Tying all of this together, the complete example is listed below:

Running the example, we can see that the image is correctly classified as a “coffee mug” with a 75% likelihood.

Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

  • Create a Function. Update the example and add a function that given an image filename and the loaded model will return the classification result.
  • Command Line Tool. Update the example so that given an image filename on the command line, the program will report the classification for the image.
  • Report Multiple Classes. Update the example to report the top 5 most likely classes for a given image and their probabilities.

Further Reading

This section provides more resources on the topic if you are looking go deeper.

Summary

In this tutorial, you discovered the VGG convolutional neural network models for image classification.

Specifically, you learned:

  • About the ImageNet dataset and competition and the VGG winning models.
  • How to load the VGG model in Keras and summarize its structure.
  • How to use the loaded VGG model to classifying objects in ad hoc photographs.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Frustrated With Your Progress In Deep Learning?

Deep Learning with Python

 What If You Could Develop A Network in Minutes

…with just a few lines of Python

Discover how in my new Ebook: Deep Learning With Python

It covers self-study tutorials and end-to-end projects on topics like:
Multilayer PerceptronsConvolutional Nets and Recurrent Neural Nets, and more…

Finally Bring Deep Learning To
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.

86 Responses to How to Use The Pre-Trained VGG Model to Classify Objects in Photographs

  1. Thabet November 8, 2017 at 4:56 pm #

    Thank you Jason !

  2. Alexander Kireev November 8, 2017 at 6:04 pm #

    Thank you, Jason. Very interest work.
    From this point we continue our journey toward Computer vision?
    If it is possible, tell us please in future works about Region of interest technique. It is difficult to understand for beginner, but very useful in practice.

    • Jason Brownlee November 9, 2017 at 9:54 am #

      Great suggestion, thanks Alexander.

      For the next few months the focus will be NLP with posts related to my new book on the topic.

  3. Gerrit Govaerts November 8, 2017 at 7:09 pm #

    I don’t want to crash your party , but…

    http://www.bbc.com/news/technology-41845878

    • Jason Brownlee November 9, 2017 at 9:56 am #

      Yes I saw that.

      We are still making impressive progress and achieving amazing results we could not dream of 10 years ago.

  4. Ritika November 10, 2017 at 4:51 am #

    Thank you Jason for the wonderful article can you please suggest which pretrained model can be used for for recognizing individual alphabets and digits.

    • Jason Brownlee November 10, 2017 at 10:40 am #

      Good question, I am not sure off the cuff, perhaps try a google search. I expect there are such models available.

      If you discover some, please let me know.

  5. Sam Ranade November 10, 2017 at 7:46 am #

    Thank you Jason,
    Someday can you take time to write about training VGG for objects not belonging to original 1000 classes (Imagenet dataset) but completely new 2000 classes. I am specially interested in training times for starting from scratch and training times for fine-tuning. Do the no_top weights reduce training time much?
    Once again thank you for the post

    • Jason Brownlee November 10, 2017 at 10:43 am #

      Great suggestion, thanks Sam. I hope to.

      Yes, the layers just before the output layer do contain valuable info! I have tested this on some image captioning examples.

  6. Adel November 10, 2017 at 9:15 am #

    Thank you Jason for the wonderful article. We really hope you a post on Object Detection stuff like SSD (Single Shot Multibox Detector ) for standard data and custome data or semantic segmentation stuff like FCN or U-Net that will be very cool.

  7. Reza November 11, 2017 at 1:14 am #

    Many thanks for That.

  8. krisna November 17, 2017 at 1:29 am #

    i’m still confused , can i change the image dataset and train it with VGG ?
    thanks

    • Jason Brownlee November 17, 2017 at 9:27 am #

      Sorry, I don’t follow, perhaps you can restate your question?

  9. Jeff November 21, 2017 at 8:38 am #

    Hello Jason,

    I am now learning Deep learning and your Website is a treasure trove for that.
    Thank you so much.

    I just finished „How to use pre-trained VGG model to Classify objects in Photographs which was very useful.
    Keras + VGG16 are really super helpful at classifying Images.
    Your write-up makes it easy to learn. A world of thanks.

    I would like to know what tool I can use to perform Medical Image Analysis.
    Any specific library that would help me to Analyse Medical Images? VGG could not.

    Your Response would be highly appreciated.

    • Jason Brownlee November 22, 2017 at 10:46 am #

      Sorry, I don’t have experience in that domain. I cannot give you specific advice.

  10. Hung Manh Nguyen November 30, 2017 at 4:41 am #

    Can you help me with where to save the “mug.jpg”.
    I’ve tried to save it in some directory but it always returns the following error.

    FileNotFoundError: [Errno 2] No such file or directory: ‘mug.jpg’

    Thank you very much!!

    • Jason Brownlee November 30, 2017 at 8:25 am #

      In the same directory as the code file, and run the code from the command line to avoid any issues from IDEs and notebooks.

  11. Leo December 14, 2017 at 12:48 pm #

    Hi Jason,

    Thanks for the sharing. I want to know if VGG16 model can identify different objects in an image and then extract features of each object, or is there any way to do this through Keras library?

  12. Sasikanth December 15, 2017 at 2:52 am #

    Hello Jason,
    Is there a similar package in R language?

  13. Bastien M January 13, 2018 at 1:35 am #

    Is there a way to use a format different than 224×224 ?
    The only example I found is here: https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/5.3-using-a-pretrained-convnet.ipynb

    Where basically we need to add another level on top of the model and use a custom classifier.
    I guess that since the model was trained for 224×224 image it would not work as it is with different size, am I right ?

  14. Moses Wong January 25, 2018 at 6:17 am #

    Simple yet works well with the 20 test image files I provided to this program! Great job! Thank you !

  15. Moses Wong January 25, 2018 at 6:19 am #

    Grateful if you could also point out how to expand the VGG16 into actual Keras or Tensorflow code so learner can modify the code on their own for training, inference, or transfer learning purpose.

  16. Namrata Nayak April 10, 2018 at 7:12 pm #

    What all classes of images are feed into the VGG model which is predicting objects?
    How can we see that?

    • Jason Brownlee April 11, 2018 at 6:36 am #

      Good question, there may be a way.

      Off the cuff, one way would be to enumerate all inputs to decode_predictions()

  17. SATYAJIT PATTNAIK April 10, 2018 at 7:29 pm #

    Hi Jason,

    I have a similar question like Namrata, if i want to train my VGG model with some new classes, how can i do that?

  18. SATYAJIT PATTNAIK April 11, 2018 at 8:43 pm #

    @Jason,

    The link you have given shows the list of classes being trained in the VGG model, my question was, can we write our own VGG model and provide the classes?

    If there’s any link or a way to do it, please let me know

    • Jason Brownlee April 12, 2018 at 8:40 am #

      I do not have a link for this.

      Perhaps you can look at the Keras code and adapt an existing example in the API for your use case?

  19. dsds April 29, 2018 at 7:34 am #

    Thanks for all efforts. U make dreams come true for researchers 🙂

  20. yuri May 2, 2018 at 5:13 am #

    Thanks for this great post.
    I am new on deep learning. I have a question that can the model provide exact position of the object so we can put a bunding box on it? And can vgg16 model detect several objects in one image and give thier positions?

    • Jason Brownlee May 2, 2018 at 5:46 am #

      It can, it is called object localization and requires more than just a VGG type model. Sorry, I don’t have a worked example.

    • Claire October 30, 2018 at 3:48 am #

      Hello Yuri,
      I am dealing with the same question than you, did you make progresses on your research?

  21. K.Choi May 9, 2018 at 6:57 pm #

    Thank you for all your kind demonstration. However, I wonder how to use pre-trained VGG net to classify my grayscale images, because number of channels of images for VGG net is 3, not 1. Can I change the number of channels of images for VGG net? for example, 2?

    • Jason Brownlee May 10, 2018 at 6:27 am #

      Great question!

      Perhaps cut off the input layers for the model and train new input layers that expect 1 channel.

  22. Sayan May 12, 2018 at 2:18 am #

    Awsome , Superb Work! Appreciate that.

  23. Yassine May 13, 2018 at 10:06 am #

    Thanks sir for this tutorial, please can i use the vgg16 to classify some images belonging to a specific domain and does not exists in the ImageNet database.

  24. Anirban Ghosh May 25, 2018 at 11:46 pm #

    Sir,
    I am a regular reader of your blog. I have read your work, like it. Furthur, in this example of your’s I could see you fed the picture to the network. I am also a fan of Dr.Adrian’s work, I was reading about transfer learning, where we removed the FC layers at the end and passed in a logistic regression there to classify a dataset (say Caltech 101) where we could get 98% accuracy. The vgg16 is trained on Imagenet but transfer learning allows us to use it on Caltech 101.
    Thank you guys are teaching incredible things to us mortals. One request can you please show a similar example of transfer learning using pre trained word embedding like GloVe or wordnet to detect sentiment in a movie review.

    • Jason Brownlee May 26, 2018 at 5:59 am #

      Thanks.

      I give examples of reusing word embeddings, search the blog. or Check my NLP book.

      • Anirban Ghosh May 26, 2018 at 2:05 pm #

        Yes, I know you have included them in your book on NLP, using a CNN and word embedding to classify the sentiments, I have implemented it too. Anyways thanks for replying.

        Regards,

        Anirban Ghosh.

  25. Vineeth June 11, 2018 at 8:46 pm #

    Hey Hi,
    thanks for the article but I have a doubt,
    The last layer in the network is a softmax layer and we have 1000 neurons in the fully connected layer before this layer right? so we can use this for classification of 1000 objects.
    What my doubt is that, is this 1000 fixed for all vgg networks even though we are trying to classify only a few say 100( some number less than 1000) or this number (number of neurons in the last fully connected layer) depends on the number of classifications we are trying to address.

    • Jason Brownlee June 12, 2018 at 6:40 am #

      The prediction is a softmax over 1000 neurons.

      It is fixed at 1000, but you can re-fit the network on a different dataset with more/less classes if you wish.

      • Vineeth June 12, 2018 at 2:52 pm #

        Ok, so as I said if we want to predict 100 classes, we still will have 1000 neurons but only 100 of them will be used for classification. Is that what you meant? If so what happens to the other 900 neurons, can softmax layer work that way, using only some neurons out of all the available ones?
        sorry if this seems so basic, I just started working with deep learning and these things confuse a bit. thanks

        • Jason Brownlee June 13, 2018 at 6:13 am #

          If you have 100 classes, you have 100 nodes in the output layer, not 1000.

          • Vineeth June 13, 2018 at 2:44 pm #

            got it! Thanks for the reply

  26. JG June 17, 2018 at 9:56 pm #

    Thank you very much Mr. Jason Brownlee ! You are doing a great job ! I have been following some of yours machine learning mastery “How to …” , “Intro..” . I am very impressive how you approach, outreach and advance some of the “hot and trending” topics of Deep Learning…explaining them is plain text (including basic Python concepts, and of course Keras API, tensorflow Library, …)
    To me the main issue is your capability to communicate the WHOLE SOLUTIONS covering everything in between of the problem starting, with math or Deep Learning intuitions concepts, following by programming language, operative ideas of libraries modules used, references list , etc. And finally but not least providing an operative code to start experimenting by ourselves all the concepts introduced by you.

    Many thanks for your really great mastery work , from JG !!

  27. Zeyu July 11, 2018 at 12:08 am #

    I wander what I should do if I would like to train my own dataset to get a new weights based on the VGG model, and do prediction on the new weights

    • Jason Brownlee July 11, 2018 at 5:59 am #

      Keep the whole VGG model fixed and only train some new output weights to interest the vgg output.

  28. Vikas July 23, 2018 at 4:45 am #

    Hi, can you help me localization of an object suppose number plate in an image. I know YOLO and Faster-RCNN can be used for this. But i am facing problem in implementing Region proposals using Anchor boxes. could you please suggest something?

  29. JG July 24, 2018 at 4:48 am #

    One more time Mr. Jason Brownlee thank you very much for your VGG16 Keras apps introduction, I think your code and explanation it is perfect (at least for my level) before diving into deeper waters, such as building your own models on Keras. I like the way you structure your pieces of codes before running the full system. I appreciate your “free” job for all of us . You do a lot of appreciable things for our Machine Learning community!!. I wish you a long running on these matters !

  30. Fork Esther July 25, 2018 at 12:23 am #

    Hi Jason,
    Your blog is the best for machine learning!
    I have a question regarding the performance of VGG.
    For coffee mug, it is exactly detecting the object.
    But I tried a very obvious snake picture (https://reikiserpent.files.wordpress.com/2013/03/snakes-guam.jpg); however the results are not that promising:

    [[(‘n01833805’, ‘hummingbird’, 0.22024027),
    (‘n01665541’, ‘leatherback_turtle’, 0.10800469),
    (‘n01664065’, ‘loggerhead’, 0.088614523),
    (‘n02641379’, ‘gar’, 0.083981715),
    (‘n01496331’, ‘electric_ray’, 0.061437886)]]

    Knowing that VGG is performing very well, is there any way to improve the model results (maybe some fine tuning?) without retraining the model?

    Thanks a lot,

  31. Fork Esther July 27, 2018 at 6:34 am #

    I tried ResNet as well, but results are still far from reality.

    • Jason Brownlee July 27, 2018 at 11:03 am #

      I guess the test images will have to be much like the images used to train the model, e.g. imagenet.

  32. AMM August 10, 2018 at 5:43 pm #

    hi sir thank you for this tutorial
    I noticed some places using vgg16 but they input images of different sizes and aspect ratio such as 192×99 or 69×81 and more other and i can’t understand how they get the output, can vgg16 take image with size other than 224×224 without resize it and what is the result will be? Thank you.

    • Jason Brownlee August 11, 2018 at 6:07 am #

      Perhaps resize the image?
      Perhaps change the input shape of the network to be much larger and zero-pad smaller images?

  33. Maryam September 13, 2018 at 1:20 am #

    Hello,
    I tried to change the type of vgg16 to sequential, but, after changing it removes the input layer.
    I don’t know why. how can I fix it?

    thanks

  34. Tin September 21, 2018 at 11:26 am #

    Hi Jason,

    I like it very much and am wondering any following ups for the fine-tune VGG?

    • Jason Brownlee September 21, 2018 at 2:19 pm #

      Thanks.

      Great question!

      Small and decaying learning rate and early stopping would be a good start.

  35. Aksasse hamid October 16, 2018 at 5:21 am #

    Thank you very much for this great work. I wonder is it possible to use this model (VGG16) in order to be able to classify daily activities.

  36. Foxrol November 20, 2018 at 2:12 am #

    Thank you Jason ! I’m speechless

  37. Nagabhushan S N November 20, 2018 at 4:16 pm #

    Hi,
    I’ve already downloaded the vgg19.npy model. Is it possible to load from this directly instead of downloading again?

    • Jason Brownlee November 21, 2018 at 7:47 am #

      Perhaps, I don’t have an example of loading the model manually, sorry.

  38. Ebtihal November 24, 2018 at 8:53 pm #

    Thank you so much for this valuable post. Really helpful.

    I have question please,
    How can I retrieve the index position of top n probabilities

    for example, the prediction vector of the mug will produce a vector with 1000* 1 which contains the probabilities values for each class.

    lets say that the probabilities are :
    [.1
    .2
    .3 (top 1)
    .001
    .002
    .25(top2)
    .24 (top3)
    .1
    .01



    ..
    etc}

    I want to retrieve the position/index in which the top 3 probabilities are located.
    in previous example, I want to retrieve the position of
    .3 (top 1)
    and
    .25(top2)
    and .24 (top3)

    which is [2,5,6]

    Thank you .

  39. Tapan Kumar November 28, 2018 at 11:36 pm #

    Hi, Guys Thanks for this awesome tutorial. Do You guys have any tutorial on How To train with our own images..(Custom Classifier) with whatever architecture you are following now. So Please let me know. Thanks for the help.

    • Jason Brownlee November 29, 2018 at 7:42 am #

      Sure, you can load your images and perhaps use transfer learning with a VGG model as a starting point.

Leave a Reply