SALE! Use code blackfriday for 40% off everything!
Hurry, sale ends soon! Click to see the full catalog.

How to Use The Pre-Trained VGG Model to Classify Objects in Photographs

Convolutional neural networks are now capable of outperforming humans on some computer vision tasks, such as classifying images.

That is, given a photograph of an object, answer the question as to which of 1,000 specific objects the photograph shows.

A competition-winning model for this task is the VGG model by researchers at Oxford. What is important about this model, besides its capability of classifying objects in photographs, is that the model weights are freely available and can be loaded and used in your own models and applications.

In this tutorial, you will discover the VGG convolutional neural network models for image classification.

After completing this tutorial, you will know:

  • About the ImageNet dataset and competition and the VGG winning models.
  • How to load the VGG model in Keras and summarize its structure.
  • How to use the loaded VGG model to classifying objects in ad hoc photographs.

Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Tutorial Overview

This tutorial is divided into 4 parts; they are:

  1. ImageNet
  2. The Oxford VGG Models
  3. Load the VGG Model in Keras
  4. Develop a Simple Photo Classifier

ImageNet

ImageNet is a research project to develop a large database of images with annotations, e.g. images and their descriptions.

The images and their annotations have been the basis for an image classification challenge called the ImageNet Large Scale Visual Recognition Challenge or ILSVRC since 2010. The result is that research organizations battle it out on pre-defined datasets to see who has the best model for classifying the objects in images.

The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions.

ImageNet Large Scale Visual Recognition Challenge, 2015.

For the classification task, images must be classified into one of 1,000 different categories.

For the last few years very deep convolutional neural network models have been used to win these challenges and results on the tasks have exceeded human performance.

Sample of Images from the ImageNet Dataset used in the ILSVRC Challenge

Sample of Images from the ImageNet Dataset used in the ILSVRC Challenge
Taken From “ImageNet Large Scale Visual Recognition Challenge”, 2015.

The Oxford VGG Models

Researchers from the Oxford Visual Geometry Group, or VGG for short, participate in the ILSVRC challenge.

In 2014, convolutional neural network models (CNN) developed by the VGG won the image classification tasks.

ILSVRC Results in 2014 for the Classification task

ILSVRC Results in 2014 for the Classification task

After the competition, the participants wrote up their findings in the paper:

They also made their models and learned weights available online.

This allowed other researchers and developers to use a state-of-the-art image classification model in their own work and programs.

This helped to fuel a rash of transfer learning work where pre-trained models are used with minor modification on wholly new predictive modeling tasks, harnessing the state-of-the-art feature extraction capabilities of proven models.

… we come up with significantly more accurate ConvNet architectures, which not only achieve the state-of-the-art accuracy on ILSVRC classification and localisation tasks, but are also applicable to other image recognition datasets, where they achieve excellent performance even when used as a part of a relatively simple pipelines (e.g. deep features classified by a linear SVM without fine-tuning). We have released our two best-performing models to facilitate further research.

Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014.

VGG released two different CNN models, specifically a 16-layer model and a 19-layer model.

Refer to the paper for the full details of these models.

The VGG models are not longer state-of-the-art by only a few percentage points. Nevertheless, they are very powerful models and useful both as image classifiers and as the basis for new models that use image inputs.

In the next section, we will see how we can use the VGG model directly in Keras.

Load the VGG Model in Keras

The VGG model can be loaded and used in the Keras deep learning library.

Keras provides an Applications interface for loading and using pre-trained models.

Using this interface, you can create a VGG model using the pre-trained weights provided by the Oxford group and use it as a starting point in your own model, or use it as a model directly for classifying images.

In this tutorial, we will focus on the use case of classifying new images using the VGG model.

Keras provides both the 16-layer and 19-layer version via the VGG16 and VGG19 classes. Let’s focus on the VGG16 model.

The model can be created as follows:

That’s it.

The first time you run this example, Keras will download the weight files from the Internet and store them in the ~/.keras/models directory.

Note that the weights are about 528 megabytes, so the download may take a few minutes depending on the speed of your Internet connection.

The weights are only downloaded once. The next time you run the example, the weights are loaded locally and the model should be ready to use in seconds.

We can use the standard Keras tools for inspecting the model structure.

For example, you can print a summary of the network layers as follows:

You can see that the model is huge.

You can also see that, by default, the model expects images as input with the size 224 x 224 pixels with 3 channels (e.g. color).

We can also create a plot of the layers in the VGG model, as follows:

Again, because the model is large, the plot is a little too large and perhaps unreadable. Nevertheless, it is provided below.

Plot of Layers in the VGG Model

Plot of Layers in the VGG Model

The VGG() class takes a few arguments that may only interest you if you are looking to use the model in your own project, e.g. for transfer learning.

For example:

  • include_top (True): Whether or not to include the output layers for the model. You don’t need these if you are fitting the model on your own problem.
  • weights (‘imagenet‘): What weights to load. You can specify None to not load pre-trained weights if you are interested in training the model yourself from scratch.
  • input_tensor (None): A new input layer if you intend to fit the model on new data of a different size.
  • input_shape (None): The size of images that the model is expected to take if you change the input layer.
  • pooling (None): The type of pooling to use when you are training a new set of output layers.
  • classes (1000): The number of classes (e.g. size of output vector) for the model.

Next, let’s look at using the loaded VGG model to classify ad hoc photographs.

Develop a Simple Photo Classifier

Let’s develop a simple image classification script.

1. Get a Sample Image

First, we need an image we can classify.

You can download a random photograph of a coffee mug from Flickr here.

Coffee Mug

Coffee Mug
Photo by jfanaian, some rights reserved.

Download the image and save it to your current working directory with the filename ‘mug.jpg‘.

2. Load the VGG Model

Load the weights for the VGG-16 model, as we did in the previous section.

3. Load and Prepare Image

Next, we can load the image as pixel data and prepare it to be presented to the network.

Keras provides some tools to help with this step.

First, we can use the load_img() function to load the image and resize it to the required size of 224×224 pixels.

Next, we can convert the pixels to a NumPy array so that we can work with it in Keras. We can use the img_to_array() function for this.

The network expects one or more images as input; that means the input array will need to be 4-dimensional: samples, rows, columns, and channels.

We only have one sample (one image). We can reshape the array by calling reshape() and adding the extra dimension.

Next, the image pixels need to be prepared in the same way as the ImageNet training data was prepared. Specifically, from the paper:

The only preprocessing we do is subtracting the mean RGB value, computed on the training set, from each pixel.

Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014.

Keras provides a function called preprocess_input() to prepare new input for the network.

We are now ready to make a prediction for our loaded and prepared image.

4. Make a Prediction

We can call the predict() function on the model in order to get a prediction of the probability of the image belonging to each of the 1000 known object types.

Nearly there, now we need to interpret the probabilities.

5. Interpret Prediction

Keras provides a function to interpret the probabilities called decode_predictions().

It can return a list of classes and their probabilities in case you would like to present the top 3 objects that may be in the photo.

We will just report the first most likely object.

And that’s it.

Complete Example

Tying all of this together, the complete example is listed below:

Running the example, we can see that the image is correctly classified as a “coffee mug” with a 75% likelihood.

Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

  • Create a Function. Update the example and add a function that given an image filename and the loaded model will return the classification result.
  • Command Line Tool. Update the example so that given an image filename on the command line, the program will report the classification for the image.
  • Report Multiple Classes. Update the example to report the top 5 most likely classes for a given image and their probabilities.

Further Reading

This section provides more resources on the topic if you are looking go deeper.

Summary

In this tutorial, you discovered the VGG convolutional neural network models for image classification.

Specifically, you learned:

  • About the ImageNet dataset and competition and the VGG winning models.
  • How to load the VGG model in Keras and summarize its structure.
  • How to use the loaded VGG model to classifying objects in ad hoc photographs.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

205 Responses to How to Use The Pre-Trained VGG Model to Classify Objects in Photographs

  1. Thabet November 8, 2017 at 4:56 pm #

    Thank you Jason !

    • Jason Brownlee November 9, 2017 at 9:53 am #

      You’re welcome.

      • Yacine April 18, 2018 at 9:47 pm #

        hi, i’m a PHD researcher, i want to applay this method on a desertic zone to detect dunes, is it possible?

      • Aldiak November 2, 2019 at 12:59 am #

        Hello, I am new in this area and for my master thesis I need to work on plant leaf diseases detection using a new neural network architecture without enough code that is what my supervisor told me so if you can give an hint because I am a bit loss. thanks

      • wilson June 2, 2020 at 4:51 am #

        Hey Jason Brownlee could you help me with a deep learning project on explicit content detection?
        this is my twitter handle: wilson_exex
        i really need help to do this project

        • Jason Brownlee June 2, 2020 at 6:22 am #

          I don’t have the capacity to join your project, sorry.

  2. Alexander Kireev November 8, 2017 at 6:04 pm #

    Thank you, Jason. Very interest work.
    From this point we continue our journey toward Computer vision?
    If it is possible, tell us please in future works about Region of interest technique. It is difficult to understand for beginner, but very useful in practice.

    • Jason Brownlee November 9, 2017 at 9:54 am #

      Great suggestion, thanks Alexander.

      For the next few months the focus will be NLP with posts related to my new book on the topic.

  3. Gerrit Govaerts November 8, 2017 at 7:09 pm #

    I don’t want to crash your party , but…

    http://www.bbc.com/news/technology-41845878

    • Jason Brownlee November 9, 2017 at 9:56 am #

      Yes I saw that.

      We are still making impressive progress and achieving amazing results we could not dream of 10 years ago.

  4. Ritika November 10, 2017 at 4:51 am #

    Thank you Jason for the wonderful article can you please suggest which pretrained model can be used for for recognizing individual alphabets and digits.

    • Jason Brownlee November 10, 2017 at 10:40 am #

      Good question, I am not sure off the cuff, perhaps try a google search. I expect there are such models available.

      If you discover some, please let me know.

  5. Sam Ranade November 10, 2017 at 7:46 am #

    Thank you Jason,
    Someday can you take time to write about training VGG for objects not belonging to original 1000 classes (Imagenet dataset) but completely new 2000 classes. I am specially interested in training times for starting from scratch and training times for fine-tuning. Do the no_top weights reduce training time much?
    Once again thank you for the post

    • Jason Brownlee November 10, 2017 at 10:43 am #

      Great suggestion, thanks Sam. I hope to.

      Yes, the layers just before the output layer do contain valuable info! I have tested this on some image captioning examples.

  6. Adel November 10, 2017 at 9:15 am #

    Thank you Jason for the wonderful article. We really hope you a post on Object Detection stuff like SSD (Single Shot Multibox Detector ) for standard data and custome data or semantic segmentation stuff like FCN or U-Net that will be very cool.

  7. Reza November 11, 2017 at 1:14 am #

    Many thanks for That.

  8. krisna November 17, 2017 at 1:29 am #

    i’m still confused , can i change the image dataset and train it with VGG ?
    thanks

    • Jason Brownlee November 17, 2017 at 9:27 am #

      Sorry, I don’t follow, perhaps you can restate your question?

  9. Jeff November 21, 2017 at 8:38 am #

    Hello Jason,

    I am now learning Deep learning and your Website is a treasure trove for that.
    Thank you so much.

    I just finished „How to use pre-trained VGG model to Classify objects in Photographs which was very useful.
    Keras + VGG16 are really super helpful at classifying Images.
    Your write-up makes it easy to learn. A world of thanks.

    I would like to know what tool I can use to perform Medical Image Analysis.
    Any specific library that would help me to Analyse Medical Images? VGG could not.

    Your Response would be highly appreciated.

    • Jason Brownlee November 22, 2017 at 10:46 am #

      Sorry, I don’t have experience in that domain. I cannot give you specific advice.

  10. Hung Manh Nguyen November 30, 2017 at 4:41 am #

    Can you help me with where to save the “mug.jpg”.
    I’ve tried to save it in some directory but it always returns the following error.

    FileNotFoundError: [Errno 2] No such file or directory: ‘mug.jpg’

    Thank you very much!!

    • Jason Brownlee November 30, 2017 at 8:25 am #

      In the same directory as the code file, and run the code from the command line to avoid any issues from IDEs and notebooks.

  11. Leo December 14, 2017 at 12:48 pm #

    Hi Jason,

    Thanks for the sharing. I want to know if VGG16 model can identify different objects in an image and then extract features of each object, or is there any way to do this through Keras library?

  12. Sasikanth December 15, 2017 at 2:52 am #

    Hello Jason,
    Is there a similar package in R language?

  13. Bastien M January 13, 2018 at 1:35 am #

    Is there a way to use a format different than 224×224 ?
    The only example I found is here: https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/5.3-using-a-pretrained-convnet.ipynb

    Where basically we need to add another level on top of the model and use a custom classifier.
    I guess that since the model was trained for 224×224 image it would not work as it is with different size, am I right ?

    • Jason Brownlee January 13, 2018 at 5:34 am #

      Yes, you would need to train a new front end.

    • Aditya July 6, 2019 at 12:54 pm #

      Hi Jason, I was trying to use the VGG16 model from kera, but I have a serious problem with that. Whenever I do
      Vgg_model = VGG16() my computer just freezes with this warning

      tensorflow/core/framework/allocator.cc.124 allocation of 449576960 exceeds 10% of system memory

      I am currently using a 64 bit, 4gb ram linux mint 18 os.
      I don’t have access to any king of GPU.
      I think this problem has to do something with my limited ram?
      Regards, aditya

      • Jason Brownlee July 7, 2019 at 7:47 am #

        It might be because of limited RAM.

        Perhaps try on another machine or on an EC2 instance?

  14. Moses Wong January 25, 2018 at 6:17 am #

    Simple yet works well with the 20 test image files I provided to this program! Great job! Thank you !

  15. Moses Wong January 25, 2018 at 6:19 am #

    Grateful if you could also point out how to expand the VGG16 into actual Keras or Tensorflow code so learner can modify the code on their own for training, inference, or transfer learning purpose.

    • Jason Brownlee January 25, 2018 at 9:08 am #

      Great suggestion, thanks.

      • Arfi June 19, 2021 at 4:06 am #

        Hello
        I need a suggestion on
        CSV file data set with some image dataset.
        It has 6 columns each columns have value like (1,0,-1)
        I want to use VGG16 and get a multilevel classification.
        How to deal with the problem, any idea or suggestions or paper will be much helpful.
        Thanks in Advance.

        • Jason Brownlee June 19, 2021 at 5:56 am #

          Generally images are not stored in CSV format, they are stored in a binary image format like JPEG or PNG.

  16. Namrata Nayak April 10, 2018 at 7:12 pm #

    What all classes of images are feed into the VGG model which is predicting objects?
    How can we see that?

    • Jason Brownlee April 11, 2018 at 6:36 am #

      Good question, there may be a way.

      Off the cuff, one way would be to enumerate all inputs to decode_predictions()

  17. SATYAJIT PATTNAIK April 10, 2018 at 7:29 pm #

    Hi Jason,

    I have a similar question like Namrata, if i want to train my VGG model with some new classes, how can i do that?

  18. SATYAJIT PATTNAIK April 11, 2018 at 8:43 pm #

    @Jason,

    The link you have given shows the list of classes being trained in the VGG model, my question was, can we write our own VGG model and provide the classes?

    If there’s any link or a way to do it, please let me know

    • Jason Brownlee April 12, 2018 at 8:40 am #

      I do not have a link for this.

      Perhaps you can look at the Keras code and adapt an existing example in the API for your use case?

  19. dsds April 29, 2018 at 7:34 am #

    Thanks for all efforts. U make dreams come true for researchers 🙂

  20. yuri May 2, 2018 at 5:13 am #

    Thanks for this great post.
    I am new on deep learning. I have a question that can the model provide exact position of the object so we can put a bunding box on it? And can vgg16 model detect several objects in one image and give thier positions?

    • Jason Brownlee May 2, 2018 at 5:46 am #

      It can, it is called object localization and requires more than just a VGG type model. Sorry, I don’t have a worked example.

    • Claire October 30, 2018 at 3:48 am #

      Hello Yuri,
      I am dealing with the same question than you, did you make progresses on your research?

  21. K.Choi May 9, 2018 at 6:57 pm #

    Thank you for all your kind demonstration. However, I wonder how to use pre-trained VGG net to classify my grayscale images, because number of channels of images for VGG net is 3, not 1. Can I change the number of channels of images for VGG net? for example, 2?

    • Jason Brownlee May 10, 2018 at 6:27 am #

      Great question!

      Perhaps cut off the input layers for the model and train new input layers that expect 1 channel.

  22. Sayan May 12, 2018 at 2:18 am #

    Awsome , Superb Work! Appreciate that.

  23. Yassine May 13, 2018 at 10:06 am #

    Thanks sir for this tutorial, please can i use the vgg16 to classify some images belonging to a specific domain and does not exists in the ImageNet database.

  24. Anirban Ghosh May 25, 2018 at 11:46 pm #

    Sir,
    I am a regular reader of your blog. I have read your work, like it. Furthur, in this example of your’s I could see you fed the picture to the network. I am also a fan of Dr.Adrian’s work, I was reading about transfer learning, where we removed the FC layers at the end and passed in a logistic regression there to classify a dataset (say Caltech 101) where we could get 98% accuracy. The vgg16 is trained on Imagenet but transfer learning allows us to use it on Caltech 101.
    Thank you guys are teaching incredible things to us mortals. One request can you please show a similar example of transfer learning using pre trained word embedding like GloVe or wordnet to detect sentiment in a movie review.

    • Jason Brownlee May 26, 2018 at 5:59 am #

      Thanks.

      I give examples of reusing word embeddings, search the blog. or Check my NLP book.

      • Anirban Ghosh May 26, 2018 at 2:05 pm #

        Yes, I know you have included them in your book on NLP, using a CNN and word embedding to classify the sentiments, I have implemented it too. Anyways thanks for replying.

        Regards,

        Anirban Ghosh.

  25. Vineeth June 11, 2018 at 8:46 pm #

    Hey Hi,
    thanks for the article but I have a doubt,
    The last layer in the network is a softmax layer and we have 1000 neurons in the fully connected layer before this layer right? so we can use this for classification of 1000 objects.
    What my doubt is that, is this 1000 fixed for all vgg networks even though we are trying to classify only a few say 100( some number less than 1000) or this number (number of neurons in the last fully connected layer) depends on the number of classifications we are trying to address.

    • Jason Brownlee June 12, 2018 at 6:40 am #

      The prediction is a softmax over 1000 neurons.

      It is fixed at 1000, but you can re-fit the network on a different dataset with more/less classes if you wish.

      • Vineeth June 12, 2018 at 2:52 pm #

        Ok, so as I said if we want to predict 100 classes, we still will have 1000 neurons but only 100 of them will be used for classification. Is that what you meant? If so what happens to the other 900 neurons, can softmax layer work that way, using only some neurons out of all the available ones?
        sorry if this seems so basic, I just started working with deep learning and these things confuse a bit. thanks

        • Jason Brownlee June 13, 2018 at 6:13 am #

          If you have 100 classes, you have 100 nodes in the output layer, not 1000.

          • Vineeth June 13, 2018 at 2:44 pm #

            got it! Thanks for the reply

  26. JG June 17, 2018 at 9:56 pm #

    Thank you very much Mr. Jason Brownlee ! You are doing a great job ! I have been following some of yours machine learning mastery “How to …” , “Intro..” . I am very impressive how you approach, outreach and advance some of the “hot and trending” topics of Deep Learning…explaining them is plain text (including basic Python concepts, and of course Keras API, tensorflow Library, …)
    To me the main issue is your capability to communicate the WHOLE SOLUTIONS covering everything in between of the problem starting, with math or Deep Learning intuitions concepts, following by programming language, operative ideas of libraries modules used, references list , etc. And finally but not least providing an operative code to start experimenting by ourselves all the concepts introduced by you.

    Many thanks for your really great mastery work , from JG !!

  27. Zeyu July 11, 2018 at 12:08 am #

    I wander what I should do if I would like to train my own dataset to get a new weights based on the VGG model, and do prediction on the new weights

    • Jason Brownlee July 11, 2018 at 5:59 am #

      Keep the whole VGG model fixed and only train some new output weights to interest the vgg output.

  28. Vikas July 23, 2018 at 4:45 am #

    Hi, can you help me localization of an object suppose number plate in an image. I know YOLO and Faster-RCNN can be used for this. But i am facing problem in implementing Region proposals using Anchor boxes. could you please suggest something?

  29. JG July 24, 2018 at 4:48 am #

    One more time Mr. Jason Brownlee thank you very much for your VGG16 Keras apps introduction, I think your code and explanation it is perfect (at least for my level) before diving into deeper waters, such as building your own models on Keras. I like the way you structure your pieces of codes before running the full system. I appreciate your “free” job for all of us . You do a lot of appreciable things for our Machine Learning community!!. I wish you a long running on these matters !

  30. Fork Esther July 25, 2018 at 12:23 am #

    Hi Jason,
    Your blog is the best for machine learning!
    I have a question regarding the performance of VGG.
    For coffee mug, it is exactly detecting the object.
    But I tried a very obvious snake picture (https://reikiserpent.files.wordpress.com/2013/03/snakes-guam.jpg); however the results are not that promising:

    [[(‘n01833805’, ‘hummingbird’, 0.22024027),
    (‘n01665541’, ‘leatherback_turtle’, 0.10800469),
    (‘n01664065’, ‘loggerhead’, 0.088614523),
    (‘n02641379’, ‘gar’, 0.083981715),
    (‘n01496331’, ‘electric_ray’, 0.061437886)]]

    Knowing that VGG is performing very well, is there any way to improve the model results (maybe some fine tuning?) without retraining the model?

    Thanks a lot,

  31. Fork Esther July 27, 2018 at 6:34 am #

    I tried ResNet as well, but results are still far from reality.

    • Jason Brownlee July 27, 2018 at 11:03 am #

      I guess the test images will have to be much like the images used to train the model, e.g. imagenet.

  32. AMM August 10, 2018 at 5:43 pm #

    hi sir thank you for this tutorial
    I noticed some places using vgg16 but they input images of different sizes and aspect ratio such as 192×99 or 69×81 and more other and i can’t understand how they get the output, can vgg16 take image with size other than 224×224 without resize it and what is the result will be? Thank you.

    • Jason Brownlee August 11, 2018 at 6:07 am #

      Perhaps resize the image?
      Perhaps change the input shape of the network to be much larger and zero-pad smaller images?

  33. Maryam September 13, 2018 at 1:20 am #

    Hello,
    I tried to change the type of vgg16 to sequential, but, after changing it removes the input layer.
    I don’t know why. how can I fix it?

    thanks

  34. Tin September 21, 2018 at 11:26 am #

    Hi Jason,

    I like it very much and am wondering any following ups for the fine-tune VGG?

    • Jason Brownlee September 21, 2018 at 2:19 pm #

      Thanks.

      Great question!

      Small and decaying learning rate and early stopping would be a good start.

  35. Aksasse hamid October 16, 2018 at 5:21 am #

    Thank you very much for this great work. I wonder is it possible to use this model (VGG16) in order to be able to classify daily activities.

  36. Foxrol November 20, 2018 at 2:12 am #

    Thank you Jason ! I’m speechless

  37. Nagabhushan S N November 20, 2018 at 4:16 pm #

    Hi,
    I’ve already downloaded the vgg19.npy model. Is it possible to load from this directly instead of downloading again?

    • Jason Brownlee November 21, 2018 at 7:47 am #

      Perhaps, I don’t have an example of loading the model manually, sorry.

  38. Ebtihal November 24, 2018 at 8:53 pm #

    Thank you so much for this valuable post. Really helpful.

    I have question please,
    How can I retrieve the index position of top n probabilities

    for example, the prediction vector of the mug will produce a vector with 1000* 1 which contains the probabilities values for each class.

    lets say that the probabilities are :
    [.1
    .2
    .3 (top 1)
    .001
    .002
    .25(top2)
    .24 (top3)
    .1
    .01



    ..
    etc}

    I want to retrieve the position/index in which the top 3 probabilities are located.
    in previous example, I want to retrieve the position of
    .3 (top 1)
    and
    .25(top2)
    and .24 (top3)

    which is [2,5,6]

    Thank you .

  39. Tapan Kumar November 28, 2018 at 11:36 pm #

    Hi, Guys Thanks for this awesome tutorial. Do You guys have any tutorial on How To train with our own images..(Custom Classifier) with whatever architecture you are following now. So Please let me know. Thanks for the help.

    • Jason Brownlee November 29, 2018 at 7:42 am #

      Sure, you can load your images and perhaps use transfer learning with a VGG model as a starting point.

  40. Anam January 13, 2019 at 9:34 pm #

    Dear Jason,
    Very helpful post.Also,i have a question that i want to use a pretrained model with different input shape.For example the input of pretained model is (None, 3661, 128) and the input shape of new dataset which i am applying on pretrained model is (None, 900, 165).So, i want to know how to set the input shape of pretrained model for the new dataset because i am getting an error:
    “ValueError: “input_length” is 3661, but received input has shape (None, 900, 165)”.
    Thanx in advance

    • Jason Brownlee January 14, 2019 at 5:28 am #

      You can add a new hidden layer after the new input layer and only train the weights of this new layer.

      Or resize inputs to meet the old model.

  41. Anam January 16, 2019 at 2:20 am #

    Dear Jason, I want to know that the pre-trained models (used for transfer learning) also contain the testing phase or it only contain the training phase? In other words is the pretrained model contain both the training and testing phase or only the training phase?
    Thanks in advance.

    • Jason Brownlee January 16, 2019 at 5:49 am #

      They are used like any other model, e.g. fine tuning/training then testing/evaluation.

      • Anam January 16, 2019 at 3:03 pm #

        I can’t understand your point. Kindly can you explain it more. Thanx for your response.

        • Jason Brownlee January 17, 2019 at 5:21 am #

          Which part?

        • Busayo Olukunle April 24, 2019 at 3:50 pm #

          Hi Anam, here’s a brief explanation to your question. The network (VGG16) had been trained and tested before being deployed as a model, so, there’s no need talking about training and test sets again. When you feed in an image to be classified, all you’re doing is using a pre-trained model to do your classification. I hope this helps, otherwise, let me know if you need further clarification. @Jason Brownlee is doing a great job!!

          • Jason Brownlee April 25, 2019 at 8:08 am #

            Great explanation.

          • Afreen F June 26, 2019 at 8:09 pm #

            @Busayo Not really. You can use VGG16 for either of following-:
            1) Only architecture and not weights. In which case you train the model on your dataset
            2) Keep only some of the initial layers along with their weights and train for latter layers using your dataset
            3) Use complete VGG16 as a pre-trained model and use your dataset for only testing purposes.

          • Jason Brownlee June 27, 2019 at 7:49 am #

            Great summary!

  42. Hansal January 17, 2019 at 11:26 pm #

    How to train model on my own using my customized training dataset?

  43. sylvain February 5, 2019 at 4:35 am #

    I suppose it is the same principle if I want to use vgg face for facial recognition, rightr?

    • Jason Brownlee February 5, 2019 at 8:29 am #

      Perhaps, but face recognition is a very different type of problem than simple classification.

  44. Gia February 21, 2019 at 12:03 am #

    I am currently working on an app using Keras, ImageNet, and VGG16.
    I was wondering if it possible to check if an image falls into one of the classes like Plant, Animal, Food, etc…? Instead of it just checking to see what type of plant or food it is?

    • Jason Brownlee February 21, 2019 at 8:12 am #

      Yes, perhaps the output or classifier part of the model needs to be re-trained on higher order class labels?

  45. Yancho Basil March 2, 2019 at 1:26 am #

    Hi sir thanks for the tutorials, I am using the Pre-Trained VGG 16 Model to finetune Classify Objects in Photographs into 6 classes which do not belong to the imagenet module. when i run my code i go the error: ValueError : ‘decode_predictions’ expect a batch of prediction (ie a 2D array of shape (samples, 1000) found an array of shape a (1, 6). can you please help me resolve this problem

  46. Ramdas Khillare March 13, 2019 at 9:32 pm #

    i was read all article of your but i dont understand the where and how to train dataset and how to predict using aboe code please eloborate lil bit . step by step
    because i have image data set so shoulde i required to label the every image for classification or not and how to train the dataset and how to predict that please help me

    • Jason Brownlee March 14, 2019 at 9:22 am #

      Yes, every image requires a label.

      I hope to provide an tutorial of what you’re asking about soon.

  47. sunita March 18, 2019 at 4:23 pm #

    plz i want to know that.u applied this algo on one image -‘mug.jpg’ but if I have so many images like image1,image2,image3 then how to code?

  48. soumya bhattacharya April 19, 2019 at 8:13 pm #

    thanks for this well explained tutorial.

  49. SY April 26, 2019 at 5:38 am #

    Hi,

    Thanks for the tutorial. Is it possible to use VGG pretrained network for time series regression? How should the input and output layers change?

    • Jason Brownlee April 26, 2019 at 8:37 am #

      Yes, but it would not make sense to use a model for image classification for time series prediction.

  50. SY April 27, 2019 at 12:57 am #

    Thank you for your reply. Do you know of any pre-trained RNN that I can use? I have done an extensive search online but cannot find one.

    • Jason Brownlee April 27, 2019 at 6:35 am #

      I am not aware of pre-trained models for time series, sorry.

  51. namit June 5, 2019 at 5:19 pm #

    i’m not able to run
    from keras.applications.vgg16 import VGG16
    model = VGG16()
    the following commands in spyder.It shows a lot of errors.

  52. ishrat June 6, 2019 at 2:10 am #

    sir, how can i use this pretrained model with some other dataset?

  53. Venkatesh Roshan July 9, 2019 at 11:56 pm #

    Name = decode_predictions(pre[0])
    –>
    —————————————————————————
    ValueError Traceback (most recent call last)
    in
    —-> 1 Name = decode_predictions(pre[0])
    2 Name = Name[0][0]

    ~\Anaconda3\lib\site-packages\keras\applications\__init__.py in wrapper(*args, **kwargs)
    26 kwargs[‘models’] = models
    27 kwargs[‘utils’] = utils
    —> 28 return base_fun(*args, **kwargs)
    29
    30 return wrapper

    ~\Anaconda3\lib\site-packages\keras\applications\vgg16.py in decode_predictions(*args, **kwargs)
    14 @keras_modules_injection
    15 def decode_predictions(*args, **kwargs):
    —> 16 return vgg16.decode_predictions(*args, **kwargs)
    17
    18

    ~\Anaconda3\lib\site-packages\keras_applications\imagenet_utils.py in decode_predictions(preds, top, **kwargs)
    220 ‘a batch of predictions ‘
    221 ‘(i.e. a 2D array of shape (samples, 1000)). ‘
    –> 222 ‘Found array with shape: ‘ + str(preds.shape))
    223 if CLASS_INDEX is None:
    224 fpath = keras_utils.get_file(

    ValueError: decode_predictions expects a batch of predictions (i.e. a 2D array of shape (samples, 1000)). Found array with shape: (2,)

  54. Sabbir July 16, 2019 at 1:39 pm #

    Can Vgg-16 model use for face recognition problem of 10 person with pre-trained weights

  55. Sabbir July 21, 2019 at 11:01 pm #

    Thanks for your reply. It really helps me for my work. the model can identify a face of my own dataset if i use embedding and SVC. But i don’t want to use embedding and SVC classifier for identification. If i add a softmax at the last layer of facenet model and fine tune the model’s last layer with my own dataset image, it gives accuracy 100% at training time, but if i test some random image it can’t identify that face. I don’t find why it happens?

  56. mohsin September 17, 2019 at 4:01 pm #

    how to load other trained vgg16 weights other than default ?

  57. sneh October 18, 2019 at 11:12 pm #

    How to load VGG16 pretrained weights into our script and use it as classifier for cats_and_dogs dataset?

  58. tural October 25, 2019 at 4:05 am #

    Thank you!

  59. babi November 4, 2019 at 4:22 am #

    I am using VGG16 and VGG19 for my own data set. i change the image shape to 32X32. my validation accuracy didn’t change.. what is the problem with my code… I am struck with this…
    I don’t want transfer learning method… Please do help..

    Epoch 1/30
    52/52 [==============================] – 116s 2s/step – loss: nan – acc: 0.2558 – val_loss: nan – val_acc: 0.2540
    Epoch 2/30
    52/52 [==============================] – 119s 2s/step – loss: nan – acc: 0.2558 – val_loss: nan – val_acc: 0.2540
    Epoch 3/30
    52/52 [==============================] – 121s 2s/step – loss: nan – acc: 0.2505 – val_loss: nan – val_acc: 0.2540
    Epoch 4/30
    52/52 [==============================] – 126s 2s/step – loss: nan – acc: 0.2522 – val_loss: nan – val_acc: 0.2540
    Epoch 5/30
    52/52 [==============================] – 122s 2s/step – loss: nan – acc: 0.2571 – val_loss: nan – val_acc: 0.2540
    Epoch 6/30
    52/52 [==============================] – 121s 2s/step – loss: nan – acc: 0.2510 – val_loss: nan – val_acc: 0.2540

  60. Maryam November 15, 2019 at 6:01 pm #

    Thanks for your great tutorials!
    I am interested in the parent category of predictions.
    For example, if the model predicts a dog. I would like to have the category animals.

  61. Rahul December 14, 2019 at 6:08 am #

    Sir, You may think this question is silly but please clear my doubt.
    1)The transfer learning(VGG-16) works when we have different classes of data means the model is not trained on new classes or say the new data is not from imagenet dataset?2

    2)Sir can you explain IF we OFF the all VGG-16 layers using vgg.trainable = False and we added our custom Conv layers on top of it the how transfer learning works? (The images is not from those 1000 classes)? How we get an information from vgg-16 to custom layers if we off the layers?

    • Jason Brownlee December 14, 2019 at 6:28 am #

      You can use the model with the same classes or different classes as imagenet. If you use different classes, you will have to train the new layers on your new classes/dataset.

      It works by only training the new layers you add and leaving all other layers untouched. The existing layers will extract features from the photos and your new layer will interpret those features and classify them – it’s still amazing to me!

  62. Sujata February 16, 2020 at 9:52 pm #

    Jason, wonderful article on pretrained model. Can you tell me which model can i use for EEG signal processing for emotion detection? Thank you

  63. AKSHAT Singh February 29, 2020 at 5:43 am #

    Hi Jason,

    First of all, thank you very much for the work you are putting on. These are really nice tutorials and I always visit this site whenever I want to search for some particular machine/deep learning concepts. However, I am confused with loading a pre-trained model and predict on the same. I have a VGG trained from scratch saved in .h5 file. I am able to load that using
    “””””””from keras.models import load_model
    saved_model = load_model(“/content/vgglite.h5″)
    saved_model.layers[0].input_shape #(None, 224, 224, 3)””””” but when I tried predicting the test folder is not getting converted to array and I am getting [[[ IsADirectoryError: ]]]
    I was using [[[[[ import os
    from keras.preprocessing import image

    import numpy as np
    batch_holder = np.zeros((20, 224, 224, 3))
    img_dir=’/content/drive/My Drive/COMPUTER VISION DOCS/imagenette_6class/test/’
    for i,img in enumerate(os.listdir(img_dir)):
    img = image.load_img(os.path.join(img_dir,img), target_size=(224,224))
    batch_holder[i, :] = img]]]]]]]]

    Kindly explain how to load a pre-trained model and predict using the test set. Thanks in advance.

    • Jason Brownlee February 29, 2020 at 7:22 am #

      You’re welcome.

      Perhaps start with the example in the tutorial, confirm it works on your workstation, then slowly adapt it for your project.

  64. Gideon Ekpo Akpata March 19, 2020 at 10:18 pm #

    Jason, thank you for this awesome work of enlightening people. I really appreciate. I’m working on a research project of developing a system that differentiate fake image from original once. Can I make use of this VGG-16 model in developing it?

    • Jason Brownlee March 20, 2020 at 8:43 am #

      You’re welcome.

      Perhaps use it as a starting point?

  65. Shailaja Natarajan April 19, 2020 at 10:25 am #

    Hi Jason,
    Thanks for your great work. I have tried same example with image (cup.jpg).
    The image i have snipped from your original image and saved as “cup.jpg”. After i tried with VGG16 model as such same your code, i unable to get accuracy, prediction also wrong.
    Model throws output as “mosquito_net” with 1.7% accuracy.

    Could you please let me know, why my prediction went wrong with same image ?

    • Jason Brownlee April 19, 2020 at 1:16 pm #

      Perhaps ensure that you have loaded the image correctly and prepared the pixels in the expected manner – as we do in the tutorial.

  66. Varsha April 20, 2020 at 2:39 am #

    Hi , I am getting the below error after executing all the code:

    Could not import PIL.Image. The use of load_img requires PIL.

  67. khouloud yengui April 26, 2020 at 11:46 pm #

    thank you Jason for this tutorial , this code was for one image can you tell me how to prepare a dataset like FER2013 for the VGG16 CNN ?

    • Jason Brownlee April 27, 2020 at 5:35 am #

      What do you mean prepare the dataset?

      • khouloud yengui May 1, 2020 at 2:14 am #

        i mean how can i adapt a dataset like FER2013 for the VGG16 cnn in the same way as the ImageNet training data was prepared because i have a project about facial expression recognition

  68. ahmed April 30, 2020 at 9:56 am #

    Dear jason

    how could i calculate FP, TP and senstivty from TL model ?

  69. ahmed April 30, 2020 at 10:00 am #

    could i implement GAN augmentation instead of normal augmentation in TL and how ?
    sorry for this basic question as i am a beginner

    • Jason Brownlee April 30, 2020 at 11:38 am #

      Yes, but I expect it will not be as effective as normal image data augmentation.

  70. Anusha May 26, 2020 at 5:41 am #

    Hello Jason,
    I am a graduate student at the University of Cincinnati. I wanted to know if it is okay for me to use the images in your post as a part of my Master’s Thesis paper while citing the source of the image i.e. this post.
    Please let me know.
    Thank you
    Anusha

  71. Adithya June 11, 2020 at 2:19 am #

    Hi Jason,

    How do i train VGG16 with an image that is not a square matrix, like 640*480? Will i have to change the size of convolution and pooling filters as well?

    Thank you.

  72. Nada June 15, 2020 at 4:13 am #

    Hi Jason,

    Why VGG16 is more popular and using than Resnet50 with transfer learning and fine-tuning tutorials to train dataset includes more than one class?

    Are there any critical differences or reasons for that?

    Thank you.

    • Jason Brownlee June 15, 2020 at 6:08 am #

      Because it is simple, well understood and good enough for many applications.

  73. Hamza J July 20, 2020 at 8:13 am #

    Can I use vgg16 for cancer images?Or should I prefer resnet/alexnet/inception_3 or anyother?

    • Jason Brownlee July 20, 2020 at 1:52 pm #

      I recommend testing a suite of model to try as a starting point for transfer learning and discover what works best for your specific dataset.

  74. Jayamala Pakhare August 8, 2020 at 4:52 pm #

    Can I use VGG16 for oher image dataset ?

  75. Jaydev Prakash September 16, 2020 at 12:39 pm #

    Thanks for the wonderful work, but when I use VGG16 dataset I got error for shape of input numpy array
    Can you put some light on it?
    snippet of code:
    def test_on_whole_videos(train_data,train_labels,validation_data,validation_labels):
    x = []
    y = []
    count = 0
    output = 0
    base_model = load_VGG16_model()
    model = train_model(train_data,train_labels,validation_data,validation_labels)
    i=0
    count = 0
    for filename in os.listdir(“./test_im/3”):
    img=cv2.imread(“./test_im/3/”+filename,0)
    x.append(img)
    Error:

    x = np.array(x) in test_on_whole_videos(train_data, train_labels, validation_data, validation_labels)
    17 x = np.array(x)
    18 print(type(x))
    —> 19 x_features = base_model.predict(x)
    20 answer = model.predict(x_features)
    21 print(answer)

    ValueError: Input 0 of layer block1_conv1 is incompatible with the layer: expected ndim=4, found ndim=3. Full shape received: [None, 224, 224]

  76. Rod October 24, 2020 at 12:40 am #

    Hey Jason, looking at this line:

    > image = preprocess_input(image)

    It seems Keras’ VGG preprocess_input really just calls imagenet_utils.preprocess_input(x, data_format=data_format, mode=’caffe’) according to the source code:

    @keras_export(‘keras.applications.vgg16.preprocess_input’)
    def preprocess_input(x, data_format=None):
    return imagenet_utils.preprocess_input(
    x, data_format=data_format, mode=’caffe’)

    Source: https://www.tensorflow.org/api_docs/python/tf/keras/applications/vgg16/preprocess_input

    I understand this to mean that it defaults to ‘caffe’ mode, which according to the docs:

    > caffe: will convert the images from RGB to BGR, then will zero-center each color channel with respect to the ImageNet dataset, without scaling.

    Zero-centering makes sense, as it follows the paper’s preprocessing technique. But what about switching the channels from RGB to BGR…

    Keras’ load_img() defatuls to ‘rgb’. So my concern is that using Keras’ preprocess_input(image) will mess with the channel ordering.

    I tested this:

    from tensorflow.keras.applications.vgg16 import preprocess_input
    copied_data = np.copy(data)
    prep_data = preprocess_input(copied_data)

    from matplotlib import pyplot as plt
    plt.imshow(data[0].astype(‘int’))
    plt.show()

    plt.imshow(prep_data[0].astype(‘int’))
    plt.show()

    And sure enough, the RGB channels were flipped. The yellows/reds in the original image turned into blue-ish hues.

    So what’s the best way to combat this? Load the data as BGR from the get-go?

    • Rod October 24, 2020 at 1:18 am #

      Welp, it seems that asking the question is often the path to enlightenment… I see now that it’s necessary to convert the image from RGB to BGR because the Keras VGG16 model with ‘imagenet’ weights are internally using BGR channel ordering.

      > In the keras link to VGG16, it is stated that: “These weights are ported from the ones released by VGG at Oxford.” So the VGG16 and VGG19 models were trained in Caffe and ported to TensorFlow, hence mode == ‘caffe’ here (range from 0 to 255 and then extract the mean [103.939, 116.779, 123.68]).

      @ https://stackoverflow.com/questions/53092971/keras-vgg16-preprocess-input-modes

    • Jason Brownlee October 24, 2020 at 7:03 am #

      Intersting, perhaps Keras got things messed up in the latest version/s.

      Perhaps you can implement the data prep manually for your application.

  77. Asha Joseph February 4, 2021 at 2:43 pm #

    Why does the pre-trained model classify common objects accurately but does a bad job when it comes to facial images though Imagenet has a category called person?

    • Jason Brownlee February 5, 2021 at 5:33 am #

      Good question.

      The model is good at classifying photos of objects like those in the training data. The model was trained on objects, not faces/people.

  78. Shobi March 8, 2021 at 9:28 am #

    Hi Jason,

    Thank you for a good article.

    Could you please guide me to choose the right TOP-1 accuracy of VGG16 because MobilNet authors write 71.5% top-1 in their paper, keras application table shows 71.3%, and
    paperwithcodes shows 74.4% under ImageNet benchmark.

    Who is reporting correct accuracy? Could you please guide ?

    Thank you!

    • Jason Brownlee March 8, 2021 at 1:30 pm #

      You’re welcome.

      Generally, I recommend testing each model on your dataset and choose the one that performs the best.

      If you want to compare reported numbers, perhaps you can check the papers to see if it is an apples to apples comparison, and if not, perhaps evaluate the models yourself under the conditions you expect to use them.

  79. Vidya March 8, 2021 at 9:52 pm #

    Hi Jason .

    I have the following questions:
    1. When should one use a pre-trained model like VGG16 with transfer learning Vs train a neural network from scratch ? is this dependent on the classification task ?
    2. For a beginner in neural network , should one directly approach pre-trained models ?

    • Jason Brownlee March 9, 2021 at 5:20 am #

      Pre-trained models can save time and get good results, if they were trained on a similar problem. Use them when they give better results than a model fit from scratch.

      Pre-trained models are an excellent way to get started on most problems.

  80. Vidya March 31, 2021 at 11:27 pm #

    Hi Jason .

    I have followed your tutorial above on using VGG16 and tested on few grocery item images like tea , oil , etc . It gave very poor prediction. So now , what options do i have?
    Train VGG16 on the images I have and then predict ?
    Thanks !

    • Jason Brownlee April 1, 2021 at 8:18 am #

      Perhaps you can try an alternate model?

      Or, perhaps you can use transfer learning to adapt the model to be better suited to your dataset?

  81. Vidya April 1, 2021 at 11:35 am #

    Thanks Jason . What would be the criteria for selecting an alternate pre-trained model ? Could you please share any reference for performing transfer learning with a given pre-trained model .
    Thanks !!

    • Jason Brownlee April 2, 2021 at 5:34 am #

      Choose a model that performs well or best for your dataset.

      There are many examples of transfer learning on the blog, you can use the search box at the top of the page.

      • Vidya April 2, 2021 at 4:25 pm #

        Thanks Jason.

  82. ANNAMANENI SANTHOSHINI December 31, 2021 at 4:11 am #

    Hello jason thanks for the info ! Iam really new to this Deep learning thing I need to do my accident prediction final year project using vgg16 and resnet 50 could you lend me your hand in it please ???????? please help me

    • James Carmichael December 31, 2021 at 10:09 am #

      You are very welcome! Generally, I recommend that you complete homework and assignments yourself.

      You have chosen a course and (perhaps) have even paid money to take the course. You have chosen to invest in yourself via self-education.

      In order to get the most out of this investment, you must do the work.

      Also, you (may) have paid the teachers, lectures and support staff to teach you. Use that resource and ask for help and clarification about your homework or assignment from them. They work for you in some sense, and no one knows more about your homework or assignment and how it will be assed than them.

      Nevertheless, if you are still struggling, perhaps you can boil your difficulty down to one sentence and contact me.

  83. Javed Hossain May 16, 2022 at 5:52 am #

    Give me some suggestion about vgg19. how to apply real life?? give some practice project??

Leave a Reply

Machine Learning Mastery is part of Guiding Tech Media, a leading digital media publisher focused on helping people figure out technology. Visit our corporate website to learn more about our mission and team.