[New Book] Click to get The Beginner's Guide to Data Science!
Use the offer code 20offearlybird to get 20% off. Hurry, sale ends soon!

How to Develop a Deep Learning Photo Caption Generator from Scratch

Develop a Deep Learning Model to Automatically
Describe Photographs in Python with Keras, Step-by-Step.

Caption generation is a challenging artificial intelligence problem where a textual description must be generated for a given photograph.

It requires both methods from computer vision to understand the content of the image and a language model from the field of natural language processing to turn the understanding of the image into words in the right order. Recently, deep learning methods have achieved state-of-the-art results on examples of this problem.

Deep learning methods have demonstrated state-of-the-art results on caption generation problems. What is most impressive about these methods is a single end-to-end model can be defined to predict a caption, given a photo, instead of requiring sophisticated data preparation or a pipeline of specifically designed models.

In this tutorial, you will discover how to develop a photo captioning deep learning model from scratch.

After completing this tutorial, you will know:

  • How to prepare photo and text data for training a deep learning model.
  • How to design and train a deep learning caption generation model.
  • How to evaluate a train caption generation model and use it to caption entirely new photographs.

Kick-start your project with my new book Deep Learning for Natural Language Processing, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Update Nov/2017: Added note about a bug introduced in Keras 2.1.0 and 2.1.1 that impacts the code in this tutorial.
  • Update Dec/2017: Updated a typo in the function name when explaining how to save descriptions to file, thanks Minel.
  • Update Apr/2018: Added a new section that shows how to train the model using progressive loading for workstations with minimum RAM.
  • Update Feb/2019: Provided direct links for the Flickr8k_Dataset dataset, as the official site was taken down.
  • Update Jun/2019: Fixed typo in dataset name. Fixed minor bug in create_sequences().
  • Update Aug/2020: Update code for API changes in Keras 2.4.3 and TensorFlow 2.3.
  • Update Dec/2020: Added a section for checking library version numbers.
  • Update Dec/2020: Updated progressive loading to fix error “ValueError: No gradients provided for any variable“.
How to Develop a Deep Learning Caption Generation Model in Python from Scratch

How to Develop a Deep Learning Caption Generation Model in Python from Scratch
Photo by Living in Monrovia, some rights reserved.

Tutorial Overview

This tutorial is divided into 6 parts; they are:

  1. Photo and Caption Dataset
  2. Prepare Photo Data
  3. Prepare Text Data
  4. Develop Deep Learning Model
  5. Train With Progressive Loading (NEW)
  6. Evaluate Model
  7. Generate New Captions

Python Environment

This tutorial assumes you have a Python SciPy environment installed, ideally with Python 3.

You must have Keras installed with the TensorFlow backend. The tutorial also assumes you have the libraries NumPy and NLTK installed.

If you need help with your environment, see this tutorial:

I recommend running the code on a system with a GPU. You can access GPUs cheaply on Amazon Web Services. Learn how in this tutorial:

Before we move on, let’s check your deep learning library version.

Run the following script and check your version numbers:

Running the script should show the same library version numbers or higher.

Let’s dive in.

Need help with Deep Learning for Text Data?

Take my free 7-day email crash course now (with code).

Click to sign-up and also get a free PDF Ebook version of the course.

Photo and Caption Dataset

A good dataset to use when getting started with image captioning is the Flickr8K dataset.

The reason is because it is realistic and relatively small so that you can download it and build models on your workstation using a CPU.

The definitive description of the dataset is in the paper “Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics” from 2013.

The authors describe the dataset as follows:

We introduce a new benchmark collection for sentence-based image description and search, consisting of 8,000 images that are each paired with five different captions which provide clear descriptions of the salient entities and events.

The images were chosen from six different Flickr groups, and tend not to contain any well-known people or locations, but were manually selected to depict a variety of scenes and situations.

Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics, 2013.

The dataset is available for free. You must complete a request form and the links to the dataset will be emailed to you. I would love to link to them for you, but the email address expressly requests: “Please do not redistribute the dataset“.

You can use the link below to request the dataset (note, this may not work any more, see below):

Within a short time, you will receive an email that contains links to two files:

  • Flickr8k_Dataset.zip (1 Gigabyte) An archive of all photographs.
  • Flickr8k_text.zip (2.2 Megabytes) An archive of all text descriptions for photographs.

UPDATE (Feb/2019): The official site seems to have been taken down (although the form still works). Here are some direct download links from my datasets GitHub repository:

Download the datasets and unzip them into your current working directory. You will have two directories:

  • Flickr8k_Dataset: Contains 8092 photographs in JPEG format.
  • Flickr8k_text: Contains a number of files containing different sources of descriptions for the photographs.

The dataset has a pre-defined training dataset (6,000 images), development dataset (1,000 images), and test dataset (1,000 images).

One measure that can be used to evaluate the skill of the model are BLEU scores. For reference, below are some ball-park BLEU scores for skillful models when evaluated on the test dataset (taken from the 2017 paper “Where to put the Image in an Image Caption Generator“):

  • BLEU-1: 0.401 to 0.578.
  • BLEU-2: 0.176 to 0.390.
  • BLEU-3: 0.099 to 0.260.
  • BLEU-4: 0.059 to 0.170.

We describe the BLEU metric more later when we work on evaluating our model.

Next, let’s look at how to load the images.

Prepare Photo Data

We will use a pre-trained model to interpret the content of the photos.

There are many models to choose from. In this case, we will use the Oxford Visual Geometry Group, or VGG, model that won the ImageNet competition in 2014. Learn more about the model here:

Keras provides this pre-trained model directly. Note, the first time you use this model, Keras will download the model weights from the Internet, which are about 500 Megabytes. This may take a few minutes depending on your internet connection.

We could use this model as part of a broader image caption model. The problem is, it is a large model and running each photo through the network every time we want to test a new language model configuration (downstream) is redundant.

Instead, we can pre-compute the “photo features” using the pre-trained model and save them to file. We can then load these features later and feed them into our model as the interpretation of a given photo in the dataset. It is no different to running the photo through the full VGG model; it is just we will have done it once in advance.

This is an optimization that will make training our models faster and consume less memory.

We can load the VGG model in Keras using the VGG class. We will remove the last layer from the loaded model, as this is the model used to predict a classification for a photo. We are not interested in classifying images, but we are interested in the internal representation of the photo right before a classification is made. These are the “features” that the model has extracted from the photo.

Keras also provides tools for reshaping the loaded photo into the preferred size for the model (e.g. 3 channel 224 x 224 pixel image).

Below is a function named extract_features() that, given a directory name, will load each photo, prepare it for VGG, and collect the predicted features from the VGG model. The image features are a 1-dimensional 4,096 element vector.

The function returns a dictionary of image identifier to image features.

We can call this function to prepare the photo data for testing our models, then save the resulting dictionary to a file named ‘features.pkl‘.

The complete example is listed below.

Running this data preparation step may take a while depending on your hardware, perhaps one hour on the CPU with a modern workstation.

At the end of the run, you will have the extracted features stored in ‘features.pkl‘ for later use. This file will be about 127 Megabytes in size.

Prepare Text Data

The dataset contains multiple descriptions for each photograph and the text of the descriptions requires some minimal cleaning.

If you are new to cleaning text data, see this post:

First, we will load the file containing all of the descriptions.

Each photo has a unique identifier. This identifier is used on the photo filename and in the text file of descriptions.

Next, we will step through the list of photo descriptions. Below defines a function load_descriptions() that, given the loaded document text, will return a dictionary of photo identifiers to descriptions. Each photo identifier maps to a list of one or more textual descriptions.

Next, we need to clean the description text. The descriptions are already tokenized and easy to work with.

We will clean the text in the following ways in order to reduce the size of the vocabulary of words we will need to work with:

  • Convert all words to lowercase.
  • Remove all punctuation.
  • Remove all words that are one character or less in length (e.g. ‘a’).
  • Remove all words with numbers in them.

Below defines the clean_descriptions() function that, given the dictionary of image identifiers to descriptions, steps through each description and cleans the text.

Once cleaned, we can summarize the size of the vocabulary.

Ideally, we want a vocabulary that is both expressive and as small as possible. A smaller vocabulary will result in a smaller model that will train faster.

For reference, we can transform the clean descriptions into a set and print its size to get an idea of the size of our dataset vocabulary.

Finally, we can save the dictionary of image identifiers and descriptions to a new file named descriptions.txt, with one image identifier and description per line.

Below defines the save_descriptions() function that, given a dictionary containing the mapping of identifiers to descriptions and a filename, saves the mapping to file.

Putting this all together, the complete listing is provided below.

Running the example first prints the number of loaded photo descriptions (8,092) and the size of the clean vocabulary (8,763 words).

Finally, the clean descriptions are written to ‘descriptions.txt‘.

Taking a look at the file, we can see that the descriptions are ready for modeling. The order of descriptions in your file may vary.

Develop Deep Learning Model

In this section, we will define the deep learning model and fit it on the training dataset.

This section is divided into the following parts:

  1. Loading Data.
  2. Defining the Model.
  3. Fitting the Model.
  4. Complete Example.

Loading Data

First, we must load the prepared photo and text data so that we can use it to fit the model.

We are going to train the data on all of the photos and captions in the training dataset. While training, we are going to monitor the performance of the model on the development dataset and use that performance to decide when to save models to file.

The train and development dataset have been predefined in the Flickr_8k.trainImages.txt and Flickr_8k.devImages.txt files respectively, that both contain lists of photo file names. From these file names, we can extract the photo identifiers and use these identifiers to filter photos and descriptions for each set.

The function load_set() below will load a pre-defined set of identifiers given the train or development sets filename.

Now, we can load the photos and descriptions using the pre-defined set of train or development identifiers.

Below is the function load_clean_descriptions() that loads the cleaned text descriptions from ‘descriptions.txt‘ for a given set of identifiers and returns a dictionary of identifiers to lists of text descriptions.

The model we will develop will generate a caption given a photo, and the caption will be generated one word at a time. The sequence of previously generated words will be provided as input. Therefore, we will need a ‘first word’ to kick-off the generation process and a ‘last word‘ to signal the end of the caption.

We will use the strings ‘startseq‘ and ‘endseq‘ for this purpose. These tokens are added to the loaded descriptions as they are loaded. It is important to do this now before we encode the text so that the tokens are also encoded correctly.

Next, we can load the photo features for a given dataset.

Below defines a function named load_photo_features() that loads the entire set of photo descriptions, then returns the subset of interest for a given set of photo identifiers.

This is not very efficient; nevertheless, this will get us up and running quickly.

We can pause here and test everything developed so far.

The complete code example is listed below.

Running this example first loads the 6,000 photo identifiers in the training dataset. These features are then used to filter and load the cleaned description text and the pre-computed photo features.

We are nearly there.

The description text will need to be encoded to numbers before it can be presented to the model as in input or compared to the model’s predictions.

The first step in encoding the data is to create a consistent mapping from words to unique integer values. Keras provides the Tokenizer class that can learn this mapping from the loaded description data.

Below defines the to_lines() to convert the dictionary of descriptions into a list of strings and the create_tokenizer() function that will fit a Tokenizer given the loaded photo description text.

We can now encode the text.

Each description will be split into words. The model will be provided one word and the photo and generate the next word. Then the first two words of the description will be provided to the model as input with the image to generate the next word. This is how the model will be trained.

For example, the input sequence “little girl running in field” would be split into 6 input-output pairs to train the model:

Later, when the model is used to generate descriptions, the generated words will be concatenated and recursively provided as input to generate a caption for an image.

The function below named create_sequences(), given the tokenizer, a maximum sequence length, and the dictionary of all descriptions and photos, will transform the data into input-output pairs of data for training the model. There are two input arrays to the model: one for photo features and one for the encoded text. There is one output for the model which is the encoded next word in the text sequence.

The input text is encoded as integers, which will be fed to a word embedding layer. The photo features will be fed directly to another part of the model. The model will output a prediction, which will be a probability distribution over all words in the vocabulary.

The output data will therefore be a one-hot encoded version of each word, representing an idealized probability distribution with 0 values at all word positions except the actual word position, which has a value of 1.

We will need to calculate the maximum number of words in the longest description. A short helper function named max_length() is defined below.

We now have enough to load the data for the training and development datasets and transform the loaded data into input-output pairs for fitting a deep learning model.

Defining the Model

We will define a deep learning based on the “merge-model” described by Marc Tanti, et al. in their 2017 papers:

For a gentle introduction to this architecture, see the post:

The authors provide a nice schematic of the model, reproduced below.

Schematic of the Merge Model For Image Captioning

Schematic of the Merge Model For Image Captioning

We will describe the model in three parts:

  • Photo Feature Extractor. This is a 16-layer VGG model pre-trained on the ImageNet dataset. We have pre-processed the photos with the VGG model (without the output layer) and will use the extracted features predicted by this model as input.
  • Sequence Processor. This is a word embedding layer for handling the text input, followed by a Long Short-Term Memory (LSTM) recurrent neural network layer.
  • Decoder (for lack of a better name). Both the feature extractor and sequence processor output a fixed-length vector. These are merged together and processed by a Dense layer to make a final prediction.

The Photo Feature Extractor model expects input photo features to be a vector of 4,096 elements. These are processed by a Dense layer to produce a 256 element representation of the photo.

The Sequence Processor model expects input sequences with a pre-defined length (34 words) which are fed into an Embedding layer that uses a mask to ignore padded values. This is followed by an LSTM layer with 256 memory units.

Both the input models produce a 256 element vector. Further, both input models use regularization in the form of 50% dropout. This is to reduce overfitting the training dataset, as this model configuration learns very fast.

The Decoder model merges the vectors from both input models using an addition operation. This is then fed to a Dense 256 neuron layer and then to a final output Dense layer that makes a softmax prediction over the entire output vocabulary for the next word in the sequence.

The function below named define_model() defines and returns the model ready to be fit.

To get a sense for the structure of the model, specifically the shapes of the layers, see the summary listed below.

We also create a plot to visualize the structure of the network that better helps understand the two streams of input.

Plot of the Caption Generation Deep Learning Model

Plot of the Caption Generation Deep Learning Model

Fitting the Model

Now that we know how to define the model, we can fit it on the training dataset.

The model learns fast and quickly overfits the training dataset. For this reason, we will monitor the skill of the trained model on the holdout development dataset. When the skill of the model on the development dataset improves at the end of an epoch, we will save the whole model to file.

At the end of the run, we can then use the saved model with the best skill on the training dataset as our final model.

We can do this by defining a ModelCheckpoint in Keras and specifying it to monitor the minimum loss on the validation dataset and save the model to a file that has both the training and validation loss in the filename.

We can then specify the checkpoint in the call to fit() via the callbacks argument. We must also specify the development dataset in fit() via the validation_data argument.

We will only fit the model for 20 epochs, but given the amount of training data, each epoch may take 30 minutes on modern hardware.

Complete Example

The complete example for fitting the model on the training data is listed below.

Running the example first prints a summary of the loaded training and development datasets.

After the summary of the model, we can get an idea of the total number of training and validation (development) input-output pairs.

The model then runs, saving the best model to .h5 files along the way.

On my run, the best validation results were saved to the file:

  • model-ep002-loss3.245-val_loss3.612.h5

This model was saved at the end of epoch 2 with a loss of 3.245 on the training dataset and a loss of 3.612 on the development dataset

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Let me know what you get in the comments below.

If you ran the example on AWS, copy the model file back to your current working directory. If you need help with commands on AWS, see the post:

Did you get an error like:

If so, see the next section.

Train With Progressive Loading

Note: If you had no problems in the previous section, please skip this section. This section is for those who do not have enough memory to train the model as described in the previous section (e.g. cannot use AWS EC2 for whatever reason).

The training of the caption model does assume you have a lot of RAM.

The code in the previous section is not memory efficient and assumes you are running on a large EC2 instance with 32GB or 64GB of RAM. If you are running the code on a workstation of 8GB of RAM, you cannot train the model.

A workaround is to use progressive loading. This was discussed in detail in the second-last section titled “Progressive Loading” in the post:

I recommend reading that section before continuing.

If you want to use progressive loading, to train this model, this section will show you how.

The first step is we must define a function that we can use as the data generator.

We will keep things very simple and have the data generator yield one photo’s worth of data per batch. This will be all of the sequences generated for a photo and its set of descriptions.

The function below data_generator() will be the data generator and will take the loaded textual descriptions, photo features, tokenizer and max length. Here, I assume that you can fit this training data in memory, which I believe 8GB of RAM should be more than capable.

How does this work? Read the post I just mentioned above that introduces data generators.

You can see that we are calling the create_sequence() function to create a batch worth of data for a single photo rather than an entire dataset. This means that we must update the create_sequences() function to delete the “iterate over all descriptions” for-loop.

The updated function is as follows:

We now have pretty much everything we need.

Note, this is a very basic data generator. The big memory saving it offers is to not have the unrolled sequences of train and test data in memory prior to fitting the model, that these samples (e.g. results from create_sequences()) are created as needed per photo.

Some off-the-cuff ideas for further improving this data generator include:

  • Randomize the order of photos each epoch.
  • Work with a list of photo ids and load text and photo data as needed to cut even further back on memory.
  • Yield more than one photo’s worth of samples per batch.

I have experienced with these variations myself in the past. Let me know if you do and how you go in the comments.

You can sanity check a data generator by calling it directly, as follows:

Running this sanity check will show what one batch worth of sequences looks like, in this case 47 samples to train on for the first photo.

Finally, we can use the fit_generator() function on the model to train the model with this data generator.

In this simple example we will discard the loading of the development dataset and model checkpointing and simply save the model after each training epoch. You can then go back and load/evaluate each saved model after training to find the one we the lowest loss that you can then use in the next section.

The code to train the model with the data generator is as follows:

That’s it. You can now train the model using progressive loading and save a ton of RAM. This may also be a lot slower.

The complete updated example with progressive loading (use of the data generator) for training the caption generation model is listed below.

Perhaps evaluate each saved model and choose the one final model with the lowest loss on a holdout dataset. The next section may help with this.

Did you use this new addition to the tutorial?
How did you go?

Evaluate Model

Once the model is fit, we can evaluate the skill of its predictions on the holdout test dataset.

We will evaluate a model by generating descriptions for all photos in the test dataset and evaluating those predictions with a standard cost function.

First, we need to be able to generate a description for a photo using a trained model.

This involves passing in the start description token ‘startseq‘, generating one word, then calling the model recursively with generated words as input until the end of sequence token is reached ‘endseq‘ or the maximum description length is reached.

The function below named generate_desc() implements this behavior and generates a textual description given a trained model, and a given prepared photo as input. It calls the function word_for_id() in order to map an integer prediction back to a word.

We will generate predictions for all photos in the test dataset and in the train dataset.

The function below named evaluate_model() will evaluate a trained model against a given dataset of photo descriptions and photo features. The actual and predicted descriptions are collected and evaluated collectively using the corpus BLEU score that summarizes how close the generated text is to the expected text.

BLEU scores are used in text translation for evaluating translated text against one or more reference translations.

Here, we compare each generated description against all of the reference descriptions for the photograph. We then calculate BLEU scores for 1, 2, 3 and 4 cumulative n-grams.

You can learn more about the BLEU score here:

The NLTK Python library implements the BLEU score calculation in the corpus_bleu() function. A higher score close to 1.0 is better, a score closer to zero is worse.

We can put all of this together with the functions from the previous section for loading the data. We first need to load the training dataset in order to prepare a Tokenizer so that we can encode generated words as input sequences for the model. It is critical that we encode the generated words using exactly the same encoding scheme as was used when training the model.

We then use these functions for loading the test dataset.

The complete example is listed below.

Running the example prints the BLEU scores.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

We can see that the scores fit within and close to the top of the expected range of a skillful model on the problem. The chosen model configuration is by no means optimized.

Generate New Captions

Now that we know how to develop and evaluate a caption generation model, how can we use it?

Almost everything we need to generate captions for entirely new photographs is in the model file.

We also need the Tokenizer for encoding generated words for the model while generating a sequence, and the maximum length of input sequences, used when we defined the model (e.g. 34).

We can hard code the maximum sequence length. With the encoding of text, we can create the tokenizer and save it to a file so that we can load it quickly whenever we need it without needing the entire Flickr8K dataset. An alternative would be to use our own vocabulary file and mapping to integers function during training.

We can create the Tokenizer as before and save it as a pickle file tokenizer.pkl. The complete example is listed below.

We can now load the tokenizer whenever we need it without having to load the entire training dataset of annotations.

Now, let’s generate a description for a new photograph.

Below is a new photograph that I chose randomly on Flickr (available under a permissive license).

Photo of a dog at the beach.

Photo of a dog at the beach.
Photo by bambe1964, some rights reserved.

We will generate a description for it using our model.

Download the photograph and save it to your local directory with the filename “example.jpg“.

First, we must load the Tokenizer from tokenizer.pkl and define the maximum length of the sequence to generate, needed for padding inputs.

Then we must load the model, as before.

Next, we must load the photo we which to describe and extract the features.

We could do this by re-defining the model and adding the VGG-16 model to it, or we can use the VGG model to predict the features and use them as inputs to our existing model. We will do the latter and use a modified version of the extract_features() function used during data preparation, but adapted to work on a single photo.

We can then generate a description using the generate_desc() function defined when evaluating the model.

The complete example for generating a description for an entirely new standalone photograph is listed below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, the description generated was as follows:

You could remove the start and end tokens and you would have the basis for a nice automatic photo captioning model.

It’s like living in the future guys!

It still completely blows my mind that we can do this. Wow.


This section lists some ideas for extending the tutorial that you may wish to explore.

  • Alternate Pre-Trained Photo Models. A small 16-layer VGG model was used for feature extraction. Consider exploring larger models that offer better performance on the ImageNet dataset, such as Inception.
  • Smaller Vocabulary. A larger vocabulary of nearly eight thousand words was used in the development of the model. Many of the words supported may be misspellings or only used once in the entire dataset. Refine the vocabulary and reduce the size, perhaps by half.
  • Pre-trained Word Vectors. The model learned the word vectors as part of fitting the model. Better performance may be achieved by using word vectors either pre-trained on the training dataset or trained on a much larger corpus of text, such as news articles or Wikipedia.
  • Tune Model. The configuration of the model was not tuned on the problem. Explore alternate configurations and see if you can achieve better performance.

Did you try any of these extensions? Share your results in the comments below.

Further Reading

This section provides more resources on the topic if you are looking go deeper.

Caption Generation Papers

Flickr8K Dataset



In this tutorial, you discovered how to develop a photo captioning deep learning model from scratch.

Specifically, you learned:

  • How to prepare photo and text data ready for training a deep learning model.
  • How to design and train a deep learning caption generation model.
  • How to evaluate a train caption generation model and use it to caption entirely new photographs.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Deep Learning models for Text Data Today!

Deep Learning for Natural Language Processing

Develop Your Own Text models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Natural Language Processing

It provides self-study tutorials on topics like:
Bag-of-Words, Word Embedding, Language Models, Caption Generation, Text Translation and much more...

Finally Bring Deep Learning to your Natural Language Processing Projects

Skip the Academics. Just Results.

See What's Inside

1,196 Responses to How to Develop a Deep Learning Photo Caption Generator from Scratch

  1. Avatar
    Christian Beckmann November 28, 2017 at 3:21 am #

    Hi Jason,

    thanks for this great article about image caption!

    My results after training were a bit worse (loss 3.566 – val_loss 3.859, then started to overfit) so i decided to try keras.applications.inception_v3.InceptionV3 for the base model. Currently it is still running and i am curious to see if it will do better.

    • Avatar
      Jason Brownlee November 28, 2017 at 8:41 am #

      Let me know how you go Christian.

      • Avatar
        zeeshan August 2, 2019 at 8:44 pm #

        hi jason m recieving this error can u please help me in this

        NameError: name ‘Flickr8k_Dataset’ is not defined

        • Avatar
          Jason Brownlee August 3, 2019 at 8:02 am #

          You may have missed a line of code or the dataset is not in the same directory as the python file.

          • Avatar
            Bhagyashree January 30, 2022 at 7:35 pm #

            Can you provide complete source code link without split code parts?
            please 🙂

          • Avatar
            James Carmichael January 31, 2022 at 10:52 am #

            Hello Bhagyashree…The tutorial contains full code listing that you may utilize.

      • Avatar
        mo December 16, 2020 at 7:54 pm #

        how to solve this , error happen


        6 generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
        7 # fit for one epoch
        —-> 8 model.fit_generator( generator,epochs=1, steps_per_epoch=steps, verbose=1)

        • Avatar
          Jason Brownlee December 17, 2020 at 6:34 am #

          I don’t have enough context to comment, sorry.

          Perhaps these tips will help:

          • Avatar
            sharath May 19, 2021 at 2:23 am #

            Hello Jason
            I,m facing a value error could u help

            ValueError Traceback (most recent call last)
            in ()
            6 image_input=image_input.reshape(2048,)
            7 gen=generate(desc_dict,photo,max_length_of_caption,vocab_size,image_input)
            —-> 8 model.fit(gen,epochs=1,steps_per_epoch=6000,verbose=1)

            5 frames
            in create_sequence(caption, max_length_of_caption, vocab_size, image_input)
            1 def create_sequence(caption,max_length_of_caption,vocab_size,image_input):
            —-> 2 input_sequence=[],image_sequence=[],output_sequence=[]
            3 for caption in captions:
            4 caption=caption.split(‘ ‘)
            5 caption=[wordtoindex[w] for w in caption if w in vocab]

            ValueError: not enough values to unpack (expected 2, got 0)

        • Avatar
          asd February 2, 2021 at 12:54 am #

          Hey, did a find a solution? I’m facing the same error.

        • Avatar
          Mustafa Dar October 20, 2021 at 12:53 am #

          What accuracy are you getting in your NLP scores?

      • Avatar
        Rajat December 26, 2020 at 4:10 am #

        Hello Jason can you help me with the frontend part I tried using the flask app but failed

    • Avatar
      basil June 21, 2018 at 12:03 am #

      Christian / Jason – instead would Batch normalization help us here. am facing the same issue, over fitting.

      BN should also speed up the training and should also give us more accurate results. any inputs ?

      • Avatar
        Jason Brownlee June 21, 2018 at 6:18 am #

        The model usually does fit in 3-5 epochs.

        You can try batchnorm if you like. Not sure if it will help.

        • Avatar
          basil June 23, 2018 at 4:34 am #

          yep, i agree… not required..thanks..

          am also trying inceptionV3, let you know the results..

          • Avatar
            Jason Brownlee June 23, 2018 at 6:20 am #


          • Avatar
            Ben June 24, 2018 at 8:14 am #

            Hey did anyone try the Inception model? What were the results?

          • Avatar
            abbas November 18, 2018 at 3:37 am #

            hey ben!!!Can you please share the code and results of the inception model?so that we can also try and know more about the inception model.Thanks in advance

    • Avatar
      Shaurya Pratap Singh October 10, 2018 at 7:25 pm #

      can you plz send me the code at shauryaprataps261@gmail.com

      • Avatar
        Asad March 24, 2019 at 7:36 am #

        did you find code ?

    • Avatar
      Janarddan Sarkar November 24, 2018 at 1:16 am #

      I am getting the same

    • Avatar
      vishal July 6, 2020 at 3:05 am #

      i have tried using the inception v3 but the bleu scores are even than that of vgg16 model.
      BLEU-1: 0.514655
      BLEU-2: 0.266434
      BLEU-3: 0.179374
      BLEU-4: 0.078146

      • Avatar
        Jason Brownlee July 6, 2020 at 6:39 am #

        Nice work!

      • Avatar
        Rohit Kushwaha April 15, 2021 at 1:32 pm #

        i also tried Inception i got BLEU-1 0.571

      • Avatar
        afrid May 17, 2021 at 1:26 am #

        @vishal, can you share the inception v3 code ?

    • Avatar
      Karan Aggarwal June 13, 2021 at 3:55 am #

      Hello Christian Sir,

      To avoid overfit, you used keras.application.inceptionV3, m geeting some error in this line:

      print(‘Extracted Features: %d’ % len(features))

      TypeError Traceback (most recent call last)
      in ()
      —-> 1 print(‘Extracted Features: %d’ % len(features))

      TypeError: object of type ‘NoneType’ has no len()

      Please help in resolving this

    • Avatar
      Nagaraj CL April 12, 2022 at 1:07 pm #

      HI Christian, Please can you share working Inception V3 code, I am not able to make InceptionV3 model working, I am getting following error.

      Incompatible shapes: [47,8,8,256] vs. [47,256]
      [[{{node gradient_tape/model_10/add_7/add/BroadcastGradientArgs}}]] [Op:__inference_train_function_1153371]

  2. Avatar
    Akash November 30, 2017 at 4:56 am #

    Hi Jason,
    Once again great Article.
    I ran into some error while executing the code under “Complete example ” section.
    The error I got was
    ValueError: Error when checking target: expected dense_3 to have shape (None, 7579) but got array with shape (306404, 1)
    Any idea how to fix this?

    • Avatar
      Jason Brownlee November 30, 2017 at 8:26 am #

      Hi Akash, nice catch.

      The fault appears to have been introduced in a recent version of Keras in the to_categorical() function. I can confirm the fault occurs with Keras 2.1.1.

      You can learn more about the fault here:

      There are two options:

      1. Downgrade Keras to 2.0.8


      2. Modify the code, change line 104 in the training code example from:


      I hope that helps.

      • Avatar
        Akash November 30, 2017 at 5:38 pm #

        Thanks Jason. It’s working now.
        Can you suggest the changes to be made to use Inception model and word embedding like word2vec.

    • Avatar
      Gaurav Anand August 3, 2018 at 4:02 pm #

      Hi Akash

      Could you please tell how did you git rid of this problem?

      I am facing

      ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (11, 7, 7, 512)

      and after changing input structure to inputs1 = Input(shape=(7, 7, 512,)) I am facing

      ValueError: Error when checking target: expected dense_3 to have 4 dimensions, but got array with shape (11, 3857)

      I have tried with Keras 2.0.8 and latest 2.2.2 versions.
      Any help would be much appreciated.


      • Avatar
        anesh August 7, 2018 at 4:34 pm #

        Did you used different input shape?.If you changed the input shape then you have to flatten it and add fully connected dense layer of 4096 neurons.

        • Avatar
          Gaurav Anand August 14, 2018 at 2:59 pm #

          Should I avoid using “include_top = false” while feature extraction ?
          or keep it as true ?

        • Avatar
          abbas November 18, 2018 at 3:45 am #

          Anesh how to fix this error?

          Error when checking input: expected input_3 to have shape (4096,) but got array with shape (2048,)

          • Avatar
            Jason Brownlee November 18, 2018 at 6:48 am #

            Change the data to meet the model or change the model to meet the data.

  3. Avatar
    Zoltan November 30, 2017 at 11:47 pm #

    Hi Jason,

    Big thumbs up, nicely written, really informative article. I especially like the step by step approach.

    But when I tried to go through it, I got an error in load_poto_features saying that “name ‘load’ not defined”. Which is kinda odd.

    Otherwise everything seems fine.

    • Avatar
      Jason Brownlee December 1, 2017 at 7:35 am #


      Perhaps double check you have the load function imported from pickle?

  4. Avatar
    Bikram Kachari December 1, 2017 at 4:59 pm #

    Hi Jason

    I am a regular follower of your tutorials. They are great. I got to learn a lot. Thank you so much. Please keep up the good work

  5. Avatar
    maibam December 1, 2017 at 7:05 pm #

    Layer (type) Output Shape Param # Connected to
    input_2 (InputLayer) (None, 34) 0
    input_1 (InputLayer) (None, 4096) 0
    embedding_1 (Embedding) (None, 34, 256) 1940224 input_2[0][0]
    dropout_1 (Dropout) (None, 4096) 0 input_1[0][0]
    dropout_2 (Dropout) (None, 34, 256) 0 embedding_1[0][0]
    dense_1 (Dense) (None, 256) 1048832 dropout_1[0][0]
    lstm_1 (LSTM) (None, 256) 525312 dropout_2[0][0]
    add_1 (Add) (None, 256) 0 dense_1[0][0]
    dense_2 (Dense) (None, 256) 65792 add_1[0][0]
    dense_3 (Dense) (None, 7579) 1947803 dense_2[0][0]
    Total params: 5,527,963
    Trainable params: 5,527,963
    Non-trainable params: 0

    ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (306404, 7, 7, 512)

    Getting error during mode.fit
    model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

    Keras 2.0.8 with tensorflow
    what is wrong ?

    • Avatar
      Jason Brownlee December 2, 2017 at 8:51 am #

      Not sure, did you copy all of the code exactly?

      Is your numpy and tensorflow also up to date?

      • Avatar
        Christian January 16, 2018 at 10:09 pm #

        This looks like he did change the network for feature extraction. When using include_top=False and wheigts=’imagenet” you get this type of data structure.

    • Avatar
      Kingson June 26, 2018 at 9:54 pm #

      @maibam did you find the solution?

      I am getting similar error –
      ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (17952, 7, 7, 512)

      Please help me out.

      • Avatar
        Jason Brownlee June 27, 2018 at 8:18 am #

        Ensure your version of Keras is up to date. v2.1.6 or better.

        • Avatar
          Kingson June 27, 2018 at 5:26 pm #

          Layer (type) Output Shape Param # Connected to
          input_2 (InputLayer) (None, 27) 0
          input_1 (InputLayer) (None, 4096) 0
          embedding_1 (Embedding) (None, 27, 256) 1058048 input_2[0][0]
          dropout_1 (Dropout) (None, 4096) 0 input_1[0][0]
          dropout_2 (Dropout) (None, 27, 256) 0 embedding_1[0][0]
          dense_1 (Dense) (None, 256) 1048832 dropout_1[0][0]
          lstm_1 (LSTM) (None, 256) 525312 dropout_2[0][0]
          add_1 (Add) (None, 256) 0 dense_1[0][0]
          dense_2 (Dense) (None, 256) 65792 add_1[0][0]
          dense_3 (Dense) (None, 4133) 1062181 dense_2[0][0]
          Total params: 3,760,165
          Trainable params: 3,760,165
          Non-trainable params: 0
          Traceback (most recent call last):
          File “train2.py”, line 179, in
          model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

          ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (10931, 7, 7, 512)

          keras version is – 2.2.0

          Please help me out.

          • Avatar
            Jason Brownlee June 28, 2018 at 6:13 am #

            Looks like the dimensions of your data do not match the expectations of the model.

            You can change the data or change the model.

          • Avatar
            anesh August 7, 2018 at 4:36 pm #

            If you changed the input shape by include_top=False then you have to flatten it and add two FC dense layer of 4096 neurons.

  6. Avatar
    Vik December 2, 2017 at 7:16 pm #

    Thank you for the article. It is great to see full pipeline.
    Always following your articles with admiration

  7. Avatar
    Gonzalo Gasca Meza December 4, 2017 at 10:42 am #

    In the prepare data section, if using Python 2.7 there is no str.maketrans method.
    To make this work just comment that line and in line 46 do this:
    desc = [w.translate(None, string.punctuation) for w in desc]

    • Avatar
      Jason Brownlee December 4, 2017 at 4:57 pm #

      Thanks Gonzalo!

    • Avatar
      Bani March 8, 2018 at 4:26 am #

      after using the function to_vocabulary()
      I am getting a vocabulary of size 24 which is too less though I have followed the code line by line.
      Can u help?

      • Avatar
        Jason Brownlee March 8, 2018 at 6:36 am #

        Are you able to confirm that your Python is version 3.5+ and that you have the latest version of all libraries installed?

  8. Avatar
    Minel December 11, 2017 at 6:17 pm #

    Hi Jason,
    I am using your code step by step. There is a light mistake :
    you wrote
    # save descriptions
    save_doc(descriptions, ‘descriptions.txt’)

    in fact the right intruction is
    # save descriptions
    save_descriptions(descriptions, ‘descriptions.txt’)

    as you wrote in the final example

  9. Avatar
    Minel December 11, 2017 at 6:34 pm #

    Hi jason
    Another small detail. I had to write
    from pickle import load
    to run the instruction
    all_features = load(open(filename, ‘rb’))


  10. Avatar
    Minel December 11, 2017 at 9:32 pm #

    Hi Jason,
    I met some trouble running your code. I got a MemoryError on the instruction :
    return array(X1), array(X2), array(y)

    I am using a virtual machine with Linux (Debian), Python3, with 32Giga of memory.
    Could you tell me what was the size of the memory on the computer you used to check your program ?


  11. Avatar
    Minel December 12, 2017 at 11:34 pm #

    Thank for the advice.In fact, I upgraded the VM (64Go, 16 cores) and it worked fine (using 45Go of memory)

    • Avatar
      Jason Brownlee December 13, 2017 at 5:35 am #

      Nice! Glad to hear it.

      • Avatar
        Vineeth March 3, 2018 at 12:32 am #

        I get the same error even with 64GB VM :/ What to do

        • Avatar
          Jason Brownlee March 3, 2018 at 8:13 am #

          I’m sorry to hear that, perhaps there is something else going on with your workstation?

          I can confirm the example works on workstations and on EC2 instances with and without GPUs.

          • Avatar
            Vineeth March 3, 2018 at 10:06 pm #

            It’s throwing a Value error for input_1 after sometime. I tried everything i can but i am not able to understand. Can you paste the link of your project so i can compare ?

          • Avatar
            Jason Brownlee March 4, 2018 at 6:03 am #

            Are you able to confirm that your Python environment is up to date?

          • Avatar
            Vineeth March 3, 2018 at 10:26 pm #

            And sir, You said the pickle size must be about 127Mb but mine turns out to be above 700MB what did i do wrong ?

          • Avatar
            Jason Brownlee March 4, 2018 at 6:04 am #

            The size may be different on different platforms (macos/linux/windows).

  12. Avatar
    Josh Ash December 17, 2017 at 9:56 pm #

    Hi Jason – hello from Queensland 🙂
    Your tutorials on applied ML in Python are the best on the net hands down, thanks for putting them together!

  13. Avatar
    Madhivarman December 18, 2017 at 7:12 pm #

    hai Jason.. When i run the train.py script my lap freeze…I don’t know whether its training or not.Did anyone face this issue ?


  14. Avatar
    Muhammad Awais December 20, 2017 at 3:36 pm #

    Thanks for such a great work. I found an error message when running a code
    FileNotFoundError: [Errno 2] No such file or directory: ‘descriptions.txt’
    Please help

    • Avatar
      Jason Brownlee December 20, 2017 at 3:50 pm #

      Ensure you generate the descriptions file before running the prior model – check the tutorial steps again and ensure you execute each in turn.

  15. Avatar
    Daniel F December 21, 2017 at 4:31 am #

    Hi Jason,

    I’m getting a MemoryError when I try to prepare the training sequences:

    Traceback (most recent call last):
    File “C:\Users\Daniel\Desktop\project\deeplearningmodel.py”, line 154, in
    X1train, X2train, ytrain = create_sequences(tokenizer, max_length, train_descriptions, train_features)
    File “C:\Users\Daniel\Desktop\project\deeplearningmodel.py”, line 104, in create_sequences
    out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]
    File “C:\Program Files\Anaconda3\lib\site-packages\keras\utils\np_utils.py”, line 24, in to_categorical
    categorical = np.zeros((n, num_classes))

    any advice? I have 8GB of RAM.

  16. Avatar
    zonetrooper32 December 28, 2017 at 3:12 am #

    Hi Jason,

    Thank you for this amazing article about image captioning.

    Currently I am trying to re-implement the whole code, except that I am doing it in pure Tensorflow. I’m curious to see if my re-implementation is working as smooth as yours.

    Also a shower thought, it might be better to get a better vector representations for words if using the pretrained word2vec embeddings, for example Glove 6B or GoogleNews. Learning embeddings from scratch with only 8k words might have some performance loss.

    Again thank you for putting everything together, it will take quite some time to implement from scratch without your tutorial.

    • Avatar
      Jason Brownlee December 28, 2017 at 5:26 am #

      Try it and see if it lifts model skill. Let me know how you go.

  17. Avatar
    Sasikanth January 8, 2018 at 5:04 pm #

    Hello Jason,
    Is there a R package to perform modeling of images?


  18. Avatar
    Marco January 16, 2018 at 10:08 pm #

    Hi Jason! Thanks for your amazing tutorial! I have a question. I don’t understand the meaning of the number 1 on this line (extract_features):
    image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))

    Can you explain me what reshape does and the meaning of the arguments?

    Thanks in advance.

  19. Avatar
    junhyung yu January 22, 2018 at 8:54 pm #

    Hi Jason! thank you for your great code.
    but i have one question.

    How long does it take to execute under code?

    # define the model
    model = define_model(vocab_size, max_length)

    This code does not run during the third day.

    I think that “se3 = LSTM(256)(se2)” code in define_model function is causing the problem.

    My computer configuration is like this.

    Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz – 6 core
    Ram 62G
    GeForce GTX TITAN X – 2core

    please help me~~

    • Avatar
      Jason Brownlee January 23, 2018 at 7:55 am #

      Ouch, something is wrong.

      Perhaps try running on AWS?

      Perhaps try other models and test your rig/setup?

      Perhaps try fewer epochs or a smaller model to see if your setup can train the model at all?

      • Avatar
        junhyung yu January 23, 2018 at 3:29 pm #

        1. No. i try running on my indicvdual linux server and using jupyter notebook

        2. No i am using only your code , no other model, no modify


        model.fit([X1train, X2train], ytrain, epochs=20, verbose=1, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

        This code has not yet been executed

        so I do not think epoch is a problem.

        • Avatar
          Jason Brownlee January 24, 2018 at 9:50 am #

          Perhaps run from the command line as a background process without notebook?

          Perhaps check memory usage and cpu/gpu utilization?

  20. Avatar
    krishna January 23, 2018 at 10:41 pm #

    ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

    hi sir… I am getting this error above when i run feature extract code.

    • Avatar
      Jason Brownlee January 24, 2018 at 9:55 am #

      Sorry, I have not seen that error.

    • Avatar
      Hiroshi February 26, 2018 at 1:01 pm #

      Hi Krishna,

      I’m also getting this error time to time. Were you able to solve this issue?

    • Avatar
      anesh August 7, 2018 at 4:40 pm #

      You have to connect to the internet to download the vgg network.

  21. Avatar
    Sathiya_Chakra January 28, 2018 at 7:05 am #

    Hi Jason!

    Is it possible to run this neural network on a 8GB RAM laptop with 2GB Graphics card with Intel core i5 processor?

    • Avatar
      Jason Brownlee January 28, 2018 at 8:28 am #


      You might need to adjust it to use progressive loading so that it does not try to hold the entire dataset in RAM.

      • Avatar
        sandhya November 20, 2018 at 4:56 am #

        Hi jason

        Is it possible to run on cpu with progressive loading without any issues??

  22. Avatar
    Ajit Tiwari January 29, 2018 at 10:46 pm #

    Hi Jason,
    Can you provide a link for the tokenizer as well as the model file.
    I Cannot train this model in my system but would like to see if I can use it to create an Android app

  23. Avatar
    Soumya February 1, 2018 at 10:19 pm #

    When I am running

    tokenizer = Tokenizer()

    I am getting error,

    Traceback (most recent call last):
    File “”, line 1, in
    NameError: name ‘Tokenizer’ is not defined

    How to solve this. Any idea please.

  24. Avatar
    Marco February 9, 2018 at 12:41 am #

    Hi Jason, thanks for the tutorial! I want to ask you if you could explain (or send me some links), to better understand, how exactly the fitting works.

    Example description: the girl is …

    The LSTM network during fitting takes the beginning of the sequence of my description (startseq) and it produces a vector with all possible subsequent words. This vector is combined with the vector of the input image features and it is passed within an FF layer where we then take the most probable word (with softmax). it’s right?

    At this point how does the fitting go on? Is the new sequence (e.g startseq – the) passed into the LSTM network, predicts all possible next words, etc.? Continuing this way up to endseq?

    If the network incorrectly generates the next word, what happens? How are the weights arranged? The fitting continues by taking in input “startseq – wrong_word” or continues with the correct one (eg startseq – the)?

    Thanks for your help

  25. Avatar
    Sumit Das February 13, 2018 at 6:10 pm #

    Hi Jason great article on caption generator i think the best till now available online.. i am a newbee in ML(AI). i extracted the features and stored it to features.pkl file but getting an error on create sequence functions memory error and i can see you have suggested progressive loading i do not get that properly could you suggest my how to use the current code modified for progressive loading::

    [‎2/‎13/‎2018 12:34 PM] Sanchawat, Hardik:
    Using TensorFlow backend.
    Dataset: 6000
    Descriptions: train=6000
    Photos: train=6000
    Vocabulary Size: 7579
    Description Length: 34
    Traceback (most recent call last):
    File “C:\Users\hardik.sanchawat\Documents\Scripts\flickr\test.py”, line 154, in
    X1train, X2train, ytrain = create_sequences(tokenizer, max_length, train_descriptions, train_features)
    File “C:\Users\hardik.sanchawat\Documents\Scripts\flickr\test.py”, line 109, in create_sequences
    return array(X1), array(X2), array(y)

    My system configuration is :

    OS: Windows 10
    Processor: AMD A8 PRO-7150B R5, 10 Compute Cores 4C+6G 1.90 GHz
    Memory(RAM): 16 GB (14.9GB Usable)
    System type: 64-bit OS, x64-based processor

  26. Avatar
    Kavya February 14, 2018 at 8:35 am #

    Hi Jason,

    I am trying to using plot _model . but I getting error

    raise ImportError(‘Failed to import pydot. You must install pydot’

    ImportError: Failed to import pydot. You must install pydot and graphviz for pydotprint to work.

    I tried
    conda install graphviz
    conda install pydotplus

    to install pydot.
    my python version is3.x
    eras vesion is 2.1.3

    Could you please help me , to solve this problem

    • Avatar
      Jason Brownlee February 14, 2018 at 2:40 pm #

      I’m sorry to hear that.

      Perhaps the installed libraries are not available in your current Python environment?

      Perhaps try posting the error to stackoverflow? I’m not an expert at debugging workstations.

    • Avatar
      Vineeth February 14, 2018 at 5:13 pm #

      If you are on windows go here and install this, https://graphviz.gitlab.io/_pages/Download/Download_windows.html 2.38 stable msi file.

      after that, add the graphviz’s bin onto your system PATH variables. Restart your computer and the path should be picked up.

      Then you won’t have that error again.

      • Avatar
        Kavya February 17, 2018 at 2:36 pm #

        Thanks Vinneth,
        I am using Mac. I tried toes pydotplus, but still its giving same error.

    • Avatar
      Precious Angrish May 2, 2018 at 10:34 am #


      I am getting the same error, how did you fix it?

      Precious Angrish

    • Avatar
      Sayan May 14, 2018 at 3:05 am #

      Hey Kavya i assume this will surely resolve your error , as it also worked for me as well, https://stackoverflow.com/questions/36869258/how-to-use-graphviz-with-anaconda-spyder.

  27. Avatar
    Vineeth February 14, 2018 at 9:02 pm #

    I used Progressive Loading from https://machinelearningmastery.com/prepare-photo-caption-dataset-training-deep-learning-model/#comment-429470 This tutorial and updated the input layer to inputs1 = Input(shape=(224, 224, 3))

    And i got the error
    ValueError: Error when checking target: expected dense_3 to have 4 dimensions, but got array with shape (13, 4485)

    Then i updated to_categorical function as you mentioned and the error changed to this
    ValueError: Error when checking target: expected dense_3 to have 4 dimensions, but got array with shape (13, 1, 4485)

    Been trying to figure out the exact input shapes of the model since 2 days please help 🙁

    • Avatar
      Srinath Hanumantha Rao March 21, 2018 at 7:58 pm #

      Hey Vineeth!

      Were you able to solve this issue? I am stuck on this for a few days too.

      • Avatar
        Jason Brownlee March 22, 2018 at 6:21 am #

        Are you able to confirm your Python and Keras versions?

  28. Avatar
    Alex February 21, 2018 at 12:30 am #

    Hi Jason, why do you apply dropout to the input instead to applying it to the dense layer?

    • Avatar
      Jason Brownlee February 21, 2018 at 6:40 am #

      I used a little experimentation to come up with the model.

      Try changing it up and see if you can lift skill or reduce training time or model complexity Alex. I’m eager to hear how you go.

  29. Avatar
    Sunny February 28, 2018 at 7:23 am #

    Hi Jason,

    I just wanted to know that when you are loading the training data, you are tokenizing the train descriptions. But when you are working with test data, you are not tokenizing the test descriptions, instead working with the previous tokens. Shouldn’t the test descriptions be tokenized too before passing to create_sequence for test ?

  30. Avatar
    Hgarrison March 7, 2018 at 8:44 am #

    Hi Jason,

    This tutorial is of great help to us all, I think. I have a question: Does the model eventually learn to predict captions not present in the corpus? I mean, is it possible for the model to output sentences that are never seen before? In the example you give, the model predicted “startseq dog is running across the beach endseq”. Is this sentence found in the training corpus, or did the model make it up based on previous observations? And also, If it is possible for the model to combine sentences, how much training data do you think it needs to do that?

    • Avatar
      Jason Brownlee March 7, 2018 at 3:04 pm #

      The model attempts to generalize beyond what it has seen during training.

      In fact, this is the goal with a machine learning model.

      Nevertheless, the model will be bounded by the types of text and images seen during training, just not the specific combinations.

  31. Avatar
    Giuseppe March 8, 2018 at 12:05 am #

    Hi Jeson, I have a question. What exactly is the LSTM used for? During fitting it takes an input (eg startseq – girl) and outputs a vector of 256 elements that contain the most probable words after the prefix? Is it trained through backpropagation? The purpose of the fitting is to make sure that given a prefix / input the LSTM gives me back a vector that represents “better” the possible following words (which are then merge with the features, etc …)

    • Avatar
      Jason Brownlee March 8, 2018 at 6:32 am #

      It is used for interpreting the text generated so far, needed to generate the next word.

  32. Avatar
    fatma March 16, 2018 at 8:16 pm #

    Hi Jason,

    for the line:

    features = dict()

    I got syntaxerror: invalid syntax

    How can I fix this error?

    • Avatar
      Jason Brownlee March 17, 2018 at 8:36 am #

      Perhaps double check that you have copied the code while maintaining white space?

      Perhaps confirm Python 3?

  33. Avatar
    fatma March 20, 2018 at 10:21 pm #

    Hi Jason,

    is the following line:

    model = Model(inputs=model.inputs, outputs=model.layers[-1].output)

    means we will save the features of fc2 layer of the vgg16 model?

    • Avatar
      Jason Brownlee March 21, 2018 at 6:33 am #

      We are creating a new model without the last layer.

      • Avatar
        fatma March 21, 2018 at 3:54 pm #

        the new model doesn’t contain any fully connected layer because I read that we can extract the features from the fc2 layers of the pre-trained model also

        • Avatar
          fatma March 21, 2018 at 4:35 pm #

          when I run the line model.summary() I got the last layer is :

          block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808

          but according to the VGG16 it should be

          fc2 (Dense) (None, 4096) 16781312 fc1[0][0]

          I don’t know where is the problem?

          • Avatar
            Saurabh May 6, 2019 at 3:58 pm #

            That is because you must have specified include_top = False in VGG. This will not include the fully connected part of the network.

        • Avatar
          fatma March 23, 2018 at 9:27 pm #

          Hi Jason,

          how we can feed the saved features in the pickle file (features.pkl) to a linear regression model

          • Avatar
            Jason Brownlee March 24, 2018 at 6:27 am #

            That would be a lot of input features! Sorry, I don’t have a worked example.

  34. Avatar
    Akash March 21, 2018 at 7:04 am #

    ValueError: Error when checking input: expected input_1 to have shape (None, 4096) but got array with shape (0, 1)

    I am getting this error..can anyone help me understand and fix it?

    • Avatar
      Jason Brownlee March 21, 2018 at 3:03 pm #

      Are you able to confirm that you have Python3 and all libs up to date?

      • Avatar
        Akash March 21, 2018 at 9:16 pm #

        Yes all my libraries are upto date, have checked.
        I solved the problem i posted before….my problem was in the data generator.
        I am using progressive loading.After fixing the problem i checked my inputs using this code:

        generator = data_generator(descriptions, tokenizer, max_length)
        inputs, outputs = next(generator)

        and it’s giving me an output like this:

        (13, 224, 224, 3)
        (13, 28)
        (13, 4485)

        but now it’s showing this error:
        ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (8, 224, 224, 3)

        do i have to change the model architecture for progressive loading??

        NOTE:for progressive loading have used this code:https://machinelearningmastery.com/prepare-photo-caption-dataset-training-deep-learning-model/

        • Avatar
          Steven March 22, 2018 at 9:57 pm #

          I am stock with the same issue. The example above runs me into memory problems even when I tried it using AWS EC2 g2.2xlarge instance or a laptop with 16 GB RAM. So I tried the progressive loading example you referred to frequently but I have the same trouble with the input of the model. I tried to use inputs[0] as inputs1 for the define_model function but that returned the error ‘Error when checking input: expected input_13 to have 5 dimensions, but got array with shape (13, 224, 224, 3)’. Do I have to reshape input[0], or is the problem in inputs2?

          • Avatar
            Akash March 23, 2018 at 6:29 pm #

            I think the model architecture needs to be changed for the progressive loading example particularly the input shapes.

    • Avatar
      Harsha April 2, 2018 at 9:27 pm #

      getting the same error for me
      File “fittingmodel.py”, line 189, in
      model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))
      File “C:\Users\pranyaram\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py”, line 1630, in fit
      File “C:\Users\pranyaram\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py”, line 1476, in _standardize_user_data
      File “C:\Users\pranyaram\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py”, line 123, in _standardize_input_data
      ValueError: Error when checking input: expected input_1 to have shape (4096,) but got array with shape (1,)

      • Avatar
        Jason Brownlee April 3, 2018 at 6:33 am #

        What version of libs are you using?

        Here’s what I’m running:

  35. Avatar
    Tanisha March 31, 2018 at 5:50 pm #

    Hi Jason,
    Thanks for the article.

    Due to lack of resources I tried running this in small amount of data.Everything worked fine but the generating new description part is giving this error.

    C:\Users\Tanisha\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
    from ._conv import register_converters as _register_converters
    Using TensorFlow backend.
    2018-03-31 12:07:43.176707: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
    2018-03-31 12:07:43.574792: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1212] Found device 0 with properties:
    name: GeForce 820M major: 2 minor: 1 memoryClockRate(GHz): 1.25
    pciBusID: 0000:08:00.0
    totalMemory: 2.00GiB freeMemory: 1.65GiB
    2018-03-31 12:07:43.584220: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1283] Ignoring visible gpu device (device: 0, name: GeForce 820M, pci bus id: 0000:08:00.0, compute capability: 2.1) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.0.
    Traceback (most recent call last):
    File “7_generate_discription.py”, line 72, in
    description = generate_desc(model, tokenizer, photo, max_length)
    File “7_generate_discription.py”, line 48, in generate_desc
    yhat = model.predict([photo,sequence], verbose=0)
    File “C:\Users\Tanisha\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\engine\training.py”, line 1817, in predict
    File “C:\Users\Tanisha\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\engine\training.py”, line 123, in _standardize_input_data
    ValueError: Error when checking : expected input_2 to have shape (25,) but got array with shape (34,)

    Any idea how can i fix this ?

    • Avatar
      Jason Brownlee April 1, 2018 at 5:46 am #

      Are you able to confirm that your Keras version and TF are up to date?

      Did you copy all of the code as is?

      • Avatar
        Tanisha April 5, 2018 at 11:52 am #

        Yeah those two are updated i just changed “max_length = 34” to “max_length = 25” in the code and now its working.

        • Avatar
          Jason Brownlee April 5, 2018 at 3:13 pm #

          I’m glad to hear you worked it out.

        • Avatar
          Saurabh May 6, 2019 at 4:02 pm #

          Changing max_length did not give any error to you?

  36. Avatar
    Harsha April 1, 2018 at 2:46 pm #

    i am getting this error
    X1train, X2train, ytrain = create_sequences(tokenizer, max_length, train_descriptions, train_features)
    File “fittingmodel.py”, line 109, in create_sequences
    return array(X1), array(X2), array(y)

  37. Avatar
    pramod choudhari April 1, 2018 at 4:07 pm #

    what backend are you using??

  38. Avatar
    anurag vats April 2, 2018 at 3:26 pm #

    can some one give me this file “model-ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5”
    my pc don’t have enough processing power .

  39. Avatar
    Harsha April 2, 2018 at 6:14 pm #

    ile “fittingmodel.py”, line 189, in
    model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))
    File “C:\Users\pranyaram\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py”, line 1522, in fit
    File “C:\Users\pranyaram\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py”, line 1378, in _standardize_user_data
    File “C:\Users\pranyaram\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py”, line 144, in _standardize_input_data
    ValueError: Error when checking input: expected input_1 to have shape (None, 4096) but got array with shape (0, 1)

    • Avatar
      Jason Brownlee April 3, 2018 at 6:32 am #

      Are you able to confirm that you are using Python 3 and that your version of Keras is up to date?

      • Avatar
        Harsha April 3, 2018 at 2:31 pm #

        which keras version should i use

        • Avatar
          Jason Brownlee April 4, 2018 at 6:04 am #

          The most recent.

          • Avatar
            Harsha April 4, 2018 at 1:42 pm #

            even still i am getting the same error once check the model training file how to reduce the training size to avoid memory error.

          • Avatar
            Jason Brownlee April 5, 2018 at 5:52 am #

            You can use progressive loading to reduce the memory requirements for the model.

            Update: I have updated the tutorial to include an example of training using progressive loading (a data generator).

  40. Avatar
    Lazuardi April 3, 2018 at 3:44 am #

    Hello, Jason! Thank you for your tutorial.

    I tried to use pre-trained model and copy-paste the code above to my Anaconda python 3.6 and Keras version of 2.1.5. First, it will run smoothly without any problem, and it begins to crawl on several image files. Unfortunately, after a while, I get this kind of error:

    “OSError: cannot identify image file ‘Flicker8k_Dataset/find.py”

    Any idea what is wrong? I am running it on my laptop with GPU NVIDIA GeForce 1050 Ti with Intel Core i7-7700HQ with Windows 10 OS.

    Thank you in advance!

    • Avatar
      Jason Brownlee April 3, 2018 at 6:40 am #

      Looks like something very strange is going on.

      I have not seen this error. Perhaps try running from the commandline, often notebooks and IDEs introduce new and crazy faults of their own.

  41. Avatar
    goutham April 4, 2018 at 1:48 pm #

    Using TensorFlow backend.
    Dataset: 6000
    Descriptions: train=6000
    Photos: train=6000
    Vocabulary Size: 7579
    Description Length: 34
    Traceback (most recent call last):
    File “model_fit.py”, line 154, in
    X1train, X2train, ytrain = create_sequences(tokenizer, max_length, train_descriptions, train_features)
    File “model_fit.py”, line 109, in create_sequences
    return array(X1), array(X2), array(y)

    how to reduce the training size to avoid this error.

    • Avatar
      Jason Brownlee April 5, 2018 at 5:52 am #

      You can use progressive loading to reduce the memory requirements for the model.

    • Avatar
      Belgaroui April 15, 2018 at 10:22 pm #

      I got the same error “OSError: cannot identify image file ‘Flicker8k_Dataset/desktop.ini'” did you fix it?

      • Avatar
        Jason Brownlee April 16, 2018 at 6:10 am #

        Looks like you have a windows file called desktop.ini in the directory for some reason. Delete it.

  42. Avatar
    harsha April 4, 2018 at 5:58 pm #

    Hi, can you provide me the weights file. My laptop is having 12GB RAM, NVIDIA GeForce 820M Graphics, all supported drivers. But Iam getting the memory error issue.

    I have tried progressive loading also.. But it is not working.. It is not saving the weights file even after steps per epoch=70000 is completed even. I cant afford for the AWS.
    So, I request you to give me the weights file.
    Thanks in advance.

    • Avatar
      Jason Brownlee April 5, 2018 at 5:53 am #

      Sorry, I cannot share the weights file.

      I will schedule time into updating the tutorial to add a progressive loading example.

      Update: I have updated the tutorial to include an example of training using progressive loading (a data generator).

  43. Avatar
    manish April 5, 2018 at 12:58 am #

    I got an error while generating the captions.

    Here is the error:

    Traceback (most recent call last):
    File “generate_captions5.py”, line 64, in
    tokenizer = load(open(‘descriptions.txt’, ‘rb’))
    _pickle.UnpicklingError: could not find MARK

    • Avatar
      Jason Brownlee April 5, 2018 at 6:09 am #

      I have not seen this error before, sorry. Perhaps try running the code again?

  44. Avatar
    harsha April 5, 2018 at 4:50 am #

    startseq man in red shirt is standing on the street endseq

    caption is generating but it is giving same caption for different images.

    • Avatar
      Jason Brownlee April 5, 2018 at 6:15 am #

      Perhaps your model requires further training?

    • Avatar
      Mohankumar Balasubramaniyam May 3, 2019 at 12:42 am #

      Hi I am also facing the same issue. Can you tell what you did to overcome the problem @harsha

      • Avatar
        Sayak Paul January 23, 2020 at 6:04 pm #

        Same issue I am facing as well.

        • Avatar
          Roy June 12, 2020 at 4:56 am #

          Hey, have you figured out the problem?

          • Avatar
            Rohan December 21, 2020 at 2:52 am #

            I am having the same issue as well. I first did it will the MS COCO dataset because it has many more images and captions, but when I ran into the issue, I followed the tutorial with the Flicker Dataset and I am running into the same issue again. Has anyone figured out the solution?

          • Avatar
            Jason Brownlee December 21, 2020 at 6:40 am #

            Are you able to confirm your tensorflow and keras versions?

          • Avatar
            Rohan December 22, 2020 at 1:56 am #

            My TensorFlow version is 2.3.1 and my Keras version is 2.4.3. However, I am using the keras built into tensorflow.

          • Avatar
            Jason Brownlee December 22, 2020 at 6:49 am #

            The versions look good.

            Perhaps these instructions will help you copy the code without error:

          • Avatar
            Rohan December 24, 2020 at 2:24 am #

            I had copied the code correctly, but I had been using the data generator because the COCO dataset has so much data. When I tried again with the Flicker dataset, I used the data generator as well, because I wasn’t sure if my 16 gigs of RAM would be enough to load all the data in at once. I am trying again, but without the data generator. I hope it works

          • Avatar
            Rohan December 24, 2020 at 3:50 am #

            It is not generating the exact same caption for each image, but it does place “a man in a red shirt is” at the beginning of each caption and the captions do not seem to be accurate.

          • Avatar
            Jason Brownlee December 24, 2020 at 5:36 am #

            Perhaps try training the model again?
            Perhaps select a different final model?
            Perhaps tune the learning parameters?

  45. Avatar
    manish April 5, 2018 at 2:16 pm #

    val-loss is improving up to 3 epoches only, there’s no any improvement in further epoches.

    model-ep003-loss3.662-val_loss3.824.h5. This is the last epoche that has improved till now.

  46. Avatar
    SAI April 8, 2018 at 12:49 am #

    File “”, line 1, in
    runfile(‘C:/Users/Owner/.spyder-py3/ML/4.py’, wdir=’C:/Users/Owner/.spyder-py3/ML’)

    File “C:\Users\Owner\Anaconda_3\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 705, in runfile
    execfile(filename, namespace)

    File “C:\Users\Owner\Anaconda_3\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 102, in execfile
    exec(compile(f.read(), filename, ‘exec’), namespace)

    File “C:/Users/Owner/.spyder-py3/ML/4.py”, line 161, in
    model = define_model(vocab_size, max_length)

    File “C:/Users/Owner/.spyder-py3/ML/4.py”, line 129, in define_model
    plot_model(model, to_file=’model.png’, show_shapes=True)

    File “C:\Users\Owner\Anaconda_3\lib\site-packages\keras\utils\vis_utils.py”, line 135, in plot_model
    dot = model_to_dot(model, show_shapes, show_layer_names, rankdir)

    File “C:\Users\Owner\Anaconda_3\lib\site-packages\keras\utils\vis_utils.py”, line 56, in model_to_dot

    File “C:\Users\Owner\Anaconda_3\lib\site-packages\keras\utils\vis_utils.py”, line 31, in _check_pydot
    raise ImportError(‘Failed to import pydot. You must install pydot’

    ImportError: Failed to import pydot. You must install pydot and graphviz for pydotprint to work.

    getting this even if i installed pydot and graphviz

    • Avatar
      Jason Brownlee April 8, 2018 at 6:22 am #

      Perhaps restart your machine?

      Perhaps comment out the part where you visualize the model?

    • Avatar
      deep_ml April 9, 2018 at 3:23 am #

      getting same error!
      Tried using solution from stackoverflow, upgraded packages..but it ain’t working..

      • Avatar
        Jason Brownlee April 9, 2018 at 6:12 am #

        No problem, just skip that part and proceed. Comment out the plotting of the model.

  47. Avatar
    deep_ml April 9, 2018 at 4:06 pm #

    I have trained the data using progressive loading and I stopped after 4 iterations, with a loss of 3.4952.

    I am unable to understand this part,
    In this simple example we will discard the loading of the development dataset and model checkpointing and simply save the model after each training epoch. You can then go back and load/evaluate each saved model after training to find the one we the lowest loss that you can then use in the next section.

    Do you mean we have to load test set in the same way using progressive loading ?
    Please help me understanding how to load the test set.

    • Avatar
      Jason Brownlee April 10, 2018 at 6:15 am #

      I am suggesting that you may want to load the test data in the existing way and evaluate your model (next section).

  48. Avatar
    Jesia April 11, 2018 at 6:25 pm #

    Error by runing “The complete code example is listed below.” in the Loading Data section:

    Message Body:
    Dataset: 6000
    Descriptions: train=6000
    Traceback (most recent call last):
    File “task2.py”, line 64, in
    train_features = load_photo_features(‘features.pkl’, train)
    File “task2.py”, line 53, in load_photo_features
    features = {k: all_features[k] for k in dataset}
    File “task2.py”, line 53, in
    features = {k: all_features[k] for k in dataset}
    KeyError: ‘878758390_dd2cdc42f6’

    • Avatar
      Jason Brownlee April 12, 2018 at 8:35 am #

      Perhaps confirm that you have the full dataset in place?

      • Avatar
        Jesia April 24, 2018 at 11:22 pm #

        Yes, some images were missed.

        Thank you

  49. Avatar
    Belgaroui April 12, 2018 at 12:31 am #

    Hello sir I’m learning from your articles that I find very informative and educational, I’ve been trying to compile this code :
    # extract features from all images
    directory = ‘Flicker8k_Dataset’
    features = extract_features(directory)
    print(‘Extracted Features: %d’ % len(features))
    # save to file
    dump(features, open(‘features.pkl’, ‘wb’))

    but an error occurred and I don’t understand it can you help me fix it and thanks for all of you
    here’s the mistake I made:
    PermissionError Traceback (most recent call last)
    in ()
    1 # extract features from all images
    2 directory = ‘Flicker8k_Dataset’
    —-> 3 features = extract_features(directory)
    4 print(‘Extracted Features: %d’ % len(features))
    5 # save to file

    in extract_features(directory)
    13 # load an image from file
    14 filename = directory + ‘/’ + name
    —> 15 image = load_img(filename, target_size=(224, 224))
    16 # convert the image pixels to a numpy array
    17 image = img_to_array(image)

    ~\Anaconda3\envs\envir1\lib\site-packages\keras\preprocessing\image.py in load_img(path, grayscale, target_size, interpolation)
    360 raise ImportError(‘Could not import PIL.Image. ‘
    361 ‘The use of array_to_img requires PIL.’)
    –> 362 img = pil_image.open(path)
    363 if grayscale:
    364 if img.mode != ‘L’:

    ~\Anaconda3\envs\envir1\lib\site-packages\PIL\Image.py in open(fp, mode)
    2547 if filename:
    -> 2548 fp = builtins.open(filename, “rb”)
    2549 exclusive_fp = True

    PermissionError: [Errno 13] Permission denied: ‘Flicker8k_Dataset/Flicker8k_Dataset’

    • Avatar
      Jason Brownlee April 12, 2018 at 8:47 am #

      Looks like the dataset is missing or is not available on your workstation.

  50. Avatar
    Seaf April 13, 2018 at 1:33 am #

    Hello sir, Thanks for your effort

    I have trained the data using progressive loading and my machine restarted after 11 iterations,
    how can i continue training from that checkpoint ?

    • Avatar
      Jason Brownlee April 13, 2018 at 6:42 am #

      Load the last saved model, then continue training. As simple as that.

      I doubt more than a handful of epochs is required on this problem.

      • Avatar
        Seaf April 13, 2018 at 12:44 pm #

        thank you !

        i have loaded the last model (‘model_11.h5’) that has 3.445 loss, now it continue training with 5.4461 loss, is that normal ?

        • Avatar
          Jason Brownlee April 13, 2018 at 3:32 pm #

          Interesting, that is a little surprising. I wonder if there is a fault or if indeed the model loss has gotten worse.

          Some careful experiments may be required.

  51. Avatar
    Belgaroui April 13, 2018 at 3:07 am #

    Thank you, I think so too….

    I already downloaded Flicker8k_Datasets and extracted it in the same file where I work with jupyter notebook.

    I consulted Google and Youtube to try to fix this error but in vain…

    I don’t know but could you be so kind as to direct me and help me fix the problem.
    Thank you very much for your efforts…

    • Avatar
      Jason Brownlee April 13, 2018 at 6:43 am #

      What problem?

      • Avatar
        Belgaroui April 14, 2018 at 12:46 am #

        Hi Jason,
        when I try to compile code related to the extracted features from all images I get this error that is “Permission denied” you told me earlier that Looks like the dataset is missing or is not available on my workstation I tried to fix the trick but in vain.
        Do you have any idea how I could do that?
        Do I need a user right or something like that?
        or maybe I need to reload the database?

        *the error :
        ~\Anaconda3\envs\envir1\lib\site-packages\PIL\Image.py in open(fp, mode)2546
        2547 if filename:
        -> 2548 fp = builtins.open(filename, “rb”)
        2549 exclusive_fp = True

        PermissionError: [Errno 13] Permission denied: ‘Flicker8k_Dataset/Flicker8k_Dataset’

        thanks a lot 🙂 🙂

        • Avatar
          Jason Brownlee April 14, 2018 at 6:47 am #

          You appear to have a problem loading the data from your hard drive. Perhaps you stored the data in a location where you/your code does not have permission to read?

          Perhaps you are using a notebook or an IDE as another user?

          Try running from the command line and check file permissions.

  52. Avatar
    @nkish April 14, 2018 at 4:56 pm #

    Thanks Jason. I really appreciate your knowledge and the way you express it to us through your articles, it’s amazing.

  53. Avatar
    Abdallah April 14, 2018 at 7:15 pm #

    Thank you very much mr.jason but I have some problems after download the pretrained model when make the model prediction

    FailedPreconditionError Traceback (most recent call last)
    ~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
    1349 try:
    -> 1350 return fn(*args)
    1351 except errors.OpError as e:

    ~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
    1328 feed_dict, fetch_list, target_list,
    -> 1329 status, run_metadata)

    ~/.local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    472 compat.as_text(c_api.TF_Message(self.status.status)),
    –> 473 c_api.TF_GetCode(self.status.status))
    474 # Delete the underlying status object from memory otherwise it stays alive

    FailedPreconditionError: Attempting to use uninitialized value block1_conv2_5/kernel
    [[Node: block1_conv2_5/kernel/read = Identity[T=DT_FLOAT, _class=[“loc:@block1_conv2_5/kernel”], _device=”/job:localhost/replica:0/task:0/device:CPU:0″](block1_conv2_5/kernel)]]

    During handling of the above exception, another exception occurred:

    FailedPreconditionError Traceback (most recent call last)
    in ()
    24 return features
    25 directory = ‘../ProjectPattern/Flickr8k_Dataset/Flicker8k_Dataset’
    —> 26 features =extract_feature(directory)
    27 dump(features,open(“feature.pkl”,”wb”))

    in extract_feature(directory)
    17 img =preprocess_input(img)
    18 #extract feature by make prediction use the pretrained model
    —> 19 feature = model.predict(img,verbose=0)
    20 #extract img_id
    21 img_id = name.split(‘.’)[0]

    ~/.local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/training.py in predict(self, x, batch_size, verbose, steps)
    1811 f = self.predict_function
    1812 return self._predict_loop(
    -> 1813 f, ins, batch_size=batch_size, verbose=verbose, steps=steps)
    1815 def train_on_batch(self, x, y, sample_weight=None, class_weight=None):

    ~/.local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/training.py in _predict_loop(self, f, ins, batch_size, verbose, steps)
    1306 else:
    1307 ins_batch = _slice_arrays(ins, batch_ids)
    -> 1308 batch_outs = f(ins_batch)
    1309 if not isinstance(batch_outs, list):
    1310 batch_outs = [batch_outs]

    ~/.local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/backend.py in __call__(self, inputs)
    2551 session = get_session()
    2552 updated = session.run(
    -> 2553 fetches=fetches, feed_dict=feed_dict, **self.session_kwargs)
    2554 return updated[:len(self.outputs)]

    ~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    893 try:
    894 result = self._run(None, fetches, feed_dict, options_ptr,
    –> 895 run_metadata_ptr)
    896 if run_metadata:
    897 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

    ~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
    1126 if final_fetches or final_targets or (handle and feed_dict_tensor):
    1127 results = self._do_run(handle, final_targets, final_fetches,
    -> 1128 feed_dict_tensor, options, run_metadata)
    1129 else:
    1130 results = []

    ~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
    1342 if handle is None:
    1343 return self._do_call(_run_fn, self._session, feeds, fetches, targets,
    -> 1344 options, run_metadata)
    1345 else:
    1346 return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

    ~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
    1361 except KeyError:
    1362 pass
    -> 1363 raise type(e)(node_def, op, message)
    1365 def _extend_graph(self):

    FailedPreconditionError: Attempting to use uninitialized value block1_conv2_5/kernel
    [[Node: block1_conv2_5/kernel/read = Identity[T=DT_FLOAT, _class=[“loc:@block1_conv2_5/kernel”], _device=”/job:localhost/replica:0/task:0/device:CPU:0″](block1_conv2_5/kernel)]]

    Caused by op ‘block1_conv2_5/kernel/read’, defined at:
    File “/usr/lib/python3.6/runpy.py”, line 193, in _run_module_as_main
    “__main__”, mod_spec)
    File “/usr/lib/python3.6/runpy.py”, line 85, in _run_code
    exec(code, run_globals)
    File “/home/abdo96/.local/lib/python3.6/site-packages/ipykernel_launcher.py”, line 16, in
    File “/home/abdo96/.local/lib/python3.6/site-packages/traitlets/config/application.py”, line 658, in launch_instance
    File “/home/abdo96/.local/lib/python3.6/site-packages/ipykernel/kernelapp.py”, line 478, in start
    File “/home/abdo96/.local/lib/python3.6/site-packages/zmq/eventloop/ioloop.py”, line 177, in start
    super(ZMQIOLoop, self).start()
    File “/home/abdo96/.local/lib/python3.6/site-packages/tornado/ioloop.py”, line 888, in start
    handler_func(fd_obj, events)
    File “/home/abdo96/.local/lib/python3.6/site-packages/tornado/stack_context.py”, line 277, in null_wrapper
    return fn(*args, **kwargs)
    File “/home/abdo96/.local/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py”, line 440, in _handle_events
    File “/home/abdo96/.local/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py”, line 472, in _handle_recv
    self._run_callback(callback, msg)
    File “/home/abdo96/.local/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py”, line 414, in _run_callback
    callback(*args, **kwargs)
    File “/home/abdo96/.local/lib/python3.6/site-packages/tornado/stack_context.py”, line 277, in null_wrapper
    return fn(*args, **kwargs)
    File “/home/abdo96/.local/lib/python3.6/site-packages/ipykernel/kernelbase.py”, line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
    File “/home/abdo96/.local/lib/python3.6/site-packages/ipykernel/kernelbase.py”, line 233, in dispatch_shell
    handler(stream, idents, msg)
    File “/home/abdo96/.local/lib/python3.6/site-packages/ipykernel/kernelbase.py”, line 399, in execute_request
    user_expressions, allow_stdin)
    File “/home/abdo96/.local/lib/python3.6/site-packages/ipykernel/ipkernel.py”, line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
    File “/home/abdo96/.local/lib/python3.6/site-packages/ipykernel/zmqshell.py”, line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
    File “/home/abdo96/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py”, line 2728, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
    File “/home/abdo96/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py”, line 2850, in run_ast_nodes
    if self.run_code(code, result):
    File “/home/abdo96/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py”, line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
    File “”, line 26, in
    features =extract_feature(directory)
    File “”, line 2, in extract_feature
    model = VGG19()
    File “/home/abdo96/.local/lib/python3.6/site-packages/keras/applications/vgg19.py”, line 117, in VGG19
    x = Conv2D(64, (3, 3), activation=’relu’, padding=’same’, name=’block1_conv2′)(x)
    File “/home/abdo96/.local/lib/python3.6/site-packages/keras/engine/topology.py”, line 590, in __call__
    File “/home/abdo96/.local/lib/python3.6/site-packages/keras/layers/convolutional.py”, line 138, in build
    File “/home/abdo96/.local/lib/python3.6/site-packages/keras/legacy/interfaces.py”, line 91, in wrapper
    return func(*args, **kwargs)
    File “/home/abdo96/.local/lib/python3.6/site-packages/keras/engine/topology.py”, line 414, in add_weight
    File “/home/abdo96/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py”, line 392, in variable
    v = tf.Variable(value, dtype=tf.as_dtype(dtype), name=name)
    File “/home/abdo96/.local/lib/python3.6/site-packages/tensorflow/python/ops/variables.py”, line 229, in __init__
    File “/home/abdo96/.local/lib/python3.6/site-packages/tensorflow/python/ops/variables.py”, line 376, in _init_from_args
    self._snapshot = array_ops.identity(self._variable, name=”read”)
    File “/home/abdo96/.local/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py”, line 127, in identity
    return gen_array_ops.identity(input, name=name)
    File “/home/abdo96/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py”, line 2134, in identity
    “Identity”, input=input, name=name)
    File “/home/abdo96/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py”, line 787, in _apply_op_helper
    File “/home/abdo96/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py”, line 3160, in create_op
    File “/home/abdo96/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py”, line 1625, in __init__
    self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

    FailedPreconditionError (see above for traceback): Attempting to use uninitialized value block1_conv2_5/kernel
    [[Node: block1_conv2_5/kernel/read = Identity[T=DT_FLOAT, _class=[“loc:@block1_conv2_5/kernel”], _device=”/job:localhost/replica:0/task:0/device:CPU:0″](block1_conv2_5/kernel)]]

    • Avatar
      Jason Brownlee April 15, 2018 at 6:25 am #

      Wow. I have not seen this before, sorry.

      Perhaps try searching or posting on stackoverflow?

      • Avatar
        Abdallah April 17, 2018 at 9:18 pm #

        so the problem solved by specifying which weights used not None(random initialization)
        but used pretraining on ‘imagenet’ and specify the include_top argument to be True

  54. Avatar
    Abdallah April 15, 2018 at 9:45 am #

    When using Merged input in model the error below showed
    Thanks in advance

    in ()
    29 plot_model(model,to_file=’model.png’,show_shapes=True,show_layer_names=True)
    30 return model
    —> 31 define_model(vocab_size,max_len)

    in define_model(vocab_size, max_length)
    26 model = Model(inputs=[input1,input2],outputs=output)
    —> 28 model.compile(loss=’categorical_crossentropy’,optimizer=’Adam’)(mask)
    29 plot_model(model,to_file=’model.png’,show_shapes=True,show_layer_names=True)
    30 return model

    ~/.local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/training.py in compile(self, optimizer, loss, metrics, loss_weights, sample_weight_mode, weighted_metrics, target_tensors, **kwargs)
    680 # Prepare output masks.
    –> 681 masks = self.compute_mask(self.inputs, mask=None)
    682 if masks is None:
    683 masks = [None for _ in self.outputs]

    ~/.local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/topology.py in compute_mask(self, inputs, mask)
    785 return self._output_mask_cache[cache_key]
    786 else:
    –> 787 _, output_masks = self._run_internal_graph(inputs, masks)
    788 return output_masks

    ~/.local/lib/python3.6/site-packages/tensorflow/python/layers/network.py in _run_internal_graph(self, inputs, masks)
    897 # Apply activity regularizer if any:
    –> 898 if layer.activity_regularizer is not None:
    899 regularization_losses = [
    900 layer.activity_regularizer(x) for x in computed_tensors

    AttributeError: ‘InputLayer’ object has no attribute ‘activity_regularizer’

    • Avatar
      Jason Brownlee April 16, 2018 at 6:01 am #

      What version of Keras are you using?

      Did you copy all of the code exactly?

      • Avatar
        Abdallah April 16, 2018 at 7:07 pm #

        I used verison 2.1.5
        the another question No, I didn’t copy all the code exactly but I understand the idea and imitate it in some parts and in other parts are written in my own

        • Avatar
          Jason Brownlee April 17, 2018 at 5:56 am #

          Sorry, I cannot help you debug your own modifications.

          • Avatar
            Abdallah April 17, 2018 at 9:10 pm #

            I wrote this problem in the stack overflow but no one answer so I will try to fix this problem in my own Thank you for your answers

          • Avatar
            Jason Brownlee April 18, 2018 at 8:04 am #

            Hang in there.

  55. Avatar
    prateek bansal April 21, 2018 at 4:03 pm #

    Hi, jason brownlee thanks for this fatanstic article.
    I am curios to know that how he while loop is getting stopped in progressive training data genertor function ?
    Please explain this to me

    def data_generator(descriptions, photos, tokenizer, max_length):

    # loop for ever over images
    while 1:
    for key, desc_list in descriptions.items():
    # retrieve the photo feature
    photo = photos[key][0]
    in_img, in_seq, out_word = create_sequences(tokenizer, max_length, desc_list, photo)
    yield [[in_img, in_seq], out_word]

    • Avatar
      Jason Brownlee April 22, 2018 at 5:58 am #

      Note the yield.

      The number of epochs will decide how many times the yeild to the caller will be performed.

  56. Avatar
    Jubaer Hossain April 22, 2018 at 12:31 am #

    Great article indeed! But I’m facing problems downloading the model. Every time I try to download the model with the code you provided, after sometimes the connection gets lost and shows this message: “ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host”

    Can you give any alternate solution to this problem? I have tried several times but failed.

    • Avatar
      Jason Brownlee April 22, 2018 at 6:01 am #

      I’m sorry to hear that, I have some ideas:

      – Perhaps you can review the code in Keras that downloads the model and download it manually?
      – Perhaps you can use an alternate internet connection to download the model?
      – Perhaps you can setup an EC2 instance and download the model there to work with?
      – Perhaps you can ask a friend or peer to download the model for you?

  57. Avatar
    Sailee April 24, 2018 at 4:15 pm #

    Hello Sir,
    Your article is very interesting and easy to understand.

    For the above code I am getting a very accurate caption if I use the same image as you have shown in the figure. But if I use some other image I am getting some description but not a correct one. So could you please tell me what is the problem here?
    Thanks in advance.

    • Avatar
      Jason Brownlee April 25, 2018 at 6:18 am #

      Perhaps try a suite of images to see how the model performs on average?

  58. Avatar
    Jesia April 24, 2018 at 11:36 pm #

    I have trained the data using progressive loading untill 19 iterations.
    Caption for your provided test image is generated. However, for new one( image of rabbit and other animals) i got the caption “dog is running …”.
    Is there a way to train the models more than 19 iterations to get a better result or how to solve this issue?

    thank you

  59. Avatar
    Kingson May 10, 2018 at 11:49 pm #

    Hi Jason,

    Can you please share me full github repository of image captioning?

  60. Avatar
    Sayan May 12, 2018 at 4:25 am #

    Hey , Jason the post is really amazing , but can you help to load me this especially the first step (Keras) which will probably take 1hour in CPU , I wanna test that I’m in GPU , how shall I be able to get that , Keras (GPU) so as to save time tho.
    Thanks Jason.

  61. Avatar
    Sayan May 13, 2018 at 10:33 pm #

    Hey Jason wassup , can you please explain what is meant by these lines :-
    filepath = ‘model-ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5′
    checkpoint = ModelCheckpoint(filepath, monitor=’val_loss’, verbose=1, save_best_only=True, mode=’min’)
    1. The line in filepath especially this – epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f} ?

    • Avatar
      Jason Brownlee May 14, 2018 at 6:34 am #

      It is the name of the file that will be saved with placeholders for specific values of the model at the time of saving.

  62. Avatar
    abbas khan May 20, 2018 at 4:59 pm #

    hey jason!! I ran the code that returns a dictionary of image identifier to image features. but did nt work and gave the following error. Please Guide me how to fix this bug.

    FileNotFoundError Traceback (most recent call last)
    in ()
    39 # extract features from all images
    40 directory = ‘Flicker8k_Dataset’
    —> 41 features = extract_features(directory)
    42 print(‘Extracted Features: %d’ % len(features))
    43 # save to file

    in extract_features(directory)
    18 # extract features from each photo
    19 features = dict()
    —> 20 for name in listdir(directory):
    21 # load an image from file
    22 filename = directory + ‘/’ + name

    FileNotFoundError: [WinError 3] The system cannot find the path specified: ‘Flicker8k_Dataset’

    • Avatar
      Jason Brownlee May 21, 2018 at 6:27 am #

      It looks like you do not have the dataset in the same directory as the code.

      • Avatar
        abbas June 22, 2018 at 4:17 pm #

        jason i have code and dataset in the same directory.I can access a test png image from the same directory but i am unable to access the dataset images..I don’t know whats wrong with it.Please help me solving the issue because i can also access the flick_text dataset.The only issue i have with images dataset.

          • Avatar
            abbas June 25, 2018 at 2:03 pm #

            1) I have installed the latest environment except tensorflow 1.5 becuase higher versions not working for me.
            2) I have dataset and code in the same directory
            3) I ran the code from command line but still found no luck.
            4) I have exactly copied the code.
            5) I searched the error on stackoverflow but never found any authentic solution yet.

          • Avatar
            Jason Brownlee June 25, 2018 at 2:40 pm #

            If you type “ls” is the “Flicker8k_Dataset” directory in the current directory beside the code file/s?

          • Avatar
            abbas July 4, 2018 at 2:18 pm #

            I replaced the relative path(as in the tutorial) with absolute full path and it worked for me

          • Avatar
            Jason Brownlee July 4, 2018 at 2:56 pm #

            Glad to hear it.

          • Avatar
            abbas July 4, 2018 at 2:39 pm #

            Now i am facing an error while running the code “# define the model
            model = define_model(vocab_size, max_length)” in the progressive training section.I have installed pydot and graphviz libraries but still come up with the following error.

            FileNotFoundError Traceback (most recent call last)
            C:\anaconda3\lib\site-packages\pydot.py in create(self, prog, format)
            1877 shell=False,
            -> 1878 stderr=subprocess.PIPE, stdout=subprocess.PIPE)
            1879 except OSError as e:

            C:\anaconda3\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors)
            708 errread, errwrite,
            –> 709 restore_signals, start_new_session)
            710 except:

            C:\anaconda3\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
            996 os.fspath(cwd) if cwd is not None else None,
            –> 997 startupinfo)
            998 finally:

            FileNotFoundError: [WinError 2] The system cannot find the file specified

            During handling of the above exception, another exception occurred:

            Exception Traceback (most recent call last)
            in ()
            1 # define the model
            —-> 2 model = define_model(vocab_size, max_length)

            in define_model(vocab_size, max_length)
            20 # summarize model
            21 model.summary()
            —> 22 plot_model(model, to_file=’model.png’, show_shapes=True)
            23 return model

            C:\anaconda3\lib\site-packages\keras\utils\vis_utils.py in plot_model(model, to_file, show_shapes, show_layer_names, rankdir)
            131 ‘LR’ creates a horizontal plot.
            132 “””
            –> 133 dot = model_to_dot(model, show_shapes, show_layer_names, rankdir)
            134 _, extension = os.path.splitext(to_file)
            135 if not extension:

            C:\anaconda3\lib\site-packages\keras\utils\vis_utils.py in model_to_dot(model, show_shapes, show_layer_names, rankdir)
            53 from ..models import Sequential
            —> 55 _check_pydot()
            56 dot = pydot.Dot()
            57 dot.set(‘rankdir’, rankdir)

            C:\anaconda3\lib\site-packages\keras\utils\vis_utils.py in _check_pydot()
            24 # Attempt to create an image of a blank graph
            25 # to check the pydot/graphviz installation.
            —> 26 pydot.Dot.create(pydot.Dot())
            27 except OSError:
            28 raise OSError(

            C:\anaconda3\lib\site-packages\pydot.py in create(self, prog, format)
            1881 raise Exception(
            1882 ‘”{prog}” not found in path.’.format(
            -> 1883 prog=prog))
            1884 else:
            1885 raise

            Exception: “dot.exe” not found in path.

          • Avatar
            Jason Brownlee July 4, 2018 at 2:57 pm #

            Try commenting out the call to plot_model().

          • Avatar
            abbas July 7, 2018 at 1:53 pm #

            thanks jason! my training is in progress.

          • Avatar
            Jason Brownlee July 8, 2018 at 6:15 am #

            Glad to hear it.

          • Avatar
            abbas July 7, 2018 at 6:17 pm #

            In model evaluation section when i come to run the code
            ” filename = ‘model-ep002-loss3.245-val_loss3.612.h5’
            model = load_model(filename)”
            I come up with the error
            “OSError: Unable to open file (unable to open file: name = ‘model-ep002-loss3.245-val_loss3.612.h5’, errno = 2, error message = ‘No such file or directory’, flags = 0, o_flags = 0)”

            i want to ask where is the file ‘model-ep002-loss3.245-val_loss3.612.h5’??and how to select the file??should i pick up the file with least loss value???

          • Avatar
            Jason Brownlee July 8, 2018 at 6:18 am #

            You must change the filename to the model that you saved while training.

          • Avatar
            abbas July 9, 2018 at 2:16 pm #

            Jason i trained the model upto 20 epochs.Now please explain which model i should use for prediction? and if i should select from 1-5 then why i am running it for 20 epochs?

          • Avatar
            Jason Brownlee July 10, 2018 at 6:40 am #

            The one with the lowest error on a validation set.

          • Avatar
            abbas July 24, 2018 at 2:12 pm #

            I just want to understand the the whole pipeline.The CNN-VGG16 extracts the the features of image to a fixed length 256 vector.The text is cleaned and preprocesed , the RNN-LSTM predicts the next words of the sequence.
            What is the strategy and intuition of the encoder/decoder?
            how these two modalities (image and text) are merged by FF?

          • Avatar
            abbas July 31, 2018 at 3:26 pm #

            What alternative algorithms i can used for photo feature extraction or what extra modifications in the model is likely to perform better results?? or what extra building blocks needs to be added to the current tutorial for getting even refined results?

          • Avatar
            Jason Brownlee August 1, 2018 at 7:38 am #

            I have some suggestions here:

          • Avatar
            abbas August 5, 2018 at 3:55 am #

            Dropout layer usually used to get rid of over-fitting.While Dense layer is usually used to change the dimensions.
            Why Dropout_1 and dropout_2 are not changing the dimensions while we set some of the connections to 0 ??What is the the intuition behind Dropout_1 and Dropout_2 Layer??Please suggest some links or explaination

          • Avatar
            Jason Brownlee August 5, 2018 at 5:38 am #

            Not get rid of, but reduce the likelihood of overfitting.

            You can learn more about the intuitions for dropout here:

          • Avatar
            abbas November 11, 2018 at 3:06 pm #

            Hi Jason!
            I implemented the above mentioned tutorial using VGG16 CNN architecture.Please let me know the code or tutorial that implements Inceptin model for image captioning.

          • Avatar
            Jason Brownlee November 12, 2018 at 5:35 am #

            You can change the example to use inception if you wish.

    • Avatar
      Kanaan October 29, 2019 at 7:16 am #

      Dear abbas,
      kindly how did you solve your problem? I have the same problem :
      PermissionError : [Errno 13] Permission Denied: Flickr8k_Dataset/Flicker8k_Dataset’

      • Avatar
        Jason Brownlee October 29, 2019 at 1:48 pm #

        Use the alternate download for the dataset listed in the tutorial.

  63. Avatar
    wasif May 24, 2018 at 10:56 pm #

    Hi Jason Brownlee! Good tutorial. I doubt how model guarantee to generate semantically correct sentences. Please share your intuition or any available resource. For example, there is three word in vocabulary “is, dog, running”, so how could we guarantee model will generate a sentence with correct grammar structure like ‘dog is running’. Thank you

    • Avatar
      Jason Brownlee May 25, 2018 at 9:27 am #

      Perhaps you can run the generated sentences through another process that corrects grammar.

    • Avatar
      abbas October 27, 2018 at 3:21 pm #

      Sir where can i find the implemented tutorial for extracting features from images using inception v3?

      • Avatar
        Jason Brownlee October 28, 2018 at 6:07 am #

        You can remove the VGG and add the Inception model yourself.

        • Avatar
          abbas November 18, 2018 at 3:28 am #

          I am trying to train my model using inception model.While training i come with the following error.How do i change the Shape of the input?

          # train the model, run epochs manually and save after each epoch
          epochs = 5
          steps = len(train_descriptions)
          for i in range(epochs):
          # create the data generator
          generator = data_generator(train_descriptions, train_features, tokenizer, max_length)
          # fit for one epoch
          model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
          # save model
          model.save(‘inception-model_’ + str(i) + ‘.h5’)

          Error when checking input: expected input_3 to have shape (4096,) but got array with shape (2048,)

          • Avatar
            Jason Brownlee November 18, 2018 at 6:46 am #

            It looks like you model and data have differing shapes, perhaps change the model or change the data.

          • Avatar
            abbas November 19, 2018 at 2:51 pm #

            do you have any working example for changing data dimensions?

          • Avatar
            Jason Brownlee November 20, 2018 at 6:31 am #

            You can learn about the reshape() function here:

          • Avatar
            abbas November 20, 2018 at 7:04 pm #

            my input has dimension of 4096 while its giving error that its 2048.

            Layer (type) Output Shape Param # Connected to
            input_4 (InputLayer) (None, 34) 0
            input_3 (InputLayer) (None, 4096) 0
            embedding_2 (Embedding) (None, 34, 256) 1940224 input_4[0][0]
            dropout_3 (Dropout) (None, 4096) 0 input_3[0][0]
            dropout_4 (Dropout) (None, 34, 256) 0 embedding_2[0][0]
            dense_4 (Dense) (None, 256) 1048832 dropout_3[0][0]
            lstm_2 (LSTM) (None, 256) 525312 dropout_4[0][0]
            add_2 (Add) (None, 256) 0 dense_4[0][0]
            dense_5 (Dense) (None, 256) 65792 add_2[0][0]
            dense_6 (Dense) (None, 7579) 1947803 dense_5[0][0]
            Total params: 5,527,963
            Trainable params: 5,527,963
            Non-trainable params: 0

            Error when checking input: expected input_3 to have shape (4096,) but got array with shape (2048,)

          • Avatar
            Jason Brownlee November 21, 2018 at 7:50 am #

            Looks like there is a mismatch between your data and the model.

          • Avatar
            abbas November 21, 2018 at 3:04 pm #

            so then how to make data and model inter harmony?

          • Avatar
            Jason Brownlee November 22, 2018 at 6:20 am #

            Sorry, I don’t understand, can you elaborate?

    • Avatar
      abbas August 23, 2019 at 4:27 pm #

      jason my model is not loading even the print command is not giving me the output..
      the following block of code is not giving the output..where is the error?

      # load the model
      filename = ‘xraysmodel_8.h5’
      model = load_model(filename)
      # evaluate model
      evaluate_model(model, test_descriptions, test_features, tokenizer, max_length)

  64. Avatar
    Andreas May 26, 2018 at 7:27 am #

    Thanks for the great post.

    I trained the model when I save the image from “http://media.einfachtierisch.de/thumbnail/600/0/media.einfachtierisch.de/images/2017/07/glueckliche-freigaenger-katze-Shutterstock-Olga-Visav_504063007.jpg” then I am still getting the text

    startseq dog is running through the grass endseq

    what do I make wrong?
    The test image appears as intended.

  65. Avatar
    Paul May 28, 2018 at 4:17 pm #

    Hi Jason
    It was a nice article.
    I trained the model for 12 epochs in my gpu.
    But the prediction was not so accurate.
    Most of the times I got the prediction with “man in blue shirt is riding his bohemian on the street” . with the keywords in this sentence.

    Help me out .

    • Avatar
      Jason Brownlee May 29, 2018 at 6:23 am #

      It needs far fewer epochs, try early stopping against a validation set.

    • Avatar
      Andi June 9, 2018 at 10:56 pm #

      HI Paul, have you found a solution to this? I have a similiar issue.

  66. Avatar
    Ravi June 4, 2018 at 10:21 pm #

    Hi jason,
    While progressive loading, we will get 20 models. Which model is choosen for prediction?

    • Avatar
      Jason Brownlee June 5, 2018 at 6:39 am #

      The one with the best skill on the hold out set, likely within epoch 1-5.

  67. Avatar
    Praharsha Singaraju June 5, 2018 at 5:12 pm #

    Hi jason,

    I got the following error when i ran the extract_features function.
    can you please help me fix it?

    field_value = self._fields.get(field)
    TypeError: descriptor ‘_fields’ for ‘OpDef’ objects doesn’t apply to ‘OpDef’ object

  68. Avatar
    Ananya June 14, 2018 at 2:26 pm #

    Hello Jason! I just wanted to know why aren’t we validating the trained model in progressive loading…

    • Avatar
      Jason Brownlee June 14, 2018 at 4:09 pm #

      You can, as I note in the tutorial. The progressive loading is just a small example to help those who don’t have enough RAM to run the main example.

  69. Avatar
    Ananya June 14, 2018 at 2:44 pm #

    I meant to ask, ‘Why cant we simultaneously validate, as in the previous code wherein no progressive loading is used?”

  70. Avatar
    Malik June 15, 2018 at 12:41 pm #

    Finally someone who understands the importance of separating mathematics from ‘implementation’. The drawback most tutorials have is that they try to discuss both simultaneously and hence making things quite confusing. ‘Implementation’ requires a completely different approach from understanding the theory.

    Another wonderful thing about this tutorial is that you actually go through the preprocessing steps. This is where I usually get stuck because most university and online courses and tutorials do not discuss them at all.

  71. Avatar
    DIKSHA SINGLA June 16, 2018 at 5:59 pm #

    Traceback (most recent call last):
    File “C:\Users\hp\AppData\Local\Programs\Python\Python36\lib\site-packages\pydot.py”, line 1861, in create
    stderr=subprocess.PIPE, stdout=subprocess.PIPE)
    File “C:\Users\hp\AppData\Local\Programs\Python\Python36\lib\subprocess.py”, line 709, in __init__
    restore_signals, start_new_session)
    File “C:\Users\hp\AppData\Local\Programs\Python\Python36\lib\subprocess.py”, line 997, in _execute_child
    FileNotFoundError: [WinError 2] The system cannot find the file specified

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
    File “C:\Users\hp\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\utils\vis_utils.py”, line 26, in _check_pydot
    File “C:\Users\hp\AppData\Local\Programs\Python\Python36\lib\site-packages\pydot.py”, line 1867, in create
    raise OSError(*args)
    FileNotFoundError: [WinError 2] “dot.exe” not found in path.

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
    File “C:\Users\hp\Desktop\iitp\caption_new\5.py”, line 163, in
    model = define_model(vocab_size, max_length)
    File “C:\Users\hp\Desktop\iitp\caption_new\5.py”, line 131, in define_model
    plot_model(model, to_file=’model.png’, show_shapes=True)
    File “C:\Users\hp\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\utils\vis_utils.py”, line 133, in plot_model
    dot = model_to_dot(model, show_shapes, show_layer_names, rankdir)
    File “C:\Users\hp\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\utils\vis_utils.py”, line 55, in model_to_dot
    File “C:\Users\hp\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\utils\vis_utils.py”, line 29, in _check_pydot
    pydot failed to call GraphViz.’
    OSError: pydot failed to call GraphViz.Please install GraphViz (https://www.graphviz.org/) and ensure that its executables are in the $PATH.

    • Avatar
      Jason Brownlee June 17, 2018 at 5:38 am #

      Looks like you need to install pygraphviz, or comment out the plotting of the model.

  72. Avatar
    shantanu singh June 19, 2018 at 3:31 pm #

    —-> 1 description = generate_desc(model, tokenizer, photo, max_length)
    2 print(description)

    in generate_desc(model, tokenizer, photo, max_length)
    10 sequence = tokenizer.texts_to_sequences([in_text])[0]
    11 sequence = pad_sequences([sequence], maxlen=max_length)
    —> 12 yhat = model.predict([photo,sequence], verbose=0)
    13 yhat = argmax(yhat)
    14 word = word_for_id(yhat, tokenizer)
    AttributeError: ‘dict’ object has no attribute ‘ndim’

    • Avatar
      Jason Brownlee June 20, 2018 at 6:21 am #

      Ensure that you copy all code for the example.

    • Avatar
      Devesh Pandey May 4, 2019 at 11:51 pm #

      @Shantanu Singh have you resolved your problem, cause I am facing the exact same problem

  73. Avatar
    vinay June 21, 2018 at 12:13 am #

    When i am training, i am getting an vocab length of 8359. It is less than what you are getting.
    Will it be a problem?

  74. Avatar
    mun June 24, 2018 at 6:43 am #

    Hello, i am stuck into this..’startseq’ and ‘endseq’ are not added in the Description.txt file but there is no error when i am running that module

    • Avatar
      Jason Brownlee June 24, 2018 at 7:37 am #

      We add them in the load_clean_descriptions() function after loading the data.

  75. Avatar
    Ben June 24, 2018 at 8:22 am #

    So,I followed exactly all the steps as shown above and after progressive loading,when the model is getting compiled,it keeps on running epoch 1/1 over and over again and keeps saving different .h5 files. So,I stopped the process after 5 iterations and got a loss of ~3.38 and when I am generating captions,it is not giving even close captions. What should I do to improve my results? Should I let the model to be trained for more iterations or will it cause over-fitting?

    • Avatar
      Jason Brownlee June 25, 2018 at 6:16 am #

      The progressive loading example runs epochs manually, not the same epoch again and again.

      Perhaps test each saved model and use the one with the lowest loss to generate captions.

  76. Avatar
    Kingson June 28, 2018 at 4:11 am #

    Hi Jason,
    I am trying to create image caption for my own datasets. Like I have 4k images with single caption. I am able to run and create model for Flickr8K dataset.Its work properly. But when I use my dataset I am able to generate all required files except model. When I try to train the model it gives error –
    Layer (type) Output Shape Param # Connected to
    input_2 (InputLayer) (None, 27) 0
    input_1 (InputLayer) (None, 4096) 0
    embedding_1 (Embedding) (None, 27, 256) 1058048 input_2[0][0]
    dropout_1 (Dropout) (None, 4096) 0 input_1[0][0]
    dropout_2 (Dropout) (None, 27, 256) 0 embedding_1[0][0]
    dense_1 (Dense) (None, 256) 1048832 dropout_1[0][0]
    lstm_1 (LSTM) (None, 256) 525312 dropout_2[0][0]
    add_1 (Add) (None, 256) 0 dense_1[0][0]
    dense_2 (Dense) (None, 256) 65792 add_1[0][0]
    dense_3 (Dense) (None, 4133) 1062181 dense_2[0][0]
    Total params: 3,760,165
    Trainable params: 3,760,165
    Non-trainable params: 0
    Traceback (most recent call last):
    File “train2.py”, line 179, in
    model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

    ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (10931, 7, 7, 512)

    keras version is – 2.2.0

    How I can solve this error?
    Please help me out.

    • Avatar
      Jason Brownlee June 28, 2018 at 6:26 am #

      Looks like the dimensions of your data do not match the expected dimensions of the model. You can change the data or the model.

      • Avatar
        Kingson June 28, 2018 at 7:59 pm #

        Ok thanks Jason,
        I will try to change the model.

        • Avatar
          Gaurav Anand August 3, 2018 at 3:26 pm #

          Hi Kingson

          Were you able to get rid of the above problem? Since I am also getting the same error while training the model.

          Thanks in advance

    • Avatar
      Aksha Jadhav September 15, 2020 at 12:18 am #

      Hey ….Did u solve this error?
      I m also stuck here…Please help

  77. Avatar
    Satendra Varma July 3, 2018 at 2:11 pm #

    Hey Jason,

    Great article. I just wanted to ask where to do you start developing code for such implementations. Do you refer papers and code from scratch or refer material that explains implementation code in detail and translate it to keras ?


    • Avatar
      Jason Brownlee July 4, 2018 at 8:18 am #

      Start by understanding the principle of the approach (from multiple papers), then implement it using whatever tools, e.g. keras.

  78. Avatar
    Peter Bonac July 8, 2018 at 2:53 am #

    Hi Jason,

    Great tutorial. I am wondering what the best way to limit the vocabulary size. As num_words does not influence tokenizer.fit_on_text, are these changes correct:

    def create_tokenizer(descriptions):
    lines = to_lines(descriptions)
    tokenizer = Tokenizer(num_words = VOCAB_NUM_WORDS)
    return tokenizer


    vocab_size = VOCAB_NUM_WORDS

    • Avatar
      Jason Brownlee July 8, 2018 at 6:24 am #

      Create a list of the n most frequent words you want to work with from the dataset, save them to file, then use them to filter the dataset prior to modeling.

      I have many examples of this on the blog, for example:

      • Avatar
        Peter Bonac July 8, 2018 at 2:14 pm #

        Thank you!

        One more question. For Progressive Loading it seems that the batch size is 1 from the data_generator. If I would like to create a batch i run into the problem of size defining as the “create_sequences” output is variable in size from:

        in_img, in_seq, out_word = create_sequences(tokenizer, max_length, desc_list, photo)

        For example i can’t size define to something like:

        batch_features = np.zeros((batch_size, 17, 1280))
        batch_labels = np.zeros((batch_size, 17, 40000))

        How can I create a batch of “in_img, in_seq, out_word” (if each sequence will be a different length)? Is there an easy way to make a larger batch size? Again thank you for your help.

        • Avatar
          Jason Brownlee July 9, 2018 at 6:32 am #

          Not sure I follow.

          In progressive loading, the generator will release a batch of data. You can change the code to make this as few or as many samples as you wish.

          • Avatar
            Peter Bonac July 9, 2018 at 8:12 am #

            Sorry I didn’t explain well. From my understanding your “data_generator” releases data for 1 image into “model.fit_generator” at a time. I would like to change this to a batch of images.

            The problem I am having is if I try to use a code structure like below, I am not able to create the empty arrays (unless I pad each line to “max_length”, and make “batch_features = np.zeros((batch_size, max_length, NN_input_shape))”).

            #code structure
            def generator(features, labels, batch_size):
            # Create empty arrays to contain batch of features and labels#
            batch_features = np.zeros((batch_size, 64, 64, 3))
            batch_labels = np.zeros((batch_size,1))
            while True:
            for i in range(batch_size):
            # choose random index in features
            index= random.choice(len(features),1)
            batch_features[i] = some_processing(features[index])
            batch_labels[i] = labels[index]
            yield batch_features, batch_labels

            Also, is there somewhere I can donate money to your site?

          • Avatar
            Jason Brownlee July 10, 2018 at 6:37 am #


            Yes, you can build up data as Python lists then covert the lists to numpy arrays before you return them. It is a strategy I use all the time.

  79. Avatar
    Fathi July 11, 2018 at 12:01 pm #

    Here the part of my code where I have a problem :

    size = 64
    img1 = load_img(‘00598546-9.jpg’, target_size=(1, size, size))

    X1 = (TimeDistributed(Conv2D(32, (3,3), activation=’relu’), input_shape=(None, size, size, 3)))(img1)

    Error message:
    ayer time_distributed_11 was called with an input that isn’t a symbolic tensor. Received type: . Full input: []. All inputs to the layer should be tensors.

    I’m looking to find the output X1 by using (img1) as an input but I get this error message.

    How can I use (img1) to find the output ?

  80. Avatar
    Maqsood July 16, 2018 at 6:24 pm #

    Hi Jason,

    I am trying to use the model you presented above for recognizing handwritten documents. In literature the feature extraction stage for OCR produces another 2-D matrix for each image (a feature vector is found for each column vector in the image). How can I then convert this 2-D matrix into a 1-D vector?

    • Avatar
      Jason Brownlee July 17, 2018 at 6:13 am #

      The vector output will be a the probability of an image belonging to each output class.

  81. Avatar
    Shantanu Patil July 19, 2018 at 11:38 pm #

    After training it for two epoch it gives caption as “man in red shirt standing on Street” for every other image i put

    • Avatar
      Jason Brownlee July 20, 2018 at 5:59 am #

      Sounds like it got stuck, try training it again?

      • Avatar
        Shantanu Patil July 21, 2018 at 10:35 pm #

        After training again for 6 epoch and loss of 3.3 its showing captions for girls as boys and calling a bird as a dog, should I train it for 20 epoch?

        • Avatar
          Jason Brownlee July 22, 2018 at 6:23 am #

          No, the model does not need very much training.

          • Avatar
            Shantanu Patil July 25, 2018 at 11:52 pm #

            For new Images it is not working, can you send me a trained model? because I tried up to 10 epoch and it is not working

          • Avatar
            Jason Brownlee July 26, 2018 at 7:43 am #

            What do you mean it is not working?

    • Avatar
      Saurabh May 6, 2019 at 4:31 pm #

      Same problem with me. I’ve trained the model several times now but it is giving same captions to all other images when i test it.

  82. Avatar
    Moha July 26, 2018 at 5:22 pm #

    Is the sequence length the number of words in a sequence?

    • Avatar
      Jason Brownlee July 27, 2018 at 5:48 am #

      Yes. Or rather, the maximum number of words that may appear in a sequence.

  83. Avatar
    Rishav July 27, 2018 at 8:45 pm #

    Hi Jason,

    Firstly I would like to thank you for sharing your knowledge and helping everyone. I am new to this field now. Could you please explain, why are removing all single letter words?

    Yes, removing words having numbers and removing punctuation does make sense. Even removing single letter word also makes sense, but by removing “a”, wont it affect formation of new sentences?

    • Avatar
      Jason Brownlee July 28, 2018 at 6:34 am #

      It does, but it makes the problem simpler to model with little effect on meaning.

  84. Avatar
    Gaurav Anand August 2, 2018 at 3:21 pm #

    Hello Jason

    I am facing the following error while training the model with progressive loading.
    Could you please help to fix this?

    ImportError Traceback (most recent call last)
    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py in swig_import_helper()
    13 try:
    —> 14 return importlib.import_module(mname)
    15 except ImportError:

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\__init__.py in import_module(name, package)
    125 level += 1
    –> 126 return _bootstrap._gcd_import(name[level:], package, level)

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\_bootstrap.py in _gcd_import(name, package, level)

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\_bootstrap.py in _find_and_load(name, import_)

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\_bootstrap.py in _find_and_load_unlocked(name, import_)

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\_bootstrap.py in _load_unlocked(spec)

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\_bootstrap.py in module_from_spec(spec)

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\_bootstrap_external.py in create_module(self, spec)

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\_bootstrap.py in _call_with_frames_removed(f, *args, **kwds)

    ImportError: DLL load failed: The specified module could not be found.

    During handling of the above exception, another exception occurred:

    ModuleNotFoundError Traceback (most recent call last)
    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow.py in ()
    —> 58 from tensorflow.python.pywrap_tensorflow_internal import *
    59 from tensorflow.python.pywrap_tensorflow_internal import __version__

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py in ()
    16 return importlib.import_module(‘_pywrap_tensorflow_internal’)
    —> 17 _pywrap_tensorflow_internal = swig_import_helper()
    18 del swig_import_helper

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py in swig_import_helper()
    15 except ImportError:
    —> 16 return importlib.import_module(‘_pywrap_tensorflow_internal’)
    17 _pywrap_tensorflow_internal = swig_import_helper()

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\__init__.py in import_module(name, package)
    125 level += 1
    –> 126 return _bootstrap._gcd_import(name[level:], package, level)

    ModuleNotFoundError: No module named ‘_pywrap_tensorflow_internal’

    During handling of the above exception, another exception occurred:

    ImportError Traceback (most recent call last)
    in ()
    1 from numpy import array
    2 from pickle import load
    —-> 3 from keras.preprocessing.text import Tokenizer
    4 from keras.preprocessing.sequence import pad_sequences
    5 from keras.utils import to_categorical

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\keras\__init__.py in ()
    1 from __future__ import absolute_import
    —-> 3 from . import utils
    4 from . import activations
    5 from . import applications

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\keras\utils\__init__.py in ()
    4 from . import data_utils
    5 from . import io_utils
    —-> 6 from . import conv_utils
    8 # Globally-importable utils.

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\keras\utils\conv_utils.py in ()
    7 from six.moves import range
    8 import numpy as np
    —-> 9 from .. import backend as K

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\keras\backend\__init__.py in ()
    85 elif _BACKEND == ‘tensorflow’:
    86 sys.stderr.write(‘Using TensorFlow backend.\n’)
    —> 87 from .tensorflow_backend import *
    88 else:
    89 # Try and load external backend.

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py in ()
    5 import tensorflow as tf
    —-> 6 from tensorflow.python.framework import ops as tf_ops
    7 from tensorflow.python.training import moving_averages
    8 from tensorflow.python.ops import tensor_array_ops

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\__init__.py in ()
    47 import numpy as np
    —> 49 from tensorflow.python import pywrap_tensorflow
    51 # Protocol buffers

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow.py in ()
    72 for some common reasons and solutions. Include the entire stack trace
    73 above this error message when asking for help.””” % traceback.format_exc()
    —> 74 raise ImportError(msg)
    76 # pylint: enable=wildcard-import,g-import-not-at-top,unused-import,line-too-long

    ImportError: Traceback (most recent call last):
    File “C:\Users\gaurav.anand\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py”, line 14, in swig_import_helper
    return importlib.import_module(mname)
    File “C:\Users\gaurav.anand\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\__init__.py”, line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
    File “”, line 994, in _gcd_import
    File “”, line 971, in _find_and_load
    File “”, line 955, in _find_and_load_unlocked
    File “”, line 658, in _load_unlocked
    File “”, line 571, in module_from_spec
    File “”, line 922, in create_module
    File “”, line 219, in _call_with_frames_removed
    ImportError: DLL load failed: The specified module could not be found.

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
    File “C:\Users\gaurav.anand\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow.py”, line 58, in
    from tensorflow.python.pywrap_tensorflow_internal import *
    File “C:\Users\gaurav.anand\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py”, line 17, in
    _pywrap_tensorflow_internal = swig_import_helper()
    File “C:\Users\gaurav.anand\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py”, line 16, in swig_import_helper
    return importlib.import_module(‘_pywrap_tensorflow_internal’)
    File “C:\Users\gaurav.anand\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\__init__.py”, line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
    ModuleNotFoundError: No module named ‘_pywrap_tensorflow_internal’

    Failed to load the native TensorFlow runtime.

    See https://www.tensorflow.org/install/install_sources#common_installation_problems

    for some common reasons and solutions. Include the entire stack trace
    above this error message when asking for help.

    • Avatar
      Jason Brownlee August 3, 2018 at 5:58 am #

      Sorry to hear that.

      Perhaps post your error to stackoverflow?

    • Avatar
      abbas August 3, 2018 at 1:41 pm #

      downgrade your tensorflow to version 1.5, i hope it will work for you.

      • Avatar
        Gaurav Anand August 3, 2018 at 3:19 pm #

        Yes, it has now worked somehow after creating new environment with latest tensorflow version. However, it is giving me another possibly known error. Please have a look once.

        Layer (type) Output Shape Param # Connected to
        input_4 (InputLayer) (None, 30) 0
        input_3 (InputLayer) (None, 7, 7, 512) 0
        embedding_2 (Embedding) (None, 30, 256) 987392 input_4[0][0]
        dropout_3 (Dropout) (None, 7, 7, 512) 0 input_3[0][0]
        dropout_4 (Dropout) (None, 30, 256) 0 embedding_2[0][0]
        dense_4 (Dense) (None, 7, 7, 256) 131328 dropout_3[0][0]
        lstm_2 (LSTM) (None, 256) 525312 dropout_4[0][0]
        add_2 (Add) (None, 7, 7, 256) 0 dense_4[0][0]
        dense_5 (Dense) (None, 7, 7, 256) 65792 add_2[0][0]
        dense_6 (Dense) (None, 7, 7, 3857) 991249 dense_5[0][0]

        ValueError Traceback (most recent call last)
        in ()
        178 generator = data_generator(train_descriptions, train_features, tokenizer, max_length)
        179 # fit for one epoch
        –> 180 model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
        181 # save model
        182 model.save(‘model_’ + str(i) + ‘.h5’)

        c:\users\gaurav.anand\appdata\local\continuum\anaconda3\envs\tensorflow1.7\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
        89 warnings.warn(‘Update your ' + object_name +
        90 '
        call to the Keras 2 API: ‘ + signature, stacklevel=2)
        —> 91 return func(*args, **kwargs)
        92 wrapper._original_function = func
        93 return wrapper

        c:\users\gaurav.anand\appdata\local\continuum\anaconda3\envs\tensorflow1.7\lib\site-packages\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
        1413 use_multiprocessing=use_multiprocessing,
        1414 shuffle=shuffle,
        -> 1415 initial_epoch=initial_epoch)
        1417 @interfaces.legacy_generator_methods_support

        c:\users\gaurav.anand\appdata\local\continuum\anaconda3\envs\tensorflow1.7\lib\site-packages\keras\engine\training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
        211 outs = model.train_on_batch(x, y,
        212 sample_weight=sample_weight,
        –> 213 class_weight=class_weight)
        215 outs = to_list(outs)

        c:\users\gaurav.anand\appdata\local\continuum\anaconda3\envs\tensorflow1.7\lib\site-packages\keras\engine\training.py in train_on_batch(self, x, y, sample_weight, class_weight)
        1207 x, y,
        1208 sample_weight=sample_weight,
        -> 1209 class_weight=class_weight)
        1210 if self._uses_dynamic_learning_phase():
        1211 ins = x + y + sample_weights + [1.]

        c:\users\gaurav.anand\appdata\local\continuum\anaconda3\envs\tensorflow1.7\lib\site-packages\keras\engine\training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size)
        785 feed_output_shapes,
        786 check_batch_axis=False, # Don’t enforce the batch size.
        –> 787 exception_prefix=’target’)
        789 # Generate sample-wise weight values given the sample_weight and

        c:\users\gaurav.anand\appdata\local\continuum\anaconda3\envs\tensorflow1.7\lib\site-packages\keras\engine\training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
        125 ‘: expected ‘ + names[i] + ‘ to have ‘ +
        126 str(len(shape)) + ‘ dimensions, but got array ‘
        –> 127 ‘with shape ‘ + str(data_shape))
        128 if not check_batch_axis:
        129 data_shape = data_shape[1:]

        ValueError: Error when checking target: expected dense_6 to have 4 dimensions, but got array with shape (11, 1, 3857)

        I have made an only change in line 114:
        inputs1 = Input(shape=(4096,)) >> inputs1 = Input(shape=(7, 7, 512,))
        Because, it was earlier giving error for Data structure mismatch for inputs1 but now it is giving same error in 3rd dense layer.

        As I read other comments, it is a common issue.
        Could you please share your opinion how to get rid of this ?

        Any external guide to data structure mismatch would be much appreciated.

  85. Avatar
    Moha August 3, 2018 at 7:51 pm #

    Some image captioning libraries (such as Im2txt) are able to provide a confidence score for their generated captions. This helps us when we have a caption that is wrong, so that we can at least tell whether or not by the confidence if the model was ‘unsure’ about the text it generated. How would we go about adding something like that to this?

    • Avatar
      Jason Brownlee August 4, 2018 at 6:03 am #

      Good question. Perhaps contact the developers and ask their approach?

  86. Avatar
    Moha August 3, 2018 at 7:53 pm #

    I have got to say. This is the best image captioning tutorial I have found online. Thank you for helping me understand it better.

  87. Avatar
    Mmed August 14, 2018 at 1:13 am #

    Would it make sense to monitor the accuracy and validation accuracy for image captioning?

    That is what I added to model.compile:

    model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

    That gave for the first epoch an accuracy of 0.9463.
    And a validation accuracy of 0.9903.

    Doesn’t that seem too high though for the 1st epoch?

    • Avatar
      Jason Brownlee August 14, 2018 at 6:22 am #

      No accuracy does not tell us much about the performance of the model. We must use a score like BLEU.

      • Avatar
        Mmed August 15, 2018 at 1:16 am #

        Thank you for your response, Dr. Bronwlee. Okay, but does ‘accuracy’ mean anything? I mean Keras is doing some calculations to get these numbers, right? Even if it does not help us to learn about the model’s performance, does the accuracy metric represent anything?

  88. Avatar
    Naveen Kumar August 19, 2018 at 3:24 am #

    I am Unable to make Progressive Loading, After first epoch I am getting error like

    5995/6000 [============================>.] – ETA: 2s – loss: 4.6600
    5996/6000 [============================>.] – ETA: 2s – loss: 4.6597
    5997/6000 [============================>.] – ETA: 1s – loss: 4.6597
    5998/6000 [============================>.] – ETA: 1s – loss: 4.6595
    5999/6000 [============================>.] – ETA: 0s – loss: 4.6595
    6000/6000 [==============================] – 3329s 555ms/step – loss: 4.6598

    Process finished with exit code -1073741819 (0xC0000005)

  89. Avatar
    Emil Lundh August 20, 2018 at 5:28 pm #

    5,5 million parameters… and 8000 examples? Clearly, the old rule doesn’t apply that the training data should be at least as many as the # parameters. Is there a way to think about this? Clearly, I shouldn’t use my intuition from a linear system of equations?

    • Avatar
      Jason Brownlee August 21, 2018 at 6:12 am #

      Yes, the old ways of thinking do not apply.

      I have not seen a good conceptual model for thinking about highly over-specified models.

      Nevertheless, they are skillful and do generalize.

  90. Avatar
    Md. Zakir Hossain August 23, 2018 at 11:08 pm #

    Hi Jason,

    Many thanks for your kind help. When we use model.fit for training, we are using training data as well as validation data. But When we use mode.fit_generator (Progressive Loading), in that case why we are not using validation data?

    • Avatar
      Jason Brownlee August 24, 2018 at 6:08 am #

      I added that progressive loading much later, as a simpler version for those that were having trouble. You can update it to use validation data if you wish.

  91. Avatar
    nehna August 28, 2018 at 7:13 pm #

    hi Jason

    Due to internet connectivity, my download when i run feature_extraction.py code.

    later i tried to run the code again and it is not downloading and not showing error also.
    without features.pkl file i cant proceed furthur.
    is there any other way to make it download

    • Avatar
      Jason Brownlee August 29, 2018 at 8:08 am #

      No, sorry. You require the dataset to work through the example.

      • Avatar
        nehna September 1, 2018 at 12:52 pm #

        thank you jason

        i got the dataset

        but due to memory error , i am doing with progressive loading

        I am getting value error

        valueError: Error when checking input : expected input_1 to have 4 dimensions but got array with shape (28,4096)

        thank you Jason in advance

        • Avatar
          Jason Brownlee September 2, 2018 at 5:27 am #

          Perhaps ensure that you have copied the data exactly and that your libraries are up to date?

  92. Avatar
    Michael September 8, 2018 at 4:23 am #

    Hi Jason,
    thank you for this super tutorial.

    But I have a question :):

    My generated caption is for the sample picture:

    “startseq dog is running through the snow endseq”

    and not

    “startseq dog is running across the beach endseq”

    My BLUE Score is also lower as in your tutorial.

    BLEU-1: 0.553073
    BLEU-2: 0.293371
    BLEU-3: 0.200420
    BLEU-4: 0.090321

    Do have any idea why, or better how can I improve my result?


  93. Avatar
    kalverk September 10, 2018 at 11:10 pm #


    Is the feature order directly tied to the caption? How much does the model rely on input’s order?

    Imagine if the extracted features of an image are [‘dog’, ‘water’, ‘blue’, ‘sand’] and the caption is “dog at the beach”, now this is correct and expected caption.

    Now the same image, but the features are [‘sand’, ‘water’, ‘dog’, ‘blue’], how different might the new caption be?

    Can we achieve the same caption with differently ordered features vector?

    • Avatar
      Jason Brownlee September 11, 2018 at 6:30 am #

      Yes, the order of the generated words is important for the design of this specific model.

  94. Avatar
    Jeff September 12, 2018 at 6:15 am #

    Jason – given the model architecture, if I use my own data, with only 1 caption per image, would it impact the quality of the outcome?

    • Avatar
      Jason Brownlee September 12, 2018 at 8:16 am #

      It will, perhaps some changes to the model configuration or training will be required. Experiment.

  95. Avatar
    nehna September 15, 2018 at 12:00 pm #

    hiii Jason

    your tutorial is super and its working fine


    i have a doubt !!

    in generating new caption , will it generate captions to only images in Flickr data set or general to normal images (downloaded in google)?

    thank you very much Jason

    • Avatar
      Jason Brownlee September 16, 2018 at 5:56 am #

      It will generate captions for any photo you provide.

      Remember, it is just an experiment, not an application.

      • Avatar
        nehna September 24, 2018 at 3:42 pm #

        hii Jason

        yeah , just to test how it is generaing captions for images other than present in flickr dataset

        but it is giving appropriate captions only for images in dataset . For all the other images , generating some caption which no way related to image

        • Avatar
          Jason Brownlee September 25, 2018 at 6:17 am #

          Perhaps your model has overfit. You could try adding some regularization.

          • Avatar
            Priyam September 29, 2018 at 1:25 am #

            what should i do to add regularization

          • Avatar
            Jason Brownlee September 29, 2018 at 6:36 am #

            Try Dropout, weight noise, weight regularization, activation regularization, early stopping, etc.

          • Avatar
            Priyam October 1, 2018 at 3:56 am #

            On which part of the program should i apply the given techniques.
            And how to apply all of them?

          • Avatar
            Jason Brownlee October 1, 2018 at 6:30 am #

            I cannot know, I recommend testing a number of different approaches and discover what works.

            If this is a challenge, then I am currently writing a series of tutorials on this exact topic (e.g. how to improve model performance).

          • Avatar
            Priyam October 1, 2018 at 4:09 am #

            I am new to deep learning implementation.I dont know where to insert the required techniques inside the code written by you.? Can you suggest a tutorial or some sourse for it

  96. Avatar
    Mmed September 18, 2018 at 5:31 am #

    Hello Jason,

    Why is there ” + 1 ” every time you find the vocabulary size from the tokenizer?

    • Avatar
      Jason Brownlee September 18, 2018 at 6:25 am #

      To start numbering of tokenized words at “1” rather than “0”. We need room for the “0” value for “unknown word”.

  97. Avatar
    Mmed September 18, 2018 at 7:16 am #

    When would we encounter an unknown word if the vocabulary consists of all the words in the training data?

    • Avatar
      Jason Brownlee September 18, 2018 at 2:15 pm #

      There may be words in the test set not in the training set.

      There may be works in new data not in the training set.

      Does that help?

      • Avatar
        Mmed September 20, 2018 at 4:05 am #

        I am still a bit confused to be honest.

        1. The training vocab and the testing vocab are different, that I can see. Why would a trained model ever encounter a word only in the test set? Wouldn’t test set captions (and hence the words in the test) only be used when calculating BLEU scores?

        2. Could this ‘unknown word’ token ever be generated by the model?

        3. When adding new data with new words to the training data, why would you stick with the older vocabulary and not ‘evaluate’ the newer one?

        • Avatar
          Jason Brownlee September 20, 2018 at 8:08 am #

          Yes, it is an artefact of evaluating the model.

          In the future, you would finalize the model by training it on all available data and use it to generate captions.

          The model may still generate unknown word tokens if it gets confused.

  98. Avatar
    RD September 19, 2018 at 2:32 am #


    Thank you for this clear and thorough tutorial. I have two questions to make sure I understand things correctly:

    1. By removing single letters from the descriptions, the generated descriptions will never include/generate descriptions with ‘a’ or ‘I’. Is that correct?

    2. Because load_clean_descriptions filters by testing/training the tokenizer may be missing words that are in the test set but not the training set. Is this correct? And for fitting I understand you want to keep test/training data separate, but for the vocabulary ideally you would include the entire vocabulary from both the test and training set. Is this correct?

    3. If I understand correctly one could use an even larger vocabulary (would be a bigger model) but in principle there is no reason not to include a larger vocabulary?

    • Avatar
      Jason Brownlee September 19, 2018 at 6:26 am #

      Yes, you can add them back if you like. It just makes the vocab larger/model slower to train.

      Yes, train defines the vocab. Ideally you want your model to have all the words that may be seen.

      Yes, there is good reason to use a larger vocab, it will be more expressive, but I was trying to keep the example fast/simple.

  99. Avatar
    Vikas September 27, 2018 at 8:54 pm #

    I wanted to know what should be the target validation loss.
    Right now, I am getting the best validation loss of 3.86 after 5 epochs. However, you have a lower validation as well as training loss than mine just after two epochs.
    Is my model trained enough or should I train again?

  100. Avatar
    Priyam September 29, 2018 at 1:00 am #

    I want to test the model with more images.
    Can you tell me the source from which i can get images which will run efficiently using the code.Itried some randon images from google but they were unsatisfactory.Also tell me steps to add new image along with model to train it?

    • Avatar
      Jason Brownlee September 29, 2018 at 6:35 am #

      I expect there are other image captioning datasets you can use.

      Sorry, I cannot point you to them off the cuff.

  101. Avatar
    Omnia October 3, 2018 at 3:25 am #

    Hi Jason

    I really like this post, it helped a lot

    your post is better than my daily DL learning class

    I have a question

    I ran the code until fitting the model, actually till this line

    “Train on 306404 samples, validate on 50903 samples
    Epoch 1/20”

    until now it took 20 mins but nothing has appeared

    Does it take so much of time to run each epoch?

    Or am I doing something wrong?

    I really hope this code work properly with me so I can optimize it and see different results


  102. Avatar
    omnia October 3, 2018 at 6:17 am #

    Another question is that if feature.pkl file has been created the first time I ran the code and I have it in my directory

    do I have to run these commands if I ran the code another time

    # extract features from all images
    directory = ‘Flicker8k_Dataset’
    features = extract_features(directory)
    print(‘Extracted Features: %d’ % len(features))
    # save to file
    dump(features, open(‘features.pkl’, ‘wb’))

    Thanks again

    • Avatar
      Jason Brownlee October 3, 2018 at 6:23 am #

      Once you have created the features, you don’t need to create them again.

  103. Avatar
    Omnia October 3, 2018 at 7:20 am #

    Thanks a lot

    now it’s working properly

    will let you know what I will get at the end

    Thanks again, really appreciate your hard work, so happy that I understand your code very well

  104. Avatar
    Xuan October 3, 2018 at 8:35 pm #

    Hey Jason, thanks for the tutorial. The model trained fine for me but when I tried to generate caption for a single new image I encountered the following error:

    ValueError: Error when checking input: expected input_8 to have shape (110,) but got array with shape (34,)

  105. Avatar
    Diana October 8, 2018 at 6:25 am #

    Hi Jason! Thanks a lot for this!

    I tried your model with ResNet50 and got model-ep005-loss3.417-val_loss3.767.h5, so yours works a little bit better even when it comes to BLEU.

    I’m gonna try to reduce the vocabulary size and see what happens.

    • Avatar
      Jason Brownlee October 8, 2018 at 9:29 am #

      Nice work!

      • Avatar
        Diana October 8, 2018 at 11:47 am #

        I cannot clearly see how to ‘correct’ misspelling. I have already gone through the vocabulary and there are about 1000 misspelled words.

        Any thoughts on how I could go over that?

        • Avatar
          Jason Brownlee October 9, 2018 at 8:32 am #

          Perhaps remove or correct all captions with misspellings?

  106. Avatar
    Oliver October 10, 2018 at 9:30 pm #

    Getting the following error when calling train_features = load_photo_features(‘features.pkl’, train)

    The error occurs when trying to run all_features = load(open(filename, ‘rb’))

    UnpicklingError: pickle data was truncated

    Has anybody a solution to this?

  107. Avatar
    Jerome MASSOT October 11, 2018 at 5:54 pm #

    Hi Jason,
    I come back tonight with the question regarding the VGG16.layers.pop() method which seems not to work with Keras 2.2.2…
    Before and after pop() and reshaping the model, the light one has exactly the same architecture as the original one…
    Features extracted has dim = 1000 which cause me trouble with my Input = (4096,)…
    If I change the input to (1000,) the performance is low…
    Thanks for the help
    Best regards

    • Avatar
      Jason Brownlee October 12, 2018 at 6:35 am #

      Thanks, I’ll investigate.

    • Avatar
      Werner June 11, 2020 at 3:18 pm #

      I found the same. 1,000 dims.

  108. Avatar
    Prasanna Kumar Behera October 15, 2018 at 12:35 am #

    Hi Jason,

    My question is, are we retraining all the parameters of the VGG16 models in this example?

    If yes, why should we train since we are using already trained model?
    If no, then what part of the above coding is doing since we have not set layer.trainable = False for any layer?
    Please let me when we should train all the layers or when we should not when using a pretrained model like VGG16?

    • Avatar
      Jason Brownlee October 15, 2018 at 7:28 am #

      No, we are not re-training the vgg, we are using the vgg to output features that are fed into the captioning model.

  109. Avatar
    Vidyush Bakshi October 15, 2018 at 11:56 pm #

    My BLEU scores with progressive loading
    BLEU-1: 0.547871
    BLEU-2: 0.293608
    BLEU-3: 0.196752
    BLEU-4: 0.086692

  110. Avatar
    Hassaan October 23, 2018 at 5:26 pm #

    Hy Jason. I am new to ML and you are the source which rise my interest in ML. I am following your above tutorial. I am confuse to get some concept, where you are applying tokenization to the text. You mentioned that ” The model will be provided one word and the photo and generate the next word. After that it recursively run to generate new sentence”.I am just confuse here that what are you doing here? What is the purpose of doing that? Please explain in detail that point or suggest me a source to get help from somewhere else.
    Second question is that when we will done with that, the model will generate captions, which are in the data-set (I mean to say exact some captions will be suggested for new unseen images or it can be new captions based on image )..plz explain it in details..

  111. Avatar
    Ahmed October 24, 2018 at 9:20 pm #

    valueError: Error when checking input: expected input_2 to have shape (40,) but got array with shape (34,)

    I am getting that error, I am unable to figure it out. Can you please help me to get that?

  112. Avatar
    Ahmed October 25, 2018 at 8:25 pm #

    I am just confused about the max_length method.

    What is the purpose of that method. Why we are trying to find that. Please slightly explain ti

    • Avatar
      Jason Brownlee October 26, 2018 at 5:35 am #

      To find the number of words in the longest description.

      We need this so we can pad all other descriptions to that length (in terms of numbers of words).

  113. Avatar
    Omnia October 26, 2018 at 1:52 am #

    hi Jason

    my BLEU scores are like this

    BLEU-1: 0.528302
    BLEU-2: 0.277568
    BLEU-3: 0.227300
    BLEU-4: 0.117189

    and here is my validation loss

    Train on 306404 samples, validate on 50903 samples
    Epoch 1/20
    306404/306404 [==============================] – 9983s 33ms/step – loss: 4.5003 – val_loss: 4.0387

    Epoch 00001: val_loss improved from inf to 4.03874, saving model to model-ep001-loss4.500-val_loss4.039.h5
    Epoch 2/20
    306404/306404 [==============================] – 9512s 31ms/step – loss: 3.8575 – val_loss: 3.8717

    Epoch 00002: val_loss improved from 4.03874 to 3.87171, saving model to model-ep002-loss3.857-val_loss3.872.h5
    Epoch 3/20
    306404/306404 [==============================] – 7866s 26ms/step – loss: 3.6712 – val_loss: 3.8360

    Epoch 00003: val_loss improved from 3.87171 to 3.83603, saving model to model-ep003-loss3.671-val_loss3.836.h5
    Epoch 4/20
    306404/306404 [==============================] – 10109s 33ms/step – loss: 3.5803 – val_loss: 3.8296

    Epoch 00004: val_loss improved from 3.83603 to 3.82960, saving model to model-ep004-loss3.580-val_loss3.830.h5
    Epoch 5/20
    306404/306404 [==============================] – 5384s 18ms/step – loss: 3.5246 – val_loss: 3.8364

    Though, when I try to generate a description for a random image from the intern the model seems not working properly

    it gives me the same sentence for different kind of images

    any suggestion?

    • Avatar
      Jason Brownlee October 26, 2018 at 5:38 am #

      It suggests the model may be overfit, perhaps try re-fitting the model or using a model it over fewer epochs or using some regularization.

      • Avatar
        Omnia October 26, 2018 at 10:19 am #

        I see


        I will try and post my experiment

    • Avatar
      PhyuPhyuKhaing July 5, 2020 at 9:11 pm #

      Hi Omnia,

      I am interested in your model’s result. May I know how to change the model.

  114. Avatar
    Saifullah October 29, 2018 at 3:25 am #

    Hi Jason,

    Thanks for such nice work.
    I want to know how I print actual caption for the test image. If I am using a new image from the test set.

    • Avatar
      Jason Brownlee October 29, 2018 at 6:00 am #

      I show how to print a caption for a new image in the tutorial.

  115. Avatar
    Omnia October 30, 2018 at 2:21 pm #

    Hi Jason,

    in fitting the model, I’m not sure if my thought of input and output is correct

    here is the command
    model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, validation_data=([X1test, X2test], ytest)

    I understand that X1train contains the photo which should be the feature of the photo as integers, correct me if I’m wrong

    X2train is sequence text which is the ground truth captions corresponding to the photo

    I didn’t understand what is ytrain

    would you please explain it briefly

    another question is that how does the output penalize if it generates a wrong caption?


    • Avatar
      Jason Brownlee October 31, 2018 at 6:21 am #


      ytrain is the next word to be predicted by the model for each sample.

  116. Avatar
    Mmed November 1, 2018 at 8:53 am #

    Dear Dr. Brownlee.

    The create_sequences() function that returns for us input-output pairs of training data makes teacher forcing possible in this example, right?

    • Avatar
      Jason Brownlee November 1, 2018 at 2:28 pm #

      I guess so, or more accurately, the way we use the sequences during training.

  117. Avatar
    Omnia November 2, 2018 at 10:35 am #

    Hi Jason

    Thanks a lot for your advice

    I’m using Pycharm

    I tried different types of regularization until I picked the best one, also different optimizers

    I got pretty good bleu scores and predictions, the model was predicting everything in details for flicker image

    and good enough for some images from the internet

    Though for images from the internet, the model couldn’t clearly recognize cat from dog face, I’m still working on that

    These are my blue score

    BLEU-1: 0.601031
    BLEU-2: 0.380297
    BLEU-3: 0.279632
    BLEU-4: 0.151589

    Thanks again

    • Avatar
      Jason Brownlee November 2, 2018 at 2:49 pm #

      Well done!

    • Avatar
      abbas November 18, 2018 at 3:51 am #

      Omnia! Please can you share the code using inception model?IF yes then let me know..also i would like to check your results

    • Avatar
      Ajay January 1, 2019 at 11:43 pm #

      Hi Omnia ! Can you tell me which regularization technique you used and helped impove the BLEU score.

    • Avatar
      Saurabh May 6, 2019 at 4:47 pm #

      Hi Omnia, can you share the approach you used for regularization at saurabh18@somaiya.edu?


  118. Avatar
    Vishwa Dadhania November 14, 2018 at 11:02 pm #

    Hi Jason,
    Thank you for an amazing tutorial. I learnt many things here. esp. progressive loading. So here I have one query as you explain in “progressive loading” section:
    “Finally, we can use the fit_generator() function on the model to train the model with this data generator.

    In this simple example we will discard the loading of the development dataset and model checkpointing and simply save the model after each training epoch. You can then go back and load/evaluate each saved model after training to find the one we the lowest loss that you can then use in the next section.”

    I have already got all 20 models from 20 epochs by training the “training” dataset. Now how do I check which model is best using the development set?? Because we have not included development set in the fit_generator(). So how to choose the best model from 20 saved models ? Should I apply evaluate() function on development set for each model?? It would be great if you could give me some idea/hint further on this!! Thanks.

    • Avatar
      Jason Brownlee November 15, 2018 at 5:31 am #

      Good question.

      Evaluate each of the saved mode on a validation dataset and use the one with the best performance. Probably around epoch 3-4.

      • Avatar
        Ajay January 1, 2019 at 11:04 pm #

        HI Jason, Can you tell me what do you mean by 3-4 epochs? I hope that evaluating the model will just go though all the images once and generate descriptions for them and then calculate the BLEU score from that. So, what 3-4 epochs are you speaking about?

        • Avatar
          Jason Brownlee January 2, 2019 at 6:36 am #

          I meant that the best performing model was found after the completion of 3 or 4 epochs.

  119. Avatar
    Omnia November 16, 2018 at 4:31 am #

    Hi Jason

    If I want to generate descriptions for the test images, how do I pass the photo features

    (which we already have extracted) to the generate_desc model?

    As in the following command, we are passing a single extracted feature for a given image

    photo = extract_features(cat.jpg)

    description = generate_desc(model, tokenizer, photo, max_length)

    I want to generate a description for the test image using test features without using the

    function extract feature again, could you please suggest any way to do it?


    • Avatar
      Jason Brownlee November 16, 2018 at 6:18 am #

      The example at the end of the tutorial shows you how to generate a description for one photo.

  120. Avatar
    Omnia November 16, 2018 at 8:51 am #


    That’s what I meant to say

    in the example, it shows how to generate for one photo but with using the extract feature function,

    If we already extracted features for the test images

    why do we need to use extract_features again to generate a description

    can’t we use our saved features in the test?

    • Avatar
      Jason Brownlee November 16, 2018 at 1:57 pm #

      Yes, if you have already extracted the feature, then you can pass the extracted feature directly to generate_desc().

  121. Avatar
    Sapar November 18, 2018 at 10:12 am #

    This is a very good work.
    What Machine learning techniques do you use in this work?

    Thank you.

  122. Avatar
    Chen Mei November 26, 2018 at 3:51 pm #

    Anyone received this problem during test phase?

    OSError: Unable to open file (unable to open file: name = ‘model-ep001-loss3.245-val_loss3.612.h5’, errno = 2, error message = ‘No such file or directory’, flags = 0, o_flags = 0)

    • Avatar
      Jason Brownlee November 27, 2018 at 6:31 am #

      You must change the code to load the file that you saved.

  123. Avatar
    Sunny December 7, 2018 at 5:57 pm #

    Hi Mr.Jason,

    I am a computer science and engineering student. Me and team mates are doing the same project. Reply me,
    1.Can we develop this using MATLab?
    2.Can we use the same code to our project for reference purpose using python?
    3.In how many months we can complete it?
    4.Suggest me what to use either python or matlab?

    • Avatar
      Jason Brownlee December 8, 2018 at 7:00 am #

      Sorry, I don’t have examples in matlab, I can’t give you good advice.

    • Avatar
      Mohankumar Balasubramaniyam May 3, 2019 at 12:40 am #

      Hi I am also facing the same issue. Can you tell what you did to overcome the problem @harsha

  124. Avatar
    Caner December 20, 2018 at 12:57 am #

    Hi Jason. Thank you for this tutorial. I want to develop text-to-image model. Does it work if I change input and output elements? or What would you suggest ?

    • Avatar
      Jason Brownlee December 20, 2018 at 6:28 am #

      I don’t have a tutorial on text to image at this stage, I hope to cover it in the future – then I can give you good advice.

  125. Avatar
    Ajay December 26, 2018 at 11:56 pm #

    Hi Jason, Why have you taken the maximum size of the sentence to be 34, when the maximum length of a sentence is 33?

  126. Avatar
    Ajay December 28, 2018 at 10:28 pm #

    Is there any reason for selection of this particular RNN architecture? Is it giving any benefit?

  127. Avatar
    Ajay December 28, 2018 at 11:10 pm #

    Hello Jason, Can you explain what is the role of mask_zero inside the embedding layer?

    • Avatar
      Jason Brownlee December 29, 2018 at 5:51 am #

      We zero pad inputs to the same length, the zero mask ignores those inputs. E.g. it is an efficiency.

      • Avatar
        Ajay December 30, 2018 at 7:41 pm #

        Can you elaborate on your answer, I didn’t get anything.

      • Avatar
        Ajay January 1, 2019 at 11:17 pm #

        Hi Jason! I’m waiting for you to elaborate on zero mask. Didn’t get anything from your comment.

  128. Avatar
    Ajay December 29, 2018 at 6:42 pm #

    Hi Jason, Inside the data_generator() function why have you used the while 1: loop?

    Can you email all the previous answers that I’ve asked?

    • Avatar
      Jason Brownlee December 30, 2018 at 5:38 am #

      Because it is a generator that will yield each loop when called.

      You can learn more about python generators here:

      • Avatar
        Ajay December 30, 2018 at 7:47 pm #

        I learned that on the repetitive calling of the generator function, the execution starts where it previously left off.

        So, in

        Shouldn’t this be :

        Where am I getting wrong?

        • Avatar
          Ajay December 30, 2018 at 8:58 pm #

          Shouldn’t this be :

          def data_generator(tokenizer,train_descs,train_features,maxlen):

          for ids, descs in train_descs.items():
          feature = train_features[ids][0]
          feature_vector, inseq, outseq = create_sequence(tokenizer,descs,feature,maxlen)
          generator = data_generator(tokenizer,train_descs,train_features,maxlen)

          Where am I getting wrong?

        • Avatar
          Jason Brownlee December 31, 2018 at 6:09 am #

          It looks like you are calling the generator from within the data_generator function.

          • Avatar
            Ajay January 1, 2019 at 5:05 pm #

            Sorry the last line in the second code snippet is outside the function.

            def data_generator(tokenizer,train_descs,train_features,maxlen):
            for ids, descs in train_descs.items():
            feature = train_features[ids][0]
            feature_vector, inseq, outseq = create_sequence(tokenizer,descs,feature,maxlen)

            generator = data_generator(tokenizer,train_descs,train_features,maxlen)

            As, we know that data_generator is yielding one example at a time and each time the function is called, the function execution starts where it previously left off. So, since the “for ids, descs in train_descs.items():” loop is still not complete in the mid-way, it should loop and yield more sequences until it ends.

            So, my quesiton is if the loop can continue till all the “train_descs.items()” are encountered, then why do we need the “while 1:” loop there?

            I want to know where am I going wrong, kindly let me know.

          • Avatar
            Jason Brownlee January 2, 2019 at 6:34 am #

            Good question. To loop over the entire dataset as many times as we need (e.g. number of epochs is exhausted).

  129. Avatar
    Ajay December 29, 2018 at 8:14 pm #

    On running this:

    # test the data generator
    generator = data_generator(train_descriptions, train_features, tokenizer, max_length)
    inputs, outputs = next(generator)

    I’m getting :

    (5, 4096)
    (47, 33)

    whereas your output is
    (47, 4096)
    (47, 34)
    (47, 7579)

    Am I getting wrong somewhere?

    Also, can you explain these dimensions?

    • Avatar
      Jason Brownlee December 30, 2018 at 5:39 am #

      Perhaps ensure that you copied all of the code and that your Keras and Tensorflow are up to date.

      • Avatar
        Ajay December 30, 2018 at 7:37 pm #

        Can you explain what is 47 in the dimension? I mean, data_generator is outputting one example at a time then instead of 47, shouldn’t it be 1? Can you explain me the dimension?

        • Avatar
          Jason Brownlee December 31, 2018 at 6:07 am #

          I believe I explain this in the post:

          Running this sanity check will show what one batch worth of sequences looks like, in this case 47 samples to train on for the first photo.

      • Avatar
        Ajay December 30, 2018 at 7:40 pm #

        I am coding the stuff myself and rectified something and now the output is :

        (47, 4096)
        (47, 33)

        Even now, print(outputs.shape) is giving me (7266,).

        Stll, I want to ask that if data_generator is outputting 1 example at a time then why is 47 the first dimension?

        • Avatar
          Ajay December 30, 2018 at 7:51 pm #

          Aah!! Finally, I got it right. Thanks. It was a small glitch.

        • Avatar
          Jason Brownlee December 31, 2018 at 6:08 am #

          Perhaps confirm that you are using Keras 2.2.4 or better, the output should have 47 samples worth of output as well.

  130. Avatar
    Ajay December 29, 2018 at 8:56 pm #

    Hi Jason, You have set steps_per_epoch=len(descriptions) and passed it into model.fit_generator(). As far as I’ve read, steps_per_epoch signify the total no. of batches before a epoch to finish. See this :


  131. Avatar
    Ajay December 30, 2018 at 7:34 pm #

    I want to clarify how data_generator is feeding the data to fit_generator. I mean, is it giving it one training example at a time or some batch of training examples at a time.

    • Avatar
      Jason Brownlee December 31, 2018 at 6:06 am #

      It releases one batch of samples per loop.

      • Avatar
        Ajay January 1, 2019 at 5:15 pm #

        epochs = 20
        steps = len(train_descriptions)
        for i in range(epochs):
        # create the data generator
        generator = data_generator(train_descriptions, train_features, tokenizer, max_length)
        # fit for one epoch
        model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
        # save model
        model.save(‘model_’ + str(i) + ‘.h5’)

        steps_per_epoch represents the no. of batches that will be trained in one epoch.
        As you have said before that data_generator is feeding a batch of examples to fit_generator so that should mean that that in one batch, let’s say x examples are being sent for training process. This should mean that no. of batches for 1 epoch training should be (total no. of training examples)/(1 batch size). On running below snippet, total no. of training examples comes out to be 6000.


        So,steps_per_epoch should be 6000/(1 batch size), but in your code steps_per_epoch = len(train_descriptions)

        Why have you set it so large?
        Are you forcing fit_generator to train over one example at a time even though data_generator is generating a batch of training example at a time?

  132. Avatar
    Ajay December 30, 2018 at 9:21 pm #

    Hi Jason, does max_length = 34 or any other bigger value have any effect on model performing well?

  133. Avatar
    Ajay December 31, 2018 at 4:47 pm #

    Hi Jason, I’m running


    and getting this error.

    KeyError Traceback (most recent call last)
    —-> 1 evaluate_model(mapping,tokenizer,maxlen,model,features)

    in evaluate_model(mapping, tokenizer, maxlen, model, feature_vector)
    33 for ids,descs in mapping.items():#1
    34 count += 1
    —> 35 pred_caption = generate_desc(feature_vector[ids],tokenizer,model,maxlen)#caption string returned
    36 for desc in descs:#2
    37 reference.append(desc.split())

    KeyError: ‘2258277193_586949ec62’

    When I search for this image in my pc, I found that its id and its descriptions are present in the



    However, the image is not present in Flicker8k_Dataset.

    Why isn’t there any image Flicker8k_Dataset.

    • Avatar
      Ajay December 31, 2018 at 10:53 pm #

      I redownloaded the dataset and searched the above-mentioned image in it and guess what, It was NOT present in that too !!

      • Avatar
        Ajay January 1, 2019 at 4:02 am #

        Also, when I loaded features from features.pkl and then ran
        print(features[‘2258277193_586949ec62’]) then it gave me

        KeyError Traceback (most recent call last)
        —-> 1 features[‘2258277193_586949ec62’]

        KeyError: ‘2258277193_586949ec62’

        From this, It seems like 2258277193_586949ec62.jpg was never present in the dataset.
        But, its description is present in Flickr8k.lemma.token.txt.

        Can you share the dataset?

        • Avatar
          Jason Brownlee January 1, 2019 at 6:28 am #

          Perhaps ignore that token then?

          • Avatar
            geeta gupta March 13, 2022 at 4:07 pm #

            I am also getting KeyError: ‘2258277193_586949ec62’

            how to resolve this. and how to get this image?

          • Avatar
            James Carmichael March 14, 2022 at 12:04 pm #

            Hello Geeta…Please specify which code listing you are working with so that we can better assist you.

    • Avatar
      Jason Brownlee January 1, 2019 at 6:13 am #

      Perhaps you skipped a step, are you able to confirm that you have all of the steps/code?

      Are you able to confirm that your Python and libraries are up to date?

      • Avatar
        Ajay January 10, 2019 at 12:31 am #

        I resolved the problem by putting an appropriate image that relates well to its description in Flickr8k.lemma.token.txt. The above image was really missing from the image directory. I reckon that anyone must have faced the same problem as mine.

  134. Avatar
    Ajay December 31, 2018 at 11:55 pm #

    Can you check this on your system? Also check that it is present in Flickr8k.lemma.token.txt.

    • Avatar
      Jason Brownlee January 1, 2019 at 6:16 am #

      The example in the blog post works perfectly for me and tens of thousands of readers, I suspect there is something going on with your local version.

  135. Avatar
    Ajay January 2, 2019 at 12:52 am #

    Hi Jason!! Can you have a look at my model image


    I cannot understand that there is an “input_3” and an “input_2” layer. Is there any problem if there is no “input_1” present in it?

    • Avatar
      Jason Brownlee January 2, 2019 at 6:37 am #

      Sorry, I don’t have the capacity to debug your model or model diagrams.

      • Avatar
        Ajay January 10, 2019 at 12:27 am #

        Just see and tell if missing of input_1 signify anything bad?

  136. Avatar
    Kashish January 4, 2019 at 6:23 pm #

    ValueError: Error when checking input: expected input_3 to have shape (34,) but got array with shape (30,)
    Sir,while evaluating the model,I’m getting such an error how should I get rid of it?I exactly typed the code and I’m also using the same dataset as mentioned above.

  137. Avatar
    Ajay January 10, 2019 at 12:33 am #

    Hi Jason !! Can you tell me about other regularization techniques to improve the model?

    Can you suggest adding anything to improve accuracy?

  138. Avatar
    Ajay January 10, 2019 at 12:40 am #

    I tried my model using my MacBook Air Webcam and it gave pretty bad results and the captions that it generated were from the training dataset.

    Where am I going? I am ready to try all the possibilities to improve my model. What can I do?

  139. Avatar
    Ajay January 17, 2019 at 11:22 pm #

    I am a student. My model is taking an hour or more to train. After training, I’m not getting the desired results. So, I don’t want to retrain it and sit back and see. There could be a high change that it may not work well again. I think AWS requires some bucks for this.

    I’m using MacBook Air.

    8 GB RAM
    i5 5th gen processor

    Is there any “free” source to train the model which will take lesser time.

    • Avatar
      Jason Brownlee January 18, 2019 at 5:38 am #

      Yes, fit a smaller model on less data as a prototype, then scale up once you find a good config.

  140. Avatar
    Al Krinker January 23, 2019 at 3:30 am #

    Hi Jason,

    Like many mentioned, it is a very comprehensive tutorial on caption generation. Progressive loading is a big plus.

    Do you plan to cover or have some ideas on the topic of using image and description to search for similar images? Example of what I am trying to do: I already implemented CNN CBIR model that extracts features clusters them and when new image comes in, its features are extracted and nearest neighbors are given as similar images suggestions. This works fine, but I would like to enhance it by adding image description in the mix, so that when I give a picture of the steering wheel and specify “car part”, I will be given list of images of the steering wheels, and not all circular options like bike wheels for example.

    I thought to use lucene to help with image description search first and then use CNN to find similar images, but not sure if it is the best approach to take as lucene search might throw out images that are relevant, but not well described.

    • Avatar
      Jason Brownlee January 23, 2019 at 8:51 am #

      Very cool idea. I have not tried this but I believe it would be straight-forward to implement.

      Let me know how you go.

      • Avatar
        Al Krinker January 24, 2019 at 6:46 am #

        I dont think that trying to come up with a model that would combine the text along with image features will be straight forward or would perform well as oppose to having elasticsearch in the mix where I can take advantage of text search that elastic provides out of the box, but elastic falls short of image search (tried LIRE before and the results were really bad compared to ConvNet approach)

        if you have any other ideas or suggestions, I am all ears 🙂

  141. Avatar
    cleansky February 6, 2019 at 4:56 am #

    Thanks for the nice tutorial.

    It is interesting to see that even though the LSTM and CNN have no connection, the decoder may produce proper caption words.

    How does the LSTM choose words without any information about the image? What is the implicit mechanism in this architecture?

    Any comment is welcome.

    • Avatar
      Jason Brownlee February 6, 2019 at 8:01 am #

      It has the extracted features from the image as input. They are abstract, but it finds meaning in them.

  142. Avatar
    Sanjay February 11, 2019 at 2:08 am #

    Hey Jason!
    Amazing tutorial. Great Learning experience from start till end.

    I wanted to ask you what do you mean exactly in the last section of the article under ‘Extensions’ section by,
    ‘Pre-trained Word Vectors. The model learned the word vectors as part of fitting the model. Better performance may be achieved by using word vectors either pre-trained on the training dataset or trained on a much larger corpus of text, such as news articles or Wikipedia.’

    Thank You!

  143. Avatar
    boumelha adaam February 14, 2019 at 1:58 am #

    hey Jason , thank you for this amazing article .

    i wanted to ask you about an issue i ve faced in the generate_desc function , i am getting in the model.predict line this error :

    ValueError: Error when checking input: expected input_2 to have shape (74,) but got array with shape (34,).

    any solutions please!

    thank you !!

  144. Avatar
    O Lokesh February 17, 2019 at 12:53 am #

    sir i couldn’t download the datasets after filling the form .please let me know if there is another way

    • Avatar
      Jason Brownlee February 17, 2019 at 6:33 am #

      You should be sent an email with the link after completing the form, I believe.

  145. Avatar
    Karan February 18, 2019 at 12:38 am #

    sir, i am also unable to download neither the dataset nor the text files. I am getting a 404 error.

  146. Avatar
    sonali verma February 19, 2019 at 7:13 pm #

    respected sir,
    I am not able to download the datasheet from the link that is provided by flickr 8k.
    It is showing
    The requested URL /HockenmaierGroup/Framing_Image_Description/Flickr8k_Dataset.zip was not found on this server.

    Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

    • Avatar
      Jason Brownlee February 20, 2019 at 7:57 am #

      It looks like they have taken the site down, it says: “Proper NLP home page coming soon.”

      I will prepare a workaround ASAP.


      I have added direct download links to the post.

  147. Avatar
    Vedic Mishra February 20, 2019 at 9:27 pm #

    In the function, create_sequences(), the pad_sequences was generating a list so big that the 17 GB Kaggle RAM was crashing. So I tried appending directly to numpy arrays instead of creating it as a list first. However, now it is taking infinite time to execute. Is there any alternative to this function or any way to increase the rate. Please help
    P.S Thanks for uploading the dataset, I spent days searching for it on the internet.

  148. Avatar
    Mohammad Anas February 22, 2019 at 2:06 am #

    i used progressive loading and after execution there were 20 models one for each epoch.
    but further sections are using one single file for model . but i have 20 models.how to proceed?

    • Avatar
      Jason Brownlee February 22, 2019 at 6:22 am #

      Choose the model with the lowest validation error, you might need to evaluate each.

      If that is a pain, use any model, e.g. from epoch 4.

      • Avatar
        Mohammad Anas February 22, 2019 at 7:58 am #

        Thank you very much.

  149. Avatar
    Mohammad Anas February 22, 2019 at 7:55 am #

    i have developed a deep learning model with a .csv file as training data.
    file contains a column with text data and during execution of the code
    could not convert string to float: ‘Moong(Green Gram)’
    this error is being displayed.
    what should i do?

    • Avatar
      Jason Brownlee February 22, 2019 at 2:44 pm #

      I’m not sure what the cause might be, sorry. Perhaps try debugging the data loading/transforming part of your code?

  150. Avatar
    Abhishek Verma February 22, 2019 at 4:59 pm #

    Hi Jason, I am unable to have access to the Flickr 8k dataset after filling the form. The link shows this:

    Please help me with it. Thank you!

    Not Found
    The requested URL /HockenmaierGroup/Framing_Image_Description/Flickr8k_Dataset.zip was not found on this server.

    Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

    • Avatar
      Jason Brownlee February 23, 2019 at 6:28 am #

      Yes, they have recently removed it.

      I have added direct download links above in the dataset section.

  151. Avatar
    Akhil February 22, 2019 at 7:31 pm #

    Jason…..we tried to complete the model generation using progressive loading…total 19 epochs ……And now we are getting outputs but the accuracy is very worse..is there any suggestions to improve the accuracy…..pls help

    • Avatar
      Jason Brownlee February 23, 2019 at 6:30 am #

      Please don’t use accuracy, instead use BLEU scores – perhaps re-read the post!

  152. Avatar
    James February 23, 2019 at 8:30 pm #

    How to train this model using mscoco dataset?

    • Avatar
      Jason Brownlee February 24, 2019 at 9:06 am #

      Sorry, I don’t have an example of training with MSCOCO. Thanks for the suggestion.

  153. Avatar
    Aron February 24, 2019 at 11:30 pm #

    Great article!
    Comprehensive, well-written and well-explained. I used the progressive loading approach and ran the scripts in Google Colab. Everything worked fine (got some errors along the process every now and then, but managed to solve them all). I am currently extracting features using VGG16, VGG19, ResNet50 and Inception and hopefully will make a comparison between them. Thanks for this great post!

    • Avatar
      Aron February 25, 2019 at 12:18 am #

      I wanted to get your opinion on this. Since I used progressive loading, I do not have a measure for the loss function on the validation dataset, so I took the models and evaluated the BLEU scores directly. However, it’s not straightforward to decide which models gives the best performance and when exactly the model starts to overfit.

      I calculated the mean squared error between some “ideal” BLEU scores taken from Marc Tanti et al. (BLEU-1 = 0.6, BLEU-2 = 0.413, BLEU-3 = 0.273, BLEU-4 = 0.178) and the BLEU scores I’ve obtained for my models. The best performing one was actually model_0.h, which was the model calculated after the first epoch. However, I don’t know if the mean squared error is actually very indicative or relevant in this case, but I didn’t know what else to use. From the limited amount of research I’ve done online, I tend to believe that BLEU-4 is a bit more important than the rest of the BLEU scores, but I am not sure. Do you have any suggestions?

      Thank you for your time!

      • Avatar
        Jason Brownlee February 25, 2019 at 6:46 am #

        Perhaps look at the loss or the learning curve of loss across all saved models?

    • Avatar
      Jason Brownlee February 25, 2019 at 6:44 am #


      Well done! Let me know what works well/best.

  154. Avatar
    Abhishek Verma February 25, 2019 at 7:46 pm #

    Why did you increment vocab_size by 1 ??

    • Avatar
      Jason Brownlee February 26, 2019 at 6:16 am #

      To start words at index 1 and make room for 0 == unknown word.

  155. Avatar
    Hassan February 26, 2019 at 10:35 pm #

    Hy Jason!
    Thanks for great article.
    I tried to run the model through progressive loading. My code is running perfectly. But my model generates generates just 3,4 type of captions for every image. It seems model is being trained on just 3,4 captions. I follow exact your code.
    Any suggestion to improve my results.

    (PS: I am testing on the same images, on which model has been trained . . .but still result is worse)

    • Avatar
      Jason Brownlee February 27, 2019 at 7:29 am #

      Sorry to hear that, some ideas:

      Perhaps try re-fitting the model?
      Perhaps try using a different final model?
      Perhaps there was a typo in your code or you skipped a line?

  156. Avatar
    Hassan February 28, 2019 at 9:53 pm #

    Hy Jason !
    I am not getting that why you reshaped image in the 4 dimensions .

    image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))

    What is the purpose of reshaping the input images in that dimension. Please help me out . . .

    • Avatar
      Jason Brownlee March 1, 2019 at 6:18 am #

      The model expects an array of samples as input, e.g. 1 sample, and each image has rows, cols and channels.

  157. Avatar
    erebus March 1, 2019 at 3:10 pm #

    Hi Jason, how can I continue your code with beam search algorithm? Because I want to show all the captions per image. Thanks!

  158. Avatar
    Rijoan March 7, 2019 at 5:27 am #

    I have problems in this section

    The complete updated example with progressive loading (use of the data generator) for training the caption generation model is listed below.

    my output is shown below :

    Requirement already satisfied: pydot in c:\users\rijoanrabbi\anaconda3\lib\site-packages (1.4.1)
    Requirement already satisfied: pyparsing>=2.1.4 in c:\users\rijoanrabbi\anaconda3\lib\site-packages (from pydot) (2.2.0)
    Dataset: 6000
    Descriptions: train=6000
    Photos: train=6000
    Vocabulary Size: 7579
    Description Length: 34
    Layer (type) Output Shape Param # Connected to
    input_9 (InputLayer) (None, 34) 0
    input_8 (InputLayer) (None, 4096) 0
    embedding_3 (Embedding) (None, 34, 256) 1940224 input_9[0][0]
    dropout_5 (Dropout) (None, 4096) 0 input_8[0][0]
    dropout_6 (Dropout) (None, 34, 256) 0 embedding_3[0][0]
    dense_7 (Dense) (None, 256) 1048832 dropout_5[0][0]
    lstm_3 (LSTM) (None, 256) 525312 dropout_6[0][0]
    add_3 (Add) (None, 256) 0 dense_7[0][0]
    dense_8 (Dense) (None, 256) 65792 add_3[0][0]
    dense_9 (Dense) (None, 7579) 1947803 dense_8[0][0]
    Total params: 5,527,963
    Trainable params: 5,527,963
    Non-trainable params: 0
    ImportError Traceback (most recent call last)
    in ()
    163 # define the model
    –> 164 model = define_model(vocab_size, max_length)
    165 # train the model, run epochs manually and save after each epoch
    166 epochs = 20

    in define_model(vocab_size, max_length)
    130 # summarize model
    131 model.summary()
    –> 132 plot_model(model, to_file=’model.png’, show_shapes=True)
    133 return model

    ~\Anaconda3\lib\site-packages\keras\utils\vis_utils.py in plot_model(model, to_file, show_shapes, show_layer_names, rankdir)
    130 ‘LR’ creates a horizontal plot.
    131 “””
    –> 132 dot = model_to_dot(model, show_shapes, show_layer_names, rankdir)
    133 _, extension = os.path.splitext(to_file)
    134 if not extension:

    ~\Anaconda3\lib\site-packages\keras\utils\vis_utils.py in model_to_dot(model, show_shapes, show_layer_names, rankdir)
    53 from ..models import Sequential
    —> 55 _check_pydot()
    56 dot = pydot.Dot()
    57 dot.set(‘rankdir’, rankdir)

    ~\Anaconda3\lib\site-packages\keras\utils\vis_utils.py in _check_pydot()
    18 if pydot is None:
    19 raise ImportError(
    —> 20 ‘Failed to import pydot. ‘
    21 ‘Please install pydot. ‘
    22 ‘For example with pip install pydot.’)

    ImportError: Failed to import pydot. Please install pydot. For example with pip install pydot.

    • Avatar
      Jason Brownlee March 7, 2019 at 6:59 am #

      You can comment out the plot_model() call if you like.

    • Avatar
      Akshat Jadhav September 15, 2020 at 12:21 am #

      Hii…..How did u solve this error?
      I m also stuck here….I need ur help

  159. Avatar
    Md.Rijoan March 8, 2019 at 3:39 pm #

    how many epoches takes for this training ? it took’s 6 hours per epoces ,now i am concerning how much time it will be taken ?

    my laptop configuration
    Ram: 4Gb
    graphics: 2Gb

    another thanks for your previous reply 🙂

    • Avatar
      Jason Brownlee March 9, 2019 at 6:21 am #

      Typically good results (low loss) can be seen in the first few epochs.

      • Avatar
        Aman September 11, 2019 at 10:09 am #

        jason plz tell How many epochs takes for this training plzz tell us…Beacuse Evary Epoch take 3hour….5 Epoch enough or Not…plz tell me

  160. Avatar
    RT March 8, 2019 at 7:20 pm #

    Hi Jason
    Awesome tutorial
    Can you please guide me on how to call fit_generator like the same way we call model.fit(….)

    filepath = ‘model-ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5′

    checkpoint = ModelCheckpoint(filepath, monitor=’val_loss’, verbose=1,save_best_only=True, mode=’min’)

    model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

    i.e. along with callbacks , save best only and include a tensorboard callback to it too!

    It’d be of great help.

    Thank you!

    • Avatar
      Jason Brownlee March 9, 2019 at 6:23 am #

      You can fall fit_generator() in an identical way to calling fit().

      What problem are you having exactly?

      • Avatar
        RT March 9, 2019 at 6:12 pm #

        In Including a tensorboard callback

        • Avatar
          RT March 10, 2019 at 2:16 am #






          ValueError Traceback (most recent call last)
          in ()
          14 #model.fit_generator(generator_train,steps_per_epoch=64,epochs=20,verbose=2,validation_data=next(generator_validtn),validation_steps=64,callbacks=[checkpoint])#tf.keras.callbacks.TensorBoard()
          —> 16 model.fit_generator(generator_train,steps_per_epoch=32,epochs=20,verbose=2,callbacks=[checkpoint],validation_data=generator_test,validation_steps=32)

          /usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
          89 warnings.warn(‘Update your ' + object_name + ' call to the ‘ +
          90 ‘Keras 2 API: ‘ + signature, stacklevel=2)
          —> 91 return func(*args, **kwargs)
          92 wrapper._original_function = func
          93 return wrapper

          /usr/local/lib/python3.6/dist-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
          1416 use_multiprocessing=use_multiprocessing,
          1417 shuffle=shuffle,
          -> 1418 initial_epoch=initial_epoch)
          1420 @interfaces.legacy_generator_methods_support

          /usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
          215 outs = model.train_on_batch(x, y,
          216 sample_weight=sample_weight,
          –> 217 class_weight=class_weight)
          219 outs = to_list(outs)

          /usr/local/lib/python3.6/dist-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight)
          1209 x, y,
          1210 sample_weight=sample_weight,
          -> 1211 class_weight=class_weight)
          1212 if self._uses_dynamic_learning_phase():
          1213 ins = x + y + sample_weights + [1.]

          /usr/local/lib/python3.6/dist-packages/keras/engine/training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size)
          749 feed_input_shapes,
          750 check_batch_axis=False, # Don’t enforce the batch size.
          –> 751 exception_prefix=’input’)
          753 if y is not None:

          /usr/local/lib/python3.6/dist-packages/keras/engine/training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
          136 ‘: expected ‘ + names[i] + ‘ to have shape ‘ +
          137 str(shape) + ‘ but got array with shape ‘ +
          –> 138 str(data_shape))
          139 return data

          ValueError: Error when checking input: expected input_7 to have shape (4096,) but got array with shape (1536,)

        • Avatar
          Jason Brownlee March 10, 2019 at 8:16 am #

          I dob’t believe callbacks make sense or can be used the same way when running epochs manually as we do in the progressive loading section.

          • Avatar
            RT March 10, 2019 at 11:20 pm #

            Yeah, But I need a tensorboard callback for this, So how should I proceed with it?

          • Avatar
            Jason Brownlee March 11, 2019 at 6:51 am #

            No, sorry.

  161. Avatar
    Md.Rijoan March 10, 2019 at 8:55 am #

    Thanks all.

    Finally i can run this in my laptop within 48 hours running in my laptop to train 6 epoches only.

    i have some issues with library function and parameter name or rename problem with direction setup.

  162. Avatar
    martino March 16, 2019 at 12:53 am #

    Hi Jason thanks for the excellent write up.

    My workstation has 4 Titan XP GPUs and 128 GB RAM (Ubuntu 14.04) and stalls when training, right after showing “Epoch 1/20”.

    If I run the version with progressive loading, it does proceed with training, but is too slow to be practical.

    Please let me know if you have any suggestions!

  163. Avatar
    sadam March 21, 2019 at 8:13 pm #

    Hy Jason!
    Thanks for great article.
    I am traying to change learning rate to 1e-5 so I change the code like this:

    # compile model
    opt = SGD(lr=0.01, momentum=0.9)
    model.compile(loss=’categorical_crossentropy’, optimizer=’opt’)
    # summarize model
    return model
    but I received this error

    ValueError: Unknown optimizer: opt
    so how can I change it .

    another thing:

    I tried to extract the features by using resnet50 so I only change a little in the code like this :

    from keras.applications.resnet50 import ResNet50
    from keras.applications.resnet50 import preprocess_input, decode_predictions
    # load the model
    model = ResNet50()

    but it is not working , can you help me to change it

    • Avatar
      Jason Brownlee March 22, 2019 at 8:26 am #

      Yes, don’t quite it like ‘opt’, just specify the variable name: opt

      Why is resnet not working?

  164. Avatar
    sadam March 22, 2019 at 10:17 pm #

    Hy Jason!
    Thanks it is working.

    but when I am trying to train the model with resnet50 I received this error:

    ValueError: Error when checking input: expected input_1 to have shape (4096,) but got array with shape (2048,)

    is every model has different shape and what shape I should write for resnet50 and inceptionv3

    another thing:

    to extract the features by using inceptionv3 should i only change a little in the code like this :

    from keras.applications.inception_v3 import InceptionV3

    # load the model
    model = InceptionV3()

    and thank you so much for your help

    • Avatar
      Jason Brownlee March 23, 2019 at 9:28 am #

      You can change the same of your input to match the model or change the model to match the shape of your data.

      For example, you can specify the input shape for the pre-trained model and use average pooling on the output layer.

      • Avatar
        abbas July 27, 2019 at 5:48 pm #

        jason i am facing the same error while trying inceptionv3
        ValueError: Error when checking input: expected input_1 to have shape (4096,) but got array with shape (2048,)

        please let me know how to specify the input shape for my inceptionv3?
        please write code if possible.

        • Avatar
          Jason Brownlee July 28, 2019 at 6:41 am #

          Perhaps start with vgg. Once you have it working, you can try adapting it to use another model.

  165. Avatar
    Saddam March 23, 2019 at 11:53 pm #

    Hy jason!
    Really thank you I changed only the shape and it is working but I found that vgg16 is better than inceptionv3 , is it true?

    • Avatar
      Jason Brownlee March 24, 2019 at 7:06 am #

      It can, it depends on the specifics of the application.

    • Avatar
      Edward October 22, 2020 at 10:29 pm #

      Saddam can you please share your code or github link for inceptionv3 and resnet50? link: edwardsharma1311@gmail.com

  166. Avatar
    Saddam March 24, 2019 at 7:53 pm #

    Thank you so much my dear teacher

  167. Avatar
    Habeb March 26, 2019 at 2:53 am #

    Hi Jason , thank you For this article. I want to ask you what is the best learning rate and regularization to this code Because I tried to use different learning rate but the results was not ok

  168. Avatar
    Anjali March 28, 2019 at 6:04 pm #

    Hi Jason. Extremely Thank You for this fascinating article. Now I’ve chosen this for my Masters Project. Sir can you give me the Data Flow Diagram for the same upto 2 levels. Actually i’ve done but still want to clarify. So can you please try it.

    • Avatar
      Jason Brownlee March 29, 2019 at 8:27 am #

      Sorry, I cannot prepare a data flow diagram for you. You have everything you need to create one yourself.

    • Avatar
      Harshit Mittal March 5, 2020 at 4:27 pm #

      heyy Can u plz send your data flow diagram. I too just want to clarify

  169. Avatar
    Habeb March 28, 2019 at 6:11 pm #

    Hi Jason , when I am traying to decrease the size of Dense from 256 to 128 I found this error

    ValueError: Operands could not be broadcast together with shapes (128,) (256,).

    what is the reason for this error

  170. Avatar
    Habeb March 28, 2019 at 7:09 pm #

    another question :
    I run your code and in epoch 3 I received this result

    BLEU-1: 0.566754
    BLEU-2: 0.310778
    BLEU-3: 0.210816
    BLEU-4: 0.095132

    why I didn’t received like your result specially blue-4 .

  171. Avatar
    Habeb March 28, 2019 at 8:28 pm #

    another question :

    to add one lstm more I only add

    se3 = LSTM(256, return_sequences=True)(se2)

    se4 = LSTM(256)(se3)

    is this true or no? and thank you for your help

  172. Avatar
    Yash Dwivedi March 30, 2019 at 7:34 pm #

    Hey, amazing work
    When i train my network iam getting this error:-

    if len(set(self.inputs)) != len(self.inputs):

    TypeError: unhashable type: ‘numpy.ndarray’

    What is the problem?

    • Avatar
      Jason Brownlee March 31, 2019 at 9:28 am #

      That is very odd, I have not seen that before, sorry.

  173. Avatar
    Audi April 1, 2019 at 5:21 am #

    Hey Jason,
    Can I train the following model on Flickr30K with 16GB of RAM?

  174. Avatar
    Anjali April 6, 2019 at 3:21 am #

    hii Jason, I’m using progressive loading. And the last sequence of code is as follows…..

    Total params: 5,527,963
    Trainable params: 5,527,963
    Non-trainable params: 0
    WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.cast instead.
    Epoch 1/1
    2019-04-05 16:17:05.604626: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
    2019-04-05 16:17:05.824794: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1796570000 Hz
    2019-04-05 16:17:05.834826: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6e06d60 executing computations on platform Host. Devices:
    2019-04-05 16:17:05.834922: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
    6000/6000 [==============================] – 5512s 919ms/step – loss: 4.7276
    Epoch 1/1
    6000/6000 [==============================] – 5896s 983ms/step – loss: 3.9618
    Epoch 1/1
    6000/6000 [==============================] – 5610s 935ms/step – loss: 3.7152
    Epoch 1/1
    2800/6000 [============>. . . . . . . . . . . . . . . . . ] – ETA- 52:12 loss:3.58

    And it is still loading. So my question is when do i want to close the terminal as it is taking hours to load?

    • Avatar
      Jason Brownlee April 6, 2019 at 6:51 am #

      It is not loading, the code is running and the model is being fit.

      You can probably kill it after 5-10 epochs.

  175. Avatar
    Anjali April 6, 2019 at 1:34 pm #

    Thank you…! Now I’m facing another issue. While running the evaluation code, I got some error.

    Using TensorFlow backend.
    Traceback (most recent call last):
    File “./evaluate.py”, line 7, in
    from nltk.translate.bleu_score import corpus_bleu
    ImportError: No module named nltk.translate.bleu_score

    • Avatar
      Anjali April 6, 2019 at 2:06 pm #

      Yes, Ive fixed it myself. And the BLEU Scores i got is:

      BLEU-1: 0.555265
      BLEU-2: 0.311960
      BLEU-3: 0.217599
      BLEU-4: 0.103576

    • Avatar
      Jason Brownlee April 7, 2019 at 5:25 am #

      It looks like you might need to install the nltk library.

  176. Avatar
    Anjali P Kaimal April 7, 2019 at 4:03 pm #

    Yes. Thank You Jason. But now when I try to get caption for another image, the first caption is displaying. I’ve tried several images but the caption is not changing. Why is it so?

    • Avatar
      Jason Brownlee April 8, 2019 at 5:54 am #

      Perhaps your model is overfit?

      You could try fitting the model again, or using a model from an earlier step in the training process?

  177. Avatar
    Anjali April 8, 2019 at 1:02 am #

    Yeah.. Now it is working. But the caption has no accuracy. Jason, what about trying MS-COCO dataset instead of flickr? As the number of images n their captions are very high, Is there any chance for getting accurate results?

    • Avatar
      Jason Brownlee April 8, 2019 at 5:55 am #

      I suspect your model is overfit.

      You can explore another dataset, let me know how you go.

  178. Avatar
    samjava April 8, 2019 at 6:24 am #

    Ever since started studying books authored by you, my zeal for machine learning grew up fast

  179. Avatar
    Alka April 8, 2019 at 9:39 pm #

    Hi Jason,
    I am using InceptionV3 model here…everything is going great but when i am testing train model with an image i got an error…………………….

    File “\Anaconda3\lib\site-packages\keras\engine\training_utils.py”, line 138, in standardize_input_data

    ValueError: Error when checking input: expected input_12 to have shape (2048,) but got array with shape (1000,)

  180. Avatar
    CJ April 9, 2019 at 12:50 pm #

    Hi Jason.

    This is such a great blog! So glad to see you still actively responding.

    I’m a noob to machine learning, so forgive my simplistic understanding. I have a question about how to alter the “style” of the captions (style transfer). For example, how would I go about changing the existing captions to match the linguistic style of a child, or of different people (ie, Donald Trump, Snoop Dog, etc.), using text files with speech samples of various people?

    Based on your article here, you stated that I could use Pre-trained Word Vectors, or use my own vocabulary file and mapping to integers function during training..?? Can you explain that a little more and perhaps point me in the right direction for applying these methods?

    Any recommendation would be appreciated. Thanks in advance.

    • Avatar
      Jason Brownlee April 9, 2019 at 2:42 pm #

      Good question, you might need to first translate the text examples in the training dataset, then use that as the training dataset for fitting the caption model.

  181. Avatar
    Alka April 9, 2019 at 7:54 pm #

    Hi Jason,

    i solved my previous problem but now i got stuck with the result

    startseq man man man man man man man man man man man man man man man man man man man man man man man man man man man man man man man man man man

    every time i generate the caption .

    • Avatar
      Jason Brownlee April 10, 2019 at 6:10 am #

      Perhaps your model is overfit or underfit?

      Perhaps try fitting it again?

    • Avatar
      HomaK February 28, 2022 at 5:18 am #

      Hi Alka
      I also have same problem. Could you resolve it?

  182. Avatar
    Anjali April 11, 2019 at 2:13 pm #

    Hi.. Jason. Now I’m going to change the dataset by Flickr30K dataset. Inorder to check any change in accuracy. So I’ve a doubt, here flickr 8K provides 6K train images. So is there any chance of increasing accuracy as increasing the number of train images?

  183. Avatar
    Shayak April 12, 2019 at 12:54 pm #

    Hi Jason,
    Thank you so much for your great work, which is really helpful to understand image captioning work from scratch.I have prepared own dataset containing Nepalese socio-cultural images (total 400 images with 3 captions per image).
    1. It works fine on training set but generates jumbled sentences on test set.Why this happen and how to generate relevent and grammatically correct captions?
    2. What is the role of accuracy? I think accuracy refers to the VGG16’s accuracy, so is it relevent to calculate here?
    3. After 40 epoch, there is no significant decreases on loss. So is there any way to reduce loss?
    4. I have changed droupout from 0.5 to other values but no significant changes in result happen.So, is it necessary to put droupout on small dataset also?

    Best regards

  184. Avatar
    Anjali April 13, 2019 at 1:37 am #

    Hii Jason, I’m using 3000 development images and 25381 train images as I’m using flickr30K dataset. So how many epochs we need?

    • Avatar
      Jason Brownlee April 13, 2019 at 6:33 am #

      It is an intractable question.

      Train until the model achieves a good fit.

  185. Avatar
    Anjali April 13, 2019 at 11:33 pm #

    While extracting the features of images, i got an error.

    terminate called after throwing an instance of ‘std::bad_alloc’
    what(): std::bad_alloc
    Aborted (core dumped)

    i’ve waited one whole day for the execution, but it end up like this. What will be the reason?

    • Avatar
      Jason Brownlee April 14, 2019 at 5:48 am #

      Sorry to hear that.

      It sounds like a hardware fault.

      Perhaps try searching/posting about it on stackoverflow?

  186. Avatar
    Xu Zhang April 17, 2019 at 5:01 am #

    Such a great post! I learned a lot from it.

    When you train your model with progressive loading, what is the reason that you used a for loop and train your model with one epoch?

    for i in range(epochs):
    # create the data generator
    generator = data_generator(train_descriptions, train_features, tokenizer, max_length)
    # fit for one epoch
    model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
    # save model
    model.save(‘model_’ + str(i) + ‘.h5’)

    Many thanks.

    • Avatar
      Jason Brownlee April 17, 2019 at 7:05 am #

      In this tutorial, because I want to save my model manually each epoch.

  187. Avatar
    Donny April 19, 2019 at 5:23 am #

    Hi Jason,

    This is an excellent tutorial and really thankful for it. May I check with you, why did you use 256 for your dense_1 and lstm_1 layers? Are there any considerations how one might choose this number?

  188. Avatar
    madi April 19, 2019 at 4:45 pm #

    Hi Jason
    can you plz tell me what basically this error is ?
    mportError Traceback (most recent call last)
    ImportError: numpy.core.multiarray failed to import

    The above exception was the direct cause of the following exception:

    SystemError Traceback (most recent call last)
    ~\Anaconda3\lib\importlib\_bootstrap.py in _find_and_load(name, import_)

    SystemError: returned a result with an error set

    ImportError Traceback (most recent call last)
    ImportError: numpy.core._multiarray_umath failed to import

    ImportError Traceback (most recent call last)
    ImportError: numpy.core.umath failed to import

  189. Avatar
    habib April 20, 2019 at 3:05 am #

    hi jason
    how can i use beam search with this code

  190. Avatar
    Arelis April 20, 2019 at 10:03 pm #

    Hello Jason. I have a problem with the file of features
    features = {k: all_features[k] for k in dataset}
    IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
    Could you help, please

    Thank you

  191. Avatar
    Anjali April 22, 2019 at 12:25 pm #

    While executing the code for generating new caption I got an error.

    File “./ur_caption.py”, line 73, in
    description = generate_desc(model, tokenizer, photo, max_length)
    File “./ur_caption.py”, line 49, in generate_desc
    yhat = model.predict([photo,sequence], verbose=0)
    File “/usr/local/lib/python2.7/dist-packages/keras/engine/training.py”, line 1149, in predict
    x, _, _ = self._standardize_user_data(x)
    File “/usr/local/lib/python2.7/dist-packages/keras/engine/training.py”, line 751, in _standardize_user_data
    File “/usr/local/lib/python2.7/dist-packages/keras/engine/training_utils.py”, line 138, in standardize_input_data
    ValueError: Error when checking input: expected input_2 to have shape (74,) but got array with shape (34,)

    How to resolve this?

  192. Avatar
    habib April 26, 2019 at 10:07 pm #

    hi Jason :
    for vgg16 we put model = Model(inputs=model.inputs, outputs=model.layers[-1].output)
    for inception v2 is the same of I should change outputs=model.layers[-2]

    • Avatar
      Jason Brownlee April 27, 2019 at 6:30 am #

      It might be different considering the architecture of the model. Perhaps use the API to create average pooling layer on the output and add output layers to it?

  193. Avatar
    sadam nagi April 29, 2019 at 9:02 pm #

    hi Jason :
    I want to ask you about why I got like this result:

    with model-ep001-loss4.514-val_loss4.070.h5 : I got this result

    BLEU-1: 0.568186
    BLEU-2: 0.308237
    BLEU-3: 0.208821
    BLEU-4: 0.094350

    but with model-ep002-loss3.878-val_loss3.897.h5 : I got this result

    BLEU-1: 0.431618
    BLEU-2: 0.224081
    BLEU-3: 0.148920
    BLEU-4: 0.059321

  194. Avatar
    sadam nagi April 30, 2019 at 8:10 pm #

    thank you so much

  195. Avatar
    sadam nagi April 30, 2019 at 8:43 pm #

    hi jaso :
    for example with progressive how can i change batch size equal to 16 or any number
    and thank you for your help

  196. Avatar
    madi April 30, 2019 at 9:14 pm #

    Hi Jason
    while progressive loading i am having this error
    Dataset: 6000
    Descriptions: train=6000
    Photos: train=6000
    Vocabulary Size: 3857
    Description Length: 30
    Layer (type) Output Shape Param # Connected to
    input_7 (InputLayer) (None, 30) 0
    input_6 (InputLayer) (None, 4096) 0
    embedding_3 (Embedding) (None, 30, 256) 987392 input_7[0][0]
    dropout_5 (Dropout) (None, 4096) 0 input_6[0][0]
    dropout_6 (Dropout) (None, 30, 256) 0 embedding_3[0][0]
    dense_7 (Dense) (None, 256) 1048832 dropout_5[0][0]
    lstm_3 (LSTM) (None, 256) 525312 dropout_6[0][0]
    add_3 (Add) (None, 256) 0 dense_7[0][0]
    dense_8 (Dense) (None, 256) 65792 add_3[0][0]
    dense_9 (Dense) (None, 3857) 991249 dense_8[0][0]
    Total params: 3,618,577
    Trainable params: 3,618,577
    Non-trainable params: 0
    WARNING:tensorflow:From C:\Users\Dell\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.cast instead.
    Epoch 1/1
    ValueError Traceback (most recent call last)
    168 generator = data_generator(train_descriptions, train_features, tokenizer, max_length)
    169 # fit for one epoch
    –> 170 model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
    171 # save model
    172 model.save(‘model_’ + str(i) + ‘.h5’)

    ~\Anaconda3\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
    89 warnings.warn(‘Update your ' + object_name + ' call to the ‘ +
    90 ‘Keras 2 API: ‘ + signature, stacklevel=2)
    —> 91 return func(*args, **kwargs)
    92 wrapper._original_function = func
    93 return wrapper

    ~\Anaconda3\lib\site-packages\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    1416 use_multiprocessing=use_multiprocessing,
    1417 shuffle=shuffle,
    -> 1418 initial_epoch=initial_epoch)
    1420 @interfaces.legacy_generator_methods_support

    ~\Anaconda3\lib\site-packages\keras\engine\training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    215 outs = model.train_on_batch(x, y,
    216 sample_weight=sample_weight,
    –> 217 class_weight=class_weight)
    219 outs = to_list(outs)

    ~\Anaconda3\lib\site-packages\keras\engine\training.py in train_on_batch(self, x, y, sample_weight, class_weight)
    1209 x, y,
    1210 sample_weight=sample_weight,
    -> 1211 class_weight=class_weight)
    1212 if self._uses_dynamic_learning_phase():
    1213 ins = x + y + sample_weights + [1.]

    ~\Anaconda3\lib\site-packages\keras\engine\training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size)
    749 feed_input_shapes,
    750 check_batch_axis=False, # Don’t enforce the batch size.
    –> 751 exception_prefix=’input’)
    753 if y is not None:

    ~\Anaconda3\lib\site-packages\keras\engine\training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
    126 ‘: expected ‘ + names[i] + ‘ to have ‘ +
    127 str(len(shape)) + ‘ dimensions, but got array ‘
    –> 128 ‘with shape ‘ + str(data_shape))
    129 if not check_batch_axis:
    130 data_shape = data_shape[1:]

    ValueError: Error when checking input: expected input_6 to have 2 dimensions, but got array with shape (15, 7, 7, 512)

  197. Avatar
    habib May 3, 2019 at 4:47 am #

    hi jason :

    I am a little confused, after training I received 20 models(for 20 epochs ) should I evaluate all of them and take the heighest accuracy and if like this it need along time.
    and thank you for your help

  198. Avatar
    habib May 3, 2019 at 7:38 am #

    hi jason :
    thank you for your answering

    means i should choose the heighst blue score because we are using BLUE

  199. Avatar
    ThiLee May 12, 2019 at 2:52 am #

    Hi Jason,
    Thank you very much for your amazing article
    I have a problem:
    tokenizer = load(open(‘tokenizer.pkl’, ‘rb’))
    FileNotFoundError: [Errno 2] No such file or directory: ‘tokenizer.pkl’
    I don’t know how the file is organized.
    Could you help me, please!

  200. Avatar
    Sravan Malla May 14, 2019 at 8:12 am #

    Can we use batch_size with progressive loading?

    like instead of steps_per_epoch = len(train_descriptions)
    can we give it as len(train_descriptions)/batch_size=32

    How do we ensure generator is yielding o/p in batches?

    currently if my understanding is correct, in our case the generator is yielding single output so we had steps_per_epoch = len(train_descriptions). am i right?

    • Avatar
      Jason Brownlee May 14, 2019 at 2:26 pm #

      Yes, you have complete control over the data generator.

      You can load and yield any number of sample you wish.

      • Avatar
        Sravan Malla May 14, 2019 at 3:03 pm #

        So in this case, where our data generator is yielding single sample everytime, we can’t use
        steps_per_epoch = len(train_descriptions)/batch_size=32 ? am I right?

        • Avatar
          Jason Brownlee May 15, 2019 at 8:09 am #

          The steps per epoch should be the total samples divided by the batch size, perhaps as you have listed.

          • Avatar
            Sravan Malla May 15, 2019 at 3:07 pm #

            Taking this as reference I built data generator for Neural machine translation to load larger data and one-hot encode the targets and train the model without facing Memory Error, just an extension to your tutorial at https://machinelearningmastery.com/develop-neural-machine-translation-system-keras/.

            There the Data Generator I built is yielding one output i.e. single trainX, trainY for every yield

            So while fit_generator, I just gave a try giving “steps_per_epoch = total_samples/32 (batch_size)”, then I tried to evaluate model saved after 1 epoch and results seems to be surprising almost same and not making any sense for any input we give, may be beacuse not all the records are passed through the model, becasue the generator I coded is yielding one record at a time and steps I am asking is lesser i.e. diving by batch size.

            So changed it back to steps_per_epoch = total samples, then the model I saved after 1 epoch is giving some sensible outputs may not be very accurate as its just 1 epoch but taking some good amount of time to train.

          • Avatar
            Jason Brownlee May 16, 2019 at 6:23 am #

            Well done!

    • Avatar
      Abhi February 18, 2020 at 4:25 am #

      Sir, I’m getting memory error while running updated code too..What can I do sir ? My jupyter notebook is also not responding when running that code..please give me a solution sir…

      • Avatar
        Jason Brownlee February 18, 2020 at 6:24 am #

        Run from the command line, not a notebook.

        Use progressive loading.

  201. Avatar
    Boubacar May 29, 2019 at 8:26 am #

    hello Jason!
    I use the progressive loading and I fund

    BLEU-1: 0.536255
    BLEU-2: 0.289525
    BLEU-3: 0.201866
    BLEU-4: 0.096334
    can you explain me what I can do for ameliorate this.

    secondly when I wnnt generate a new caption whith a new photograph for example a Photo of a dog at the beach, I fund this description “startseq two black dogs are playing in the water endseq”, you see it’s not a good description.


  202. Avatar
    Saurabh Shinde June 3, 2019 at 1:52 am #

    Hello Jason,
    I am doing this experiment with Flickr30k dataset. When training the same decoder architecture and evaluating on test data, the model performance is decreasing when compared to Flickr8k. The BLEU scores for F30k are worse than F8k. What should be done to solve this problem?

    1) Limit the vocabulary? (F8k had around 8000 words and F30k has 18000 words)
    2) Add another LSTM layer? (but doubles the training time)
    3) Increase/Decrease word vector dimension?
    4) Change no. of units in LSTM?
    5) Add more Dense layers?

    What should be done to achieve similar performance of F8k on F30k?

    • Avatar
      Jason Brownlee June 3, 2019 at 6:43 am #

      Perhaps try each approach and evaluate the effect on model skill.

      • Avatar
        Saurabh Shinde June 5, 2019 at 2:13 am #

        Also, I have a doubt. In create_sequences function, we are not passing the vocab_size, but it is used in the to_categorical method. How is it one-hot encoding it if vocab_size variable is not passed in the function?

        • Avatar
          Jason Brownlee June 5, 2019 at 8:49 am #

          Looks like a bug, create_sequences uses the vocab size.

          I will schedule time to update the code.


          Update: Fixed.

        • Avatar
          JN June 10, 2019 at 11:05 pm #

          Hi Saurabh,

          We might not need to pass the vocab_size as argument for the create_sequences function because the vocab_size is a global variable so it can be used inside the create_sequences function. Is that right, Jason Brownlee?

          • Avatar
            Jason Brownlee June 11, 2019 at 7:53 am #

            Yes, but that was not the intent. I like to pass things around.

          • Avatar
            Saurabh Shinde June 23, 2019 at 12:46 am #

            Yes, I thought of that too, because of vocab_size being a global variable. Anyways, thanks!

  203. Avatar
    CharlesYuan June 6, 2019 at 2:27 pm #

    i try to download the dataset but it failed two times when download about 873 Mb

    • Avatar
      Jason Brownlee June 7, 2019 at 7:45 am #

      I’m sorry to hear that.

      Perhaps try downloading from a different computer, at a different time of day, or via a different internet connection?

  204. Avatar
    JN June 10, 2019 at 11:00 pm #

    Hi Jason,
    Thank you for your great tutorial. I’ve been trying to understand your code line by line. However, I can’t figure out this line
    vocab_size = len(tokenizer.word_index) + 1
    I thought the vocab_size should be the same as the tokenizer.word_index. Why did you add an extra 1?

    • Avatar
      Jason Brownlee June 11, 2019 at 7:52 am #

      Good question.

      We add 1 for the integer “0” used for “unknown”, e.g. words not in our vocab.

      Therefore, integers assigned to words in our vocab start at 1, not 0.

  205. Avatar
    CharlesYuan June 11, 2019 at 9:22 pm #

    FInally i successfully download the dataset file. looks like the sample of your code looks wrong,

    # extract features from all images
    directory = ‘Flicker8k_Dataset’

    should be ‘Flickr8k not Flicker8k

    # extract features from all images
    directory = ‘Flickr8k_Dataset’

  206. Avatar
    CharlesYuan June 12, 2019 at 11:42 am #

    Just let you know,

    I tried the example.jpg and got “two dogs are running in the snow” 🙂

    • Avatar
      CharlesYuan June 12, 2019 at 11:53 am #

      i try two more images, one picture is a bird and the other is the “55470226_52ff517151”, but the return description is always “man in red shirt is standing on the street”

      • Avatar
        Jason Brownlee June 12, 2019 at 2:24 pm #

        Perhaps try a different fit of the model?

        • Avatar
          CharlesYuan June 13, 2019 at 11:59 am #

          Do you mean try different pictures?

          if so, I had tried about ten different pictures, man, dog or child, the model return

          9 time “man in red shirt is standing on the street” and 1 time ” two dogs are running in the grass’.

          • Avatar
            Jason Brownlee June 13, 2019 at 2:34 pm #

            I’m suggesting perhaps try refitting the model/try different checkpoints of saved model weights.

            Your chosen model may have overfit.

    • Avatar
      Jason Brownlee June 12, 2019 at 2:24 pm #

      Ha, nice!

      The image has a rejection that might make it a complex example.

  207. Avatar
    Ajay Dabas June 16, 2019 at 4:50 pm #

    Hi Jason, thanks for this awesome tutorial. I’d like to share my work which is highly inspired by this tutorial.

    Github repo: https://github.com/dabasajay/Image-Caption-Generator

    I’ve tried InceptionV3 and VGG16 as Encoder and two types of RNN as Decoder making a combination of 4 image captioning models and compared results. I also implemented BEAM search algorithm and compared results with simple argmax.

    Please have a look, thank you.

    Ajay Dabas
    Github: https://github.com/dabasajay

  208. Avatar
    saddam June 17, 2019 at 3:07 am #

    hello jaso :
    I want to ask you why for image caption we are using different matrix (bleu1, bleu2, bleu3 , bleu4) why 4 blue scores not one and what are the different betwwen them .

  209. Avatar
    Ritik June 20, 2019 at 2:39 pm #

    Hi Jason

    I am trying to replace the RNN model used for the language model by a CNN language model.

    I have understood the conepts but i am not able to figure out how to code it.

    If you don’t mind , please help me with the above

  210. Avatar
    saddam June 22, 2019 at 3:19 am #

    hi jason :

    i used vgg16 ,resnet50 and inception v3 in my model and the results like this :

    vgg16 :

    Bleu_1: 0.661
    Bleu_2: 0.486
    Bleu_3: 0.350
    Bleu_4: 0.252

    resnet50 :

    Bleu_1: 0.682
    Bleu_2: 0.511
    Bleu_3: 0.375
    Bleu_4: 0.274

    inception v3 :

    Bleu_1: 0.646
    Bleu_2: 0.470
    Bleu_3: 0.339
    Bleu_4: 0.248

    when i test images using those three models i found that :
    resnet50 generate a good sentence better than vgg16 ,
    but the problem that inception v3 generate a good sentence better than vgg16 and resnet50
    , so why bleu score for inception v3 is lowest than vgg16 and resnet50 , really i am so confused

    • Avatar
      Jason Brownlee June 22, 2019 at 6:46 am #

      Nice work!

      Perhaps the BLEU score is not capturing what you noticed in the generated sentence structure? It is a very simple score.

    • Avatar
      abbas July 27, 2019 at 5:09 pm #

      Please saddam share your code with me .I also want to try the inception model.thanks in advance.

    • Avatar
      Phyu Phyu Khaing October 24, 2019 at 8:09 pm #

      Dear Saddam,

      How do you create the model to get that results.
      I don’t get as that results.
      Can you please tell me how do you create the network to achieve those scores you obtained?
      If you are okay, may I know.
      My email address is phyukhaing7@gmail.com

      Best Regards,

      • Avatar
        Jason Brownlee October 25, 2019 at 6:39 am #

        I ran the code as described in the tutorial.

        Are you able to confirm your libraries are up to date?

        • Avatar
          Phyu Phyu Khaing October 25, 2019 at 8:01 pm #

          Yes, my library is up to date.

          I don’t get Saddam’s result.
          I would like to get the best result.
          How should change the model?

    • Avatar
      Phelan December 22, 2019 at 12:33 am #

      Hi saddam, Jason:
      For VGG16, I got poor result (specical BLUE-4). Not high as your result
      BLEU-1: 0.487551
      BLEU-2: 0.259738
      BLEU-3: 0.179878
      BLEU-4: 0.085398

      Is there any modification compared to this article?

      • Avatar
        Jason Brownlee December 22, 2019 at 6:15 am #

        Perhaps try fitting the model a few times and compare results?

  211. Avatar
    Saddam June 22, 2019 at 1:05 pm #

    I want to compare the results of those three models and write a discussion about it so I trained those three models until 32 epochs and I received those results , so for example depends on bleu score can I say that resnet50 gave me the best results

    • Avatar
      Jason Brownlee June 23, 2019 at 5:29 am #

      It is a good idea to average the results of a neural network over multiple runs, if possible.

      • Avatar
        Saddam June 23, 2019 at 2:31 pm #

        Thank you so much

  212. Avatar
    Saurabh Shinde June 23, 2019 at 12:45 am #

    Hi Saddam,
    Can you please tell me what parameters did you use in your network to achieve those scores you obtained?

    You can email me at this address: saurabh18@somaiya.edu

    • Avatar
      Jason Brownlee June 23, 2019 at 5:37 am #

      All of the parameters are listed in the code directly.

      What parameters are you having problems with exactly?

      • Avatar
        Saurabh Shinde June 23, 2019 at 6:50 pm #

        Hi Jason,

        I just want to know if I change the dataset from 8K to 30K, then should I change the sequence model architecture as well?, because I tried training with same architecture on 30k and it was overfitting and the for every new image it is given same caption.

        • Avatar
          Saurabh Shinde June 23, 2019 at 6:58 pm #

          Also, in the Show and Tell paper by A. Karpathy, LSTM units used is 512 and as mentioned in this paper (https://arxiv.org/pdf/1805.09137.pdf) embedding size and vocab_size used is 512 as well.

          512 vocab_size seems less or maybe having less vocabulary gives better results sometimes?

          I am confused.

        • Avatar
          Jason Brownlee June 24, 2019 at 6:22 am #

          More data may help if you don’t want to tune the model.

          • Avatar
            Saurabh Shinde July 5, 2019 at 10:05 pm #

            Hi Jason,

            Sorry for the late reply. I tried Ajay’s code and added support for xception,mobilenet and resent50 models.

            Now it is giving proper captions.

            Also, I wanted to know if we can give a different max_length value other than what we found in the dataset?

            For example, in Flickr8k, max_length of a caption is 34. What if I want to set it to some lower number. How can I do it?

            As mentioned in this (https://arxiv.org/pdf/1805.09137.pdf):

            “Following are a few key hyperparameters that we retained across various models. These could be helpful for attempting to reproduce our results.”

            RNN Size: 512
            Batch size: 16
            Learning Rate: 4e-4
            Learning Rate Decay: 50% every 50000 iterations
            RNN Sequence max_length: 16
            Dropout in RNN: 50%
            Gradient clip: 0.1%

          • Avatar
            Jason Brownlee July 6, 2019 at 8:38 am #

            Yes, you can change the length, I would encourage you to explore changing many aspects of the model configuration.

          • Avatar
            Saurabh Shinde July 6, 2019 at 3:56 pm #

            For example, in Flickr8k, max_length found is 34. But if we set it to 16, wouldn’t it throw an error saying “expected input_shape is (32, ) but got (16, )” or something like that.

            How to solve this problem?

            In Decoder Model,
            Input_layer_1 = Input_Shape((max_length, ))

          • Avatar
            Jason Brownlee July 7, 2019 at 7:48 am #

            You must change the expectation of the model.

          • Avatar
            Saurabh Shinde July 7, 2019 at 3:47 pm #

            So, if i change max_len to 16, what should be done in order to handle the captions whose length is greater than 16?

            Should i clip each caption to length 16? But, won’t it result in loss of information?

          • Avatar
            Saurabh Shinde July 10, 2019 at 2:45 pm #

            Thanks Jason!

            I’ll try using this and test whether the results are improving or not.

  213. Avatar
    Ketan Dhakate June 23, 2019 at 3:45 pm #

    i am getting same caption for different test images.

    • Avatar
      Jason Brownlee June 24, 2019 at 6:21 am #

      The model may have overfit, perhaps try fitting it again or choosing a different set of weights/final model?

  214. Avatar
    Ankit Rathi June 29, 2019 at 5:17 pm #

    Thanks you Sir for this great tutorial. I implemented the Image captioning in Google Colab. Can you please upload a tutorial for attention mechanism used in Image captioning with code.

  215. Avatar
    Nivethan nivan July 4, 2019 at 7:41 pm #

    Dear Jason,

    It’s a very good tutorial. Loved it, And made it working. I have two qustions.

    1. How can we make the network Predict the exact captions which we used for training?


    Original: a dog is playing with the ball
    predicted should be: a dog is playing with the ball
    (not some random/ something close to the original)

    2. How to stop the longer prediction?

    Example: (Currently this is what happening)

    Original: a dog is playing with the ball
    Predicted: a dog is playing with the ball ball ball ball ball ball ball ball ball ball ball ball

    A clear explanation would be helpful.

    Thanks a lot

  216. Avatar
    Ajay July 13, 2019 at 9:25 pm #

    Since LSTM cells produce sequential information, and the sequence in the above model is “words in the caption”.

    You have used LSTM layer before the features from CNN are used to generate the caption.

    We know that it is the decoder network which is generating the words of the caption but I’m unable to understand the role of LSTM.

    Which of these: LSTM or the decoder ?
    is generating the sequence.

    If decoder is generating the sequence then what is the role of the LSTM layer?

  217. Avatar
    Ajay July 13, 2019 at 11:51 pm #

    Hi Jason !

    I read about your merge model that you described in some other post and you used it here.

    But, it is not-understandable that you used LSTM (not making use of the image features) and the CNN separately.

    I mean, your lstm layer is just using word sequences( vectors) to generate the next word without using the image features.

    Image features are only added when you combine the LSTM and CNN vectors. Before combining, LSTM is not aware of the image because it is not using the image features. How is the LSTM generating the right captions when you are not even using the image features during LSTM network?

    • Avatar
      Jason Brownlee July 14, 2019 at 8:13 am #

      “why” questions might not be tractable at this time. We don’t have good theories on why many of these models work so well. But they do – so we use them.

      Same with drugs issued by doctors. No idea why they work, but they do – so we use them.

  218. Avatar
    Ajay July 14, 2019 at 12:04 am #

    Here is what I’ve understood from your model:

    1. The network is assuming that the caption for every image will use at max 34 words because the longest caption is of 34 words.

    2. The embedding layer is taking a word index and outputting a 256 long vector.

    3. After the dropout, the LSTM layer is used “consisting of 34 LSTM cells”.

    4. Each LSTM cell is producing a new word which is a 256 long vector.

    Correct me on the above points if I’m wrong.

    1. I’m not able to understand how the output dimension of LSTM is (None,256).
    2. How is the LSTM using the image information because CNN output is not being fed to the LSTM layer?
    3. What is the output of dense_3 layer.
    4. From which layer are we extracting the word sequences?

    • Avatar
      Ajay July 14, 2019 at 12:20 am #

      Got the answer of the above 3rd question that the output of dense_3 is the next word.

      1. But, does output of LSTM hold some significance or does it just compute some 256 long vector which will later be used(combined) with a CNN?

      2. Kindly explain how the 34 LSTM cells are taking input. I reckon you are giving the input sequence of words to these 34 LSTMs(if the length of the input sequence is less than 34 then these are padded). Each word is a 256-dimensional vector.

      Correct me if I’m wrong in the 2nd part.

    • Avatar
      Jason Brownlee July 14, 2019 at 8:15 am #

      Seems reasonable.

      Perhaps study this post and the associated papers:

      • Avatar
        Ajay July 16, 2019 at 6:44 pm #

        I’ve read about those architectures but still had some doubts and I’ve asked them on the post above.

        Kindly answer them.

  219. Avatar
    Nasif Mahbub July 22, 2019 at 7:06 am #

    This is a wonderful tutorial. Thank you Jason Brownlee. Would love to see a tutorial on attention mechanism applied on an image caption generator (preferably this one).

  220. Avatar
    Nasif Mahbub July 23, 2019 at 5:23 am #

    I’ve encountered a problem regarding feature extraction. In both the training phase and test phase when using the VGG16 feature extractor model, it is necessary to download VGG16 weight model which is 500+ MB. So, I followed the link they used to download the weight model and downloaded it.

    Then instead of using “VGG16()” I used:

    But then it shows the error:
    “raise ValueError(‘Cannot create group in read only mode.’)
    ValueError: Cannot create group in read only mode.”

    But for the same image it works perfectly when I use “VGG16()” and they download it from the link below:


    Downloading the model each time is not actually practical. Is there any workaround?

    • Avatar
      Jason Brownlee July 23, 2019 at 8:16 am #

      I believe you can specify the path to the model via the API, meaning you can download it once and reuse it each time.

      E.g. the “weights” argument when loading the model.

  221. Avatar
    Ankit Rathi July 25, 2019 at 1:59 am #

    Hi Jason,

    I tried to apply the above code and generated caption with BLEU-1 score of 0.52. But when I observed the predicted captions, same sentence were repeating for multiple Images. For example, if a dog appears in the Image then the model generate a caption : “dog is running through the grass” for multiple image. How to make a model more accurate so that its should not repeat the same caption. ?



    • Avatar
      Jason Brownlee July 25, 2019 at 7:55 am #

      Perhaps try fitting the model again and selecting a final model with lower loss on the holdout dataset?

      • Avatar
        Ankit Rathi July 26, 2019 at 1:38 am #

        Ok, I will try. Thank you for your quick response.

  222. Avatar
    Aman August 7, 2019 at 1:41 am #

    Hello Jason. I have a problem..I triyed many time but i can not fixed problem.plz tell me the solution….

    Total params: 134,260,544
    Trainable params: 134,260,544
    Non-trainable params: 0
    FileNotFoundError Traceback (most recent call last)
    39 # extract features from all images
    40 directory = ‘Flickr8k_Dataset’
    —> 41 features = extract_features(directory)
    42 print(‘Extracted Features: %d’ % len(features))
    43 # save to file

    in extract_features(directory)
    18 # extract features from each photo
    19 features = dict()
    —> 20 for name in listdir(directory):
    21 # load an image from file
    22 filename = directory + ‘\\Users\Admin’ + name

    FileNotFoundError: [WinError 3] The system cannot find the path specified: ‘Flickr8k_Dataset’

  223. Avatar
    Aman ullah August 22, 2019 at 1:12 am #

    whats wrong plz guide.

    filename = ‘model-ep002-loss3.245-val_loss3.612.h5’
    model = load_model(filename)
    # evaluate model
    evaluate_model(model, test_descriptions, test_features, tokenizer, max_length)

    OSError Traceback (most recent call last)
    1 filename = ‘model-ep002-loss3.245-val_loss3.612.h5’
    —-> 2 model = load_model(filename)
    3 # evaluate model
    4 evaluate_model(model, test_descriptions, test_features, tokenizer, max_length)

    D:\anconda3\lib\site-packages\keras\engine\saving.py in load_model(filepath, custom_objects, compile)
    415 model = None
    416 opened_new_file = not isinstance(filepath, h5py.Group)
    –> 417 f = h5dict(filepath, ‘r’)
    418 try:
    419 model = _deserialize_model(f, custom_objects, compile)

    D:\anconda3\lib\site-packages\keras\utils\io_utils.py in __init__(self, path, mode)
    184 self._is_file = False
    185 elif isinstance(path, str):
    –> 186 self.data = h5py.File(path, mode=mode)
    187 self._is_file = True
    188 elif isinstance(path, dict):

    D:\anconda3\lib\site-packages\h5py\_hl\files.py in __init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, **kwds)
    392 fid = make_fid(name, mode, userblock_size,
    393 fapl, fcpl=make_fcpl(track_order=track_order),
    –> 394 swmr=swmr)
    396 if swmr_support:

    D:\anconda3\lib\site-packages\h5py\_hl\files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
    168 if swmr and swmr_support:
    169 flags |= h5f.ACC_SWMR_READ
    –> 170 fid = h5f.open(name, flags, fapl=fapl)
    171 elif mode == ‘r+’:
    172 fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

    h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

    h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

    h5py\h5f.pyx in h5py.h5f.open()

    OSError: Unable to open file (unable to open file: name = ‘model-ep002-loss3.245-val_loss3.612.h5’, errno = 2, error message = ‘No such file or directory’, flags = 0, o_flags = 0)

  224. Avatar
    Aman August 22, 2019 at 1:44 am #

    filename = ‘model-ep002-loss4.6485-val_loss3.1147’
    model = load_model(filename)
    this code generate a error

    OSError: Unable to open file (unable to open file: name = ‘model-ep002-loss4.6485-val_loss3.1147’, errno = 2, error message = ‘No such file or directory’, flags = 0, o_flags = 0)

    whats wrong????
    plz tell us..

    • Avatar
      Jason Brownlee August 22, 2019 at 6:30 am #

      Ensure the file exists in the same directory as your .py file.

  225. Avatar
    Aman August 22, 2019 at 1:45 am #

    thanks jason great tutorial.

  226. Avatar
    Aman August 22, 2019 at 6:31 pm #

    whats wrong

    File “D:\anconda3\lib\site-packages\IPython\core\interactiveshell.py”, line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

    File “”, line 3, in
    from keras.preprocessing.text import Tokenizer

    File “D:\anconda3\lib\site-packages\keras\__init__.py”, line 3, in
    from . import utils

    File “D:\anconda3\lib\site-packages\keras\utils\__init__.py”, line 5, in
    from . import io_utils

    File “D:\anconda3\lib\site-packages\keras\utils\io_utils.py”, line 13, in
    import h5py

    File “D:\anconda3\lib\site-packages\h5py\__init__.py”, line 49, in
    from ._hl.files import (

    File “D:\anconda3\lib\site-packages\h5py\_hl\files.py”, line 13
    swmr_support = True
    IndentationError: unexpected indent

    • Avatar
      Jason Brownlee August 23, 2019 at 6:23 am #

      Looks like you have not preserved the indenting when you copied the code.

  227. Avatar
    Prashant Verma September 4, 2019 at 5:56 pm #

    Hi Jason,
    I am getting error while executing tokenizer

    Traceback (most recent call last):
    File “tokenizer.py”, line 65, in
    tokenizer = load(open(‘tokenizer.pkl’, ‘rb’))
    _pickle.UnpicklingError: invalid load key, ‘f’.

  228. Avatar
    Aman September 12, 2019 at 12:17 am #

    Jason Brownlee great tutorial…
    After 4 Epoch val did not chnage..

    Epoch 00004: val_loss improved from 3.85954 to 3.84653, saving model to model-ep004-loss3.609-val_loss3.847.h5
    Epoch 5/20
    – 8447s – loss: 3.5627 – val_loss: 3.8547

    Epoch 00005: val_loss did not improve from 3.84653
    Epoch 6/20
    – 8873s – loss: 3.5319 – val_loss: 3.8623

    Epoch 00006: val_loss did not improve from 3.84653
    Epoch 7/20
    – 8550s – loss: 3.5109 – val_loss: 3.8818

    Epoch 00007: val_loss did not improve from 3.84653
    Epoch 8/20
    – 11736s – loss: 3.5004 – val_loss: 3.8942

    Epoch 00008: val_loss did not improve from 3.84653
    Epoch 9/20
    – 8654s – loss: 3.4960 – val_loss: 3.9139

    Epoch 00009: val_loss did not improve from 3.84653
    Epoch 10/20

    • Avatar
      Aman September 12, 2019 at 12:37 am #

      jason after 4 epoch value did not change.if i can stop this program right now….because value did not change.10 epoch enough to train this model.

    • Avatar
      Jason Brownlee September 12, 2019 at 5:17 am #

      Perhaps choose the model with the lowest validation loss.

  229. Avatar
    Aman September 16, 2019 at 1:35 am #

    Dataset: 6000
    Descriptions: train=6000
    Vocabulary Size: 7579
    Description Length: 34
    Dataset: 1000
    Descriptions: test=1000
    Photos: test=1000
    BLEU-1: 0.535757
    BLEU-2: 0.282579
    BLEU-3: 0.192619
    BLEU-4: 0.089498

    But when I observed the predicted captions, same sentence were repeating for multiple Images. For example, if a dog appears in the Image then the model generate a caption : “dog is running through the grass” for multiple image.

    • Avatar
      Jason Brownlee September 16, 2019 at 6:38 am #

      Perhaps try fitting the model again, or use a different checkpoint?

      • Avatar
        Aman September 16, 2019 at 8:18 pm #

        Thanks Janson Brownlee I will try..
        Great tutorial.thanks for quick responce…

        • Avatar
          Jason Brownlee September 17, 2019 at 6:27 am #


        • Avatar
          Videl June 13, 2020 at 5:51 am #

          Hi Aman. has your problem solved? If yes, how so?
          I am facing the same issue. The same caption is being generated for the pictures.

  230. Avatar
    Narayan Iyer September 20, 2019 at 11:26 am #

    Thanks for the great tutorial! I have a doubt regarding the LSTM layer. Do we have as many memory units as the number of dimensions in the embedding matrix, and the dimension of input size to each unit is the size of the vocabulary? And is the output of each of the 256 unit 1×1, such that all outputs taken together is a 256 dimensional vector? If so, why do we need an LSTM? Can we not simply use 256 FC layers with each layer taking one dimension of all caption words generated?

    • Avatar
      Jason Brownlee September 20, 2019 at 1:39 pm #

      No, the number of nodes in a layer is unrelated to the number of inputs to the layer.

      LSTM is needed to process the sequence of inputs (words generated so far).

      • Avatar
        Narayan Iyer September 22, 2019 at 7:24 am #

        So the embeddings for the prefix generated so far is fed all at once only to the first LSTM unit? Which means the input size for the first LSTM is |vocab-size| x |Dimensionality of embedding space|.

        • Avatar
          Jason Brownlee September 22, 2019 at 9:38 am #

          The LSTM receives one word at a time, yes a sequence of n words where each word is a m sized vector.

          • Avatar
            Narayan Iyer October 12, 2019 at 2:57 pm #

            And the caption prefix is used as input only to the first LSTM cell?

          • Avatar
            Jason Brownlee October 13, 2019 at 8:25 am #


  231. Avatar
    Vivek October 4, 2019 at 10:59 pm #

    Thanks for the great tutorial!

  232. Avatar
    Phyu Phyu Khaing October 22, 2019 at 3:30 pm #

    Thanks for your great tutorial.

    Now, I am learning about image captioning.
    When I train the model, I got less accuracy for more epochs.
    That is right?
    If you have any attention example, let me know.

    Best Regards

    • Avatar
      Jason Brownlee October 23, 2019 at 6:31 am #

      Yes, you may need to use early stopping.

      • Avatar
        Phyu Phyu Khaing October 23, 2019 at 2:00 pm #

        Thanks a lot, Sir,

      • Avatar
        Phyu Phyu Khaing October 23, 2019 at 2:05 pm #

        When I train the model, I got four-loss files. When I test with that loss files, I got the following results.

        For model-ep001-loss4.544-val_loss4.103.h5 file, the results are:

        BLEU-1: 0.523776
        BLEU-2: 0.283197
        BLEU-3: 0.197006
        BLEU-4: 0.094024

        For model-ep002-loss3.907-val_loss3.933.h5 file, the results are:

        BLEU-1: 0.457893
        BLEU-2: 0.246748
        BLEU-3: 0.172982
        BLEU-4: 0.080477

        For model-ep003-loss3.721-val_loss3.874.h5 file, the results are:

        BLEU-1: 0.550797
        BLEU-2: 0.302675
        BLEU-3: 0.206961
        BLEU-4: 0.094808

        For model-ep005-loss3.577-val_loss3.874.h5 file, the results are:

        BLEU-1: 0.475467
        BLEU-2: 0.245151
        BLEU-3: 0.164806
        BLEU-4: 0.072125

        I don’t get your results.
        My results are less.
        What can be my faults?

        I also have some doubts.
        My previous understanding is that the less the loss, the better the performance.
        But I don’t get like that.
        Please explain to me.

        Best Regards,

        • Avatar
          miaoyi October 23, 2019 at 2:39 pm #

          I got the same model as above ‘model-ep005-loss3.577-val_loss3.874.h5 file’
          My results are less too.

        • Avatar
          Jason Brownlee October 24, 2019 at 5:33 am #

          Perhaps try fitting the model a few times?

          Loss is a good guide, but in this case we are selecting a model based on BLEU on the hold out dataset.

          • Avatar
            miaoyi October 24, 2019 at 1:42 pm #

            thanks a lot

  233. Avatar
    Kanaan October 29, 2019 at 6:06 am #

    Dear Dr.Jason,
    I have an error , I hope you can help me to fix it :

    PermissionError : [Errno 13] Permission Denied: Flickr8k_Dataset/Flicker8k_Dataset’

    • Avatar
      Jason Brownlee October 29, 2019 at 1:45 pm #

      Yes, I provide new links for the dataset in the tutorial. See the section titled “UPDATE (Feb/2019)”

      • Avatar
        Kanaan October 29, 2019 at 7:58 pm #

        I already used the alternate download link, both Flickr8k_Dataset and Flickr8k_text already downloaded but still showing the same error :
        PermissionError : [Errno 13] Permission Denied: Flickr8k_Dataset/Flicker8k_Dataset’

        and features.pkl file can not be created !

        • Avatar
          Jason Brownlee October 30, 2019 at 6:03 am #

          That is a strange error, I though it was your web browser.

          What is reporting that error exactly? Python?
          If Python could not find the file, it would say “not found”, not “permission denied”.

  234. Avatar
    Kanaan October 30, 2019 at 4:24 am #

    Dear Dr.Jason,

    I have fixed my error, I hope you fix it the given code :
    in the line no. 40 directory = ‘Flickr8k_Dataset’ should be modified to directory = ‘Flickr8k_Dataset/Flicker8k_Dataset’

    • Avatar
      Jason Brownlee October 30, 2019 at 6:06 am #

      Thanks, but that sounds specific to the way you have unzipped the dataset and placed it in your code directory.

  235. Avatar
    Kanaan October 30, 2019 at 7:52 am #

    yes of course, sometime busy minds do such a mistake !

  236. Avatar
    Kanaan October 30, 2019 at 8:13 am #

    Dear Jason,
    once agian I faced this error:

    Total params: 5,527,963
    Trainable params: 5,527,963
    Non-trainable params: 0
    Traceback (most recent call last):
    File “test4.py”, line 181, in
    model = define_model(vocab_size, max_length)
    File “test4.py”, line 138, in define_model
    plot_model(model, to_file=’model.png’, show_shapes=True)
    File “C:\Users\Excellence\Anaconda3\lib\site-packages\keras\utils\vis_utils.py”, line 240, in plot_model
    expand_nested, dpi)
    File “C:\Users\Excellence\Anaconda3\lib\site-packages\keras\utils\vis_utils.py”, line 79, in model_to_dot
    File “C:\Users\Excellence\Anaconda3\lib\site-packages\keras\utils\vis_utils.py”, line 22, in _check_pydot
    ‘Failed to import pydot. ‘
    ImportError: Failed to import pydot. Please install pydot. For example with pip install pydot.

    I can install pydot but where I have to put it?

    • Avatar
      Jason Brownlee October 30, 2019 at 1:56 pm #

      You can comment out the call to “plot_model()”

  237. Avatar
    Kanaan October 30, 2019 at 9:31 pm #

    Dear Jason,
    for time saving, instead of ‘epochs = 20’ if i make it ‘epochs = 5’ what will be the problem?

    • Avatar
      Jason Brownlee October 31, 2019 at 5:28 am #

      No problem. Perhaps test it?

      • Avatar
        Kanaan October 31, 2019 at 5:33 pm #

        I don’t know why it is breaking in epoch 4 and it is not continuing :

        Total params: 5,527,963
        Trainable params: 5,527,963
        Non-trainable params: 0
        C:\Users\Excellence\Anaconda3\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
        “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
        Epoch 1/1
        6000/6000 [==============================] – 1953s 325ms/step – loss: 4.6870
        Epoch 1/1
        6000/6000 [==============================] – 1902s 317ms/step – loss: 3.8931
        Epoch 1/1
        6000/6000 [==============================] – 2424s 404ms/step – loss: 3.6361
        Epoch 1/1
        4834/6000 [=======================>……] – ETA: 10:46 – loss: 3.4892
        (base) C:\Users\Excellence>

        • Avatar
          Jason Brownlee November 1, 2019 at 5:26 am #

          Perhaps you are running out of memory?

          Try running on EC2 with more RAM?
          Try fitting on less data?

  238. Avatar
    ishritam October 31, 2019 at 5:12 am #

    Hi Jason,
    Yet another great explanation. I am about to start this project. I would love to see you implement attention mechanism for this.
    In case if I have missed your blog on applying attention mechanism in Image Captioning, please share me at E-mail: ishritam.ml@gmail.com

    Thank you:)

  239. Avatar
    Hany November 1, 2019 at 2:27 am #

    Hi Jason,

    Thanks for another great tutorial.
    I tried your code and get one error, which is:

    ValueError: Error when checking input: expected input_10 to have shape (4096,) but got array with shape (1000,) #and sometimes I get input_1

    So I changed inputs1 shape in define_model() to 1000 instead of 4096, then the code worked fine.

    Here is the losses values I reached model-ep004-loss3.402-val_loss3.741
    and here are my Blue scores:
    BLEU-1: 0.528189
    BLEU-2: 0.286299
    BLEU-3: 0.198691
    BLEU-4: 0.093021

    which is obviously lower than yours.

    The problem is now the captioning is very poor, here are some examples:

    – “young girl in blue and blue and blue shirt is jumping into pool”, there is neither a girl nor a pool in the image. By the way, this image is from the dataset.
    – “man in red shirt is jumping into the water”, this was an image for just a beach with no people.
    – “dog is running through the grass”, this was the dog example photo.

    Would you please advise on how to increase the accuracy of the generated captions? Thank you

    P.S.: I tried refitting the model but got the same loss value for the same number of epochs.

  240. Avatar
    Arpit Jain November 14, 2019 at 7:00 am #

    Hi Jason,

    Thanks for the blog. Its wonderful.

    I was following it and got stuck at the fit_generator() part.


    # train the model, run epochs manually and save after each epoch
    epochs = 20
    steps = len(train_descriptions)
    for i in range(epochs):
    # create the data generator
    generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
    # fit for one epoch
    model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
    # save model
    model.save(‘model_’ + str(i) + ‘.h5’)


    This part of the code throws me the following error.

    could not broadcast input array from shape (48,4096) into shape (48)

    The shape (48,4096) here is of image_input.

    I do not understand why is it behaving like this.

    Please help me.

    • Avatar
      Jason Brownlee November 14, 2019 at 8:08 am #

      Sorry to hear that, I have some suggestions here:

      • Avatar
        Arpit Jain November 16, 2019 at 6:05 pm #

        Thanks for the response.

        I tried the code on Google Colab with the usual approach and not the progressive loading, and it runs perfectly.

        • Avatar
          Jason Brownlee November 17, 2019 at 7:13 am #

          Well done!

        • Avatar
          Jay Trivedi November 21, 2020 at 6:48 am #

          I tried with the regular approach but It doesn’t work due to full ram consumption. Runtime automatically restarts after RAM is full

          • Avatar
            Jason Brownlee November 21, 2020 at 7:38 am #

            Try the progressive loading example above.

    • Avatar
      abdo helmy February 7, 2020 at 8:12 pm #

      i know it’s a bit late ,but i had the same problem and was using tensorflow.keras
      when i used the keras library it worked fine
      so i’m pretty sure the problem is in one of the transformation works differently on your keras version than the one used it the tutorial

      • Avatar
        Jason Brownlee February 8, 2020 at 7:07 am #

        Great tip!

        Yes, use standalone Keras, not tf.keras.

    • Avatar
      Kevin Jivani June 27, 2020 at 4:23 pm #

      First of all Jason, this blog has the best demonstration of image caption generation I have got on the internet. And after reading your other posts this blog has been the first stop for anything I want to learn in deep learning or machine learning. I want to heartily thank you for such amazing work which is of great help for undergraduate students like me.

      I know it is very late but it is for others who may face this problem in future.

      If you want to use tf.keras with progressive loading here is the trick that worked for me:

      in function data_generator(): change

      yield [[in_img, in_seq], out_word]


      yield [in_img, in_seq], out_word

      • Avatar
        Jason Brownlee June 28, 2020 at 5:42 am #

        Thanks for your kind words and for the sharing this tip Kevin!

  241. Avatar
    Jay November 23, 2019 at 6:39 am #

    I was trying get my hands dirty on the code. My extracted features are in the form [1, 2560, 8] to use this in the first encoder what should I do?
    When I put inputs1 = Input(shape=(2560, 8, )), I get an error
    ValueError: Error when checking target: expected dense_28 to have 3 dimensions, but got array with shape (47, 7579)

  242. Avatar
    Natalie Jones November 26, 2019 at 10:31 am #

    Great tutorial. I am having trouble when loading the photo features.

    features = {k: all_features[k] for k in dataset}

    gives me this error:
    KeyError: ‘2657663775_bc98bf67ac’

    Do you have any suggestions as to why this is? Thanks.

  243. Avatar
    Maya December 9, 2019 at 8:26 am #

    Thank you so much for this amazing tutorial.. I’m facing a problem when I’m trying to run this line:

    model = define_model(vocab_size, max_length)

    it gives me the following error:

    TypeError: Error converting shape to a TensorShape: int() argument must be a string, a bytes-like object or a number, not ‘function’.

  244. Avatar
    Harshit Parikh December 13, 2019 at 8:24 pm #

    Hello Sir, I am getting the accuracy of around 35% even after training the model for 20 epochs and there isn’t any sudden change after training another 5 epochs. Do you recommend any changes or maybe what might be the error? The BLEU scores you mentioned below are coming almost the same what you got.

    Please let me know if any changes are recommended from your side. Thank you.

    • Avatar
      Jason Brownlee December 14, 2019 at 6:16 am #

      Accuracy is a bad measure on this dataset. Focus on the BLEU scores.

  245. Avatar
    Shyam Yadav December 14, 2019 at 5:07 am #

    Hello Sir,

    Thank you for posting this code and explaining it such easily. I had one doubt that I had trained this model for 20 epochs, and I just got an accuracy of around 35%. After that, I trained the model for another 5 epochs, but there weren’t any considerable changes in the accuracy of the model.

    What changes do you recommend in the model?

    Thank you

    • Avatar
      Jason Brownlee December 14, 2019 at 6:26 am #

      Do not use accuracy for photo classification, instead use BELU.

      • Avatar
        Shyam Yadav December 15, 2019 at 4:29 am #

        How would using BLEU instead of accuracy differ and what are its advantages? What is a good BLEU score?

  246. Avatar
    Shyam Yadav December 16, 2019 at 4:37 pm #

    And if we wantto improve thr bleu score what changes do you recommend?

    • Avatar
      Jason Brownlee December 17, 2019 at 6:29 am #

      Some of the suggestions here will help:

      • Avatar
        Shyam Yadav December 17, 2019 at 6:13 pm #

        There isn’t anything in here to help me improve the BLEU-4 score in this code.

        • Avatar
          Jason Brownlee December 18, 2019 at 6:01 am #

          I disagree, there are tutorials on diagnosing learning dynamics with learning curves, then tutorials on fixing each issue, such as regularization for overfitting and ensembles for better prediction.

          • Avatar
            Shyam Yadav December 19, 2019 at 5:00 am #

            But what about just improving the BLEU score explicitly?

          • Avatar
            Jason Brownlee December 19, 2019 at 6:34 am #

            BLEU is improved by improving the model.

  247. Avatar
    Rohit Halder December 18, 2019 at 5:46 pm #

    Sir, I was following your steps, the test results seemed to be horrible. Predominantly, it is generating a particular type of sentence, whenever a human is detected. Or a different sentence whenever a dog is detected. Which model should be used instead of VGG16 to improve the performance?
    And I played around with the structure of the model, and no of epochs, still there were no noticeable changes. What are your suggestions?
    Thank you.

    • Avatar
      Jason Brownlee December 19, 2019 at 6:23 am #

      Perhaps try training the model again?
      Perhaps use a different final model?
      Perhaps try adding regularization to reduce overfitting?

  248. Avatar
    Shyam Yadav December 19, 2019 at 8:02 pm #

    What do you suggest doing to improve the model?

  249. Avatar
    Phelan December 21, 2019 at 2:23 pm #

    Hi Jason,

    Thank for your great explanation.
    I really love your articles.

    I have a question:
    total Vocabulary Size: 8,763 in token file
    Why we not use this vocabulary for training instead of Vocabulary on train dataset (Vocabulary Size: 7,579) ?

    IMO, It will cover more words which not appear on training dataset

    • Avatar
      Jason Brownlee December 22, 2019 at 6:07 am #


      It is good practice to use the vocab in the training set only and pretend we don’t have access to the test set until after the model is prepared.

      The reason for this is to develop a robust and independent estimate of the model performance when making predictions on new data – unseen during training.

      • Avatar
        Phelan December 23, 2019 at 12:39 am #

        I see your point, thank you

  250. Avatar
    Shyam Yadav January 4, 2020 at 2:28 am #

    Tune Model. The configuration of the model was not tuned on the problem. Explore alternate configurations and see if you can achieve better performance.

    How do we tune the model?

    • Avatar
      Jason Brownlee January 4, 2020 at 8:37 am #

      Change something, like the learning rate, run an experiment and summarize the performance. If it is better keep it. Repeat with other configration.

  251. Avatar
    Mehul Kohli January 5, 2020 at 4:08 am #

    Hi Jason!

    Nice article, and very well explained. I just had a doubt regarding the keras tokenizer used. In the code, the tokenizer is fit on the train_descriptions, and when the create_sequences is called for the test data, the same tokenizer which was fit on the train data is used. I had a doubt that how does the tokenizer work with the data which it was not fitted on? I’m kinda new to this, so I’m not really sure about that. Could you please briefly explain this?


    • Avatar
      Jason Brownlee January 5, 2020 at 7:09 am #


      The tokenizer is fit on the training data and is then used to prepare data as input to the model. If it is used on test data that has words not seen during training, those words are marked as 0 (unknown).

      • Avatar
        Mehul Kohli January 5, 2020 at 4:28 pm #

        Oh I get it now! Thanks.

        And one more query: why weren’t the training sequences normalized here? Like what if we just normalize by dividing it by the vocab size? I’m training the model at the moment without normalizing them. Just curious what would happen if we used normalized sequence data. I’m guessing it should atleast result in faster convergence of the model, if not significantly improve results.

        • Avatar
          Jason Brownlee January 6, 2020 at 7:09 am #

          If by normalize you mean scale to the range 0-1, then this is not needed. Words are encoded as integer values, then mapped to word vectors.

          • Avatar
            Mehul Kohli January 6, 2020 at 5:35 pm #

            Okay, thank you so much.

  252. Avatar
    Samriddhi January 5, 2020 at 12:45 pm #

    Hi Jason.
    I cannot find the features.pkl file. Getting the error No such file or directory: ‘features.pkl’. Please guide me.

    • Avatar
      Jason Brownlee January 6, 2020 at 7:06 am #

      You must create and save that file as an earlier step in the tutorial.

  253. Avatar
    Samriddhi January 5, 2020 at 1:20 pm #

    Hi Jason
    What does this part of the code do? Is it one hot encoding being done here-

    • Avatar
      Jason Brownlee January 6, 2020 at 7:08 am #

      Encodes words as integers, then defines the input and output sequences for a given sample.

  254. Avatar
    Sayak Paul January 23, 2020 at 6:03 pm #

    Has anyone got the same captions no matter what the supplied image is? My BLEU score is not that bad:

    BLEU-1: 0.555407
    BLEU-2: 0.284509
    BLEU-3: 0.181447
    BLEU-4: 0.076735

    For feature extraction I used the following method:

    • Avatar
      Jason Brownlee January 24, 2020 at 7:44 am #

      Perhaps try re-fitting your model?

    • Avatar
      Narendhiran April 15, 2020 at 11:22 pm #

      I’m also stuck with same problem.

  255. Avatar
    Phelan January 30, 2020 at 5:49 pm #

    Hi Jason,

    I tried that model with MDI dataset.
    But result very bad.
    I checked and have found some points:
    – vocabulary size: huge more than 21000 => more difficult.
    – many name such as character name in movies or place, … such as Harry Potter, …
    – there are many frame same with 1 caption (now I use single frame for 1 caption and ignore other which same caption)

    Do you have any suggestion for this situation?

    Many thanks!

    • Avatar
      Jason Brownlee January 31, 2020 at 7:40 am #

      I’m not familiar with that dataset.

      Try a range models.
      Try aggressively reducing the vocab.
      Try tuning the model.
      Try transfer learning.

  256. Avatar
    Vinayak Nayak February 19, 2020 at 11:44 pm #

    Hi Jason!

    Thanks for this elaborate post. It is really informative and details every small step which I found quite useful in building an Image Caption Generator in PyTorch.

    I used an architecture similar to yours and also used 200 dimensional Glove Vectors as word embeddings. However, after training the model, I can observe that loss is going down yet the caption predicted for every image is the same.

    I ran my project in google colab and here’s the notebook link for the same:


    I would be highly grateful to you if you could look at it once and let me know where I’m going wrong.

    I have subscribed to your mailing list and I love the tips you’ve given in the pdf for ml_performance_improvement_cheatsheet.
    Thanks for your help!:)

  257. Avatar
    Vinayak Nayak February 20, 2020 at 12:37 pm #

    Thanks Jason.

  258. Avatar
    Anan February 20, 2020 at 1:22 pm #

    The model is showing bad results even after 100 epochs.Any suggestions?

    • Avatar
      Jason Brownlee February 21, 2020 at 8:16 am #

      Perhaps re-run and stop after just a few epochs?

      • Avatar
        Anan February 21, 2020 at 5:18 pm #

        Tried that too but the loss rate gets increased after some epochs , the loss rate isn’t going below 3.

  259. Avatar
    Sakib Hossain February 26, 2020 at 8:36 pm #

    ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
    While downloading the VGG model

    • Avatar
      Jason Brownlee February 27, 2020 at 5:44 am #

      Sounds like you are having internet issues.

      Perhaps try again?
      Perhaps try from a different internet connection?
      Perhaps try from a diffrent computer?

  260. Avatar
    Ali Raza February 28, 2020 at 12:20 am #

    Hi. I am using above code for medical image captioning. but unfortunately i got the same caption for all images i give to the model. loss value on my dataset is 0,75 so i don’t think so its overfittting issue or my model would not be trained very well. can you please help me out in this matter.?


  261. Avatar
    Shyam Yadav February 28, 2020 at 3:27 am #

    What are the real world applications of this device?

  262. Avatar
    Sakib Hossain March 9, 2020 at 3:26 pm #

    Traceback (most recent call last):
    File “E:/MS Final/features.pkl”, line 41, in
    features = extract_features(directory)
    File “E:/MS Final/features.pkl”, line 23, in extract_features
    image = load_img(filename, target_size=(224, 224))
    File “C:\Users\hso\AppData\Local\Programs\Python\Python36\lib\site-packages\keras_preprocessing\image\utils.py”, line 110, in load_img
    img = pil_image.open(path)
    File “C:\Users\hso\AppData\Local\Programs\Python\Python36\lib\site-packages\PIL\Image.py”, line 2809, in open
    fp = builtins.open(filename, “rb”)
    PermissionError: [Errno 13] Permission denied: ‘Flickr8k_Dataset/Flicker8k_Dataset’

    Please help what is the problem ?

    • Avatar
      Jason Brownlee March 10, 2020 at 5:36 am #

      Looks like you don’t have permission to open the files.

      Perhaps change the permission?

  263. Avatar
    Sakib Hossain March 10, 2020 at 5:56 pm #

    ============= RESTART: E:/Project Study/Fitted Model – Final.py =============
    Using TensorFlow backend.
    Dataset: 6000
    Descriptions: train=6000
    Photos: train=6000
    Vocabulary Size: 7579
    Description Length: 34
    Traceback (most recent call last):
    File “E:/Project Study/Fitted Model – Final.py”, line 154, in
    X1train, X2train, ytrain = create_sequences(tokenizer, max_length, train_descriptions, train_features, vocab_size)
    File “E:/Project Study/Fitted Model – Final.py”, line 109, in create_sequences
    return array(X1), array(X2), array(y)
    MemoryError: Unable to allocate 8.65 GiB for an array with shape (306404, 7579) and data type float32

    Can you please help ?

    • Avatar
      Jason Brownlee March 11, 2020 at 5:21 am #

      Perhaps try the progressive loading version listed above.

  264. Avatar
    Dharmi March 15, 2020 at 2:45 am #

    During data preprocessing,i am getting an error while using the directory variable and filename. What has to be done over there?

    • Avatar
      Jason Brownlee March 15, 2020 at 6:20 am #

      Ensure you have the code and data in the same directory and run code from the command line.

  265. Avatar
    Jerry March 17, 2020 at 2:28 am #

    Hi Jason,
    Thank you so much for your super super super helpful post. I appreciate your effort!

    I got my best training result is ep005-loss3.515-val_loss3.829.

    I tried to use the trained model to generate a caption for a picture, in which a guy with a naked upper body is doing a chest push in the gym. The generated caption is “startseq two children are playing in the snow endseq”.

    Do you have any idea how to improve the model?Thank you.

    • Avatar
      Jason Brownlee March 17, 2020 at 8:20 am #

      You’re welcome!

      Nice work.

      Perhaps try using a different final model?
      Perhaps try training again?
      Perhaps try tuning the model?

      Ideas here:

      • Avatar
        Jerry March 19, 2020 at 11:48 am #

        Hi Jason,

        Thank you for your feedback!

        May I ask one more question? In my mind, usually, what we need for developing and evaluating a model is a training dataset and a testing dataset. So what is the purpose of the “Flickr_8k.devImages.txt” dataset?

        And I find that the size of the “Flickr_8k.devImages.txt” is equal to the size of “Flickr_8k.testImages.txt”. Is this quantitative equivalent a coincidence or a necessity?

        Thank you!

  266. Avatar
    Nafees Dipta March 18, 2020 at 11:34 am #

    Hello Jason,
    Thank for your post. What if each image has multiple descriptions in csv file like flicker30k dataset?

    • Avatar
      Jason Brownlee March 18, 2020 at 1:09 pm #

      You’re welcome.

      You can train the model using each description for the same image input.

    • Avatar
      Jerry March 19, 2020 at 11:58 am #

      Hi Nafees,

      Can you manage to train the model using the Flickr30k dataset? I tried it and got a “killed: 9” output, which means the training is too massive for the memory of my laptop. If you are able to come over that, could you please share how you did it?


  267. Avatar
    Ali March 28, 2020 at 6:36 pm #

    Dear Jason
    I am running a visual question answering task. It is seems similar to the image captioning problem. Take as input : image features (which I have saved in a h5py file) and question tokens (which I have pickled) and outputs are the answers (the whole answer is considered a target , so 3129 answers –one word or more – and 3129 labels in my case)
    I am using the Keras sequence utility to create the generator.
    I am getting a dimension error in the output layer when the model tries to start training.
    I have copied my getitem function in the generator and also a sample of my model.
    Would you please have a look my code and help figure out the problem?
    Best wishes

    Epoch 1/1
    Traceback (most recent call last):
    File “”, line 32, in
    File “C:\python\envs\tf2-keras\lib\site-packages\keras\legacy\interfaces.py”, line 91, in wrapper
    return func(*args, **kwargs)
    File “C:\python\envs\tf2-keras\lib\site-packages\keras\engine\training.py”, line 1732, in fit_generator
    File “C:\python\envs\tf2-keras\lib\site-packages\keras\engine\training_generator.py”, line 220, in fit_generator
    File “C:\python\envs\tf2-keras\lib\site-packages\keras\engine\training.py”, line 1508, in train_on_batch
    File “C:\python\envs\tf2-keras\lib\site-packages\keras\engine\training.py”, line 621, in _standardize_user_data
    File “C:\python\envs\tf2-keras\lib\site-packages\keras\engine\training_utils.py”, line 145, in standardize_input_data

    ValueError: Error when checking target: expected output to have shape (3129,) but got array with shape (1,)

    # this is the getitem function
    The __getitem__ of my generator look like this:
    def __getitem__(self, index):
    ‘Generate one batch of data’

    imfeatures = np.empty((self.batch_size,2048))
    question_tokens = np.empty((self.batch_size,14))
    answers = np.empty((self.batch_size,3129))

    # Generate indexes of the batch
    indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
    # self.T.append(indexes)
    list_IDs_temp = [self.list_IDs[k] for k in indexes]

    # Generate data
    for i,k in enumerate(list_IDs_temp):
    temp =self.Features[‘image_features’][k]

    return [imfeatures,question_tokens],answers

    #And this is what my model looks like:

    ImInput = Input(shape=(2048,),name=’image_input’)
    QInput = Input(shape=(14,),name=’question’)

    # some dense layers and dropouts

    #Then the layers are merged

    M =Multiply()[ImInput,QInput]
    #Some dense layers and dropouts


    model = Model([ImInput,X],output)
    model.compile(optimizer=’RMSprop’,loss=’categorical_crossentropy’,metrics = [‘accuracy’])

    verbose =1,

  268. Avatar
    Manju Patil March 28, 2020 at 9:35 pm #

    In Train With Progressive Loading
    I get the following error. Please Help

    Dataset: 6000
    Descriptions: train=6000
    Photos: train=6000
    Vocabulary Size: 7579
    Description Length: 34
    Model: “model_4”
    Layer (type) Output Shape Param # Connected to
    input_8 (InputLayer) (None, 34) 0
    input_7 (InputLayer) (None, 4096) 0
    embedding_4 (Embedding) (None, 34, 256) 1940224 input_8[0][0]
    dropout_7 (Dropout) (None, 4096) 0 input_7[0][0]
    dropout_8 (Dropout) (None, 34, 256) 0 embedding_4[0][0]
    dense_10 (Dense) (None, 256) 1048832 dropout_7[0][0]
    lstm_4 (LSTM) (None, 256) 525312 dropout_8[0][0]
    add_4 (Add) (None, 256) 0 dense_10[0][0]
    dense_11 (Dense) (None, 256) 65792 add_4[0][0]
    dense_12 (Dense) (None, 7579) 1947803 dense_11[0][0]
    Total params: 5,527,963
    Trainable params: 5,527,963
    Non-trainable params: 0
    “dot” with args [‘-Tps’, ‘C:\\Users\\MANJUP~1\\AppData\\Local\\Temp\\tmp1f9neqi1’] returned code: 1

    stdout, stderr:
    b”‘F:\\New’ is not recognized as an internal or external command,\r\noperable program or batch file.\r\n”

    AssertionError Traceback (most recent call last)
    161 # define the model
    –> 162 model = define_model(vocab_size, max_length)
    163 # train the model, run epochs manually and save after each epoch
    164 epochs = 20

    in define_model(vocab_size, max_length)
    128 # summarize model
    129 model.summary()
    –> 130 plot_model(model, to_file=’model.png’, show_shapes=True)
    131 return model

    F:\New folder\lib\site-packages\keras\utils\vis_utils.py in plot_model(model, to_file, show_shapes, show_layer_names, rankdir, expand_nested, dpi)
    238 “””
    239 dot = model_to_dot(model, show_shapes, show_layer_names, rankdir,
    –> 240 expand_nested, dpi)
    241 _, extension = os.path.splitext(to_file)
    242 if not extension:

    F:\New folder\lib\site-packages\keras\utils\vis_utils.py in model_to_dot(model, show_shapes, show_layer_names, rankdir, expand_nested, dpi, subgraph)
    77 from ..models import Sequential
    —> 79 _check_pydot()
    80 if subgraph:
    81 dot = pydot.Cluster(style=’dashed’, graph_name=model.name)

    F:\New folder\lib\site-packages\keras\utils\vis_utils.py in _check_pydot()
    26 # Attempt to create an image of a blank graph
    27 # to check the pydot/graphviz installation.
    —> 28 pydot.Dot.create(pydot.Dot())
    29 except OSError:
    30 raise OSError(

    F:\New folder\lib\site-packages\pydot.py in create(self, prog, format, encoding)
    1943 print(message)
    -> 1945 assert process.returncode == 0, process.returncode
    1947 return stdout_data

    AssertionError: 1

  269. Avatar
    Ramita Shrestha March 29, 2020 at 2:12 pm #

    Hello Jason,
    If we need to find accuracy from this then what to do? Is there any code ?

  270. Avatar
    Hesam March 31, 2020 at 2:41 am #

    This article is awesome Jason!
    here is what i got “dog is running through the water”

  271. Avatar
    Jamie April 6, 2020 at 8:55 am #

    Hi Jason,

    I’m wondering why the tokenizer is fit on the training, validation, and test data separately. I understand that we don’t want sequences from one set to leak into another, but wouldn’t we want the integer representation of each word to be the same across all sets, and then get the sequences from that representation?

    If the model is learning relationships between integers in sequences, and the integers have totally different meaning (and thus expected placement in sequences) between train and test, then wouldn’t that be bad?

    Would your answer change if word2vec was used since the location of the word in the vector space has meaning?

    Thank you for the great tutorial.


    • Avatar
      Jason Brownlee April 6, 2020 at 9:21 am #

      We create a single tokenizer from the training dataset.

      • Avatar
        Jamie April 8, 2020 at 5:31 pm #

        Sorry, I misunderstood. I thought it would be a problem if the dev and test sets have longer sequences or different words, but I guess that’s a part of it!

        When I try to train the model I get the error: “NotImplementedError: Cannot convert a symbolic Tensor (args_2:0) to a numpy array.”

        This is in TensorFlow 2.1 so I’m importing from tensorflow.keras and using fit instead of fit_generator. I tried directly copying your complete progressive loading example (and making those two changes) and still have the error.

        Thank you for your help.

  272. Avatar
    coder April 10, 2020 at 9:34 pm #

    # extract features from each photo in the directory
    def extract_features(directory):
    # load the model
    model = VGG16()
    # re-structure the model
    model = Model(inputs=model.inputs, outputs=model.layers[-1].output)
    # summarize
    # extract features from each photo
    features = dict()
    for name in listdir(directory):
    # load an image from file
    filename = directory + ‘/’ + name
    image = load_img(filename, target_size=(224, 224))
    # convert the image pixels to a numpy array
    image = img_to_array(image)
    # reshape data for the model
    image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
    # prepare the image for the VGG model
    image = preprocess_input(image)
    # get features
    feature = model.predict(image, verbose=0)
    # get image id
    image_id = name.split(‘.’)[0]
    # store feature
    features[image_id] = feature
    print(‘>%s’ % name)
    return features

    # extract features from all images
    directory = ‘C:\Flicker8k_Dataset’
    features = extract_features(directory)
    print(‘Extracted Features: %d’ % len(features))CA
    # save to file
    dump(features, open(‘features.pkl’, ‘wb’))

    ”’ When i run this code after loading all the images it’s showing this error”’

    in extract_features(directory)
    13 # load an image from file
    14 filename = directory + ‘/’ + name
    —> 15 image = load_img(filename, target_size=(224, 224))
    16 # convert the image pixels to a numpy array
    17 image = img_to_array(image)

    ~\.conda\envs\tensorflow\lib\site-packages\keras_preprocessing\image\utils.py in load_img(path, grayscale, color_mode, target_size, interpolation)
    108 raise ImportError(‘Could not import PIL.Image. ‘
    109 ‘The use of load_img requires PIL.’)
    –> 110 img = pil_image.open(path)
    111 if color_mode == ‘grayscale’:
    112 if img.mode != ‘L’:

    ~\.conda\envs\tensorflow\lib\site-packages\PIL\Image.py in open(fp, mode)
    2894 warnings.warn(message)
    2895 raise UnidentifiedImageError(
    -> 2896 “cannot identify image file %r” % (filename if filename else fp)
    2897 )

    UnidentifiedImageError: cannot identify image file ‘C:\\Flicker8k_Dataset/Flicker8k_Dataset – Shortcut.lnk’

    ”’Can you please help me”’

  273. Avatar
    Anand Menon April 20, 2020 at 7:39 am #

    Jason, Minor typo. The text for this line, “Running this example first loads the 6,000 photo identifiers in the test dataset”, should say “train dataset” instead of test dataset.

  274. Avatar
    Mohit joshi April 23, 2020 at 11:07 pm #

    This error is coming , help me if you can 🙂

    NotImplementedError: Cannot convert a symbolic Tensor (args_2:0) to a numpy array.

    model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
    12 # save model
    13 model.save(“models/model_” + str(i) + ‘.h5’)

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\util\deprecation.py in new_func(*args, **kwargs)
    322 ‘in a future version’ if date is None else (‘after %s’ % date),
    323 instructions)
    –> 324 return func(*args, **kwargs)
    325 return tf_decorator.make_decorator(
    326 func, new_func, ‘deprecated’,

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    1304 use_multiprocessing=use_multiprocessing,
    1305 shuffle=shuffle,
    -> 1306 initial_epoch=initial_epoch)
    1308 @deprecation.deprecated(

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    817 max_queue_size=max_queue_size,
    818 workers=workers,
    –> 819 use_multiprocessing=use_multiprocessing)
    821 def evaluate(self,

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    233 max_queue_size=max_queue_size,
    234 workers=workers,
    –> 235 use_multiprocessing=use_multiprocessing)
    237 total_samples = _get_total_number_of_samples(training_data_adapter)

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in _process_training_inputs(model, x, y, batch_size, epochs, sample_weights, class_weights, steps_per_epoch, validation_split, validation_data, validation_steps, shuffle, distribution_strategy, max_queue_size, workers, use_multiprocessing)
    591 max_queue_size=max_queue_size,
    592 workers=workers,
    –> 593 use_multiprocessing=use_multiprocessing)
    594 val_adapter = None
    595 if validation_data:

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in _process_inputs(model, mode, x, y, batch_size, epochs, sample_weights, class_weights, shuffle, steps, distribution_strategy, max_queue_size, workers, use_multiprocessing)
    704 max_queue_size=max_queue_size,
    705 workers=workers,
    –> 706 use_multiprocessing=use_multiprocessing)
    708 return adapter

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\data_adapter.py in __init__(self, x, y, sample_weights, standardize_function, workers, use_multiprocessing, max_queue_size, **kwargs)
    766 if standardize_function is not None:
    –> 767 dataset = standardize_function(dataset)
    769 if kwargs.get(“shuffle”, False) and self.get_size() is not None:

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in standardize_function(dataset)
    682 return x, y
    683 return x, y, sample_weights
    –> 684 return dataset.map(map_fn, num_parallel_calls=dataset_ops.AUTOTUNE)
    686 if mode == ModeKeys.PREDICT:

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\data\ops\dataset_ops.py in map(self, map_func, num_parallel_calls)
    1589 else:
    1590 return ParallelMapDataset(
    -> 1591 self, map_func, num_parallel_calls, preserve_cardinality=True)
    1593 def flat_map(self, map_func):

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\data\ops\dataset_ops.py in __init__(self, input_dataset, map_func, num_parallel_calls, use_inter_op_parallelism, preserve_cardinality, use_legacy_function)
    3924 self._transformation_name(),
    3925 dataset=input_dataset,
    -> 3926 use_legacy_function=use_legacy_function)
    3927 self._num_parallel_calls = ops.convert_to_tensor(
    3928 num_parallel_calls, dtype=dtypes.int32, name=”num_parallel_calls”)

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\data\ops\dataset_ops.py in __init__(self, func, transformation_name, dataset, input_classes, input_shapes, input_types, input_structure, add_to_graph, use_legacy_function, defun_kwargs)
    3145 with tracking.resource_tracker_scope(resource_tracker):
    3146 # TODO(b/141462134): Switch to using garbage collection.
    -> 3147 self._function = wrapper_fn._get_concrete_function_internal()
    3149 if add_to_graph:

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\eager\function.py in _get_concrete_function_internal(self, *args, **kwargs)
    2393 “””Bypasses error checking when getting a graph function.”””
    2394 graph_function = self._get_concrete_function_internal_garbage_collected(
    -> 2395 *args, **kwargs)
    2396 # We’re returning this concrete function to someone, and they may keep a
    2397 # reference to the FuncGraph without keeping a reference to the

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\eager\function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
    2387 args, kwargs = None, None
    2388 with self._lock:
    -> 2389 graph_function, _, _ = self._maybe_define_function(args, kwargs)
    2390 return graph_function

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\eager\function.py in _maybe_define_function(self, args, kwargs)
    2702 self._function_cache.missed.add(call_context_key)
    -> 2703 graph_function = self._create_graph_function(args, kwargs)
    2704 self._function_cache.primary[cache_key] = graph_function
    2705 return graph_function, args, kwargs

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\eager\function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
    2591 arg_names=arg_names,
    2592 override_flat_arg_shapes=override_flat_arg_shapes,
    -> 2593 capture_by_value=self._capture_by_value),
    2594 self._function_attributes,
    2595 # Tell the ConcreteFunction to clean up its graph once it goes out of

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\framework\func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
    976 converted_func)
    –> 978 func_outputs = python_func(*func_args, **func_kwargs)
    980 # invariant: func_outputs contains only Tensors, CompositeTensors,

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\data\ops\dataset_ops.py in wrapper_fn(*args)
    3138 attributes=defun_kwargs)
    3139 def wrapper_fn(*args): # pylint: disable=missing-docstring
    -> 3140 ret = _wrapper_helper(*args)
    3141 ret = structure.to_tensor_list(self._output_structure, ret)
    3142 return [ops.convert_to_tensor(t) for t in ret]

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\data\ops\dataset_ops.py in _wrapper_helper(*args)
    3080 nested_args = (nested_args,)
    -> 3082 ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
    3083 # If func returns a list of tensors, nest.flatten() and
    3084 # ops.convert_to_tensor() would conspire to attempt to stack

    ~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\autograph\impl\api.py in wrapper(*args, **kwargs)
    235 except Exception as e: # pylint:disable=broad-except
    236 if hasattr(e, ‘ag_error_metadata’):
    –> 237 raise e.ag_error_metadata.to_exception(e)
    238 else:
    239 raise

    NotImplementedError: in converted code:

    C:\Users\Mohit\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py:677 map_fn
    C:\Users\Mohit\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training.py:2410 _standardize_tensors
    C:\Users\Mohit\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_utils.py:513 standardize_input_data
    data = [np.asarray(d) for d in data]
    data = [np.asarray(d) for d in data]
    C:\Users\Mohit\Anaconda3\envs\tf-gpu\lib\site-packages\numpy\core\_asarray.py:85 asarray
    return array(a, dtype, copy=False, order=order)
    C:\Users\Mohit\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\framework\ops.py:728 __array__
    ” array.”.format(self.name))

    NotImplementedError: Cannot convert a symbolic Tensor (args_2:0) to a numpy array.

  275. Avatar
    Rishal April 27, 2020 at 8:09 am #

    Hi Jason,

    Thank you for this article. Amazing implementation and explanation of the entire process.

    I had a question regarding improving the BLEU score for this task:

    You said we should try out using different Pre-Trained Image models like InceptionV3 and also try to add regularization.

    Where exactly according to your expertise should these changes be implemented? Should we make changes in extract_features() or in define_model() or both to get better results?

    Also, is there a faster way to know if our new model will do better as these models take a couple of hours to run?

    Thank you

  276. Avatar
    Mohit joshi May 1, 2020 at 8:57 pm #

    hey Jason, just wanna ask why you implement this by VGG16 when there is more accurate and efficient pre-trained models are available like Inception V3 ResNet etc.. just wanna know your reasons behind choosing VGG16 model….

    • Avatar
      Jason Brownlee May 2, 2020 at 5:43 am #

      It is a simple model that is easy to understand and works well in may cases.

  277. Avatar
    Krish May 4, 2020 at 6:31 pm #

    Hey Jason, Thanks for creating such a wonderful article

    With standalone Keras the code runs fine

    As I am now using tensorflow 2.1 and while trying to run the show, facing the below error:

    NotImplementedError: Cannot convert a symbolic Tensor (args_2:0) to a numpy array.

    I have gone the below link which you have posted but it’s of no help


    Please have a look on it and in case required I can pass the code to you

    • Avatar
      Jason Brownlee May 5, 2020 at 6:21 am #

      You can run the example using Keras 2.3 on top of TensorFlow 2.1 directly. No need to change the code.

  278. Avatar
    Krish May 6, 2020 at 12:00 am #

    Hey Jason,

    I tried using Keras(2.3) from TensorFlow(2.1) and facing the below issue:

    NotImplementedError: Cannot convert a symbolic Tensor (args_2:0) to a numpy array.

  279. Avatar
    Pranshi Garg May 14, 2020 at 11:03 pm #

    PermissionError Traceback (most recent call last)
    1 directory =”D:\Flickr8k_Dataset”
    —-> 2 features = extract_features(directory)
    3 print(‘Extracted Features: %d’ % len(features))
    4 # save to file
    5 dump(features, open(r’features.pkl’, ‘rb’))

    in extract_features(directory)
    13 # load an image from file
    14 filename = directory + ‘/’ + name
    —> 15 image = load_img(filename, target_size=(224, 224))
    16 # convert the image pixels to a numpy array
    17 image = img_to_array(image)

    ~\anaconda3\lib\site-packages\keras_preprocessing\image\utils.py in load_img(path, grayscale, color_mode, target_size, interpolation)
    108 raise ImportError(‘Could not import PIL.Image. ‘
    109 ‘The use of load_img requires PIL.’)
    –> 110 img = pil_image.open(path)
    111 if color_mode == ‘grayscale’:
    112 if img.mode != ‘L’:

    ~\anaconda3\lib\site-packages\PIL\Image.py in open(fp, mode)
    2808 if filename:
    -> 2809 fp = builtins.open(filename, “rb”)
    2810 exclusive_fp = True

    PermissionError: [Errno 13] Permission denied: ‘D:\\Flickr8k_Dataset/Flicker8k_Dataset’

    Can you please help me?

    • Avatar
      Jason Brownlee May 15, 2020 at 6:02 am #

      Looks like you don’t have permission to access your own files on your own workstation!

      • Avatar
        Pranshi Garg May 16, 2020 at 8:16 am #

        how to change that?

        • Avatar
          Jason Brownlee May 16, 2020 at 10:13 am #

          It will be specific to your workstation.

          If you are not the admin of your workstation, perhaps contact the admin.

          Or, perhaps try downloading the data set again and save it in a different location on your workstation.

  280. Avatar
    Ali May 19, 2020 at 3:19 am #

    sir i used your code to develop desktop app.now i want to create android app.is it possible.if possible how?

    • Avatar
      Jason Brownlee May 19, 2020 at 6:10 am #

      I don’t know about creating android apps, I teach machine learning.

  281. Avatar
    Karan Aryan May 19, 2020 at 8:23 pm #

    Hi Jason, I am using Tensorflow 2.0 , and while running the code , I am receiving this error
    I used both the method but I am receiving error . Can you please help me out.

    1) with normal model.fit() method
    Epoch 1/20
    UnboundLocalError Traceback (most recent call last)
    in ()
    4 checkpoint = ModelCheckpoint(filepath, monitor=’val_loss’, verbose=1, save_best_only=True, mode=’min’)
    5 # fit model
    —-> 6 model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, batch_size = 3, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

    1 frames
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    857 logs = tmp_logs # No error, now safe to assign to logs.
    858 callbacks.on_train_batch_end(step, logs)
    –> 859 epoch_logs = copy.copy(logs)
    861 # Run validation.

    UnboundLocalError: local variable ‘logs’ referenced before assignment

    2) and while running the code of Progressive Loading method. I am receiving this error

    WARNING:tensorflow:Model was constructed with shape (None, 2048) for input Tensor(“input_11:0”, shape=(None, 2048), dtype=float32), but it was called on an input with incompatible shape (None, None, None).
    ValueError Traceback (most recent call last)
    in ()
    1 for i in range(epochs):
    2 generator = data_generator(train_descriptions, train_features, wordtoix, max_length, number_pics_per_batch)
    —-> 3 model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
    4 model.save(‘./model_weights/model_’ + str(i) + ‘.h5’)

    12 frames
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
    966 except Exception as e: # pylint:disable=broad-except
    967 if hasattr(e, “ag_error_metadata”):
    –> 968 raise e.ag_error_metadata.to_exception(e)
    969 else:
    970 raise

    ValueError: in user code:

    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:571 train_function *
    outputs = self.distribute_strategy.run(
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:951 run **
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2290 call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2649 _call_for_each_replica
    return fn(*args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:541 train_step **
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:1807 _minimize
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:521 _aggregate_gradients
    filtered_grads_and_vars = _filter_grads(grads_and_vars)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:1219 _filter_grads
    ([v.name for _, v in grads_and_vars],))

    ValueError: No gradients provided for any variable: [‘dense_15/kernel:0’, ‘dense_15/bias:0’, ‘lstm_5/lstm_cell_5/kernel:0’, ‘lstm_5/lstm_cell_5/recurrent_kernel:0’, ‘lstm_5/lstm_cell_5/bias:0’, ‘dense_16/kernel:0’, ‘dense_16/bias:0’, ‘dense_17/kernel:0’, ‘dense_17/bias:0’].

    I’ll list out the main value error here
    a) Normal model.fit method()
    – UnboundLocalError: local variable ‘logs’ referenced before assignment

    b) Progressive Loading method
    – ValueError: No gradients provided for any variable: [‘dense_15/kernel:0’, ‘dense_15/bias:0’, ‘lstm_5/lstm_cell_5/kernel:0’, ‘lstm_5/lstm_cell_5/recurrent_kernel:0’, ‘lstm_5/lstm_cell_5/bias:0’, ‘dense_16/kernel:0’, ‘dense_16/bias:0’, ‘dense_17/kernel:0’, ‘dense_17/bias:0’].

    • Avatar
      Jason Brownlee May 20, 2020 at 6:24 am #

      I’m sorry to hear that, I have not seen this error. Perhaps this will help:

    • Avatar
      Werner June 10, 2020 at 12:09 pm #

      I got the same error. I also went through the checklist and have imported all packages, copied code correctly, etc. It only happens with the low_RAM version. With the original it didn’t, but I ran out of RAM.

      • Avatar
        Rishab July 17, 2020 at 3:30 am #

        Hey,i also got the same error,i want to train the model on my local machine but i get the same error,i am using tensorflow 2.2.0 and keras version 2.3.0,were you able to solve it?
        Also, i tried running it on colab,it seems to work fine there, colab uses tensorflow 2.2.0 and keras 2.3.1 version,i am scared to install keras again on my local machine so that it doesnt screw up the system…if you found the solution to this error please do let me know.

        • Avatar
          Saicharan August 6, 2020 at 4:57 pm #

          I am also getting the same error. I am using google colab to run the code. i got this error in google colab. Did u find the solution for the error?

          • Avatar
            Jason Brownlee August 7, 2020 at 6:23 am #

            Try running on your own workstation or on an AWS EC2 instance.

          • Avatar
            Aayush Jain August 24, 2020 at 11:02 pm #

            For anyone who is getting this error on google colab, I have a temporary fix for it. Simply downgrade the version of keras and tensorflow. Use pip for this.
            Run the following code:

            pip uninstall keras
            pip install keras == 2.3.1
            pip uninstall tensorflow
            pip install tensorflow == 2.2

            After running the above codes in different cells, simply restart your runtime and your error will be solved.

          • Avatar
            Jason Brownlee August 25, 2020 at 6:41 am #

            Thanks for sharing!

    • Avatar
      Abhinav Kumar June 24, 2020 at 12:59 am #

      I am also getting same error.

  282. Avatar
    Harshit Bargali May 31, 2020 at 2:50 am #

    I have used your code only in training and evaluating. But somehow I got this error while evaluating, and I am not sure why it happened.

    C:\Users\Harshit\Anaconda3\envs\ImageProcessing\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
    Traceback (most recent call last):

    File “”, line 1, in
    runfile(‘C:/Users/Harshit/Desktop/Flickr8k_dataset/Evaluate model.py’, wdir=’C:/Users/Harshit/Desktop/Flickr8k_dataset’)

    File “C:\Users\Harshit\Anaconda3\envs\ImageProcessing\lib\site-packages\spyder_kernels\customize\spydercustomize.py”, line 827, in runfile
    execfile(filename, namespace)

    File “C:\Users\Harshit\Anaconda3\envs\ImageProcessing\lib\site-packages\spyder_kernels\customize\spydercustomize.py”, line 110, in execfile
    exec(compile(f.read(), filename, ‘exec’), namespace)

    File “C:/Users/Harshit/Desktop/Flickr8k_dataset/Evaluate model.py”, line 164, in
    evaluate_model(model, test_descriptions, test_features, tokenizer, max_length)

    File “C:/Users/Harshit/Desktop/Flickr8k_dataset/Evaluate model.py”, line 119, in evaluate_model
    yhat = generate_desc(model, tokenizer, photos[key], max_length)

    File “C:/Users/Harshit/Desktop/Flickr8k_dataset/Evaluate model.py”, line 98, in generate_desc
    yhat = model.predict([photo,sequence], verbose=0)

    File “C:\Users\Harshit\Anaconda3\envs\ImageProcessing\lib\site-packages\keras\engine\training.py”, line 1441, in predict
    x, _, _ = self._standardize_user_data(x)

    File “C:\Users\Harshit\Anaconda3\envs\ImageProcessing\lib\site-packages\keras\engine\training.py”, line 579, in _standardize_user_data

    File “C:\Users\Harshit\Anaconda3\envs\ImageProcessing\lib\site-packages\keras\engine\training_utils.py”, line 145, in standardize_input_data

    ValueError: Error when checking input: expected input_6 to have shape (28,) but got array with shape (34,)

  283. Avatar
    Aniket Gupta June 9, 2020 at 12:44 am #

    I have trained the model for 20 epochs after which the accuracy is around 0.5 and validation set accuracy is 0.3.
    How will this be able to generate good captions?

    • Avatar
      Jason Brownlee June 9, 2020 at 6:05 am #

      You can ignore accuracy, it is a poor metric for this task. We instead use BLEU.

      • Avatar
        Aniket Gupta June 9, 2020 at 9:12 pm #

        Will we get better predictions if we use GloVe vectors as they cover a wider corpus and account for relations between words?

        • Avatar
          Jason Brownlee June 10, 2020 at 6:13 am #

          It depends on the specifics of the model and the dataset. Try it and see.

  284. Avatar
    LIN June 9, 2020 at 9:13 pm #

    hi, Jason
    I’m a bit confused. one picture in the data set corresponds to five sentences, which means that the label of each picture will choose one from these five sentences during the training?

    • Avatar
      Jason Brownlee June 10, 2020 at 6:14 am #

      We train it on one description or on each description. Give the model lots of ways to “think” about an image which may or may not be useful.

      • Avatar
        LIN June 10, 2020 at 12:58 pm #

        got it, thanks

  285. Avatar
    Videl June 12, 2020 at 5:22 am #

    Hi, Jason.
    That’s a great article and it helped me a lot.
    I just want to tell you that model.layers.pop() does not remove the layer from the model.
    The features that I extracted are 1000 dimensional, not 4096.
    (I do not know how it is working for you. Maybe it worked fine with older Keras.)



    • Avatar
      Jason Brownlee June 12, 2020 at 6:19 am #

      The tutorial uses the Keras API directly, it looks like you are trying to change it to use tf.keras.

  286. Avatar
    Phuc June 12, 2020 at 2:14 pm #

    Hi Jason,
    I am working a project of image captioning. I have read some papers of image captioning. However, I don’t which evalution they used. And I am confuse between corpus_bleu and sentence_bleu. Image captioning’s output is a sentence for an image. So I think we should calculate sentence_bleu for each image, then calculate the average, but I saw you use corpus_bleu.
    Can you tell me which evaluation is suitable? Which was used in the papers?

  287. Avatar
    Videl June 13, 2020 at 4:29 am #

    Hi again Jason,

    So I have trained the model on my own dataset (Bengali language). I got the best val loss as 2.99.
    Now the problem is, when I run the generate_desc function I get the following error:

    ValueError: Data cardinality is ambiguous:
    x sizes: 4096, 33
    Please provide data which shares the same first dimension.

    There was no such issue during training at all. What could possibly cause this?


    • Avatar
      Videl June 13, 2020 at 5:35 am #

      Sorry the error is gone. But the same caption is being generated for all the images.

    • Avatar
      Jason Brownlee June 13, 2020 at 6:11 am #

      ell done!

      Perhaps you will need to adapt the code, I cannot diagnose the error off the cuff.

    • Avatar
      Gaurav September 20, 2021 at 4:18 am #


      How did you fix the data cardinality issue? I have it too

      ValueError: Data cardinality is ambiguous:
      x sizes: 2048, 1
      Make sure all arrays contain the same number of samples.

      • Avatar
        Moda February 25, 2022 at 8:55 am #

        I reshaped the features vector, in my case the features len for every image is 4096, the feature vector which is passed to the “generate_desc” is of shape (4096, ) , which is ambiguous for the model bcs it does not seem that it is a one sample/photo features vector.

        The fix:
        yhat = model.predict([np.reshape(photo, (1, 4096)),sequence], verbose=0)

  288. Avatar
    Adam July 2, 2020 at 3:06 pm #

    Hey Jason, thank you for an amazing tutorial.

    I have one question though,

    I tried using the above given VGG16 model, the highest BLEU score I reached is 0.35
    And using Xception, I got 0.37

    How can I increase this?

  289. Avatar
    Dorsa July 5, 2020 at 6:47 am #

    Hi , this error is coming for me. can anyone help me plz ?

    FileNotFoundError: [WinError 3] The system cannot find the path specified: ‘Flickr8k_Dataset’

    • Avatar
      Jason Brownlee July 5, 2020 at 7:08 am #

      Ensure you download the dataset and run the code from the same directory as the unzipped dataset, e.g. from the command line.

  290. Avatar
    Jay July 7, 2020 at 4:41 am #

    Hey Jason, thanks for sharing such an awesome tutorial. I enjoyed reading it now soon will try to implement it.
    But just out of curiosity is there any tutorial which can generate images from captions i.e.(Text-to-Image synthesis)

  291. Avatar
    mohit July 28, 2020 at 9:39 pm #

    hey jason i m getting error
    KeyError Traceback (most recent call last)
    in ()
    142 print(‘Descriptions: train=%d’ % len(train_descriptions))
    143 # photo features
    –> 144 train_features = load_photo_features(‘features.pkl’, train)
    145 print(‘Photos: train=%d’ % len(train_features))
    146 # prepare tokenizer

    1 frames
    in (.0)
    64 all_features = load(open(filename, ‘rb’))
    65 # filter features
    —> 66 features = {k: all_features[k] for k in dataset}
    67 return features

    KeyError: ‘2855417531_521bf47b50’

  292. Avatar
    Manan Kumawat July 30, 2020 at 2:49 am #

    Hi jason,

    AWS ec2 instance don’t have sufficient memory to extract and install CUDA for tensorflow.

  293. Avatar
    Anubhav August 1, 2020 at 3:24 am #

    Hi Jason,

    Thanks for another great article. You have done it again. I’d not be far off saying that you are one of my inspirations on my journey to Machine Learning. I have followed your articles for quite a while, ranging from small queries to entire topics like this one.

    I was curious about how this model performs on Flickr30K dataset. Unfortunately using same hyper-parameters led to the model output just 1 sentence overall. The sentence being ” man in blue shirt is sitting on the ground with his arms crossed end end of the ground end end of the ground end end of the ground end end of the ground end end of the ground end end of the ground end end of the ground end end of the ground end end of the ground end end of the ground end end of the ground end end of the ground end”. (The max length for this dataset was 74).

    I have used the progressive loading using generators as instructed by your tutorial. Could you point out any possible reasons why this would be happening?

    PS: I haven’t run the model on Flickr8k to confirm that my implementation is correct or not.


    • Avatar
      Jason Brownlee August 1, 2020 at 6:15 am #


      Perhaps the model is overfit, try tuning the learning hyperparameters, or perhaps even a larger model.

  294. Avatar
    siddhant kandge August 5, 2020 at 1:56 am #

    I have copy pasted same code but with inception model. It giving me a mismatch error.
    Can you please slove my issue

  295. Avatar
    Ranganadh August 6, 2020 at 7:58 pm #

    I am getting this error please help me to resolve
    ValueError: could not broadcast input array from shape (47,1000) into shape (47)

  296. Avatar
    Ranganadh August 7, 2020 at 8:48 pm #

    I am getting this one please help

    ValueError: No gradients provided for any variable: [’embedding_8/embeddings:0′, ‘dense_24/kernel:0’, ‘dense_24/bias:0’, ‘lstm_8/lstm_cell_8/kernel:0’, ‘lstm_8/lstm_cell_8/recurrent_kernel:0’, ‘lstm_8/lstm_cell_8/bias:0’, ‘dense_25/kernel:0’, ‘dense_25/bias:0’, ‘dense_26/kernel:0’, ‘dense_26/bias:0’].

  297. Avatar
    Akash K August 18, 2020 at 3:11 am #

    I’m getting a strange error.

    While training X1train, X2train and ytrain using “create_sequences” function, the following error is popping up.

    Traceback (most recent call last):

    File “”, line 1, in
    X1train, X2train, ytrain = create_sequences(tokenizer, max_length, train_descriptions, train_features, vocab_size)

    File “”, line 21, in create_sequences
    return array(X1), array(X2), array(y)

    TypeError: array() argument 1 must be a unicode character, not list

    Could you please help me with this?

      • Avatar
        Akash K August 18, 2020 at 6:22 am #

        Are you able to confirm that your libraries are up to date (e.g, check version numbers)? – Yes, they’re up to date.

        Are you able to confirm that you copied all of the code exactly (preserving white space)? – Definitely

        Are you able to confirm that you saved any required data files in the same folder as the code? – Yes

        Have you tried running the code from the command line, not a notebook or an IDE? – Yes sir.

        Have you tried searching for a similar error in the comments or on StackOverflow? – No specific solution found on StackOverflow or Github either.

        Please Help!

        • Avatar
          Jason Brownlee August 18, 2020 at 1:25 pm #

          I found the issue and updated the tutorial.

          • Avatar
            Akash K August 18, 2020 at 2:01 pm #

            Hi Jason,

            Just curious about what the issue was. Do you mind sharing the details?

            BTW, thank you so much for such a beautiful piece of code.

          • Avatar
            Jason Brownlee August 19, 2020 at 5:54 am #

            The preparation of the VGG model required modification due to an API change. E.g. the pop() function on the layers no longer did anything.

  298. Avatar
    Ram August 19, 2020 at 2:10 am #

    When defining model with code

    model = define_model(vocab_size, max_length)

    Got error

    Traceback (most recent call last):

    File “”, line 1, in
    model = define_model(vocab_size, max_length)

    File “”, line 3, in define_model
    inputs1 = input(shape=[4096,])

    TypeError: raw_input() got an unexpected keyword argument ‘shape’

    Help with this please.

  299. Avatar
    Anisa August 25, 2020 at 4:44 pm #

    I am facing Name Error for the below code while fitting the model. Please help me resolve this.
    # define the captioning model
    def define_model(vocab_size, max_length):

    # define checkpoint callback
    filepath = ‘model-ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5′
    checkpoint = ModelCheckpoint(filepath, monitor=’val_loss’, verbose=1, save_best_only=True, mode=’min’)

    # fit model
    model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

    NameError Traceback (most recent call last)
    1 # fit model
    —-> 2 model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

    NameError: name ‘model’ is not defined

    Please help me fix this .

      • Avatar
        Anisa August 26, 2020 at 6:02 pm #

        Thanks Jason, I tried that.
        I’m sorry to say, but stuck with this now:

        MemoryError Traceback (most recent call last)
        152 print(‘Description Length: %d’ % max_length)
        153 # prepare sequences
        –> 154 X1train, X2train, ytrain = create_sequences(tokenizer, max_length, train_descriptions, train_features, vocab_size)
        156 # dev dataset

        in create_sequences(tokenizer, max_length, descriptions, photos, vocab_size)
        107 X2.append(in_seq)
        108 y.append(out_seq)
        –> 109 return array(X1), array(X2), array(y)
        111 # define the captioning model

        MemoryError: Unable to allocate 4.68 GiB for an array with shape (306404, 4096) and data type float32

        How can this be solved?

        • Avatar
          Jason Brownlee August 27, 2020 at 6:12 am #

          It looks like you ran out of memory.

          Try running on a machine with more memory.
          Try running on AWS EC2.
          Try using a smaller model or less data.

  300. Avatar
    JG August 29, 2020 at 6:35 pm #

    Hi Jason:

    Awesome tutorial! It’s an incredible achievement of Computer Vision and NLP techniques!
    Thank you Jason it’s very inspirational!

    I share my comments and results.

    1) Image caption HORIZON:
    – For when a tutorial that gives voice to the predicted captions of the images?
    – I guess these techniques can also be applied as alternative to multi-label image classification, isn’t it?
    – Also It could be useful to explicit the image pattern recognition obtained during the “solo” computer vision training, by labelling the image through captioning them, do you agree?

    – There are only 8,091 Images in image dataset folder but we got 8092 from text Image description: the missing image is: “2258277193_586949ec62.jpg”
    – I realise that you do not apply “stopwords” in cleaning text process. Why?
    – your manual data-generator does not admit to setup parallel process (workers/threads arguments) while fitting keras model (model.fit_generator)

    – I reduced the vocabulary length by filtering-out words that repeat a minimum number of times (e.g. 5 times). I got worse results using 10 times repetition but, a faster code.
    – I apply Glove pre-trained words library within Embedding layer training. The results are better.
    – I tried to apply Conv1D besides MaxPool1D layers following the embedding layers. But I get worse results. Probably because I can not use mask_zero=True, because Conv1D does not support it.
    – I realised there are other alternatives to word coding “texts_to_sequences” such as “texts_to_matrix”, or “one_hot” but as you do I do not apply either.
    – I have 16 GB RAM memory on my Mac but I can run the whole dataset training but, surprisingly the code was quicker using your data generator 11minutes/epoch vs 15minutes/epoch. I guess because it works around RAM limit.
    – I also applied validation data on training generation dataset. So I can apply directly the best_model from callbacks list (for progressive training).
    – In addition to VGG16 Model I applied “Inceptionv3” app. The results are better than VGG16.
    – I complete the code, including the output Inceptionv3 prediction within my code. So I can plot the images examples with X-axis = image caption prediction andY-axis = the Image Inceptionv3 Class prediction. So you can get a more complete vision of the results obtained using Computer vision alone plus Image Caption.
    – I select a lower learning rate for the epochs training, to get a lower losses.
    – Anyway, the dog image example I think it is a very simple case because, when I try other images examples, sometimes the image captions are very funny and you get a completely crazy caption !

    – I believe the most singular contribution of my code experiment was replacing the “add” layer (were the two models merge) vs “Multiply” or “Subtract” layers. I got the best results with “Subtract” layer (subtracting the outputs of the NLP model from Image Features model). I do not know why the model performs better with subtract layer.
    – I got for image example: “black and white dog is running on the beach” besides, the “Border_Collie” from Inceptionv3 Image prediction-. With very similars BLEU scores to you.
    BLEU-1 = 0.570
    BLEU-2 = 0.343
    BLEU-3 = 0.133
    BLEU-4 = 0.133

    Jason your Tutorial collection is astonish and also because you are a great teacher. Thank you.

    • Avatar
      Jason Brownlee August 30, 2020 at 6:34 am #

      Thanks JG.

      Images in a database can be described, then humans can search the database for images that match their free form requirements.

      Stop words increase the complexity of the problem and don’t add a lot of semantic meaning. You can add them back if you like.

      Very cool experiments, thank you so much for sharing!

  301. Avatar
    Dominique August 31, 2020 at 3:43 pm #

    Dear Jason,

    I have just finished the reading of your book « Deep Learning for Natural Language Processing». It’s excellent, contains tons of information and I thank you for this journey in NLP.

    I have posted my review on your book here: http://questioneurope.blogspot.com/2020/08/deep-learning-for-natural-language.html


  302. Avatar
    sami September 3, 2020 at 1:05 am #

    hey jason,
    I am having a problem,
    TypeError Traceback (most recent call last)
    in ()
    168 generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
    169 # fit for one epoch
    –> 170 Model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
    171 # save model
    172 model.save(‘model_’ + str(i) + ‘.h5’)

    /usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
    89 warnings.warn(‘Update your ' + object_name + ' call to the ‘ +
    90 ‘Keras 2 API: ‘ + signature, stacklevel=2)
    —> 91 return func(*args, **kwargs)
    92 wrapper._original_function = func
    93 return wrapper

    TypeError: fit_generator() missing 1 required positional argument: ‘generator’

    can you suggest a solution please

  303. Avatar
    Ali September 8, 2020 at 9:42 pm #

    Hi. I have a question how can we add attention machenisim to this code or on this model.??


  304. Avatar
    Prashant Aryal September 9, 2020 at 1:25 am #

    Hello Jason,
    Would you please tell me how to do this thing on Googlecolab? I am having hard time with “extract_features” function in preparation section.

  305. Avatar
    Drishti September 12, 2020 at 2:41 am #

    Hey I’m having this problem. Can you help me?

    Traceback (most recent call last):
    File “load.py”, line 72, in
    train_features = load_features(‘features.pkl’,train)
    File “load.py”, line 33, in load_features
    features = {k: all_features[k] for k in dataset}
    File “load.py”, line 33, in
    features = {k: all_features[k] for k in dataset}
    KeyError: ‘3356642567_f1d92cb81b.jpg’

  306. Avatar
    Reshma Jindal September 13, 2020 at 5:53 am #

    epochs = 20
    steps = len(train_descriptions)
    for i in range(epochs):
    # create the data generator
    generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
    # fit for one epoch
    model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
    # save model
    model.save(‘model_’ + str(i) + ‘.h5’)

    In the above code, there are two epoch initialzation , epochs=20 and epochs=1 (in fit_generator), can you describe what both of them actually mean with help of a rough example that can distinguish the significance of both ?

    • Avatar
      Jason Brownlee September 13, 2020 at 6:14 am #

      There’s no contradiction. We are enumerating the epochs manually.

      E.g. the first is our manual outer loop, the second is the inner call to the Keras API to run one epoch.

      • Avatar
        pankaj October 8, 2020 at 7:02 pm #

        while running i got
        model_2.h5 like that till model_19.h

        but non of file made like model-ep002-loss3.245-val_loss3.612.h5. where I can find this file.

        • Avatar
          Jason Brownlee October 9, 2020 at 6:41 am #

          Your file names will be different. Use the files you have.

  307. Avatar
    Drishti September 14, 2020 at 7:57 pm #

    Hello once again
    When I trained the model under progressive loading then in the first epoch only I’m getting NaN loss. Why would that be so?

  308. Avatar
    Todd September 21, 2020 at 3:11 am #

    Hello, first thanks for this great article!i’ve learnt alot from your code.

    i tried it with my laptop but somehow i have a error when fitting the model(something like failed to call GraphViz),, i have it installed but the error is not gone…

    just wondering if i can download the final model somewhere so i can try it with my own pics?


    • Avatar
      Jason Brownlee September 21, 2020 at 8:13 am #

      Yes, you can install pydot and pygraphviz, or comment out the call to the plot_model – which is not needed to complete the tutorial.

      • Avatar
        todd September 21, 2020 at 5:37 pm #

        Thanks Jason, that worked!

      • Avatar
        Todd September 22, 2020 at 3:58 am #

        Also, i’m thinking if it is possible to have algorithms like YOLO that will capture objects in a pic, and then create captions from these object tags?
        like we detect one man , a dog ,grass, we may infer the caption as ‘ a man playing with a dog in a park’?

        • Avatar
          Jason Brownlee September 22, 2020 at 6:54 am #

          Yes, you can use two models in that way.

          • Avatar
            Maverick December 2, 2023 at 6:33 am #

            Hey, Can you please me to execute the code for image captioning using object detection model YOLO.

          • Avatar
            James Carmichael December 2, 2023 at 11:33 am #

            Hi Maverick…The following resource may be of interest to you:


          • Avatar
            Maverick December 3, 2023 at 5:49 am #

            Hey James..I have implemented object detection model but I am not able to incorporate image captioning model with YOLO to generate caption. Can you please help me with this.

  309. Avatar
    Ashutosh Garg September 30, 2020 at 8:14 am #

    Hi, thanks for the article.

    I followed it and trained the model but at the time of inferencing and generating new captions, I get the error


    I am running it on google colab with gpu enabled, in cpu, it might take centuries to run.

    Please help of possible.

  310. Avatar
    pankaj October 8, 2020 at 7:08 pm #

    Hi Jason,
    Thank you for wonder full explanation.
    I struct in some where
    # fit for one epoch
    model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
    this line giving error below

    File “F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\func_graph.py”, line 973, in wrapper
    raise e.ag_error_metadata.to_exception(e)

    ValueError: in user code:

    F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:806 train_function *
    return step_function(self, iterator)
    F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:796 step_function **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
    F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:1211 run
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2585 call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
    F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2945 _call_for_each_replica
    return fn(*args, **kwargs)
    F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:789 run_step **
    outputs = model.train_step(data)
    F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:757 train_step
    F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:2737 _minimize
    F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py:562 _aggregate_gradients
    filtered_grads_and_vars = _filter_grads(grads_and_vars)
    F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py:1271 _filter_grads
    ([v.name for _, v in grads_and_vars],))

    ValueError: No gradients provided for any variable: [’embedding_5/embeddings:0′, ‘dense_15/kernel:0’, ‘dense_15/bias:0’, ‘lstm_5/lstm_cell_5/kernel:0’, ‘lstm_5/lstm_cell_5/recurrent_kernel:0’, ‘lstm_5/lstm_cell_5/bias:0’, ‘dense_16/kernel:0’, ‘dense_16/bias:0’, ‘dense_17/kernel:0’, ‘dense_17/bias:0’].

  311. Avatar
    Priyanka Digambar Pawar October 9, 2020 at 8:25 pm #

    I am facing this error:

    ValueError: No model found in config file.
    in ()

    179 generator = data_generator(train_descriptions, train_features, tokenizer, max_length)

    180 # fit for one epoch

    –> 181 model.fit_generator(generator, epochs=1, steps_per_epoch=len(train_descriptions), verbose=1)
    182 # save model
    183 model.save(‘model_’ + str(i) + ‘.h5’)

    12 frames
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
    971 except Exception as e: # pylint:disable=broad-except
    972 if hasattr(e, “ag_error_metadata”):
    –> 973 raise e.ag_error_metadata.to_exception(e)
    974 else:
    975 raise

    How should i solve this?

  312. Avatar
    Manish October 10, 2020 at 11:59 am #

    Hello jason,

    I’ve been trying to compile the code for extracting the features from the image. I am getting the following error:

    PermissionError Traceback (most recent call last)
    30 # extract features from all images
    31 directory = ‘Flickr8k_Dataset’
    —> 32 features = extract_features(directory)
    33 print(‘Extracted Features: %d’ % len(features))
    34 # save to file

    in extract_features(directory)
    12 # load an image from file
    13 filename = directory + ‘/’ + name
    —> 14 image = load_img(filename, target_size=(224, 224))
    15 # convert the image pixels to a numpy array
    16 image = img_to_array(image)

    ~\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\keras\preprocessing\image.py in load_img(path, grayscale, color_mode, target_size, interpolation)
    299 “””
    300 return image.load_img(path, grayscale=grayscale, color_mode=color_mode,
    –> 301 target_size=target_size, interpolation=interpolation)

    ~\AppData\Local\Programs\Python\Python37\lib\site-packages\keras_preprocessing\image\utils.py in load_img(path, grayscale, color_mode, target_size, interpolation)
    111 raise ImportError(‘Could not import PIL.Image. ‘
    112 ‘The use of load_img requires PIL.’)
    –> 113 with open(path, ‘rb’) as f:
    114 img = pil_image.open(io.BytesIO(f.read()))
    115 if color_mode == ‘grayscale’:

    PermissionError: [Errno 13] Permission denied: ‘Flickr8k_Dataset/Flicker8k_Dataset’

    I have tried changing the permission for the folder by giving it full access. But the error seems to persist. I ran the next part of the code which extracts the descriptions of the images and it ran without any errors. I’m working with jupyter notbeook in Visual Studio code.

    Thank you!

    • Avatar
      Jason Brownlee October 10, 2020 at 1:54 pm #

      Looks like you do not have permission on your workstation to access the dataset.

      Maybe talk to your admin or check the help documentation for your operating system.

      • Avatar
        Manish October 11, 2020 at 11:22 am #

        I run this code on my personal machine. So, I don’t know what you meant by admin. Do you have any suggestions for softwares where I can run this code?

        • Avatar
          Jason Brownlee October 12, 2020 at 6:37 am #

          The error suggests you do not have permission to access files on your machine, it suggests you were using a work machine controlled by someone else.

          If you have control over your machine, give yourself permission to access the files, or place the files in a location where you can access them with permissions.

          Sorry, I don’t know a thing about windows permission administration, I have not used the operating system.

    • Avatar
      Manish October 15, 2020 at 4:25 am #

      Hello Jason,

      Im getting the following error:

      UnidentifiedImageError: cannot identify image file

      Any idea how can I solve this?

      • Avatar
        Jason Brownlee October 15, 2020 at 6:19 am #

        I have not seen this error before, sorry.

      • Avatar
        K.Apoorva January 13, 2021 at 8:03 pm #

        same error, any idea how to resolve?

  313. Avatar
    Md Shihab Uddin October 11, 2020 at 6:05 am #

    Hello, I get this error.how can I solve it?

    ValueError: Input 0 of layer dense_18 is incompatible with the layer: expected axis -1 of input shape to have value 4096 but received input with shape [1, 1000]

  314. Avatar
    Sam Ogbonnaya October 16, 2020 at 1:44 am #


    You noted the following in your article regarding evaluating the models:

    “Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.”

    I can understand how the final model during training i.e. weights, loss etc may vary as you’ve stated. However, when I run the same test image through the trained model, I obtain a different prediction every time – is this due to the same reason as above? or some other reason – shouldn’t the final output always be the same for the same image?

    Would appreciate your explanation.

    I’m using the evaluation procedure from here: https://www.tensorflow.org/tutorials/text/image_captioning#caption


    • Avatar
      Jason Brownlee October 16, 2020 at 5:56 am #

      A trained model should make the same prediction each time. If it does not, check your code – perhaps you are accidentally training or there is a bug.

      • Avatar
        Sam Ogbonnaya October 17, 2020 at 2:16 am #

        Thanks. I’ve spotted the issue.

        The evaluation procedure from the TensorFlow tutorial uses the line below to convert the probability to an integer. As it’s random, it always generates random predictions. By changing the evaluation method to argmax instead, I get consistent predictions

        from tensorflow:
        predicted_id = tf.random.categorical(predictions, 1)[0][0].numpy()

        predicted_id = tf.expand_dims(tf.argmax(predictions, -1), 0).numpy()[0][0]

  315. Avatar
    Manish October 21, 2020 at 4:52 am #

    Hello Jason,

    I’ve trained the model with progressive loading for 3 epochs. When I used a new image to generate captions, it gave me the accurate caption. But everytime I use an image that has beach in it, I get the same caption, “man in red shirt is standing on the beach”, even if there no man in the image. I tried re fitting the model. But the issue is the same. Do you have any suggestions on how to improve the accuracy?

  316. Avatar
    Moksh Grover October 27, 2020 at 5:39 am #

    model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

    Its showing that my GPU is performing tasks but by epoch is stuck at 1st one and it isn’t showing any progress.
    I have an RTX 2070 Max-Q GPU and an i7 Processor
    can anyone help me out?

  317. Avatar
    Mahavirbha October 29, 2020 at 1:08 am #

    i’m getting this error after completing 1 epochs, while running Progressive loading code example.
    ValueError: Failed to find data adapter that can handle input: ,
    what should I do? please help me

  318. Avatar
    Rohit Kushwaha November 10, 2020 at 7:49 pm #

    hello sir, while running the below code after some execution i am getting the error:

    # extract features from each photo in the directory
    def extract_features(directory):
    # load the model
    model = VGG16()
    # re-structure the model
    model = Model(inputs=model.inputs, outputs=model.layers[-2].output)
    # summarize

    # extract features from each photo
    features = dict()
    for name in listdir(directory):
    # load an image from file
    filename = directory + ‘/’ + name
    image = load_img(filename, target_size=(224, 224))
    # convert the image pixels to a numpy array
    image = img_to_array(image)
    # reshape data for the model
    image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
    # prepare the image for the VGG model
    image = preprocess_input(image)
    # get features
    feature = model.predict(image, verbose=0)
    # get image id
    image_id = name.split(‘.’)[0]
    # store feature
    features[image_id] = feature
    print(‘>%s’ % name)
    return features

    # extract features from all images
    directory = ‘/content/drive/My Drive/Flickr_Data/Images/’
    features = extract_features(directory)
    print(‘Extracted Features: %d’ % len(features))
    # save to file
    dump(features, open(‘features.pkl’, ‘wb’))

    This is the error comes after some execution with my image file. I could not able to figure out. please help me!!!

    UnidentifiedImageError Traceback (most recent call last)
    in ()
    32 # extract features from all images
    33 directory = ‘/content/drive/My Drive/Flickr_Data/Images/’
    —> 34 features = extract_features(directory)
    35 print(‘Extracted Features: %d’ % len(features))
    36 # save to file

    3 frames
    /usr/local/lib/python3.6/dist-packages/PIL/Image.py in open(fp, mode)
    2860 warnings.warn(message)
    2861 raise UnidentifiedImageError(
    -> 2862 “cannot identify image file %r” % (filename if filename else fp)
    2863 )

    UnidentifiedImageError: cannot identify image file

  319. Avatar
    Jay Trivedi November 21, 2020 at 10:24 am #

    I was unable to train the model normal way even using AWS m5.2xlarge (32gig). So I tried the generator variant but it shows the error when I’m fitting it

    WARNING:tensorflow:From project1.py:339: Model.fit_generator (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use the Model. fit, which supports generators.
    Traceback (most recent call last):
    File “project1.py”, line 339, in
    model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
    File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py”, line 324, in new_func
    return func(*args, **kwargs)
    File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py”, line 1829, in fit_generator
    File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py”, line 108, in _method_wrapper
    return method(self, *args, **kwargs)
    File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py”, line 1098, in fit
    tmp_logs = train_function(iterator)
    File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py”, line 780, in __call__
    result = self._call(*args, **kwds)
    File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py”, line 823, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
    File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py”, line 697, in _initialize
    *args, **kwds))
    File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py”, line 2855, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
    File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py”, line 3213, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
    File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py”, line 3075, in _create_graph_function
    File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py”, line 986, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
    File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py”, line 600, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
    File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py”, line 973, in wrapper
    raise e.ag_error_metadata.to_exception(e)
    ValueError: in user code:

    /home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:806 train_function *
    return step_function(self, iterator)
    /home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:796 step_function **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
    /home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:1211 run
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:2585 call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
    /home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:2945 _call_for_each_replica
    return fn(*args, **kwargs)
    /home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:789 run_step **
    outputs = model.train_step(data)
    /home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:757 train_step
    /home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:2737 _minimize
    /home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:562 _aggregate_gradients
    filtered_grads_and_vars = _filter_grads(grads_and_vars)
    /home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:1271 _filter_grads
    ([v.name for _, v in grads_and_vars],))

    ValueError: No gradients provided for any variable: [’embedding/embeddings:0′, ‘dense/kernel:0’, ‘dense/bias:0’, ‘lstm/lstm_cell/kernel:0’, ‘lstm/lstm_cell/recurrent_kernel:0’, ‘lstm/lstm_cell/bias:0’, ‘dense_1/kernel:0’, ‘dense_1/bias:0’, ‘dense_2/kernel:0’, ‘dense_2/bias:0’].

    Please give me a solution…….. I have to meet deadlines

    • Avatar
      Jason Brownlee November 21, 2020 at 1:04 pm #

      Sorry to hear that you are having trouble, I can confirm the code works as described. Here are some suggestions:

      • Avatar
        Venkata naresh suddula December 7, 2020 at 6:45 am #

        you can moddify the code of data generators return from

        this yield [[in_img, in_seq], out_word)


        yield ([in_img, in_seq], out_word)

    • Avatar
      Jay Trivedi November 27, 2020 at 10:06 am #

      How can I get flickr30k dataset? I searched a lot but couldn’t find it. The link they provide after filling up the form is broken and says forbidden. Can anyone have the zip file that I can get directly

      • Avatar
        Jason Brownlee November 27, 2020 at 1:08 pm #

        Sorry, I only have a copy of the Flickr8k Dataset

        • Avatar
          Jay Trivedi November 27, 2020 at 4:23 pm #

          I found one on GitHub but I have to create text files by converting them from CSV

      • Avatar
        vishal venkat December 15, 2020 at 4:10 am #

        hai dont worry it is vvv simple

        this is the link for the dataset


        go through that link and download the 2gb sized data file …

        • Avatar
          Jason Brownlee December 15, 2020 at 6:29 am #

          The dataset download is linked directly in the tutorial.

  320. Avatar
    Andrew November 29, 2020 at 7:38 am #

    Hi Jason. This tutorial is fantastic thank you. You also have the patience of a saint responding to all of these. Keep up the great work!

    I’ve adapted this tutorial to run on some images that are pretty difficult to classify anyway (they are similar).

    I’m getting an output but it is the same words for every image. Does this mean the model is overfit or underfit? What are the most likely parameters I should look at changing?


  321. Avatar
    Venkata naresh suddula December 7, 2020 at 6:36 am #

    hai jason i got this error when i running the progresive loading on my lap which has 8gb ram

    ValueError: No gradients provided for any variable: [’embedding_1/embeddings:0′, ‘dense_3/kernel:0’, ‘dense_3/bias:0’, ‘lstm_1/lstm_cell_1/kernel:0’, ‘lstm_1/lstm_cell_1/recurrent_kernel:0’, ‘lstm_1/lstm_cell_1/bias:0’, ‘dense_4/kernel:0’, ‘dense_4/bias:0’, ‘dense_5/kernel:0’, ‘dense_5/bias:0’].

    can u please suggest methe any solution

    • Avatar
      Jason Brownlee December 7, 2020 at 7:38 am #

      Perhaps try the “progressive loading” section of the tutorial.

  322. Avatar
    Venkata naresh suddula December 7, 2020 at 6:46 am #

    tahnk u vvv much for u jason for suchh an awsome project

  323. Avatar
    Waqas December 20, 2020 at 9:23 pm #

    Is this project independent of tensorflow version?

    • Avatar
      Jason Brownlee December 21, 2020 at 6:39 am #

      Yes, it works with many different version, although I recommend using the latest version.

  324. Avatar
    Waqas December 21, 2020 at 12:59 am #

    Please Help
    ValueError: Layer model expects 2 input(s), but it received 3 input tensors. Inputs received: [, , ]

  325. Avatar
    Ahsan December 21, 2020 at 8:55 pm #

    Is there a latest (optimized) version of this program, because it has low accuracy. I need help how to fine tune the model, how to reduce vocabulary, and how to use inception instead of vgg16? some snippets would be really helpful, specially the code snippet to use inception instead of vgg16. Thanks any help would be appreciated. This is a great post thanks

    • Avatar
      Jason Brownlee December 22, 2020 at 6:43 am #

      Thanks for the suggestion, I look into preparing an updated version.

  326. Avatar
    Md. Ajwad Akil December 22, 2020 at 9:20 pm #

    Hello, I followed every single step but I am getting this error when I fit the model:

    ValueError: Layer model expects 2 input(s), but it received 3 input tensors. Inputs received: [, , ]

    I don’t understand the reason, as inputs and outputs to the model , this was supplied:
    model = Model(inputs=[inputs1, inputs2], outputs=outputs)

    Then What is the problem here?
    I am running on google colab here.

    • Avatar
      Jason Brownlee December 23, 2020 at 5:33 am #

      Perhaps try running on your workstation instead of colab? or on AWS EC2?

  327. Avatar
    Md. Ajwad Akil December 22, 2020 at 9:23 pm #

    Sorry, the error did not come for some reason, here it is again:

    ValueError: Layer model expects 2 input(s), but it received 3 input tensors. Inputs received: [, , ]

    • Avatar
      Sahil December 23, 2020 at 7:29 am #

      Yes i faced same error. You faced it because you may be using latest tensorflow (v.24). Whereas this code only works with tensorflow 1.x.
      To overcome this problem i used Tensorflow 1.13.1 and keras 2.2.4.
      We really need a latest version which works on tensorflow 2

      • Avatar
        Jason Brownlee December 23, 2020 at 8:27 am #

        This is incorrect.

        All code examples have been updated and tested on TensorFlow 2.

        I have updated the progressive loading example and changed


        I have also re-run all examples this morning with the latest version of Keras and Tensorflow without incident.

        Please check your library versions using the script in the above tutorial, and ensure you have copied the code correctly.

  328. Avatar
    AndyTown December 23, 2020 at 6:29 pm #

    hey Jason, I followed your tutorial to train a model using progressive loading on the Flickr 30k dataset. I’m using several VMs to train the model using 10 epochs and batch size of 3.

    Train time ETA is 48 hours. Even if I decrease number of epochs or change batch size, ETA stays the same! Any explanation or tips to decrease training time?

    Thank you in advance for any help you can offer!

  329. Avatar
    Montaser December 29, 2020 at 1:57 pm #

    Hello, I am having some trouble with the model.fit_generator method.

    # train the model, run epochs manually and save after each epoch
    epochs = 20
    steps = len(train_descriptions)
    for i in range(epochs):
    # create the data generator
    generator = data_generator(train_descriptions, train_features, tokenizer, max_length)
    # fit for one epoch
    model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
    # save model
    model.save(‘/content/drive/Shareddrives/AITrust/model_’ + str(i) + ‘.h5’)

    while running this cell, the following output is generated

    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:1844: UserWarning: Model.fit_generator is deprecated and will be removed in a future version. Please use Model.fit, which supports generators.
    warnings.warn(‘Model.fit_generator is deprecated and ‘
    TypeError Traceback (most recent call last)
    in ()
    6 generator = data_generator(train_descriptions, train_features, tokenizer, max_length)
    7 # fit for one epoch
    —-> 8 model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
    9 # save model
    10 model.save(‘/content/drive/Shareddrives/AITrust/model_’ + str(i) + ‘.h5′)

    9 frames
    /usr/local/lib/python3.6/dist-packages/numpy/core/numeric.py in full(shape, fill_value, dtype, order)
    312 if dtype is None:
    313 dtype = array(fill_value).dtype
    –> 314 a = empty(shape, dtype, order)
    315 multiarray.copyto(a, fill_value, casting=’unsafe’)
    316 return a

    TypeError: ‘function’ object cannot be interpreted as an integer

    I am using tensorflow 2.4.0 and keras 2.4.3. I have also changed
    out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]
    out_seq = to_categorical([out_seq], num_classes=vocab_size)

    • Avatar
      Jason Brownlee December 30, 2020 at 6:33 am #

      Sorry to hear that.

      Perhaps trying copying the complete code example from the end of the section?

  330. Avatar
    Arun Kumar December 31, 2020 at 9:24 pm #

    i have given data like csv consist of descriptions,imgpath, ,which tokenizer i use

    • Avatar
      Jason Brownlee January 1, 2021 at 5:26 am #

      Perhaps start wit the tokenizer used above as a starting point.

  331. Avatar
    Ayush Gupta January 11, 2021 at 1:46 am #

    InvalidArgumentError: 2 root error(s) found.
    (0) Invalid argument: Matrix size-incompatible: In[0]: [47,1000], In[1]: [4096,256]
    [[node model_2/dense_6/MatMul (defined at :170) ]]
    sanity is not maintained in code, how to resove this?

  332. Avatar
    harsha February 1, 2021 at 6:00 am #

    Dataset: 6000
    Descriptions: train=6000
    Photos: train=6000
    Vocabulary Size: 7507


    TypeError Traceback (most recent call last)

    in ()
    30 print(‘Vocabulary Size: %d’ % vocab_size)
    31 # determine the maximum sequence length
    —> 32 max_length = max_length(train_descriptions)
    33 print(‘Description Length: %d’ % max_length)
    34 # prepare sequences

    TypeError: ‘int’ object is not callable

    why I am getting this error even though I have copy-pasted the code. How to resolve this issue?

  333. Avatar
    pakshal February 23, 2021 at 7:32 pm #

    # define the model
    model = define_model(vocab_size, max_length)
    # train the model, run epochs manually and save after each epoch
    epochs = 20
    steps = len(train_descriptions)
    for i in range(epochs):
    # create the data generator
    generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
    # fit for one epoch
    # save model
    model.save(‘model_’ + str(i) + ‘.h5’)

    Model: “model_4”
    Layer (type) Output Shape Param # Connected to
    input_10 (InputLayer) [(None, 34)] 0
    input_9 (InputLayer) [(None, 4096)] 0
    embedding_4 (Embedding) (None, 34, 256) 1940224 input_10[0][0]
    dropout_8 (Dropout) (None, 4096) 0 input_9[0][0]
    dropout_9 (Dropout) (None, 34, 256) 0 embedding_4[0][0]
    dense_12 (Dense) (None, 256) 1048832 dropout_8[0][0]
    lstm_4 (LSTM) (None, 256) 525312 dropout_9[0][0]
    add_4 (Add) (None, 256) 0 dense_12[0][0]
    dense_13 (Dense) (None, 256) 65792 add_4[0][0]
    dense_14 (Dense) (None, 7579) 1947803 dense_13[0][0]
    Total params: 5,527,963
    Trainable params: 5,527,963
    Non-trainable params: 0
    Epoch 1/20
    InvalidArgumentError Traceback (most recent call last)
    in ()
    9 generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
    10 # fit for one epoch
    —> 11 model.fit(generator,epochs=20,verbose=1,steps_per_epoch=steps)
    12 # save model
    13 model.save(‘model_’ + str(i) + ‘.h5’)

    6 frames
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
    58 ctx.ensure_initialized()
    59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
    —> 60 inputs, attrs, num_outputs)
    61 except core._NotOkStatusException as e:
    62 if name is not None:

    InvalidArgumentError: 2 root error(s) found.
    (0) Invalid argument: Can not squeeze dim[2], expected a dimension of 1, got 7579
    [[node categorical_crossentropy/remove_squeezable_dimensions/Squeeze (defined at :11) ]]
    (1) Invalid argument: Can not squeeze dim[2], expected a dimension of 1, got 7579
    [[node categorical_crossentropy/remove_squeezable_dimensions/Squeeze (defined at :11) ]]
    0 successful operations.
    0 derived errors ignored. [Op:__inference_train_function_36733]

    Function call stack:
    train_function -> train_function

    sir please help

      • Avatar
        pakshal February 24, 2021 at 5:46 pm #

        Sir all libraries are seems to update becuase I use google colab.Sir please help me i stuck on this since 15 days

        • Avatar
          Jason Brownlee February 25, 2021 at 5:26 am #

          The above tutorial works on the latest version of libraries.

          I don’t know about colab, sorry.

          Perhaps try running on your own machine or on AWS EC2 where you can control the environment.

          • Avatar
            pakshal February 26, 2021 at 6:13 am #

            No sir you use “model.fit_generator” but as per the latest update have model.fit. That’s why most of student facing error.

  334. Avatar
    pakshal February 24, 2021 at 6:02 pm #

    InvalidArgumentError Traceback (most recent call last)
    in ()
    9 generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
    10 # fit for one epoch
    —> 11 model.fit(generator,epochs=1,verbose=1,steps_per_epoch=steps)
    12 # save model
    13 model.save(‘model_’ + str(i) + ‘.h5’)

  335. Avatar
    pakshal February 24, 2021 at 6:31 pm #

    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:1844: UserWarning: Model.fit_generator is deprecated and will be removed in a future version. Please use Model.fit, which supports generators.
    warnings.warn(‘Model.fit_generator is deprecated and ‘
    ValueError Traceback (most recent call last)
    in ()
    6 generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
    7 # fit for one epoch
    —-> 8 model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
    9 # save model
    10 model.save(‘model_’ + str(i) + ‘.h5’)

  336. Avatar
    MBA ASPIRANT February 27, 2021 at 6:51 pm #

    Thank you so much for u Jason for such an awsome project

  337. Avatar
    MA Kabir Arif March 4, 2021 at 5:40 pm #

    Sir, thanks for your explanation. Can I use this embedding (cnn-rnn) for custom Stack GAN training?

    • Avatar
      Jason Brownlee March 5, 2021 at 5:31 am #

      Perhaps try it and see?

      • Avatar
        MA Kabir Arif March 11, 2021 at 2:23 am #

        Okay sir. I will try it soon & let you know the results.

  338. Avatar
    Shivaprasad Satla March 9, 2021 at 8:45 pm #

    I am getting memory error? how to overcome this one

  339. Avatar
    Ryo March 12, 2021 at 2:12 pm #

    Could you tell me what is “FF” short for in the graph?

    • Avatar
      Jason Brownlee March 13, 2021 at 5:24 am #

      FF == Feed-forward, e.g. dense layers used to interpret input and make a prediction.

  340. Avatar
    PAKSHAL SHETH March 14, 2021 at 5:51 pm #

    Can we run the code in 8 GB ram without using progressive loading code?

  341. Avatar
    Bellev March 18, 2021 at 6:37 am #

    Hello, I think the line:
    max_length = max_length(train_descriptions)

    found in several places in the text contains an error – how can the variable and the function have the same name?

    That apart, this is a brilliant tutorial, thank you!

    • Avatar
      Jason Brownlee March 19, 2021 at 6:09 am #

      Yes, that does look like a bad idea. It can work fine if the function is called before the variable is defined – as it is in this case.

  342. Avatar
    Jens Br April 5, 2021 at 12:52 am #


    when I try out code, I get follow error:

    File “/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/plaidml/keras/backend.py”, line 1529, in rnn
    raise NotImplementedError(‘rnn is not implemented with mask support’)
    NotImplementedError: rnn is not implemented with mask support

    Can you tell me where the problem is?

    many thanks

  343. Avatar
    Rohit Kushwaha April 15, 2021 at 1:36 pm #

    Hello Sir, i want to plot the graph between training and testing or dev loss but with model.fit_generator i could not able to do so. i have to go with model.fit_generator because while using model.fit i am getting RAM issues so anyhow i have to go with model.fit_generator,
    Please tell me how to implement the graph between training and testing.

    Thanking youin advance

    • Avatar
      Jason Brownlee April 16, 2021 at 5:29 am #

      You may have to write some custom code, I’m not sure off the cuff sorry.

  344. Avatar
    srz April 30, 2021 at 9:18 am #

    Hey Jason,
    Thank you for all the posts. They’re heavily informative.
    How can I use checkpoints whilst using progressive loading?
    I mean the model.fit is inside a loop
    so I’m confused.
    Thanks again

    • Avatar
      Jason Brownlee May 1, 2021 at 6:00 am #

      You’re welcome.

      Perhaps experiment with using a callback to save checkpoints?

  345. Avatar
    srz May 4, 2021 at 4:10 pm #

    okay, thanks a lot. worked it out.
    Just a thing more, I tried to use a batch size of 64. The accuracy increased but the bleu scores dropped in comparison to when using the batch size of 6000.
    Could you please guide me to the possible reason?

    • Avatar
      Jason Brownlee May 5, 2021 at 6:07 am #

      I would expect accuracy is not an appropriate measure to use on this dataset, ignore it.

      • Avatar
        srz May 6, 2021 at 7:33 am #

        yes, I read that in your post. Should I not be using batch training then? Since the higher the batch size the lower my bleu score goes. Just wanted to know why is this happening. couldnt get an answer anywhere.

        • Avatar
          Jason Brownlee May 7, 2021 at 6:23 am #

          Perhaps try different configurations and compare the result.

  346. Avatar
    SK May 13, 2021 at 2:51 pm #


    Iam trying to run the code in “Prepare Photo Data” section and when I run I get the following error that says model.fit() require model.compile(), however, there is no model compilation in this code fragment.

    Traceback (most recent call last):
    File "extract_image_features.py", line 39, in
    features = compute_features(IMAGE_DIR)
    File "extract_image_features.py", line 31, in compute_features
    feature = model.fit(img, verbose=0)
    File "/Users/saratk/envs/tf/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1032, in fit
    File "/Users/saratk/envs/tf/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 2592, in _assert_compile_was_called
    raise RuntimeError('You must compile your model before '
    RuntimeError: You must compile your model before training/testing. Use
    model.compile(optimizer, loss).

    I have

    tensorflow: 2.4.1
    keras: 2.4.0


  347. Avatar
    Ankit Nigam May 17, 2021 at 12:03 am #

    Hi Jason,

    I am running above example and from section – Photo and Caption Dataset, getting the below error

    NotFoundError Traceback (most recent call last)
    1 #extract features from all images
    2 directory=’Flicker8k_Dataset’
    —-> 3 features=extract_features(directory)
    5 print (“Extracted features : “, len(features))

    in extract_features(directory)
    35 #get features
    —> 36 feature=model.predict(image, verbose=0)
    38 #get image id

    ~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\engine\training.py in predict(self, x, batch_size, verbose, steps, callbacks, max_queue_size, workers, use_multiprocessing)
    1627 for step in data_handler.steps():
    1628 callbacks.on_predict_batch_begin(step)
    -> 1629 tmp_batch_outputs = self.predict_function(iterator)
    1630 if data_handler.should_sync:
    1631 context.async_wait()

    ~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\def_function.py in __call__(self, *args, **kwds)
    826 tracing_count = self.experimental_get_tracing_count()
    827 with trace.Trace(self._name) as tm:
    –> 828 result = self._call(*args, **kwds)
    829 compiler = “xla” if self._experimental_compile else “nonXla”
    830 new_tracing_count = self.experimental_get_tracing_count()

    ~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
    893 # If we did not create any variables the trace we have is good enough.
    894 return self._concrete_stateful_fn._call_flat(
    –> 895 filtered_flat_args, self._concrete_stateful_fn.captured_inputs) # pylint: disable=protected-access
    897 def fn_with_cond(inner_args, inner_kwds, inner_filtered_flat_args):

    ~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
    1917 # No tape is watching; skip to running the function.
    1918 return self._build_call_outputs(self._inference_function.call(
    -> 1919 ctx, args, cancellation_manager=cancellation_manager))
    1920 forward_backward = self._select_forward_and_backward_functions(
    1921 args,

    ~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager)
    558 inputs=args,
    559 attrs=attrs,
    –> 560 ctx=ctx)
    561 else:
    562 outputs = execute.execute_with_cancellation(

    ~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
    58 ctx.ensure_initialized()
    59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
    —> 60 inputs, attrs, num_outputs)
    61 except core._NotOkStatusException as e:
    62 if name is not None:

    NotFoundError: No algorithm worked!
    [[node model_1/block1_conv1/Relu (defined at :36) ]] [Op:__inference_predict_function_1219]

    Function call stack:

    Could you please give some pointers to resolve the same


  348. Avatar
    Ankit Nigam May 17, 2021 at 12:45 am #


    Please ignore the above issue.

    It is resolved now.


  349. Avatar
    Sam May 17, 2021 at 6:04 am #

    hi Jason, here is the link to my code, see the last cell
    I have done the same as mentioned in the blog, can you plz help, error coming in the last step only


    the error coming is : ValueError: Layer model_3 expects 2 input(s), but it received 3 input tensors. Inputs received: [, , ]

  350. Avatar
    Adiya May 24, 2021 at 12:42 pm #

    Hi Jason Sir I was hoping you could help me, I am using progressive overloading method to get the best model I get the following error stack

    File "Basic_model.py", line 357, in
    model = define_model(vocab_size, max_length)
    File "Basic_model.py", line 310, in define_model
    se3 = LSTM(256)(se2)
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py", line 660, in __call__
    return super(RNN, self).__call__(inputs, **kwargs)
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 952, in __call__
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1091, in _functional_construction_call
    inputs, input_masks, args, kwargs)
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call
    return self._infer_output_signature(inputs, args, kwargs, input_masks)
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 863, in _infer_output_signature
    outputs = call_fn(inputs, *args, **kwargs)
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent_v2.py", line 1157, in call
    inputs, initial_state, _ = self._process_inputs(inputs, initial_state, None)
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py", line 859, in _process_inputs
    initial_state = self.get_initial_state(inputs)
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py", line 643, in get_initial_state
    inputs=None, batch_size=batch_size, dtype=dtype)
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py", line 2507, in get_initial_state
    self, inputs, batch_size, dtype))
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py", line 2987, in _generate_zero_filled_state_for_cell
    return _generate_zero_filled_state(batch_size, cell.state_size, dtype)
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py", line 3003, in _generate_zero_filled_state
    return nest.map_structure(create_zeros, state_size)
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/util/nest.py", line 659, in map_structure
    structure[0], [func(*x) for x in entries],
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/util/nest.py", line 659, in
    structure[0], [func(*x) for x in entries],
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py", line 3000, in create_zeros
    return array_ops.zeros(init_state_size, dtype=dtype)
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 2819, in wrapped
    tensor = fun(*args, **kwargs)
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 2868, in zeros
    output = _constant_if_small(zero, shape, dtype, name)
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 2804, in _constant_if_small
    if np.prod(shape) < 1000:
    File "", line 6, in prod
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 3031, in prod
    keepdims=keepdims, initial=initial, where=where)
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
    File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 855, in __array__
    " a NumPy call, which is not supported".format(self.name))
    NotImplementedError: Cannot convert a symbolic Tensor (lstm/strided_slice:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported

  351. Avatar
    Prabir May 25, 2021 at 1:44 am #

    Hi Jason,

    I have followed the steps you mentioned in your blog. After some trials, I am able to run the entire project.
    But the issue I am facing is that for whatever image I provide, my model always says either ‘A man in red t-shirt standing on street’ or ‘Two dogs are playing on ground’.

    I am unable to understand which section is creating this problem. I am very new in this area. Can you please suggest in which areas might create this problem?

    I can share my entire code, if you want to check.

    • Avatar
      Jason Brownlee May 25, 2021 at 6:10 am #

      You may need to re-fit the model or tune the model hyperparameters.

  352. Avatar
    Azaz Ur Rehman Butt June 3, 2021 at 7:55 am #

    Hi Jason!

    First of all, thanks for this fruitful tutorial, it has helped me a lot and I’ve learnt a lot from this. I want to make an extension by changing the CNN model from VGG16 to InceptionV3. Will it be a better choice? The output of Inception V3 is (None, 2048). Please help me in this regard.

    Also, I want to use BERT Transformers Model for word embeddings. Will it be a better choice?

    • Avatar
      Jason Brownlee June 4, 2021 at 6:44 am #

      You’re welcome.

      It may, perhaps test it and discover it these changes result in better performance.

  353. Avatar
    Mukesh June 22, 2021 at 8:58 pm #

    Can you please upload the model that you trained and used in the example? My system is not really powerful and the progressive loading method is giving out really bad models.So I would really appreciate if you could share your model ‘model-ep002-loss3.245-val_loss3.612.h5’.

    Thank You!

  354. Avatar
    Zulfikri Mirza July 2, 2021 at 7:01 pm #

    Hi Jason nice article and thank you for giving us something to learn !
    a few question tho
    1. can this used to video for the captioning?
    2. how much photos that it need to take for the sequence to make the caption?

    thank you so much
    bless you !

    • Avatar
      Jason Brownlee July 3, 2021 at 6:10 am #

      Perhaps you can apply the method to frames of a video.

      You may need to experiment to discover how much data is required and the best model for your specific dataset.

  355. Avatar
    Zulfikri Mirza July 3, 2021 at 5:50 pm #

    currently got 23k frames data that i got from around 200 videos, i did apply it to it and it works for making model train and evaluating, didnt try the model to the video yet since i didnt see any description for how much frames could it take to make a caption in your article

    any suggestion or clue which or where should i look from your code in your article to modified it if my data are video and make the system process the caption after looking from a few frames ?

    Once again Thanks a lot !

    • Avatar
      Jason Brownlee July 4, 2021 at 6:01 am #

      The example expects images, so perhaps you can provide video frames, or a subset of video frames to the model for prediction.

      • Avatar
        Zulfikri Mirza July 4, 2021 at 5:37 pm #

        i did use this example with my video frames data with my own token etc, but i havent tried the model to video yet since i didnt see how much images(or frames because i input the data with my frames/images from my video) sequence it takes to predict and i am a little bit confused at that part.

        but for trying to make the model with my own data(frames from my video), i already did and it works fine,

  356. Avatar
    sam July 15, 2021 at 2:35 am #

    thanks a lot for your tutorial. it helped me a lot but excuse me when i print (actual and predicted sentences ) i got the same sentence every time i got the actual from test descriptions . i need to get the new captions that used the model h5 . Thanks a lot

    • Avatar
      Jason Brownlee July 15, 2021 at 5:33 am #

      Perhaps re-fitting the model or using a different saved model as the final model?

      • Avatar
        sam July 29, 2021 at 11:49 am #

        Thanks a lot for replying .. yes the model needs more training but I’m using coco dataset and used the same model with number of epochs 100 and learning rate 3e-4 and delete the first dropout layer and got this result
        1 loss: 4.5090 – accuracy: 0.2570 – val_loss: 3.8487 – val_accuracy: 0.3179
        2 loss: 3.7380 – accuracy: 0.3306 – val_loss: 3.6578 – val_accuracy: 0.3398
        3 loss: 3.5954 – accuracy: 0.3449 – val_loss: 3.6016 – val_accuracy: 0.3481
        4 loss: 3.5336 – accuracy: 0.3519 – val_loss: 3.5808 – val_accuracy: 0.3527
        5 loss: 3.4969 – accuracy: 0.3563 – val_loss: 3.5751 – val_accuracy: 0.3555
        6 loss: 3.4720 – accuracy: 0.3595 – val_loss: 3.5733 – val_accuracy: 0.3574
        7 loss: 3.4528 – accuracy: 0.3619 – val_loss: 3.5809 – val_accuracy: 0.3585
        8 loss: 3.4388 – accuracy: 0.3638 – val_loss: 3.5888 – val_accuracy: 0.3595
        9 loss: 3.4283 – accuracy: 0.3655 – val_loss: 3.5965 – val_accuracy: 0.3606
        10 loss: 3.4207 – accuracy: 0.3666 – val_loss: 3.6070 – val_accuracy: 0.3612
        11 loss: 3.4155 – accuracy: 0.3675 – val_loss: 3.6225 – val_accuracy: 0.3615
        12 loss: 3.4112 – accuracy: 0.3688 – val_loss: 3.6362 – val_accuracy: 0.3622
        13 loss: 3.4085 – accuracy: 0.3696 – val_loss: 3.6467 – val_accuracy: 0.3623
        14 loss: 3.4024 – accuracy: 0.3697 – val_loss: 3.6510 – val_accuracy: 0.3617
        15 loss: 3.3850 – accuracy: 0.3702 – val_loss: 3.6485 – val_accuracy: 0.3622
        16 loss: 3.3729 – accuracy: 0.3709 – val_loss: 3.6563 – val_accuracy: 0.3622

        Should I wait or the model needs to change ?

  357. Avatar
    Mr B August 1, 2021 at 8:12 pm #

    Hi Just wondering if anyone came across this error

    TypeError: Dimension value must be integer or None or have an __index__ method, got value ” with type ”

  358. Avatar
    Peter Sun September 1, 2021 at 7:05 pm #

    Hi, Jason, I followed through the tutorial from beginning to end. So my best model was the one with val-loss 3.882, and test BLEU-1 score was 0.5461!
    I have learned that BLEU-1 score of 0.5 is a state-of-the-art performance, but when I try generating captions with new images of people and animal from the internet, the phrase “man in black shirt is sitting on the sidewalk” keeps coming up for random images. Does this mean that for these images the model does not recognize them at all?

    • Avatar
      Jason Brownlee September 2, 2021 at 5:07 am #

      Perhaps the model has overfit, you could try using a different model saved during training or try re-training the model.

  359. Avatar
    Peter Sun September 3, 2021 at 12:36 pm #

    All right! Thanks for the reply! I’ll try using different options!

  360. Avatar
    Naqqash Dilshad September 9, 2021 at 1:55 pm #

    Hi Dr. Brownlee

    Would you please guide us on how to perform image captioning for a custom dataset? The problem is assigning unique identifiers to the labels of each image. Any kind of help will be appreciated…. thank you

    • Avatar
      Adrian Tam September 11, 2021 at 6:05 am #

      Yes, that’s a boring part but you must spend time to do this tagging before you can do anything else.

  361. Avatar
    Karlo September 18, 2021 at 5:38 am #

    While running the code in Google Colab I my runtime stops working because of this message: “Your session crashed after using all available RAM” . does anyone know what might be the reason for it or how to fix it?

    • Avatar
      Adrian Tam September 19, 2021 at 6:22 am #

      You exhausted the memory. You either need to use a paid version of Colab, or use another way to run your code.

  362. Avatar
    Meena Vinaykumar September 19, 2021 at 4:29 pm #


    I have given the flickr dataset without the train, test and val datasets separately. So how do I do the training. Please help

  363. Avatar
    Mandar September 23, 2021 at 4:23 am #

    Hi Adrian,
    Am getting an error ‘TypeError: ‘int’ object is not callable’ for the line :

    # determine the maximum sequence length
    max_length = max_length(train_descriptions)

    What might be causing this.

    • Avatar
      Adrian Tam September 23, 2021 at 5:35 am #

      max_length is a variable or max_length is a function? You are reusing the same name for two purposes.

  364. Avatar
    AdeN September 25, 2021 at 5:13 pm #

    Thank you for the great tutorial. It’s beneficial for me. 🙂
    I tried using another model, progressive loading, and added a data validation set using the generator.

  365. Avatar
    João Gondim September 27, 2021 at 2:29 am #

    Hi! Thank you so much for this post!

    First, I’m using this code to make some tests on a translated Flickr8k dataset, I intend to publish my findings later, how can I cite you and your website? Standard latex citing the site would be ok?

    Second, as told on the article, val_loss reaches minimum values on the very first epochs, but comparing BLEU score, the latest epoch showed better numbers, why do you think this might be happening?

  366. Avatar
    Muhammad Kamran October 29, 2021 at 5:33 am #

    Hey Jason, is there any work done on image captioning using conventional machine learning. I am working on report generation for medical images. can you suggest me some literature that i should i review for my thesis?

  367. Avatar
    Saurabh Sarkar November 4, 2021 at 4:08 pm #

    Hello Jason,

    Thanks for this article. when I am trying to pretrain my inception v3 model using the existing 8k flicker dataset that I have. I am getting error:

    image_model = og_tf.keras.applications.InceptionV3(include_top=False,weights=’imagenet’)

    new_input = image_model.input
    hidden_layer = image_model.layers[-1].output

    image_features_extract_model = og_tf.keras.Model(new_input,hidden_layer)

    for img,path in img_data:
    fv = image_features_extract_model(img)

    Below is the ERROR:
    NotFoundError Traceback (most recent call last)
    ~\AppData\Local\Temp/ipykernel_81952/3499706949.py in
    —-> 1 for img,path in img_data:
    2 fv = image_features_extract_model(img)

    ~\miniconda3\envs\tensor\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py in __next__(self)
    759 def __next__(self):
    760 try:
    –> 761 return self._next_internal()
    762 except errors.OutOfRangeError:
    763 raise StopIteration

    ~\miniconda3\envs\tensor\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py in _next_internal(self)
    745 self._iterator_resource,
    746 output_types=self._flat_output_types,
    –> 747 output_shapes=self._flat_output_shapes)
    749 try:

    ~\miniconda3\envs\tensor\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py in iterator_get_next(iterator, output_types, output_shapes, name)
    2725 return _result
    2726 except _core._NotOkStatusException as e:
    -> 2727 _ops.raise_from_not_ok_status(e, name)
    2728 except _core._FallbackException:
    2729 pass

    ~\miniconda3\envs\tensor\lib\site-packages\tensorflow\python\framework\ops.py in raise_from_not_ok_status(e, name)
    6895 message = e.message + (” name: ” + name if name is not None else “”)
    6896 # pylint: disable=protected-access
    -> 6897 six.raise_from(core._status_to_exception(e.code, message), None)
    6898 # pylint: enable=protected-access

    ~\miniconda3\envs\tensor\lib\site-packages\six.py in raise_from(value, from_value)

    NotFoundError: NewRandomAccessFile failed to Create/Open: \Images\1000268201_693b08cb0e.jpg : The system cannot find the path specified.
    ; No such process
    [[{{node ReadFile}}]] [Op:IteratorGetNext]

    Any idea why the system is not able to access the path.

    • Avatar
      Adrian Tam November 7, 2021 at 7:39 am #

      No idea. Are you messing up the path separators “\” with “/” ?

  368. Avatar
    Shivam Patel November 7, 2021 at 11:29 pm #

    Hello sir, I am getting same caption for all nre images that is “startseq man in blue shirt is standing on the street endseq”.

    what is the problem and how can I fix it.

    • Avatar
      Adrian Tam November 14, 2021 at 11:58 am #

      Was there a problem on training? I believe the model is degenerated, but not sure what caused it.

  369. Avatar
    HomaK February 28, 2022 at 2:00 am #

    Hi Jason
    Thank you for your perfect article
    I successfully train and evaluate model but I get this result:
    startseq rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed

    how can I fix it?

    • Avatar
      James Carmichael February 28, 2022 at 11:58 am #

      Hi Homak…Thanks for asking.

      I’m eager to help, but I just don’t have the capacity to debug code for you.

      I am happy to make some suggestions:

      Consider aggressively cutting the code back to the minimum required. This will help you isolate the problem and focus on it.
      Consider cutting the problem back to just one or a few simple examples.
      Consider finding other similar code examples that do work and slowly modify them to meet your needs. This might expose your misstep.
      Consider posting your question and code to StackOverflow.

  370. Avatar
    HomaK February 28, 2022 at 4:28 pm #

    Thanks a lot dear Jason for your answering.
    I will follow your recommendations and post the results.
    Your website is like a book which I learn many thing from even comments.
    best wishes

  371. Avatar
    Vahid March 29, 2022 at 1:51 am #

    Is it possible for you to share the h5 file?

    • Avatar
      James Carmichael March 29, 2022 at 9:58 am #

      Hi Vahid…We do not share h5 files, however you may feel free to create one from the source code we provide.

  372. Avatar
    generator energy May 11, 2022 at 3:33 am #

    Such a great information. This is really very helpful for bloggers

    • Avatar
      James Carmichael May 13, 2022 at 12:41 am #

      Thank you for the feedback!

  373. Avatar
    Didem Damka May 16, 2022 at 7:07 am #

    Hi James. Thank you for this beautiful tutorial.

    I am trying to use the same code with Flickr30K dataset. And also I am computing Bleu and Cider score. It works fine with Flickr8K. I split the the Flickr30K dataset to 29000 train, 1000 validation and 1000 test images. and trained the model. But the model generates the same two sentence for every image in the list. Why does this happen. How can I fix it. Also the Bleu scores are higher the Flickr8K, but the Cider score is too low. I tried to reduce the vocabulary size like in the ” What is the Role of Recurrent Neural Networks (RNNs) in an Image
    Caption Generator?” paper. But it doesn’t work. Thank you so much.

    • Avatar
      James Carmichael May 16, 2022 at 8:50 am #

      Hi Didem…I have never encountered this issue. The following may help by providing another approach:


      • Avatar
        Didem Damka May 16, 2022 at 8:32 pm #

        OK. I will try this approach. Thank you so much.

        • Avatar
          Nani May 18, 2022 at 6:54 pm #

          Hey how can we identify colors of the object present in the image through captions

  374. Avatar
    JayC July 12, 2022 at 10:24 am #

    Thank you for this excellent example of image captioning!

    I am currently working on a project where I need to caption images of playing cards. Importantly, the model needs to capture the ORDER of the playing cards (from left to right).

    If I train a CNN LSTM, as in your example, and the captions are correctly formatted (left-right), will this model capture such as spatial relationship?

    I.e. is an image captioning model the correct approach for this task?

    • Avatar
      James Carmichael July 13, 2022 at 7:49 am #

      Hi JayC…You are very welcome! Explain further what you mean by “capture such as spatial relationship”.

      • Avatar
        JayC July 14, 2022 at 11:31 am #

        Sorry, my question was not very clear.

        I train a model, like you described above, on images of playing cards. Each image is captioned, describing the cards in strictly left-to-right order. E.g.

        5 of Diamonds — 3 of Clubs — Ace of Hearts

        I use the trained model to make a prediction on a new image of 3 playing cards. Will the predicted caption have the correct left-to-right ordering?

        I.e. can the image captioning approach you describe learn (in this case left-to-right) spatial relationships?

        • Avatar
          James Carmichael July 15, 2022 at 8:32 am #

          Hi JayC…The answer is yes. I would recommend that you proceed with the model for your application and let us know your findings.

  375. Avatar
    Xuan December 11, 2022 at 8:02 am #

    Hi Jason, this is a very well-written tutorial on caption generation, thank you! All procedures, including data preparation, model architecture, training and evaluation are thoroughly explained in detail using simple terms. For me, it has been a good refreshment for the encoder-decoder architecture. I also appreciate the provided code, from which I learn a lot, especially the code for recursively generating output text.

    Now looking back from the end of 2022, I’m curious whether the following could increase the performance of the caption generator:
    • Plug in the photo feature extractor and let it be fine-tuned along with training the decoder
    • Use a transformer for the decoder

    • Avatar
      James Carmichael December 11, 2022 at 9:31 am #

      Great feedback Xuan! We greatly appreciate your support!

  376. Avatar
    Mazen February 16, 2023 at 3:08 am #

    Hi Jason, Hi all
    Thanks Jason for this very helpful tutorial!

    Could anyone please send a link for downloading their trained model and tokenizer? Then we can try directly the last part (Generate New Captions) without training and saving..

    • Avatar
      Mazen February 28, 2023 at 8:41 pm #

      So, I searched for pre-trained models and found and tried this one, which looks very impressive (not only) for captioning: https://github.com/salesforce/LAVIS

      Just wanted to share this with all here, as I always benefit from this very helpful website.

  377. Avatar
    tounes February 20, 2023 at 9:51 am #

    hello, can you please tell me if these codes are updated to the latest evolutions in image captioning field , or there are other recouse that are up to date 2023?
    please reply to me
    thank you.

    • Avatar
      James Carmichael February 20, 2023 at 10:00 am #

      Hi tounes…Our content is up to date with stable library levels. Are you having any particular issues with executing the code that we can assist you with?

Leave a Reply