How to Develop a Deep Learning Photo Caption Generator from Scratch

By Jason Brownlee on December 23, 2020 in Deep Learning for Natural Language Processing 1,196

Develop a Deep Learning Model to Automatically
Describe Photographs in Python with Keras, Step-by-Step.

Caption generation is a challenging artificial intelligence problem where a textual description must be generated for a given photograph.

It requires both methods from computer vision to understand the content of the image and a language model from the field of natural language processing to turn the understanding of the image into words in the right order. Recently, deep learning methods have achieved state-of-the-art results on examples of this problem.

Deep learning methods have demonstrated state-of-the-art results on caption generation problems. What is most impressive about these methods is a single end-to-end model can be defined to predict a caption, given a photo, instead of requiring sophisticated data preparation or a pipeline of specifically designed models.

In this tutorial, you will discover how to develop a photo captioning deep learning model from scratch.

After completing this tutorial, you will know:

How to prepare photo and text data for training a deep learning model.
How to design and train a deep learning caption generation model.
How to evaluate a train caption generation model and use it to caption entirely new photographs.

Kick-start your project with my new book Deep Learning for Natural Language Processing, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Update Nov/2017: Added note about a bug introduced in Keras 2.1.0 and 2.1.1 that impacts the code in this tutorial.
Update Dec/2017: Updated a typo in the function name when explaining how to save descriptions to file, thanks Minel.
Update Apr/2018: Added a new section that shows how to train the model using progressive loading for workstations with minimum RAM.
Update Feb/2019: Provided direct links for the Flickr8k_Dataset dataset, as the official site was taken down.
Update Jun/2019: Fixed typo in dataset name. Fixed minor bug in create_sequences().
Update Aug/2020: Update code for API changes in Keras 2.4.3 and TensorFlow 2.3.
Update Dec/2020: Added a section for checking library version numbers.
Update Dec/2020: Updated progressive loading to fix error “ValueError: No gradients provided for any variable“.

How to Develop a Deep Learning Caption Generation Model in Python from Scratch
Photo by Living in Monrovia, some rights reserved.

Tutorial Overview

This tutorial is divided into 6 parts; they are:

Photo and Caption Dataset
Prepare Photo Data
Prepare Text Data
Develop Deep Learning Model
Train With Progressive Loading (NEW)
Evaluate Model
Generate New Captions

Python Environment

This tutorial assumes you have a Python SciPy environment installed, ideally with Python 3.

You must have Keras installed with the TensorFlow backend. The tutorial also assumes you have the libraries NumPy and NLTK installed.

If you need help with your environment, see this tutorial:

How to Setup a Python Environment for Deep Learning with Anaconda

I recommend running the code on a system with a GPU. You can access GPUs cheaply on Amazon Web Services. Learn how in this tutorial:

How to Setup Amazon AWS EC2 GPUs to Train Keras Models (step-by-step)

Before we move on, let’s check your deep learning library version.

Run the following script and check your version numbers:

# tensorflow version
import tensorflow
print('tensorflow: %s' % tensorflow.__version__)
# keras version
import keras
print('keras: %s' % keras.__version__)

# tensorflow version

import tensorflow

print('tensorflow: %s' % tensorflow.__version__)

# keras version

import keras

print('keras: %s' % keras.__version__)

Running the script should show the same library version numbers or higher.

tensorflow: 2.4.0
keras: 2.4.3

1 2	tensorflow: 2.4.0 keras: 2.4.3

Let’s dive in.

Need help with Deep Learning for Text Data?

Take my free 7-day email crash course now (with code).

Click to sign-up and also get a free PDF Ebook version of the course.

Photo and Caption Dataset

A good dataset to use when getting started with image captioning is the Flickr8K dataset.

The reason is because it is realistic and relatively small so that you can download it and build models on your workstation using a CPU.

The definitive description of the dataset is in the paper “Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics” from 2013.

The authors describe the dataset as follows:

We introduce a new benchmark collection for sentence-based image description and search, consisting of 8,000 images that are each paired with five different captions which provide clear descriptions of the salient entities and events.

…

The images were chosen from six different Flickr groups, and tend not to contain any well-known people or locations, but were manually selected to depict a variety of scenes and situations.

— Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics, 2013.

The dataset is available for free. You must complete a request form and the links to the dataset will be emailed to you. I would love to link to them for you, but the email address expressly requests: “Please do not redistribute the dataset“.

You can use the link below to request the dataset (note, this may not work any more, see below):

Dataset Request Form

Within a short time, you will receive an email that contains links to two files:

Flickr8k_Dataset.zip (1 Gigabyte) An archive of all photographs.
Flickr8k_text.zip (2.2 Megabytes) An archive of all text descriptions for photographs.

UPDATE (Feb/2019): The official site seems to have been taken down (although the form still works). Here are some direct download links from my datasets GitHub repository:

Download the datasets and unzip them into your current working directory. You will have two directories:

Flickr8k_Dataset: Contains 8092 photographs in JPEG format.
Flickr8k_text: Contains a number of files containing different sources of descriptions for the photographs.

The dataset has a pre-defined training dataset (6,000 images), development dataset (1,000 images), and test dataset (1,000 images).

One measure that can be used to evaluate the skill of the model are BLEU scores. For reference, below are some ball-park BLEU scores for skillful models when evaluated on the test dataset (taken from the 2017 paper “Where to put the Image in an Image Caption Generator“):

BLEU-1: 0.401 to 0.578.
BLEU-2: 0.176 to 0.390.
BLEU-3: 0.099 to 0.260.
BLEU-4: 0.059 to 0.170.

We describe the BLEU metric more later when we work on evaluating our model.

Next, let’s look at how to load the images.

Prepare Photo Data

We will use a pre-trained model to interpret the content of the photos.

There are many models to choose from. In this case, we will use the Oxford Visual Geometry Group, or VGG, model that won the ImageNet competition in 2014. Learn more about the model here:

Very Deep Convolutional Networks for Large-Scale Visual Recognition

Keras provides this pre-trained model directly. Note, the first time you use this model, Keras will download the model weights from the Internet, which are about 500 Megabytes. This may take a few minutes depending on your internet connection.

We could use this model as part of a broader image caption model. The problem is, it is a large model and running each photo through the network every time we want to test a new language model configuration (downstream) is redundant.

Instead, we can pre-compute the “photo features” using the pre-trained model and save them to file. We can then load these features later and feed them into our model as the interpretation of a given photo in the dataset. It is no different to running the photo through the full VGG model; it is just we will have done it once in advance.

This is an optimization that will make training our models faster and consume less memory.

We can load the VGG model in Keras using the VGG class. We will remove the last layer from the loaded model, as this is the model used to predict a classification for a photo. We are not interested in classifying images, but we are interested in the internal representation of the photo right before a classification is made. These are the “features” that the model has extracted from the photo.

Keras also provides tools for reshaping the loaded photo into the preferred size for the model (e.g. 3 channel 224 x 224 pixel image).

Below is a function named extract_features() that, given a directory name, will load each photo, prepare it for VGG, and collect the predicted features from the VGG model. The image features are a 1-dimensional 4,096 element vector.

The function returns a dictionary of image identifier to image features.

# extract features from each photo in the directory
def extract_features(directory):
	# load the model
	model = VGG16()
	# re-structure the model
	model = Model(inputs=model.inputs, outputs=model.layers[-2].output)
	# summarize
	print(model.summary())
	# extract features from each photo
	features = dict()
	for name in listdir(directory):
		# load an image from file
		filename = directory + '/' + name
		image = load_img(filename, target_size=(224, 224))
		# convert the image pixels to a numpy array
		image = img_to_array(image)
		# reshape data for the model
		image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
		# prepare the image for the VGG model
		image = preprocess_input(image)
		# get features
		feature = model.predict(image, verbose=0)
		# get image id
		image_id = name.split('.')[0]
		# store feature
		features[image_id] = feature
		print('>%s' % name)
	return features

# extract features from each photo in the directory

def extract_features(directory):

# load the model

model = VGG16()

# re-structure the model

model = Model(inputs=model.inputs, outputs=model.layers[-2].output)

# summarize

print(model.summary())

# extract features from each photo

features = dict()

for name in listdir(directory):

# load an image from file

filename = directory + '/' + name

image = load_img(filename, target_size=(224, 224))

# convert the image pixels to a numpy array

image = img_to_array(image)

# reshape data for the model

image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))

# prepare the image for the VGG model

image = preprocess_input(image)

# get features

feature = model.predict(image, verbose=0)

# get image id

image_id = name.split('.')[0]

# store feature

features[image_id] = feature

print('>%s' % name)

return features

We can call this function to prepare the photo data for testing our models, then save the resulting dictionary to a file named ‘features.pkl‘.

The complete example is listed below.

from os import listdir
from pickle import dump
from keras.applications.vgg16 import VGG16
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.applications.vgg16 import preprocess_input
from keras.models import Model

# extract features from each photo in the directory
def extract_features(directory):
	# load the model
	model = VGG16()
	# re-structure the model
	model = Model(inputs=model.inputs, outputs=model.layers[-2].output)
	# summarize
	print(model.summary())
	# extract features from each photo
	features = dict()
	for name in listdir(directory):
		# load an image from file
		filename = directory + '/' + name
		image = load_img(filename, target_size=(224, 224))
		# convert the image pixels to a numpy array
		image = img_to_array(image)
		# reshape data for the model
		image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
		# prepare the image for the VGG model
		image = preprocess_input(image)
		# get features
		feature = model.predict(image, verbose=0)
		# get image id
		image_id = name.split('.')[0]
		# store feature
		features[image_id] = feature
		print('>%s' % name)
	return features

# extract features from all images
directory = 'Flickr8k_Dataset'
features = extract_features(directory)
print('Extracted Features: %d' % len(features))
# save to file
dump(features, open('features.pkl', 'wb'))

from os import listdir

from pickle import dump

from keras.applications.vgg16 import VGG16

from keras.preprocessing.image import load_img

from keras.preprocessing.image import img_to_array

from keras.applications.vgg16 import preprocess_input

from keras.models import Model

# extract features from each photo in the directory

def extract_features(directory):

# load the model

model = VGG16()

# re-structure the model

model = Model(inputs=model.inputs, outputs=model.layers[-2].output)

# summarize

print(model.summary())

# extract features from each photo

features = dict()

for name in listdir(directory):

# load an image from file

filename = directory + '/' + name

image = load_img(filename, target_size=(224, 224))

# convert the image pixels to a numpy array

image = img_to_array(image)

# reshape data for the model

image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))

# prepare the image for the VGG model

image = preprocess_input(image)

# get features

feature = model.predict(image, verbose=0)

# get image id

image_id = name.split('.')[0]

# store feature

features[image_id] = feature

print('>%s' % name)

return features

# extract features from all images

directory = 'Flickr8k_Dataset'

features = extract_features(directory)

print('Extracted Features: %d' % len(features))

# save to file

dump(features, open('features.pkl', 'wb'))

Running this data preparation step may take a while depending on your hardware, perhaps one hour on the CPU with a modern workstation.

At the end of the run, you will have the extracted features stored in ‘features.pkl‘ for later use. This file will be about 127 Megabytes in size.

Prepare Text Data

The dataset contains multiple descriptions for each photograph and the text of the descriptions requires some minimal cleaning.

If you are new to cleaning text data, see this post:

How to Clean Text for Machine Learning with Python

First, we will load the file containing all of the descriptions.

# load doc into memory
def load_doc(filename):
	# open the file as read only
	file = open(filename, 'r')
	# read all text
	text = file.read()
	# close the file
	file.close()
	return text

filename = 'Flickr8k_text/Flickr8k.token.txt'
# load descriptions
doc = load_doc(filename)

# load doc into memory

def load_doc(filename):

# open the file as read only

file = open(filename, 'r')

# read all text

text = file.read()

# close the file

file.close()

return text

filename = 'Flickr8k_text/Flickr8k.token.txt'

# load descriptions

doc = load_doc(filename)

Each photo has a unique identifier. This identifier is used on the photo filename and in the text file of descriptions.

Next, we will step through the list of photo descriptions. Below defines a function load_descriptions() that, given the loaded document text, will return a dictionary of photo identifiers to descriptions. Each photo identifier maps to a list of one or more textual descriptions.

# extract descriptions for images
def load_descriptions(doc):
	mapping = dict()
	# process lines
	for line in doc.split('\n'):
		# split line by white space
		tokens = line.split()
		if len(line) < 2:
			continue
		# take the first token as the image id, the rest as the description
		image_id, image_desc = tokens[0], tokens[1:]
		# remove filename from image id
		image_id = image_id.split('.')[0]
		# convert description tokens back to string
		image_desc = ' '.join(image_desc)
		# create the list if needed
		if image_id not in mapping:
			mapping[image_id] = list()
		# store description
		mapping[image_id].append(image_desc)
	return mapping

# parse descriptions
descriptions = load_descriptions(doc)
print('Loaded: %d ' % len(descriptions))

# extract descriptions for images

def load_descriptions(doc):

mapping = dict()

# process lines

for line in doc.split('\n'):

# split line by white space

tokens = line.split()

if len(line) < 2:

continue

# take the first token as the image id, the rest as the description

image_id, image_desc = tokens[0], tokens[1:]

# remove filename from image id

image_id = image_id.split('.')[0]

# convert description tokens back to string

image_desc = ' '.join(image_desc)

# create the list if needed

if image_id not in mapping:

mapping[image_id] = list()

# store description

mapping[image_id].append(image_desc)

return mapping

# parse descriptions

descriptions = load_descriptions(doc)

print('Loaded: %d ' % len(descriptions))

Next, we need to clean the description text. The descriptions are already tokenized and easy to work with.

We will clean the text in the following ways in order to reduce the size of the vocabulary of words we will need to work with:

Convert all words to lowercase.
Remove all punctuation.
Remove all words that are one character or less in length (e.g. ‘a’).
Remove all words with numbers in them.

Below defines the clean_descriptions() function that, given the dictionary of image identifiers to descriptions, steps through each description and cleans the text.

import string

def clean_descriptions(descriptions):
	# prepare translation table for removing punctuation
	table = str.maketrans('', '', string.punctuation)
	for key, desc_list in descriptions.items():
		for i in range(len(desc_list)):
			desc = desc_list[i]
			# tokenize
			desc = desc.split()
			# convert to lower case
			desc = [word.lower() for word in desc]
			# remove punctuation from each token
			desc = [w.translate(table) for w in desc]
			# remove hanging 's' and 'a'
			desc = [word for word in desc if len(word)>1]
			# remove tokens with numbers in them
			desc = [word for word in desc if word.isalpha()]
			# store as string
			desc_list[i] =  ' '.join(desc)

# clean descriptions
clean_descriptions(descriptions)

import string

def clean_descriptions(descriptions):

# prepare translation table for removing punctuation

table = str.maketrans('', '', string.punctuation)

for key, desc_list in descriptions.items():

for i in range(len(desc_list)):

desc = desc_list[i]

# tokenize

desc = desc.split()

# convert to lower case

desc = [word.lower() for word in desc]

# remove punctuation from each token

desc = [w.translate(table) for w in desc]

# remove hanging 's' and 'a'

desc = [word for word in desc if len(word)>1]

# remove tokens with numbers in them

desc = [word for word in desc if word.isalpha()]

# store as string

desc_list[i] = ' '.join(desc)

# clean descriptions

clean_descriptions(descriptions)

Once cleaned, we can summarize the size of the vocabulary.

Ideally, we want a vocabulary that is both expressive and as small as possible. A smaller vocabulary will result in a smaller model that will train faster.

For reference, we can transform the clean descriptions into a set and print its size to get an idea of the size of our dataset vocabulary.

# convert the loaded descriptions into a vocabulary of words
def to_vocabulary(descriptions):
	# build a list of all description strings
	all_desc = set()
	for key in descriptions.keys():
		[all_desc.update(d.split()) for d in descriptions[key]]
	return all_desc

# summarize vocabulary
vocabulary = to_vocabulary(descriptions)
print('Vocabulary Size: %d' % len(vocabulary))

# convert the loaded descriptions into a vocabulary of words

def to_vocabulary(descriptions):

# build a list of all description strings

all_desc = set()

for key in descriptions.keys():

[all_desc.update(d.split()) for d in descriptions[key]]

return all_desc

# summarize vocabulary

vocabulary = to_vocabulary(descriptions)

print('Vocabulary Size: %d' % len(vocabulary))

Finally, we can save the dictionary of image identifiers and descriptions to a new file named descriptions.txt, with one image identifier and description per line.

Below defines the save_descriptions() function that, given a dictionary containing the mapping of identifiers to descriptions and a filename, saves the mapping to file.

# save descriptions to file, one per line
def save_descriptions(descriptions, filename):
	lines = list()
	for key, desc_list in descriptions.items():
		for desc in desc_list:
			lines.append(key + ' ' + desc)
	data = '\n'.join(lines)
	file = open(filename, 'w')
	file.write(data)
	file.close()

# save descriptions
save_descriptions(descriptions, 'descriptions.txt')

# save descriptions to file, one per line

def save_descriptions(descriptions, filename):

lines = list()

for key, desc_list in descriptions.items():

for desc in desc_list:

lines.append(key + ' ' + desc)

data = '\n'.join(lines)

file = open(filename, 'w')

file.write(data)

file.close()

# save descriptions

save_descriptions(descriptions, 'descriptions.txt')

Putting this all together, the complete listing is provided below.

import string

# load doc into memory
def load_doc(filename):
	# open the file as read only
	file = open(filename, 'r')
	# read all text
	text = file.read()
	# close the file
	file.close()
	return text

# extract descriptions for images
def load_descriptions(doc):
	mapping = dict()
	# process lines
	for line in doc.split('\n'):
		# split line by white space
		tokens = line.split()
		if len(line) < 2:
			continue
		# take the first token as the image id, the rest as the description
		image_id, image_desc = tokens[0], tokens[1:]
		# remove filename from image id
		image_id = image_id.split('.')[0]
		# convert description tokens back to string
		image_desc = ' '.join(image_desc)
		# create the list if needed
		if image_id not in mapping:
			mapping[image_id] = list()
		# store description
		mapping[image_id].append(image_desc)
	return mapping

def clean_descriptions(descriptions):
	# prepare translation table for removing punctuation
	table = str.maketrans('', '', string.punctuation)
	for key, desc_list in descriptions.items():
		for i in range(len(desc_list)):
			desc = desc_list[i]
			# tokenize
			desc = desc.split()
			# convert to lower case
			desc = [word.lower() for word in desc]
			# remove punctuation from each token
			desc = [w.translate(table) for w in desc]
			# remove hanging 's' and 'a'
			desc = [word for word in desc if len(word)>1]
			# remove tokens with numbers in them
			desc = [word for word in desc if word.isalpha()]
			# store as string
			desc_list[i] =  ' '.join(desc)

# convert the loaded descriptions into a vocabulary of words
def to_vocabulary(descriptions):
	# build a list of all description strings
	all_desc = set()
	for key in descriptions.keys():
		[all_desc.update(d.split()) for d in descriptions[key]]
	return all_desc

# save descriptions to file, one per line
def save_descriptions(descriptions, filename):
	lines = list()
	for key, desc_list in descriptions.items():
		for desc in desc_list:
			lines.append(key + ' ' + desc)
	data = '\n'.join(lines)
	file = open(filename, 'w')
	file.write(data)
	file.close()

filename = 'Flickr8k_text/Flickr8k.token.txt'
# load descriptions
doc = load_doc(filename)
# parse descriptions
descriptions = load_descriptions(doc)
print('Loaded: %d ' % len(descriptions))
# clean descriptions
clean_descriptions(descriptions)
# summarize vocabulary
vocabulary = to_vocabulary(descriptions)
print('Vocabulary Size: %d' % len(vocabulary))
# save to file
save_descriptions(descriptions, 'descriptions.txt')

import string

# load doc into memory

def load_doc(filename):

# open the file as read only

file = open(filename, 'r')

# read all text

text = file.read()

# close the file

file.close()

return text

# extract descriptions for images

def load_descriptions(doc):

mapping = dict()

# process lines

for line in doc.split('\n'):

# split line by white space

tokens = line.split()

if len(line) < 2:

continue

# take the first token as the image id, the rest as the description

image_id, image_desc = tokens[0], tokens[1:]

# remove filename from image id

image_id = image_id.split('.')[0]

# convert description tokens back to string

image_desc = ' '.join(image_desc)

# create the list if needed

if image_id not in mapping:

mapping[image_id] = list()

# store description

mapping[image_id].append(image_desc)

return mapping

def clean_descriptions(descriptions):

# prepare translation table for removing punctuation

table = str.maketrans('', '', string.punctuation)

for key, desc_list in descriptions.items():

for i in range(len(desc_list)):

desc = desc_list[i]

# tokenize

desc = desc.split()

# convert to lower case

desc = [word.lower() for word in desc]

# remove punctuation from each token

desc = [w.translate(table) for w in desc]

# remove hanging 's' and 'a'

desc = [word for word in desc if len(word)>1]

# remove tokens with numbers in them

desc = [word for word in desc if word.isalpha()]

# store as string

desc_list[i] = ' '.join(desc)

# convert the loaded descriptions into a vocabulary of words

def to_vocabulary(descriptions):

# build a list of all description strings

all_desc = set()

for key in descriptions.keys():

[all_desc.update(d.split()) for d in descriptions[key]]

return all_desc

# save descriptions to file, one per line

def save_descriptions(descriptions, filename):

lines = list()

for key, desc_list in descriptions.items():

for desc in desc_list:

lines.append(key + ' ' + desc)

data = '\n'.join(lines)

file = open(filename, 'w')

file.write(data)

file.close()

filename = 'Flickr8k_text/Flickr8k.token.txt'

# load descriptions

doc = load_doc(filename)

# parse descriptions

descriptions = load_descriptions(doc)

print('Loaded: %d ' % len(descriptions))

# clean descriptions

clean_descriptions(descriptions)

# summarize vocabulary

vocabulary = to_vocabulary(descriptions)

print('Vocabulary Size: %d' % len(vocabulary))

# save to file

save_descriptions(descriptions, 'descriptions.txt')

Running the example first prints the number of loaded photo descriptions (8,092) and the size of the clean vocabulary (8,763 words).

Loaded: 8,092
Vocabulary Size: 8,763

1 2	Loaded: 8,092 Vocabulary Size: 8,763

Finally, the clean descriptions are written to ‘descriptions.txt‘.

Taking a look at the file, we can see that the descriptions are ready for modeling. The order of descriptions in your file may vary.

2252123185_487f21e336 bunch on people are seated in stadium
2252123185_487f21e336 crowded stadium is full of people watching an event
2252123185_487f21e336 crowd of people fill up packed stadium
2252123185_487f21e336 crowd sitting in an indoor stadium
2252123185_487f21e336 stadium full of people watch game
...

2252123185_487f21e336 bunch on people are seated in stadium

2252123185_487f21e336 crowded stadium is full of people watching an event

2252123185_487f21e336 crowd of people fill up packed stadium

2252123185_487f21e336 crowd sitting in an indoor stadium

2252123185_487f21e336 stadium full of people watch game

...

Develop Deep Learning Model

In this section, we will define the deep learning model and fit it on the training dataset.

This section is divided into the following parts:

Loading Data.
Defining the Model.
Fitting the Model.
Complete Example.

Loading Data

First, we must load the prepared photo and text data so that we can use it to fit the model.

We are going to train the data on all of the photos and captions in the training dataset. While training, we are going to monitor the performance of the model on the development dataset and use that performance to decide when to save models to file.

The train and development dataset have been predefined in the Flickr_8k.trainImages.txt and Flickr_8k.devImages.txt files respectively, that both contain lists of photo file names. From these file names, we can extract the photo identifiers and use these identifiers to filter photos and descriptions for each set.

The function load_set() below will load a pre-defined set of identifiers given the train or development sets filename.

# load doc into memory
def load_doc(filename):
	# open the file as read only
	file = open(filename, 'r')
	# read all text
	text = file.read()
	# close the file
	file.close()
	return text

# load a pre-defined list of photo identifiers
def load_set(filename):
	doc = load_doc(filename)
	dataset = list()
	# process line by line
	for line in doc.split('\n'):
		# skip empty lines
		if len(line) < 1:
			continue
		# get the image identifier
		identifier = line.split('.')[0]
		dataset.append(identifier)
	return set(dataset)

# load doc into memory

def load_doc(filename):

# open the file as read only

file = open(filename, 'r')

# read all text

text = file.read()

# close the file

file.close()

return text

# load a pre-defined list of photo identifiers

def load_set(filename):

doc = load_doc(filename)

dataset = list()

# process line by line

for line in doc.split('\n'):

# skip empty lines

if len(line) < 1:

continue

# get the image identifier

identifier = line.split('.')[0]

dataset.append(identifier)

return set(dataset)

Now, we can load the photos and descriptions using the pre-defined set of train or development identifiers.

Below is the function load_clean_descriptions() that loads the cleaned text descriptions from ‘descriptions.txt‘ for a given set of identifiers and returns a dictionary of identifiers to lists of text descriptions.

The model we will develop will generate a caption given a photo, and the caption will be generated one word at a time. The sequence of previously generated words will be provided as input. Therefore, we will need a ‘first word’ to kick-off the generation process and a ‘last word‘ to signal the end of the caption.

We will use the strings ‘startseq‘ and ‘endseq‘ for this purpose. These tokens are added to the loaded descriptions as they are loaded. It is important to do this now before we encode the text so that the tokens are also encoded correctly.

# load clean descriptions into memory
def load_clean_descriptions(filename, dataset):
	# load document
	doc = load_doc(filename)
	descriptions = dict()
	for line in doc.split('\n'):
		# split line by white space
		tokens = line.split()
		# split id from description
		image_id, image_desc = tokens[0], tokens[1:]
		# skip images not in the set
		if image_id in dataset:
			# create list
			if image_id not in descriptions:
				descriptions[image_id] = list()
			# wrap description in tokens
			desc = 'startseq ' + ' '.join(image_desc) + ' endseq'
			# store
			descriptions[image_id].append(desc)
	return descriptions

# load clean descriptions into memory

def load_clean_descriptions(filename, dataset):

# load document

doc = load_doc(filename)

descriptions = dict()

for line in doc.split('\n'):

# split line by white space

tokens = line.split()

# split id from description

image_id, image_desc = tokens[0], tokens[1:]

# skip images not in the set

if image_id in dataset:

# create list

if image_id not in descriptions:

descriptions[image_id] = list()

# wrap description in tokens

desc = 'startseq ' + ' '.join(image_desc) + ' endseq'

# store

descriptions[image_id].append(desc)

return descriptions

Next, we can load the photo features for a given dataset.

Below defines a function named load_photo_features() that loads the entire set of photo descriptions, then returns the subset of interest for a given set of photo identifiers.

This is not very efficient; nevertheless, this will get us up and running quickly.

# load photo features
def load_photo_features(filename, dataset):
	# load all features
	all_features = load(open(filename, 'rb'))
	# filter features
	features = {k: all_features[k] for k in dataset}
	return features

# load photo features

def load_photo_features(filename, dataset):

# load all features

all_features = load(open(filename, 'rb'))

# filter features

features = {k: all_features[k] for k in dataset}

return features

We can pause here and test everything developed so far.

The complete code example is listed below.

from pickle import load

# load doc into memory
def load_doc(filename):
	# open the file as read only
	file = open(filename, 'r')
	# read all text
	text = file.read()
	# close the file
	file.close()
	return text

# load a pre-defined list of photo identifiers
def load_set(filename):
	doc = load_doc(filename)
	dataset = list()
	# process line by line
	for line in doc.split('\n'):
		# skip empty lines
		if len(line) < 1:
			continue
		# get the image identifier
		identifier = line.split('.')[0]
		dataset.append(identifier)
	return set(dataset)

# load clean descriptions into memory
def load_clean_descriptions(filename, dataset):
	# load document
	doc = load_doc(filename)
	descriptions = dict()
	for line in doc.split('\n'):
		# split line by white space
		tokens = line.split()
		# split id from description
		image_id, image_desc = tokens[0], tokens[1:]
		# skip images not in the set
		if image_id in dataset:
			# create list
			if image_id not in descriptions:
				descriptions[image_id] = list()
			# wrap description in tokens
			desc = 'startseq ' + ' '.join(image_desc) + ' endseq'
			# store
			descriptions[image_id].append(desc)
	return descriptions

# load photo features
def load_photo_features(filename, dataset):
	# load all features
	all_features = load(open(filename, 'rb'))
	# filter features
	features = {k: all_features[k] for k in dataset}
	return features

# load training dataset (6K)
filename = 'Flickr8k_text/Flickr_8k.trainImages.txt'
train = load_set(filename)
print('Dataset: %d' % len(train))
# descriptions
train_descriptions = load_clean_descriptions('descriptions.txt', train)
print('Descriptions: train=%d' % len(train_descriptions))
# photo features
train_features = load_photo_features('features.pkl', train)
print('Photos: train=%d' % len(train_features))

from pickle import load

# load doc into memory

def load_doc(filename):

# open the file as read only

file = open(filename, 'r')

# read all text

text = file.read()

# close the file

file.close()

return text

# load a pre-defined list of photo identifiers

def load_set(filename):

doc = load_doc(filename)

dataset = list()

# process line by line

for line in doc.split('\n'):

# skip empty lines

if len(line) < 1:

continue

# get the image identifier

identifier = line.split('.')[0]

dataset.append(identifier)

return set(dataset)

# load clean descriptions into memory

def load_clean_descriptions(filename, dataset):

# load document

doc = load_doc(filename)

descriptions = dict()

for line in doc.split('\n'):

# split line by white space

tokens = line.split()

# split id from description

image_id, image_desc = tokens[0], tokens[1:]

# skip images not in the set

if image_id in dataset:

# create list

if image_id not in descriptions:

descriptions[image_id] = list()

# wrap description in tokens

desc = 'startseq ' + ' '.join(image_desc) + ' endseq'

# store

descriptions[image_id].append(desc)

return descriptions

# load photo features

def load_photo_features(filename, dataset):

# load all features

all_features = load(open(filename, 'rb'))

# filter features

features = {k: all_features[k] for k in dataset}

return features

# load training dataset (6K)

filename = 'Flickr8k_text/Flickr_8k.trainImages.txt'

train = load_set(filename)

print('Dataset: %d' % len(train))

# descriptions

train_descriptions = load_clean_descriptions('descriptions.txt', train)

print('Descriptions: train=%d' % len(train_descriptions))

# photo features

train_features = load_photo_features('features.pkl', train)

print('Photos: train=%d' % len(train_features))

Running this example first loads the 6,000 photo identifiers in the training dataset. These features are then used to filter and load the cleaned description text and the pre-computed photo features.

We are nearly there.

Dataset: 6,000
Descriptions: train=6,000
Photos: train=6,000

Dataset: 6,000

Descriptions: train=6,000

Photos: train=6,000

The description text will need to be encoded to numbers before it can be presented to the model as in input or compared to the model’s predictions.

The first step in encoding the data is to create a consistent mapping from words to unique integer values. Keras provides the Tokenizer class that can learn this mapping from the loaded description data.

Below defines the to_lines() to convert the dictionary of descriptions into a list of strings and the create_tokenizer() function that will fit a Tokenizer given the loaded photo description text.

# convert a dictionary of clean descriptions to a list of descriptions
def to_lines(descriptions):
	all_desc = list()
	for key in descriptions.keys():
		[all_desc.append(d) for d in descriptions[key]]
	return all_desc

# fit a tokenizer given caption descriptions
def create_tokenizer(descriptions):
	lines = to_lines(descriptions)
	tokenizer = Tokenizer()
	tokenizer.fit_on_texts(lines)
	return tokenizer

# prepare tokenizer
tokenizer = create_tokenizer(train_descriptions)
vocab_size = len(tokenizer.word_index) + 1
print('Vocabulary Size: %d' % vocab_size)

# convert a dictionary of clean descriptions to a list of descriptions

def to_lines(descriptions):

all_desc = list()

for key in descriptions.keys():

[all_desc.append(d) for d in descriptions[key]]

return all_desc

# fit a tokenizer given caption descriptions

def create_tokenizer(descriptions):

lines = to_lines(descriptions)

tokenizer = Tokenizer()

tokenizer.fit_on_texts(lines)

return tokenizer

# prepare tokenizer

tokenizer = create_tokenizer(train_descriptions)

vocab_size = len(tokenizer.word_index) + 1

print('Vocabulary Size: %d' % vocab_size)

We can now encode the text.

Each description will be split into words. The model will be provided one word and the photo and generate the next word. Then the first two words of the description will be provided to the model as input with the image to generate the next word. This is how the model will be trained.

For example, the input sequence “little girl running in field” would be split into 6 input-output pairs to train the model:

X1,		X2 (text sequence), 						y (word)
photo	startseq, 									little
photo	startseq, little,							girl
photo	startseq, little, girl, 					running
photo	startseq, little, girl, running, 			in
photo	startseq, little, girl, running, in, 		field
photo	startseq, little, girl, running, in, field, endseq

X1, X2 (text sequence), y (word)

photo startseq, little

photo startseq, little, girl

photo startseq, little, girl, running

photo startseq, little, girl, running, in

photo startseq, little, girl, running, in, field

photo startseq, little, girl, running, in, field, endseq

Later, when the model is used to generate descriptions, the generated words will be concatenated and recursively provided as input to generate a caption for an image.

The function below named create_sequences(), given the tokenizer, a maximum sequence length, and the dictionary of all descriptions and photos, will transform the data into input-output pairs of data for training the model. There are two input arrays to the model: one for photo features and one for the encoded text. There is one output for the model which is the encoded next word in the text sequence.

The input text is encoded as integers, which will be fed to a word embedding layer. The photo features will be fed directly to another part of the model. The model will output a prediction, which will be a probability distribution over all words in the vocabulary.

The output data will therefore be a one-hot encoded version of each word, representing an idealized probability distribution with 0 values at all word positions except the actual word position, which has a value of 1.

# create sequences of images, input sequences and output words for an image
def create_sequences(tokenizer, max_length, descriptions, photos, vocab_size):
	X1, X2, y = list(), list(), list()
	# walk through each image identifier
	for key, desc_list in descriptions.items():
		# walk through each description for the image
		for desc in desc_list:
			# encode the sequence
			seq = tokenizer.texts_to_sequences([desc])[0]
			# split one sequence into multiple X,y pairs
			for i in range(1, len(seq)):
				# split into input and output pair
				in_seq, out_seq = seq[:i], seq[i]
				# pad input sequence
				in_seq = pad_sequences([in_seq], maxlen=max_length)[0]
				# encode output sequence
				out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]
				# store
				X1.append(photos[key][0])
				X2.append(in_seq)
				y.append(out_seq)
	return array(X1), array(X2), array(y)

# create sequences of images, input sequences and output words for an image

def create_sequences(tokenizer, max_length, descriptions, photos, vocab_size):

X1, X2, y = list(), list(), list()

# walk through each image identifier

for key, desc_list in descriptions.items():

# walk through each description for the image

for desc in desc_list:

# encode the sequence

seq = tokenizer.texts_to_sequences([desc])[0]

# split one sequence into multiple X,y pairs

for i in range(1, len(seq)):

# split into input and output pair

in_seq, out_seq = seq[:i], seq[i]

# pad input sequence

in_seq = pad_sequences([in_seq], maxlen=max_length)[0]

# encode output sequence

out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]

# store

X1.append(photos[key][0])

X2.append(in_seq)

y.append(out_seq)

return array(X1), array(X2), array(y)

We will need to calculate the maximum number of words in the longest description. A short helper function named max_length() is defined below.

# calculate the length of the description with the most words
def max_length(descriptions):
	lines = to_lines(descriptions)
	return max(len(d.split()) for d in lines)

# calculate the length of the description with the most words

def max_length(descriptions):

lines = to_lines(descriptions)

return max(len(d.split()) for d in lines)

We now have enough to load the data for the training and development datasets and transform the loaded data into input-output pairs for fitting a deep learning model.

Defining the Model

We will define a deep learning based on the “merge-model” described by Marc Tanti, et al. in their 2017 papers:

For a gentle introduction to this architecture, see the post:

Caption Generation with the Inject and Merge Architectures for the Encoder-Decoder Model

The authors provide a nice schematic of the model, reproduced below.

Schematic of the Merge Model For Image Captioning

We will describe the model in three parts:

Photo Feature Extractor. This is a 16-layer VGG model pre-trained on the ImageNet dataset. We have pre-processed the photos with the VGG model (without the output layer) and will use the extracted features predicted by this model as input.
Sequence Processor. This is a word embedding layer for handling the text input, followed by a Long Short-Term Memory (LSTM) recurrent neural network layer.
Decoder (for lack of a better name). Both the feature extractor and sequence processor output a fixed-length vector. These are merged together and processed by a Dense layer to make a final prediction.

The Photo Feature Extractor model expects input photo features to be a vector of 4,096 elements. These are processed by a Dense layer to produce a 256 element representation of the photo.

The Sequence Processor model expects input sequences with a pre-defined length (34 words) which are fed into an Embedding layer that uses a mask to ignore padded values. This is followed by an LSTM layer with 256 memory units.

Both the input models produce a 256 element vector. Further, both input models use regularization in the form of 50% dropout. This is to reduce overfitting the training dataset, as this model configuration learns very fast.

The Decoder model merges the vectors from both input models using an addition operation. This is then fed to a Dense 256 neuron layer and then to a final output Dense layer that makes a softmax prediction over the entire output vocabulary for the next word in the sequence.

The function below named define_model() defines and returns the model ready to be fit.

# define the captioning model
def define_model(vocab_size, max_length):
	# feature extractor model
	inputs1 = Input(shape=(4096,))
	fe1 = Dropout(0.5)(inputs1)
	fe2 = Dense(256, activation='relu')(fe1)
	# sequence model
	inputs2 = Input(shape=(max_length,))
	se1 = Embedding(vocab_size, 256, mask_zero=True)(inputs2)
	se2 = Dropout(0.5)(se1)
	se3 = LSTM(256)(se2)
	# decoder model
	decoder1 = add([fe2, se3])
	decoder2 = Dense(256, activation='relu')(decoder1)
	outputs = Dense(vocab_size, activation='softmax')(decoder2)
	# tie it together [image, seq] [word]
	model = Model(inputs=[inputs1, inputs2], outputs=outputs)
	model.compile(loss='categorical_crossentropy', optimizer='adam')
	# summarize model
	print(model.summary())
	plot_model(model, to_file='model.png', show_shapes=True)
	return model

# define the captioning model

def define_model(vocab_size, max_length):

# feature extractor model

inputs1 = Input(shape=(4096,))

fe1 = Dropout(0.5)(inputs1)

fe2 = Dense(256, activation='relu')(fe1)

# sequence model

inputs2 = Input(shape=(max_length,))

se1 = Embedding(vocab_size, 256, mask_zero=True)(inputs2)

se2 = Dropout(0.5)(se1)

se3 = LSTM(256)(se2)

# decoder model

decoder1 = add([fe2, se3])

decoder2 = Dense(256, activation='relu')(decoder1)

outputs = Dense(vocab_size, activation='softmax')(decoder2)

# tie it together [image, seq] [word]

model = Model(inputs=[inputs1, inputs2], outputs=outputs)

model.compile(loss='categorical_crossentropy', optimizer='adam')

# summarize model

print(model.summary())

plot_model(model, to_file='model.png', show_shapes=True)

return model

To get a sense for the structure of the model, specifically the shapes of the layers, see the summary listed below.

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to
====================================================================================================
input_2 (InputLayer)             (None, 34)            0
____________________________________________________________________________________________________
input_1 (InputLayer)             (None, 4096)          0
____________________________________________________________________________________________________
embedding_1 (Embedding)          (None, 34, 256)       1940224     input_2[0][0]
____________________________________________________________________________________________________
dropout_1 (Dropout)              (None, 4096)          0           input_1[0][0]
____________________________________________________________________________________________________
dropout_2 (Dropout)              (None, 34, 256)       0           embedding_1[0][0]
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 256)           1048832     dropout_1[0][0]
____________________________________________________________________________________________________
lstm_1 (LSTM)                    (None, 256)           525312      dropout_2[0][0]
____________________________________________________________________________________________________
add_1 (Add)                      (None, 256)           0           dense_1[0][0]
                                                                   lstm_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 256)           65792       add_1[0][0]
____________________________________________________________________________________________________
dense_3 (Dense)                  (None, 7579)          1947803     dense_2[0][0]
====================================================================================================
Total params: 5,527,963
Trainable params: 5,527,963
Non-trainable params: 0
____________________________________________________________________________________________________

____________________________________________________________________________________________________

Layer (type) Output Shape Param # Connected to

====================================================================================================

input_2 (InputLayer) (None, 34) 0

____________________________________________________________________________________________________

input_1 (InputLayer) (None, 4096) 0

____________________________________________________________________________________________________

embedding_1 (Embedding) (None, 34, 256) 1940224 input_2[0][0]

____________________________________________________________________________________________________

dropout_1 (Dropout) (None, 4096) 0 input_1[0][0]

____________________________________________________________________________________________________

dropout_2 (Dropout) (None, 34, 256) 0 embedding_1[0][0]

____________________________________________________________________________________________________

dense_1 (Dense) (None, 256) 1048832 dropout_1[0][0]

____________________________________________________________________________________________________

lstm_1 (LSTM) (None, 256) 525312 dropout_2[0][0]

____________________________________________________________________________________________________

add_1 (Add) (None, 256) 0 dense_1[0][0]

lstm_1[0][0]

____________________________________________________________________________________________________

dense_2 (Dense) (None, 256) 65792 add_1[0][0]

____________________________________________________________________________________________________

dense_3 (Dense) (None, 7579) 1947803 dense_2[0][0]

====================================================================================================

Total params: 5,527,963

Trainable params: 5,527,963

Non-trainable params: 0

____________________________________________________________________________________________________

We also create a plot to visualize the structure of the network that better helps understand the two streams of input.

Plot of the Caption Generation Deep Learning Model

Fitting the Model

Now that we know how to define the model, we can fit it on the training dataset.

The model learns fast and quickly overfits the training dataset. For this reason, we will monitor the skill of the trained model on the holdout development dataset. When the skill of the model on the development dataset improves at the end of an epoch, we will save the whole model to file.

At the end of the run, we can then use the saved model with the best skill on the training dataset as our final model.

We can do this by defining a ModelCheckpoint in Keras and specifying it to monitor the minimum loss on the validation dataset and save the model to a file that has both the training and validation loss in the filename.

# define checkpoint callback
filepath = 'model-ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5'
checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='min')

# define checkpoint callback

filepath = 'model-ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5'

checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='min')

We can then specify the checkpoint in the call to fit() via the callbacks argument. We must also specify the development dataset in fit() via the validation_data argument.

We will only fit the model for 20 epochs, but given the amount of training data, each epoch may take 30 minutes on modern hardware.

# fit model
model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

1 2	# fit model model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

Complete Example

The complete example for fitting the model on the training data is listed below.

from numpy import array
from pickle import load
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.utils import to_categorical
from keras.utils import plot_model
from keras.models import Model
from keras.layers import Input
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Embedding
from keras.layers import Dropout
from keras.layers.merge import add
from keras.callbacks import ModelCheckpoint

# load doc into memory
def load_doc(filename):
	# open the file as read only
	file = open(filename, 'r')
	# read all text
	text = file.read()
	# close the file
	file.close()
	return text

# load a pre-defined list of photo identifiers
def load_set(filename):
	doc = load_doc(filename)
	dataset = list()
	# process line by line
	for line in doc.split('\n'):
		# skip empty lines
		if len(line) < 1:
			continue
		# get the image identifier
		identifier = line.split('.')[0]
		dataset.append(identifier)
	return set(dataset)

# load clean descriptions into memory
def load_clean_descriptions(filename, dataset):
	# load document
	doc = load_doc(filename)
	descriptions = dict()
	for line in doc.split('\n'):
		# split line by white space
		tokens = line.split()
		# split id from description
		image_id, image_desc = tokens[0], tokens[1:]
		# skip images not in the set
		if image_id in dataset:
			# create list
			if image_id not in descriptions:
				descriptions[image_id] = list()
			# wrap description in tokens
			desc = 'startseq ' + ' '.join(image_desc) + ' endseq'
			# store
			descriptions[image_id].append(desc)
	return descriptions

# load photo features
def load_photo_features(filename, dataset):
	# load all features
	all_features = load(open(filename, 'rb'))
	# filter features
	features = {k: all_features[k] for k in dataset}
	return features

# covert a dictionary of clean descriptions to a list of descriptions
def to_lines(descriptions):
	all_desc = list()
	for key in descriptions.keys():
		[all_desc.append(d) for d in descriptions[key]]
	return all_desc

# fit a tokenizer given caption descriptions
def create_tokenizer(descriptions):
	lines = to_lines(descriptions)
	tokenizer = Tokenizer()
	tokenizer.fit_on_texts(lines)
	return tokenizer

# calculate the length of the description with the most words
def max_length(descriptions):
	lines = to_lines(descriptions)
	return max(len(d.split()) for d in lines)

# create sequences of images, input sequences and output words for an image
def create_sequences(tokenizer, max_length, descriptions, photos, vocab_size):
	X1, X2, y = list(), list(), list()
	# walk through each image identifier
	for key, desc_list in descriptions.items():
		# walk through each description for the image
		for desc in desc_list:
			# encode the sequence
			seq = tokenizer.texts_to_sequences([desc])[0]
			# split one sequence into multiple X,y pairs
			for i in range(1, len(seq)):
				# split into input and output pair
				in_seq, out_seq = seq[:i], seq[i]
				# pad input sequence
				in_seq = pad_sequences([in_seq], maxlen=max_length)[0]
				# encode output sequence
				out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]
				# store
				X1.append(photos[key][0])
				X2.append(in_seq)
				y.append(out_seq)
	return array(X1), array(X2), array(y)

# define the captioning model
def define_model(vocab_size, max_length):
	# feature extractor model
	inputs1 = Input(shape=(4096,))
	fe1 = Dropout(0.5)(inputs1)
	fe2 = Dense(256, activation='relu')(fe1)
	# sequence model
	inputs2 = Input(shape=(max_length,))
	se1 = Embedding(vocab_size, 256, mask_zero=True)(inputs2)
	se2 = Dropout(0.5)(se1)
	se3 = LSTM(256)(se2)
	# decoder model
	decoder1 = add([fe2, se3])
	decoder2 = Dense(256, activation='relu')(decoder1)
	outputs = Dense(vocab_size, activation='softmax')(decoder2)
	# tie it together [image, seq] [word]
	model = Model(inputs=[inputs1, inputs2], outputs=outputs)
	model.compile(loss='categorical_crossentropy', optimizer='adam')
	# summarize model
	print(model.summary())
	plot_model(model, to_file='model.png', show_shapes=True)
	return model

# train dataset

# load training dataset (6K)
filename = 'Flickr8k_text/Flickr_8k.trainImages.txt'
train = load_set(filename)
print('Dataset: %d' % len(train))
# descriptions
train_descriptions = load_clean_descriptions('descriptions.txt', train)
print('Descriptions: train=%d' % len(train_descriptions))
# photo features
train_features = load_photo_features('features.pkl', train)
print('Photos: train=%d' % len(train_features))
# prepare tokenizer
tokenizer = create_tokenizer(train_descriptions)
vocab_size = len(tokenizer.word_index) + 1
print('Vocabulary Size: %d' % vocab_size)
# determine the maximum sequence length
max_length = max_length(train_descriptions)
print('Description Length: %d' % max_length)
# prepare sequences
X1train, X2train, ytrain = create_sequences(tokenizer, max_length, train_descriptions, train_features, vocab_size)

# dev dataset

# load test set
filename = 'Flickr8k_text/Flickr_8k.devImages.txt'
test = load_set(filename)
print('Dataset: %d' % len(test))
# descriptions
test_descriptions = load_clean_descriptions('descriptions.txt', test)
print('Descriptions: test=%d' % len(test_descriptions))
# photo features
test_features = load_photo_features('features.pkl', test)
print('Photos: test=%d' % len(test_features))
# prepare sequences
X1test, X2test, ytest = create_sequences(tokenizer, max_length, test_descriptions, test_features, vocab_size)

# fit model

# define the model
model = define_model(vocab_size, max_length)
# define checkpoint callback
filepath = 'model-ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5'
checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='min')
# fit model
model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

from numpy import array

from pickle import load

from keras.preprocessing.text import Tokenizer

from keras.preprocessing.sequence import pad_sequences

from keras.utils import to_categorical

from keras.utils import plot_model

from keras.models import Model

from keras.layers import Input

from keras.layers import Dense

from keras.layers import LSTM

from keras.layers import Embedding

from keras.layers import Dropout

from keras.layers.merge import add

from keras.callbacks import ModelCheckpoint

# load doc into memory

def load_doc(filename):

# open the file as read only

file = open(filename, 'r')

# read all text

text = file.read()

# close the file

file.close()

return text

# load a pre-defined list of photo identifiers

def load_set(filename):

doc = load_doc(filename)

dataset = list()

# process line by line

for line in doc.split('\n'):

# skip empty lines

if len(line) < 1:

continue

# get the image identifier

identifier = line.split('.')[0]

dataset.append(identifier)

return set(dataset)

# load clean descriptions into memory

def load_clean_descriptions(filename, dataset):

# load document

doc = load_doc(filename)

descriptions = dict()

for line in doc.split('\n'):

# split line by white space

tokens = line.split()

# split id from description

image_id, image_desc = tokens[0], tokens[1:]

# skip images not in the set

if image_id in dataset:

# create list

if image_id not in descriptions:

descriptions[image_id] = list()

# wrap description in tokens

desc = 'startseq ' + ' '.join(image_desc) + ' endseq'

# store

descriptions[image_id].append(desc)

return descriptions

# load photo features

def load_photo_features(filename, dataset):

# load all features

all_features = load(open(filename, 'rb'))

# filter features

features = {k: all_features[k] for k in dataset}

return features

# covert a dictionary of clean descriptions to a list of descriptions

def to_lines(descriptions):

all_desc = list()

for key in descriptions.keys():

[all_desc.append(d) for d in descriptions[key]]

return all_desc

# fit a tokenizer given caption descriptions

def create_tokenizer(descriptions):

lines = to_lines(descriptions)

tokenizer = Tokenizer()

tokenizer.fit_on_texts(lines)

return tokenizer

# calculate the length of the description with the most words

def max_length(descriptions):

lines = to_lines(descriptions)

return max(len(d.split()) for d in lines)

# create sequences of images, input sequences and output words for an image

def create_sequences(tokenizer, max_length, descriptions, photos, vocab_size):

X1, X2, y = list(), list(), list()

# walk through each image identifier

for key, desc_list in descriptions.items():

# walk through each description for the image

for desc in desc_list:

# encode the sequence

seq = tokenizer.texts_to_sequences([desc])[0]

# split one sequence into multiple X,y pairs

for i in range(1, len(seq)):

# split into input and output pair

in_seq, out_seq = seq[:i], seq[i]

# pad input sequence

in_seq = pad_sequences([in_seq], maxlen=max_length)[0]

# encode output sequence

out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]

# store

X1.append(photos[key][0])

X2.append(in_seq)

y.append(out_seq)

return array(X1), array(X2), array(y)

# define the captioning model

def define_model(vocab_size, max_length):

# feature extractor model

inputs1 = Input(shape=(4096,))

fe1 = Dropout(0.5)(inputs1)

fe2 = Dense(256, activation='relu')(fe1)

# sequence model

inputs2 = Input(shape=(max_length,))

se1 = Embedding(vocab_size, 256, mask_zero=True)(inputs2)

se2 = Dropout(0.5)(se1)

se3 = LSTM(256)(se2)

# decoder model

decoder1 = add([fe2, se3])

decoder2 = Dense(256, activation='relu')(decoder1)

outputs = Dense(vocab_size, activation='softmax')(decoder2)

# tie it together [image, seq] [word]

model = Model(inputs=[inputs1, inputs2], outputs=outputs)

model.compile(loss='categorical_crossentropy', optimizer='adam')

# summarize model

print(model.summary())

plot_model(model, to_file='model.png', show_shapes=True)

return model

# train dataset

# load training dataset (6K)

filename = 'Flickr8k_text/Flickr_8k.trainImages.txt'

train = load_set(filename)

print('Dataset: %d' % len(train))

# descriptions

train_descriptions = load_clean_descriptions('descriptions.txt', train)

print('Descriptions: train=%d' % len(train_descriptions))

# photo features

train_features = load_photo_features('features.pkl', train)

print('Photos: train=%d' % len(train_features))

# prepare tokenizer

tokenizer = create_tokenizer(train_descriptions)

vocab_size = len(tokenizer.word_index) + 1

print('Vocabulary Size: %d' % vocab_size)

# determine the maximum sequence length

max_length = max_length(train_descriptions)

print('Description Length: %d' % max_length)

# prepare sequences

X1train, X2train, ytrain = create_sequences(tokenizer, max_length, train_descriptions, train_features, vocab_size)

# dev dataset

# load test set

filename = 'Flickr8k_text/Flickr_8k.devImages.txt'

test = load_set(filename)

print('Dataset: %d' % len(test))

# descriptions

test_descriptions = load_clean_descriptions('descriptions.txt', test)

print('Descriptions: test=%d' % len(test_descriptions))

# photo features

test_features = load_photo_features('features.pkl', test)

print('Photos: test=%d' % len(test_features))

# prepare sequences

X1test, X2test, ytest = create_sequences(tokenizer, max_length, test_descriptions, test_features, vocab_size)

# fit model

# define the model

model = define_model(vocab_size, max_length)

# define checkpoint callback

filepath = 'model-ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5'

checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='min')

# fit model

model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

Running the example first prints a summary of the loaded training and development datasets.

Dataset: 6,000
Descriptions: train=6,000
Photos: train=6,000
Vocabulary Size: 7,579
Description Length: 34
Dataset: 1,000
Descriptions: test=1,000
Photos: test=1,000

Dataset: 6,000

Descriptions: train=6,000

Photos: train=6,000

Vocabulary Size: 7,579

Description Length: 34

Dataset: 1,000

Descriptions: test=1,000

Photos: test=1,000

After the summary of the model, we can get an idea of the total number of training and validation (development) input-output pairs.

Train on 306,404 samples, validate on 50,903 samples

1	Train on 306,404 samples, validate on 50,903 samples

The model then runs, saving the best model to .h5 files along the way.

On my run, the best validation results were saved to the file:

model-ep002-loss3.245-val_loss3.612.h5

This model was saved at the end of epoch 2 with a loss of 3.245 on the training dataset and a loss of 3.612 on the development dataset

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Let me know what you get in the comments below.

If you ran the example on AWS, copy the model file back to your current working directory. If you need help with commands on AWS, see the post:

10 Command Line Recipes for Deep Learning on Amazon Web Services

Did you get an error like:

Memory Error

1	Memory Error

If so, see the next section.

Train With Progressive Loading

Note: If you had no problems in the previous section, please skip this section. This section is for those who do not have enough memory to train the model as described in the previous section (e.g. cannot use AWS EC2 for whatever reason).

The training of the caption model does assume you have a lot of RAM.

The code in the previous section is not memory efficient and assumes you are running on a large EC2 instance with 32GB or 64GB of RAM. If you are running the code on a workstation of 8GB of RAM, you cannot train the model.

A workaround is to use progressive loading. This was discussed in detail in the second-last section titled “Progressive Loading” in the post:

How to Prepare a Photo Caption Dataset for Training a Deep Learning Model

I recommend reading that section before continuing.

If you want to use progressive loading, to train this model, this section will show you how.

The first step is we must define a function that we can use as the data generator.

We will keep things very simple and have the data generator yield one photo’s worth of data per batch. This will be all of the sequences generated for a photo and its set of descriptions.

The function below data_generator() will be the data generator and will take the loaded textual descriptions, photo features, tokenizer and max length. Here, I assume that you can fit this training data in memory, which I believe 8GB of RAM should be more than capable.

How does this work? Read the post I just mentioned above that introduces data generators.

# data generator, intended to be used in a call to model.fit_generator()
def data_generator(descriptions, photos, tokenizer, max_length, vocab_size):
	# loop for ever over images
	while 1:
		for key, desc_list in descriptions.items():
			# retrieve the photo feature
			photo = photos[key][0]
			in_img, in_seq, out_word = create_sequences(tokenizer, max_length, desc_list, photo, vocab_size)
			yield [in_img, in_seq], out_word

# data generator, intended to be used in a call to model.fit_generator()

def data_generator(descriptions, photos, tokenizer, max_length, vocab_size):

# loop for ever over images

while 1:

for key, desc_list in descriptions.items():

# retrieve the photo feature

photo = photos[key][0]

in_img, in_seq, out_word = create_sequences(tokenizer, max_length, desc_list, photo, vocab_size)

yield [in_img, in_seq], out_word

You can see that we are calling the create_sequence() function to create a batch worth of data for a single photo rather than an entire dataset. This means that we must update the create_sequences() function to delete the “iterate over all descriptions” for-loop.

The updated function is as follows:

# create sequences of images, input sequences and output words for an image
def create_sequences(tokenizer, max_length, desc_list, photo, vocab_size):
	X1, X2, y = list(), list(), list()
	# walk through each description for the image
	for desc in desc_list:
		# encode the sequence
		seq = tokenizer.texts_to_sequences([desc])[0]
		# split one sequence into multiple X,y pairs
		for i in range(1, len(seq)):
			# split into input and output pair
			in_seq, out_seq = seq[:i], seq[i]
			# pad input sequence
			in_seq = pad_sequences([in_seq], maxlen=max_length)[0]
			# encode output sequence
			out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]
			# store
			X1.append(photo)
			X2.append(in_seq)
			y.append(out_seq)
	return array(X1), array(X2), array(y)

# create sequences of images, input sequences and output words for an image

def create_sequences(tokenizer, max_length, desc_list, photo, vocab_size):

X1, X2, y = list(), list(), list()

# walk through each description for the image

for desc in desc_list:

# encode the sequence

seq = tokenizer.texts_to_sequences([desc])[0]

# split one sequence into multiple X,y pairs

for i in range(1, len(seq)):

# split into input and output pair

in_seq, out_seq = seq[:i], seq[i]

# pad input sequence

in_seq = pad_sequences([in_seq], maxlen=max_length)[0]

# encode output sequence

out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]

# store

X1.append(photo)

X2.append(in_seq)

y.append(out_seq)

return array(X1), array(X2), array(y)

We now have pretty much everything we need.

Note, this is a very basic data generator. The big memory saving it offers is to not have the unrolled sequences of train and test data in memory prior to fitting the model, that these samples (e.g. results from create_sequences()) are created as needed per photo.

Some off-the-cuff ideas for further improving this data generator include:

Randomize the order of photos each epoch.
Work with a list of photo ids and load text and photo data as needed to cut even further back on memory.
Yield more than one photo’s worth of samples per batch.

I have experienced with these variations myself in the past. Let me know if you do and how you go in the comments.

You can sanity check a data generator by calling it directly, as follows:

# test the data generator
generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
inputs, outputs = next(generator)
print(inputs[0].shape)
print(inputs[1].shape)
print(outputs.shape)

# test the data generator

generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)

inputs, outputs = next(generator)

print(inputs[0].shape)

print(inputs[1].shape)

print(outputs.shape)

Running this sanity check will show what one batch worth of sequences looks like, in this case 47 samples to train on for the first photo.

(47, 4096)
(47, 34)
(47, 7579)

(47, 4096)

(47, 34)

(47, 7579)

Finally, we can use the fit_generator() function on the model to train the model with this data generator.

In this simple example we will discard the loading of the development dataset and model checkpointing and simply save the model after each training epoch. You can then go back and load/evaluate each saved model after training to find the one we the lowest loss that you can then use in the next section.

The code to train the model with the data generator is as follows:

# train the model, run epochs manually and save after each epoch
epochs = 20
steps = len(train_descriptions)
for i in range(epochs):
	# create the data generator
	generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
	# fit for one epoch
	model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
	# save model
	model.save('model_' + str(i) + '.h5')

# train the model, run epochs manually and save after each epoch

epochs = 20

steps = len(train_descriptions)

for i in range(epochs):

# create the data generator

generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)

# fit for one epoch

model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)

# save model

model.save('model_' + str(i) + '.h5')

That’s it. You can now train the model using progressive loading and save a ton of RAM. This may also be a lot slower.

The complete updated example with progressive loading (use of the data generator) for training the caption generation model is listed below.

from numpy import array
from pickle import load
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.utils import to_categorical
from keras.utils import plot_model
from keras.models import Model
from keras.layers import Input
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Embedding
from keras.layers import Dropout
from keras.layers.merge import add
from keras.callbacks import ModelCheckpoint

# load doc into memory
def load_doc(filename):
	# open the file as read only
	file = open(filename, 'r')
	# read all text
	text = file.read()
	# close the file
	file.close()
	return text

# load a pre-defined list of photo identifiers
def load_set(filename):
	doc = load_doc(filename)
	dataset = list()
	# process line by line
	for line in doc.split('\n'):
		# skip empty lines
		if len(line) < 1:
			continue
		# get the image identifier
		identifier = line.split('.')[0]
		dataset.append(identifier)
	return set(dataset)

# load clean descriptions into memory
def load_clean_descriptions(filename, dataset):
	# load document
	doc = load_doc(filename)
	descriptions = dict()
	for line in doc.split('\n'):
		# split line by white space
		tokens = line.split()
		# split id from description
		image_id, image_desc = tokens[0], tokens[1:]
		# skip images not in the set
		if image_id in dataset:
			# create list
			if image_id not in descriptions:
				descriptions[image_id] = list()
			# wrap description in tokens
			desc = 'startseq ' + ' '.join(image_desc) + ' endseq'
			# store
			descriptions[image_id].append(desc)
	return descriptions

# load photo features
def load_photo_features(filename, dataset):
	# load all features
	all_features = load(open(filename, 'rb'))
	# filter features
	features = {k: all_features[k] for k in dataset}
	return features

# covert a dictionary of clean descriptions to a list of descriptions
def to_lines(descriptions):
	all_desc = list()
	for key in descriptions.keys():
		[all_desc.append(d) for d in descriptions[key]]
	return all_desc

# fit a tokenizer given caption descriptions
def create_tokenizer(descriptions):
	lines = to_lines(descriptions)
	tokenizer = Tokenizer()
	tokenizer.fit_on_texts(lines)
	return tokenizer

# calculate the length of the description with the most words
def max_length(descriptions):
	lines = to_lines(descriptions)
	return max(len(d.split()) for d in lines)

# create sequences of images, input sequences and output words for an image
def create_sequences(tokenizer, max_length, desc_list, photo, vocab_size):
	X1, X2, y = list(), list(), list()
	# walk through each description for the image
	for desc in desc_list:
		# encode the sequence
		seq = tokenizer.texts_to_sequences([desc])[0]
		# split one sequence into multiple X,y pairs
		for i in range(1, len(seq)):
			# split into input and output pair
			in_seq, out_seq = seq[:i], seq[i]
			# pad input sequence
			in_seq = pad_sequences([in_seq], maxlen=max_length)[0]
			# encode output sequence
			out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]
			# store
			X1.append(photo)
			X2.append(in_seq)
			y.append(out_seq)
	return array(X1), array(X2), array(y)

# define the captioning model
def define_model(vocab_size, max_length):
	# feature extractor model
	inputs1 = Input(shape=(4096,))
	fe1 = Dropout(0.5)(inputs1)
	fe2 = Dense(256, activation='relu')(fe1)
	# sequence model
	inputs2 = Input(shape=(max_length,))
	se1 = Embedding(vocab_size, 256, mask_zero=True)(inputs2)
	se2 = Dropout(0.5)(se1)
	se3 = LSTM(256)(se2)
	# decoder model
	decoder1 = add([fe2, se3])
	decoder2 = Dense(256, activation='relu')(decoder1)
	outputs = Dense(vocab_size, activation='softmax')(decoder2)
	# tie it together [image, seq] [word]
	model = Model(inputs=[inputs1, inputs2], outputs=outputs)
	# compile model
	model.compile(loss='categorical_crossentropy', optimizer='adam')
	# summarize model
	model.summary()
	plot_model(model, to_file='model.png', show_shapes=True)
	return model

# data generator, intended to be used in a call to model.fit_generator()
def data_generator(descriptions, photos, tokenizer, max_length, vocab_size):
	# loop for ever over images
	while 1:
		for key, desc_list in descriptions.items():
			# retrieve the photo feature
			photo = photos[key][0]
			in_img, in_seq, out_word = create_sequences(tokenizer, max_length, desc_list, photo, vocab_size)
			yield [in_img, in_seq], out_word

# load training dataset (6K)
filename = 'Flickr8k_text/Flickr_8k.trainImages.txt'
train = load_set(filename)
print('Dataset: %d' % len(train))
# descriptions
train_descriptions = load_clean_descriptions('descriptions.txt', train)
print('Descriptions: train=%d' % len(train_descriptions))
# photo features
train_features = load_photo_features('features.pkl', train)
print('Photos: train=%d' % len(train_features))
# prepare tokenizer
tokenizer = create_tokenizer(train_descriptions)
vocab_size = len(tokenizer.word_index) + 1
print('Vocabulary Size: %d' % vocab_size)
# determine the maximum sequence length
max_length = max_length(train_descriptions)
print('Description Length: %d' % max_length)

# define the model
model = define_model(vocab_size, max_length)
# train the model, run epochs manually and save after each epoch
epochs = 20
steps = len(train_descriptions)
for i in range(epochs):
	# create the data generator
	generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
	# fit for one epoch
	model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
	# save model
	model.save('model_' + str(i) + '.h5')

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

from numpy import array

from pickle import load

from keras.preprocessing.text import Tokenizer

from keras.preprocessing.sequence import pad_sequences

from keras.utils import to_categorical

from keras.utils import plot_model

from keras.models import Model

from keras.layers import Input

from keras.layers import Dense

from keras.layers import LSTM

from keras.layers import Embedding

from keras.layers import Dropout

from keras.layers.merge import add

from keras.callbacks import ModelCheckpoint

# load doc into memory

def load_doc(filename):

# open the file as read only

file = open(filename, 'r')

# read all text

text = file.read()

# close the file

file.close()

return text

# load a pre-defined list of photo identifiers

def load_set(filename):

doc = load_doc(filename)

dataset = list()

# process line by line

for line in doc.split('\n'):

# skip empty lines

if len(line) < 1:

continue

# get the image identifier

identifier = line.split('.')[0]

dataset.append(identifier)

return set(dataset)

# load clean descriptions into memory

def load_clean_descriptions(filename, dataset):

# load document

doc = load_doc(filename)

descriptions = dict()

for line in doc.split('\n'):

# split line by white space

tokens = line.split()

# split id from description

image_id, image_desc = tokens[0], tokens[1:]

# skip images not in the set

if image_id in dataset:

# create list

if image_id not in descriptions:

descriptions[image_id] = list()

# wrap description in tokens

desc = 'startseq ' + ' '.join(image_desc) + ' endseq'

# store

descriptions[image_id].append(desc)

return descriptions

# load photo features

def load_photo_features(filename, dataset):

# load all features

all_features = load(open(filename, 'rb'))

# filter features

features = {k: all_features[k] for k in dataset}

return features

# covert a dictionary of clean descriptions to a list of descriptions

def to_lines(descriptions):

all_desc = list()

for key in descriptions.keys():

[all_desc.append(d) for d in descriptions[key]]

return all_desc

# fit a tokenizer given caption descriptions

def create_tokenizer(descriptions):

lines = to_lines(descriptions)

tokenizer = Tokenizer()

tokenizer.fit_on_texts(lines)

return tokenizer

# calculate the length of the description with the most words

def max_length(descriptions):

lines = to_lines(descriptions)

return max(len(d.split()) for d in lines)

# create sequences of images, input sequences and output words for an image

def create_sequences(tokenizer, max_length, desc_list, photo, vocab_size):

X1, X2, y = list(), list(), list()

# walk through each description for the image

for desc in desc_list:

# encode the sequence

seq = tokenizer.texts_to_sequences([desc])[0]

# split one sequence into multiple X,y pairs

for i in range(1, len(seq)):

# split into input and output pair

in_seq, out_seq = seq[:i], seq[i]

# pad input sequence

in_seq = pad_sequences([in_seq], maxlen=max_length)[0]

# encode output sequence

out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]

# store

X1.append(photo)

X2.append(in_seq)

y.append(out_seq)

return array(X1), array(X2), array(y)

# define the captioning model

def define_model(vocab_size, max_length):

# feature extractor model

inputs1 = Input(shape=(4096,))

fe1 = Dropout(0.5)(inputs1)

fe2 = Dense(256, activation='relu')(fe1)

# sequence model

inputs2 = Input(shape=(max_length,))

se1 = Embedding(vocab_size, 256, mask_zero=True)(inputs2)

se2 = Dropout(0.5)(se1)

se3 = LSTM(256)(se2)

# decoder model

decoder1 = add([fe2, se3])

decoder2 = Dense(256, activation='relu')(decoder1)

outputs = Dense(vocab_size, activation='softmax')(decoder2)

# tie it together [image, seq] [word]

model = Model(inputs=[inputs1, inputs2], outputs=outputs)

# compile model

model.compile(loss='categorical_crossentropy', optimizer='adam')

# summarize model

model.summary()

plot_model(model, to_file='model.png', show_shapes=True)

return model

# data generator, intended to be used in a call to model.fit_generator()

def data_generator(descriptions, photos, tokenizer, max_length, vocab_size):

# loop for ever over images

while 1:

for key, desc_list in descriptions.items():

# retrieve the photo feature

photo = photos[key][0]

in_img, in_seq, out_word = create_sequences(tokenizer, max_length, desc_list, photo, vocab_size)

yield [in_img, in_seq], out_word

# load training dataset (6K)

filename = 'Flickr8k_text/Flickr_8k.trainImages.txt'

train = load_set(filename)

print('Dataset: %d' % len(train))

# descriptions

train_descriptions = load_clean_descriptions('descriptions.txt', train)

print('Descriptions: train=%d' % len(train_descriptions))

# photo features

train_features = load_photo_features('features.pkl', train)

print('Photos: train=%d' % len(train_features))

# prepare tokenizer

tokenizer = create_tokenizer(train_descriptions)

vocab_size = len(tokenizer.word_index) + 1

print('Vocabulary Size: %d' % vocab_size)

# determine the maximum sequence length

max_length = max_length(train_descriptions)

print('Description Length: %d' % max_length)

# define the model

model = define_model(vocab_size, max_length)

# train the model, run epochs manually and save after each epoch

epochs = 20

steps = len(train_descriptions)

for i in range(epochs):

# create the data generator

generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)

# fit for one epoch

model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)

# save model

model.save('model_' + str(i) + '.h5')

Perhaps evaluate each saved model and choose the one final model with the lowest loss on a holdout dataset. The next section may help with this.

Did you use this new addition to the tutorial?
How did you go?

Evaluate Model

Once the model is fit, we can evaluate the skill of its predictions on the holdout test dataset.

We will evaluate a model by generating descriptions for all photos in the test dataset and evaluating those predictions with a standard cost function.

First, we need to be able to generate a description for a photo using a trained model.

This involves passing in the start description token ‘startseq‘, generating one word, then calling the model recursively with generated words as input until the end of sequence token is reached ‘endseq‘ or the maximum description length is reached.

The function below named generate_desc() implements this behavior and generates a textual description given a trained model, and a given prepared photo as input. It calls the function word_for_id() in order to map an integer prediction back to a word.

# map an integer to a word
def word_for_id(integer, tokenizer):
	for word, index in tokenizer.word_index.items():
		if index == integer:
			return word
	return None

# generate a description for an image
def generate_desc(model, tokenizer, photo, max_length):
	# seed the generation process
	in_text = 'startseq'
	# iterate over the whole length of the sequence
	for i in range(max_length):
		# integer encode input sequence
		sequence = tokenizer.texts_to_sequences([in_text])[0]
		# pad input
		sequence = pad_sequences([sequence], maxlen=max_length)
		# predict next word
		yhat = model.predict([photo,sequence], verbose=0)
		# convert probability to integer
		yhat = argmax(yhat)
		# map integer to word
		word = word_for_id(yhat, tokenizer)
		# stop if we cannot map the word
		if word is None:
			break
		# append as input for generating the next word
		in_text += ' ' + word
		# stop if we predict the end of the sequence
		if word == 'endseq':
			break
	return in_text

# map an integer to a word

def word_for_id(integer, tokenizer):

for word, index in tokenizer.word_index.items():

if index == integer:

return word

return None

# generate a description for an image

def generate_desc(model, tokenizer, photo, max_length):

# seed the generation process

in_text = 'startseq'

# iterate over the whole length of the sequence

for i in range(max_length):

# integer encode input sequence

sequence = tokenizer.texts_to_sequences([in_text])[0]

# pad input

sequence = pad_sequences([sequence], maxlen=max_length)

# predict next word

yhat = model.predict([photo,sequence], verbose=0)

# convert probability to integer

yhat = argmax(yhat)

# map integer to word

word = word_for_id(yhat, tokenizer)

# stop if we cannot map the word

if word is None:

break

# append as input for generating the next word

in_text += ' ' + word

# stop if we predict the end of the sequence

if word == 'endseq':

break

return in_text

We will generate predictions for all photos in the test dataset and in the train dataset.

The function below named evaluate_model() will evaluate a trained model against a given dataset of photo descriptions and photo features. The actual and predicted descriptions are collected and evaluated collectively using the corpus BLEU score that summarizes how close the generated text is to the expected text.

# evaluate the skill of the model
def evaluate_model(model, descriptions, photos, tokenizer, max_length):
	actual, predicted = list(), list()
	# step over the whole set
	for key, desc_list in descriptions.items():
		# generate description
		yhat = generate_desc(model, tokenizer, photos[key], max_length)
		# store actual and predicted
		references = [d.split() for d in desc_list]
		actual.append(references)
		predicted.append(yhat.split())
	# calculate BLEU score
	print('BLEU-1: %f' % corpus_bleu(actual, predicted, weights=(1.0, 0, 0, 0)))
	print('BLEU-2: %f' % corpus_bleu(actual, predicted, weights=(0.5, 0.5, 0, 0)))
	print('BLEU-3: %f' % corpus_bleu(actual, predicted, weights=(0.3, 0.3, 0.3, 0)))
	print('BLEU-4: %f' % corpus_bleu(actual, predicted, weights=(0.25, 0.25, 0.25, 0.25)))

# evaluate the skill of the model

def evaluate_model(model, descriptions, photos, tokenizer, max_length):

actual, predicted = list(), list()

# step over the whole set

for key, desc_list in descriptions.items():

# generate description

yhat = generate_desc(model, tokenizer, photos[key], max_length)

# store actual and predicted

references = [d.split() for d in desc_list]

actual.append(references)

predicted.append(yhat.split())

# calculate BLEU score

print('BLEU-1: %f' % corpus_bleu(actual, predicted, weights=(1.0, 0, 0, 0)))

print('BLEU-2: %f' % corpus_bleu(actual, predicted, weights=(0.5, 0.5, 0, 0)))

print('BLEU-3: %f' % corpus_bleu(actual, predicted, weights=(0.3, 0.3, 0.3, 0)))

print('BLEU-4: %f' % corpus_bleu(actual, predicted, weights=(0.25, 0.25, 0.25, 0.25)))

BLEU scores are used in text translation for evaluating translated text against one or more reference translations.

Here, we compare each generated description against all of the reference descriptions for the photograph. We then calculate BLEU scores for 1, 2, 3 and 4 cumulative n-grams.

You can learn more about the BLEU score here:

A Gentle Introduction to Calculating the BLEU Score for Text in Python

The NLTK Python library implements the BLEU score calculation in the corpus_bleu() function. A higher score close to 1.0 is better, a score closer to zero is worse.

We can put all of this together with the functions from the previous section for loading the data. We first need to load the training dataset in order to prepare a Tokenizer so that we can encode generated words as input sequences for the model. It is critical that we encode the generated words using exactly the same encoding scheme as was used when training the model.

We then use these functions for loading the test dataset.

The complete example is listed below.

from numpy import argmax
from pickle import load
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import load_model
from nltk.translate.bleu_score import corpus_bleu

# load doc into memory
def load_doc(filename):
	# open the file as read only
	file = open(filename, 'r')
	# read all text
	text = file.read()
	# close the file
	file.close()
	return text

# load a pre-defined list of photo identifiers
def load_set(filename):
	doc = load_doc(filename)
	dataset = list()
	# process line by line
	for line in doc.split('\n'):
		# skip empty lines
		if len(line) < 1:
			continue
		# get the image identifier
		identifier = line.split('.')[0]
		dataset.append(identifier)
	return set(dataset)

# load clean descriptions into memory
def load_clean_descriptions(filename, dataset):
	# load document
	doc = load_doc(filename)
	descriptions = dict()
	for line in doc.split('\n'):
		# split line by white space
		tokens = line.split()
		# split id from description
		image_id, image_desc = tokens[0], tokens[1:]
		# skip images not in the set
		if image_id in dataset:
			# create list
			if image_id not in descriptions:
				descriptions[image_id] = list()
			# wrap description in tokens
			desc = 'startseq ' + ' '.join(image_desc) + ' endseq'
			# store
			descriptions[image_id].append(desc)
	return descriptions

# load photo features
def load_photo_features(filename, dataset):
	# load all features
	all_features = load(open(filename, 'rb'))
	# filter features
	features = {k: all_features[k] for k in dataset}
	return features

# covert a dictionary of clean descriptions to a list of descriptions
def to_lines(descriptions):
	all_desc = list()
	for key in descriptions.keys():
		[all_desc.append(d) for d in descriptions[key]]
	return all_desc

# fit a tokenizer given caption descriptions
def create_tokenizer(descriptions):
	lines = to_lines(descriptions)
	tokenizer = Tokenizer()
	tokenizer.fit_on_texts(lines)
	return tokenizer

# calculate the length of the description with the most words
def max_length(descriptions):
	lines = to_lines(descriptions)
	return max(len(d.split()) for d in lines)

# map an integer to a word
def word_for_id(integer, tokenizer):
	for word, index in tokenizer.word_index.items():
		if index == integer:
			return word
	return None

# generate a description for an image
def generate_desc(model, tokenizer, photo, max_length):
	# seed the generation process
	in_text = 'startseq'
	# iterate over the whole length of the sequence
	for i in range(max_length):
		# integer encode input sequence
		sequence = tokenizer.texts_to_sequences([in_text])[0]
		# pad input
		sequence = pad_sequences([sequence], maxlen=max_length)
		# predict next word
		yhat = model.predict([photo,sequence], verbose=0)
		# convert probability to integer
		yhat = argmax(yhat)
		# map integer to word
		word = word_for_id(yhat, tokenizer)
		# stop if we cannot map the word
		if word is None:
			break
		# append as input for generating the next word
		in_text += ' ' + word
		# stop if we predict the end of the sequence
		if word == 'endseq':
			break
	return in_text

# evaluate the skill of the model
def evaluate_model(model, descriptions, photos, tokenizer, max_length):
	actual, predicted = list(), list()
	# step over the whole set
	for key, desc_list in descriptions.items():
		# generate description
		yhat = generate_desc(model, tokenizer, photos[key], max_length)
		# store actual and predicted
		references = [d.split() for d in desc_list]
		actual.append(references)
		predicted.append(yhat.split())
	# calculate BLEU score
	print('BLEU-1: %f' % corpus_bleu(actual, predicted, weights=(1.0, 0, 0, 0)))
	print('BLEU-2: %f' % corpus_bleu(actual, predicted, weights=(0.5, 0.5, 0, 0)))
	print('BLEU-3: %f' % corpus_bleu(actual, predicted, weights=(0.3, 0.3, 0.3, 0)))
	print('BLEU-4: %f' % corpus_bleu(actual, predicted, weights=(0.25, 0.25, 0.25, 0.25)))

# prepare tokenizer on train set

# load training dataset (6K)
filename = 'Flickr8k_text/Flickr_8k.trainImages.txt'
train = load_set(filename)
print('Dataset: %d' % len(train))
# descriptions
train_descriptions = load_clean_descriptions('descriptions.txt', train)
print('Descriptions: train=%d' % len(train_descriptions))
# prepare tokenizer
tokenizer = create_tokenizer(train_descriptions)
vocab_size = len(tokenizer.word_index) + 1
print('Vocabulary Size: %d' % vocab_size)
# determine the maximum sequence length
max_length = max_length(train_descriptions)
print('Description Length: %d' % max_length)

# prepare test set

# load test set
filename = 'Flickr8k_text/Flickr_8k.testImages.txt'
test = load_set(filename)
print('Dataset: %d' % len(test))
# descriptions
test_descriptions = load_clean_descriptions('descriptions.txt', test)
print('Descriptions: test=%d' % len(test_descriptions))
# photo features
test_features = load_photo_features('features.pkl', test)
print('Photos: test=%d' % len(test_features))

# load the model
filename = 'model-ep002-loss3.245-val_loss3.612.h5'
model = load_model(filename)
# evaluate model
evaluate_model(model, test_descriptions, test_features, tokenizer, max_length)

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

from numpy import argmax

from pickle import load

from keras.preprocessing.text import Tokenizer

from keras.preprocessing.sequence import pad_sequences

from keras.models import load_model

from nltk.translate.bleu_score import corpus_bleu

# load doc into memory

def load_doc(filename):

# open the file as read only

file = open(filename, 'r')

# read all text

text = file.read()

# close the file

file.close()

return text

# load a pre-defined list of photo identifiers

def load_set(filename):

doc = load_doc(filename)

dataset = list()

# process line by line

for line in doc.split('\n'):

# skip empty lines

if len(line) < 1:

continue

# get the image identifier

identifier = line.split('.')[0]

dataset.append(identifier)

return set(dataset)

# load clean descriptions into memory

def load_clean_descriptions(filename, dataset):

# load document

doc = load_doc(filename)

descriptions = dict()

for line in doc.split('\n'):

# split line by white space

tokens = line.split()

# split id from description

image_id, image_desc = tokens[0], tokens[1:]

# skip images not in the set

if image_id in dataset:

# create list

if image_id not in descriptions:

descriptions[image_id] = list()

# wrap description in tokens

desc = 'startseq ' + ' '.join(image_desc) + ' endseq'

# store

descriptions[image_id].append(desc)

return descriptions

# load photo features

def load_photo_features(filename, dataset):

# load all features

all_features = load(open(filename, 'rb'))

# filter features

features = {k: all_features[k] for k in dataset}

return features

# covert a dictionary of clean descriptions to a list of descriptions

def to_lines(descriptions):

all_desc = list()

for key in descriptions.keys():

[all_desc.append(d) for d in descriptions[key]]

return all_desc

# fit a tokenizer given caption descriptions

def create_tokenizer(descriptions):

lines = to_lines(descriptions)

tokenizer = Tokenizer()

tokenizer.fit_on_texts(lines)

return tokenizer

# calculate the length of the description with the most words

def max_length(descriptions):

lines = to_lines(descriptions)

return max(len(d.split()) for d in lines)

# map an integer to a word

def word_for_id(integer, tokenizer):

for word, index in tokenizer.word_index.items():

if index == integer:

return word

return None

# generate a description for an image

def generate_desc(model, tokenizer, photo, max_length):

# seed the generation process

in_text = 'startseq'

# iterate over the whole length of the sequence

for i in range(max_length):

# integer encode input sequence

sequence = tokenizer.texts_to_sequences([in_text])[0]

# pad input

sequence = pad_sequences([sequence], maxlen=max_length)

# predict next word

yhat = model.predict([photo,sequence], verbose=0)

# convert probability to integer

yhat = argmax(yhat)

# map integer to word

word = word_for_id(yhat, tokenizer)

# stop if we cannot map the word

if word is None:

break

# append as input for generating the next word

in_text += ' ' + word

# stop if we predict the end of the sequence

if word == 'endseq':

break

return in_text

# evaluate the skill of the model

def evaluate_model(model, descriptions, photos, tokenizer, max_length):

actual, predicted = list(), list()

# step over the whole set

for key, desc_list in descriptions.items():

# generate description

yhat = generate_desc(model, tokenizer, photos[key], max_length)

# store actual and predicted

references = [d.split() for d in desc_list]

actual.append(references)

predicted.append(yhat.split())

# calculate BLEU score

print('BLEU-1: %f' % corpus_bleu(actual, predicted, weights=(1.0, 0, 0, 0)))

print('BLEU-2: %f' % corpus_bleu(actual, predicted, weights=(0.5, 0.5, 0, 0)))

print('BLEU-3: %f' % corpus_bleu(actual, predicted, weights=(0.3, 0.3, 0.3, 0)))

print('BLEU-4: %f' % corpus_bleu(actual, predicted, weights=(0.25, 0.25, 0.25, 0.25)))

# prepare tokenizer on train set

# load training dataset (6K)

filename = 'Flickr8k_text/Flickr_8k.trainImages.txt'

train = load_set(filename)

print('Dataset: %d' % len(train))

# descriptions

train_descriptions = load_clean_descriptions('descriptions.txt', train)

print('Descriptions: train=%d' % len(train_descriptions))

# prepare tokenizer

tokenizer = create_tokenizer(train_descriptions)

vocab_size = len(tokenizer.word_index) + 1

print('Vocabulary Size: %d' % vocab_size)

# determine the maximum sequence length

max_length = max_length(train_descriptions)

print('Description Length: %d' % max_length)

# prepare test set

# load test set

filename = 'Flickr8k_text/Flickr_8k.testImages.txt'

test = load_set(filename)

print('Dataset: %d' % len(test))

# descriptions

test_descriptions = load_clean_descriptions('descriptions.txt', test)

print('Descriptions: test=%d' % len(test_descriptions))

# photo features

test_features = load_photo_features('features.pkl', test)

print('Photos: test=%d' % len(test_features))

# load the model

filename = 'model-ep002-loss3.245-val_loss3.612.h5'

model = load_model(filename)

# evaluate model

evaluate_model(model, test_descriptions, test_features, tokenizer, max_length)

Running the example prints the BLEU scores.

We can see that the scores fit within and close to the top of the expected range of a skillful model on the problem. The chosen model configuration is by no means optimized.

BLEU-1: 0.579114
BLEU-2: 0.344856
BLEU-3: 0.252154
BLEU-4: 0.131446

BLEU-1: 0.579114

BLEU-2: 0.344856

BLEU-3: 0.252154

BLEU-4: 0.131446

Generate New Captions

Now that we know how to develop and evaluate a caption generation model, how can we use it?

Almost everything we need to generate captions for entirely new photographs is in the model file.

We also need the Tokenizer for encoding generated words for the model while generating a sequence, and the maximum length of input sequences, used when we defined the model (e.g. 34).

We can hard code the maximum sequence length. With the encoding of text, we can create the tokenizer and save it to a file so that we can load it quickly whenever we need it without needing the entire Flickr8K dataset. An alternative would be to use our own vocabulary file and mapping to integers function during training.

We can create the Tokenizer as before and save it as a pickle file tokenizer.pkl. The complete example is listed below.

from keras.preprocessing.text import Tokenizer
from pickle import dump

# load doc into memory
def load_doc(filename):
	# open the file as read only
	file = open(filename, 'r')
	# read all text
	text = file.read()
	# close the file
	file.close()
	return text

# load a pre-defined list of photo identifiers
def load_set(filename):
	doc = load_doc(filename)
	dataset = list()
	# process line by line
	for line in doc.split('\n'):
		# skip empty lines
		if len(line) < 1:
			continue
		# get the image identifier
		identifier = line.split('.')[0]
		dataset.append(identifier)
	return set(dataset)

# load clean descriptions into memory
def load_clean_descriptions(filename, dataset):
	# load document
	doc = load_doc(filename)
	descriptions = dict()
	for line in doc.split('\n'):
		# split line by white space
		tokens = line.split()
		# split id from description
		image_id, image_desc = tokens[0], tokens[1:]
		# skip images not in the set
		if image_id in dataset:
			# create list
			if image_id not in descriptions:
				descriptions[image_id] = list()
			# wrap description in tokens
			desc = 'startseq ' + ' '.join(image_desc) + ' endseq'
			# store
			descriptions[image_id].append(desc)
	return descriptions

# covert a dictionary of clean descriptions to a list of descriptions
def to_lines(descriptions):
	all_desc = list()
	for key in descriptions.keys():
		[all_desc.append(d) for d in descriptions[key]]
	return all_desc

# fit a tokenizer given caption descriptions
def create_tokenizer(descriptions):
	lines = to_lines(descriptions)
	tokenizer = Tokenizer()
	tokenizer.fit_on_texts(lines)
	return tokenizer

# load training dataset (6K)
filename = 'Flickr8k_text/Flickr_8k.trainImages.txt'
train = load_set(filename)
print('Dataset: %d' % len(train))
# descriptions
train_descriptions = load_clean_descriptions('descriptions.txt', train)
print('Descriptions: train=%d' % len(train_descriptions))
# prepare tokenizer
tokenizer = create_tokenizer(train_descriptions)
# save the tokenizer
dump(tokenizer, open('tokenizer.pkl', 'wb'))

from keras.preprocessing.text import Tokenizer

from pickle import dump

# load doc into memory

def load_doc(filename):

# open the file as read only

file = open(filename, 'r')

# read all text

text = file.read()

# close the file

file.close()

return text

# load a pre-defined list of photo identifiers

def load_set(filename):

doc = load_doc(filename)

dataset = list()

# process line by line

for line in doc.split('\n'):

# skip empty lines

if len(line) < 1:

continue

# get the image identifier

identifier = line.split('.')[0]

dataset.append(identifier)

return set(dataset)

# load clean descriptions into memory

def load_clean_descriptions(filename, dataset):

# load document

doc = load_doc(filename)

descriptions = dict()

for line in doc.split('\n'):

# split line by white space

tokens = line.split()

# split id from description

image_id, image_desc = tokens[0], tokens[1:]

# skip images not in the set

if image_id in dataset:

# create list

if image_id not in descriptions:

descriptions[image_id] = list()

# wrap description in tokens

desc = 'startseq ' + ' '.join(image_desc) + ' endseq'

# store

descriptions[image_id].append(desc)

return descriptions

# covert a dictionary of clean descriptions to a list of descriptions

def to_lines(descriptions):

all_desc = list()

for key in descriptions.keys():

[all_desc.append(d) for d in descriptions[key]]

return all_desc

# fit a tokenizer given caption descriptions

def create_tokenizer(descriptions):

lines = to_lines(descriptions)

tokenizer = Tokenizer()

tokenizer.fit_on_texts(lines)

return tokenizer

# load training dataset (6K)

filename = 'Flickr8k_text/Flickr_8k.trainImages.txt'

train = load_set(filename)

print('Dataset: %d' % len(train))

# descriptions

train_descriptions = load_clean_descriptions('descriptions.txt', train)

print('Descriptions: train=%d' % len(train_descriptions))

# prepare tokenizer

tokenizer = create_tokenizer(train_descriptions)

# save the tokenizer

dump(tokenizer, open('tokenizer.pkl', 'wb'))

We can now load the tokenizer whenever we need it without having to load the entire training dataset of annotations.

Now, let’s generate a description for a new photograph.

Below is a new photograph that I chose randomly on Flickr (available under a permissive license).

Photo of a dog at the beach.
Photo by bambe1964, some rights reserved.

We will generate a description for it using our model.

Download the photograph and save it to your local directory with the filename “example.jpg“.

First, we must load the Tokenizer from tokenizer.pkl and define the maximum length of the sequence to generate, needed for padding inputs.

# load the tokenizer
tokenizer = load(open('tokenizer.pkl', 'rb'))
# pre-define the max sequence length (from training)
max_length = 34

# load the tokenizer

tokenizer = load(open('tokenizer.pkl', 'rb'))

# pre-define the max sequence length (from training)

max_length = 34

Then we must load the model, as before.

# load the model
model = load_model('model-ep002-loss3.245-val_loss3.612.h5')

1 2	# load the model model = load_model('model-ep002-loss3.245-val_loss3.612.h5')

Next, we must load the photo we which to describe and extract the features.

We could do this by re-defining the model and adding the VGG-16 model to it, or we can use the VGG model to predict the features and use them as inputs to our existing model. We will do the latter and use a modified version of the extract_features() function used during data preparation, but adapted to work on a single photo.

# extract features from each photo in the directory
def extract_features(filename):
	# load the model
	model = VGG16()
	# re-structure the model
	model = Model(inputs=model.inputs, outputs=model.layers[-2].output)
	# load the photo
	image = load_img(filename, target_size=(224, 224))
	# convert the image pixels to a numpy array
	image = img_to_array(image)
	# reshape data for the model
	image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
	# prepare the image for the VGG model
	image = preprocess_input(image)
	# get features
	feature = model.predict(image, verbose=0)
	return feature

# load and prepare the photograph
photo = extract_features('example.jpg')

# extract features from each photo in the directory

def extract_features(filename):

# load the model

model = VGG16()

# re-structure the model

model = Model(inputs=model.inputs, outputs=model.layers[-2].output)

# load the photo

image = load_img(filename, target_size=(224, 224))

# convert the image pixels to a numpy array

image = img_to_array(image)

# reshape data for the model

image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))

# prepare the image for the VGG model

image = preprocess_input(image)

# get features

feature = model.predict(image, verbose=0)

return feature

# load and prepare the photograph

photo = extract_features('example.jpg')

We can then generate a description using the generate_desc() function defined when evaluating the model.

The complete example for generating a description for an entirely new standalone photograph is listed below.

from pickle import load
from numpy import argmax
from keras.preprocessing.sequence import pad_sequences
from keras.applications.vgg16 import VGG16
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.applications.vgg16 import preprocess_input
from keras.models import Model
from keras.models import load_model

# extract features from each photo in the directory
def extract_features(filename):
	# load the model
	model = VGG16()
	# re-structure the model
	model = Model(inputs=model.inputs, outputs=model.layers[-2].output)
	# load the photo
	image = load_img(filename, target_size=(224, 224))
	# convert the image pixels to a numpy array
	image = img_to_array(image)
	# reshape data for the model
	image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
	# prepare the image for the VGG model
	image = preprocess_input(image)
	# get features
	feature = model.predict(image, verbose=0)
	return feature

# map an integer to a word
def word_for_id(integer, tokenizer):
	for word, index in tokenizer.word_index.items():
		if index == integer:
			return word
	return None

# generate a description for an image
def generate_desc(model, tokenizer, photo, max_length):
	# seed the generation process
	in_text = 'startseq'
	# iterate over the whole length of the sequence
	for i in range(max_length):
		# integer encode input sequence
		sequence = tokenizer.texts_to_sequences([in_text])[0]
		# pad input
		sequence = pad_sequences([sequence], maxlen=max_length)
		# predict next word
		yhat = model.predict([photo,sequence], verbose=0)
		# convert probability to integer
		yhat = argmax(yhat)
		# map integer to word
		word = word_for_id(yhat, tokenizer)
		# stop if we cannot map the word
		if word is None:
			break
		# append as input for generating the next word
		in_text += ' ' + word
		# stop if we predict the end of the sequence
		if word == 'endseq':
			break
	return in_text

# load the tokenizer
tokenizer = load(open('tokenizer.pkl', 'rb'))
# pre-define the max sequence length (from training)
max_length = 34
# load the model
model = load_model('model-ep002-loss3.245-val_loss3.612.h5')
# load and prepare the photograph
photo = extract_features('example.jpg')
# generate description
description = generate_desc(model, tokenizer, photo, max_length)
print(description)

from pickle import load

from numpy import argmax

from keras.preprocessing.sequence import pad_sequences

from keras.applications.vgg16 import VGG16

from keras.preprocessing.image import load_img

from keras.preprocessing.image import img_to_array

from keras.applications.vgg16 import preprocess_input

from keras.models import Model

from keras.models import load_model

# extract features from each photo in the directory

def extract_features(filename):

# load the model

model = VGG16()

# re-structure the model

model = Model(inputs=model.inputs, outputs=model.layers[-2].output)

# load the photo

image = load_img(filename, target_size=(224, 224))

# convert the image pixels to a numpy array

image = img_to_array(image)

# reshape data for the model

image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))

# prepare the image for the VGG model

image = preprocess_input(image)

# get features

feature = model.predict(image, verbose=0)

return feature

# map an integer to a word

def word_for_id(integer, tokenizer):

for word, index in tokenizer.word_index.items():

if index == integer:

return word

return None

# generate a description for an image

def generate_desc(model, tokenizer, photo, max_length):

# seed the generation process

in_text = 'startseq'

# iterate over the whole length of the sequence

for i in range(max_length):

# integer encode input sequence

sequence = tokenizer.texts_to_sequences([in_text])[0]

# pad input

sequence = pad_sequences([sequence], maxlen=max_length)

# predict next word

yhat = model.predict([photo,sequence], verbose=0)

# convert probability to integer

yhat = argmax(yhat)

# map integer to word

word = word_for_id(yhat, tokenizer)

# stop if we cannot map the word

if word is None:

break

# append as input for generating the next word

in_text += ' ' + word

# stop if we predict the end of the sequence

if word == 'endseq':

break

return in_text

# load the tokenizer

tokenizer = load(open('tokenizer.pkl', 'rb'))

# pre-define the max sequence length (from training)

max_length = 34

# load the model

model = load_model('model-ep002-loss3.245-val_loss3.612.h5')

# load and prepare the photograph

photo = extract_features('example.jpg')

# generate description

description = generate_desc(model, tokenizer, photo, max_length)

print(description)

In this case, the description generated was as follows:

startseq dog is running across the beach endseq

1	startseq dog is running across the beach endseq

You could remove the start and end tokens and you would have the basis for a nice automatic photo captioning model.

It’s like living in the future guys!

It still completely blows my mind that we can do this. Wow.

Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

Alternate Pre-Trained Photo Models. A small 16-layer VGG model was used for feature extraction. Consider exploring larger models that offer better performance on the ImageNet dataset, such as Inception.
Smaller Vocabulary. A larger vocabulary of nearly eight thousand words was used in the development of the model. Many of the words supported may be misspellings or only used once in the entire dataset. Refine the vocabulary and reduce the size, perhaps by half.
Pre-trained Word Vectors. The model learned the word vectors as part of fitting the model. Better performance may be achieved by using word vectors either pre-trained on the training dataset or trained on a much larger corpus of text, such as news articles or Wikipedia.
Tune Model. The configuration of the model was not tuned on the problem. Explore alternate configurations and see if you can achieve better performance.

Did you try any of these extensions? Share your results in the comments below.

Summary

In this tutorial, you discovered how to develop a photo captioning deep learning model from scratch.

Specifically, you learned:

How to prepare photo and text data ready for training a deep learning model.
How to design and train a deep learning caption generation model.
How to evaluate a train caption generation model and use it to caption entirely new photographs.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

1,196 Responses to How to Develop a Deep Learning Photo Caption Generator from Scratch

Christian Beckmann November 28, 2017 at 3:21 am #

Hi Jason,

thanks for this great article about image caption!

My results after training were a bit worse (loss 3.566 – val_loss 3.859, then started to overfit) so i decided to try keras.applications.inception_v3.InceptionV3 for the base model. Currently it is still running and i am curious to see if it will do better.

Reply
- Jason Brownlee November 28, 2017 at 8:41 am #
  
  Let me know how you go Christian.
  
  Reply
  - zeeshan August 2, 2019 at 8:44 pm #
    
    hi jason m recieving this error can u please help me in this
    
    NameError: name ‘Flickr8k_Dataset’ is not defined
    
    Reply
    - Jason Brownlee August 3, 2019 at 8:02 am #
      
      You may have missed a line of code or the dataset is not in the same directory as the python file.
      
      Reply
      - Bhagyashree January 30, 2022 at 7:35 pm #
        
        Can you provide complete source code link without split code parts?
        please 🙂
      - James Carmichael January 31, 2022 at 10:52 am #
        
        Hello Bhagyashree…The tutorial contains full code listing that you may utilize.
  - mo December 16, 2020 at 7:54 pm #
    
    how to solve this , error happen
    
    ValueError
    
    6 generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
    7 # fit for one epoch
    —-> 8 model.fit_generator( generator,epochs=1, steps_per_epoch=steps, verbose=1)
    
    Reply
    - Jason Brownlee December 17, 2020 at 6:34 am #
      
      I don’t have enough context to comment, sorry.
      
      Perhaps these tips will help:
      https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
      
      Reply
      - sharath May 19, 2021 at 2:23 am #
        
        Hello Jason
        I,m facing a value error could u help
        
        ValueError Traceback (most recent call last)
        in ()
        6 image_input=image_input.reshape(2048,)
        7 gen=generate(desc_dict,photo,max_length_of_caption,vocab_size,image_input)
        —-> 8 model.fit(gen,epochs=1,steps_per_epoch=6000,verbose=1)
        9
        10
        
        5 frames
        in create_sequence(caption, max_length_of_caption, vocab_size, image_input)
        1 def create_sequence(caption,max_length_of_caption,vocab_size,image_input):
        —-> 2 input_sequence=[],image_sequence=[],output_sequence=[]
        3 for caption in captions:
        4 caption=caption.split(‘ ‘)
        5 caption=[wordtoindex[w] for w in caption if w in vocab]
        
        ValueError: not enough values to unpack (expected 2, got 0)
      - Jason Brownlee May 19, 2021 at 6:37 am #
        
        These tips may help:
        https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
    - asd February 2, 2021 at 12:54 am #
      
      Hey, did a find a solution? I’m facing the same error.
      
      Reply
    - Mustafa Dar October 20, 2021 at 12:53 am #
      
      What accuracy are you getting in your NLP scores?
      
      Reply
  - Rajat December 26, 2020 at 4:10 am #
    
    Hello Jason can you help me with the frontend part I tried using the flask app but failed
    
    Reply
    - Jason Brownlee December 26, 2020 at 5:13 am #
      
      Perhaps this will help:
      https://machinelearningmastery.com/faq/single-faq/how-do-i-deploy-my-python-file-as-an-application
      
      Reply
- basil June 21, 2018 at 12:03 am #
  
  Christian / Jason – instead would Batch normalization help us here. am facing the same issue, over fitting.
  
  BN should also speed up the training and should also give us more accurate results. any inputs ?
  
  Reply
  - Jason Brownlee June 21, 2018 at 6:18 am #
    
    The model usually does fit in 3-5 epochs.
    
    You can try batchnorm if you like. Not sure if it will help.
    
    Reply
    - basil June 23, 2018 at 4:34 am #
      
      yep, i agree… not required..thanks..
      
      am also trying inceptionV3, let you know the results..
      
      Reply
      - Jason Brownlee June 23, 2018 at 6:20 am #
        
        Great.
      - Ben June 24, 2018 at 8:14 am #
        
        Hey did anyone try the Inception model? What were the results?
      - abbas November 18, 2018 at 3:37 am #
        
        hey ben!!!Can you please share the code and results of the inception model?so that we can also try and know more about the inception model.Thanks in advance
- Shaurya Pratap Singh October 10, 2018 at 7:25 pm #
  
  can you plz send me the code at shauryaprataps261@gmail.com
  
  Reply
  - Asad March 24, 2019 at 7:36 am #
    
    did you find code ?
    
    Reply
- Janarddan Sarkar November 24, 2018 at 1:16 am #
  
  I am getting the same
  
  Reply
- vishal July 6, 2020 at 3:05 am #
  
  Hi,
  i have tried using the inception v3 but the bleu scores are even than that of vgg16 model.
  BLEU-1: 0.514655
  BLEU-2: 0.266434
  BLEU-3: 0.179374
  BLEU-4: 0.078146
  
  Reply
  - Jason Brownlee July 6, 2020 at 6:39 am #
    
    Nice work!
    
    Reply
  - Rohit Kushwaha April 15, 2021 at 1:32 pm #
    
    i also tried Inception i got BLEU-1 0.571
    
    Reply
    - Jason Brownlee April 16, 2021 at 5:28 am #
      
      Well done!
      
      Reply
  - afrid May 17, 2021 at 1:26 am #
    
    @vishal, can you share the inception v3 code ?
    
    Reply
- Karan Aggarwal June 13, 2021 at 3:55 am #
  
  Hello Christian Sir,
  
  To avoid overfit, you used keras.application.inceptionV3, m geeting some error in this line:
  
  print(‘Extracted Features: %d’ % len(features))
  
  —————————————————————————
  TypeError Traceback (most recent call last)
  in ()
  —-> 1 print(‘Extracted Features: %d’ % len(features))
  
  TypeError: object of type ‘NoneType’ has no len()
  
  Please help in resolving this
  
  Reply
- Nagaraj CL April 12, 2022 at 1:07 pm #
  
  HI Christian, Please can you share working Inception V3 code, I am not able to make InceptionV3 model working, I am getting following error.
  
  Incompatible shapes: [47,8,8,256] vs. [47,256]
  [[{{node gradient_tape/model_10/add_7/add/BroadcastGradientArgs}}]] [Op:__inference_train_function_1153371]
  
  Reply
Akash November 30, 2017 at 4:56 am #

Hi Jason,
Once again great Article.
I ran into some error while executing the code under “Complete example ” section.
The error I got was
ValueError: Error when checking target: expected dense_3 to have shape (None, 7579) but got array with shape (306404, 1)
Any idea how to fix this?
Thanks

Reply
- Jason Brownlee November 30, 2017 at 8:26 am #
  
  Hi Akash, nice catch.
  
  The fault appears to have been introduced in a recent version of Keras in the to_categorical() function. I can confirm the fault occurs with Keras 2.1.1.
  
  You can learn more about the fault here:
  https://github.com/fchollet/keras/issues/8519
  
  There are two options:
  
  1. Downgrade Keras to 2.0.8
  
  or
  
  2. Modify the code, change line 104 in the training code example from:
  
  out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]
  
  1
  
  out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]
  
  to
  
  out_seq = to_categorical([out_seq], num_classes=vocab_size)
  
  1
  
  out_seq = to_categorical([out_seq], num_classes=vocab_size)
  
  I hope that helps.
  
  Reply
  - Akash November 30, 2017 at 5:38 pm #
    
    Thanks Jason. It’s working now.
    Can you suggest the changes to be made to use Inception model and word embedding like word2vec.
    
    Reply
    - Jason Brownlee December 1, 2017 at 7:26 am #
      
      I’m glad to hear that.
      
      Yes, simply load the inception model and prepare the images.
      https://keras.io/applications/
      
      Reply
- Gaurav Anand August 3, 2018 at 4:02 pm #
  
  Hi Akash
  
  Could you please tell how did you git rid of this problem?
  
  I am facing
  
  ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (11, 7, 7, 512)
  
  and after changing input structure to inputs1 = Input(shape=(7, 7, 512,)) I am facing
  
  ValueError: Error when checking target: expected dense_3 to have 4 dimensions, but got array with shape (11, 3857)
  
  I have tried with Keras 2.0.8 and latest 2.2.2 versions.
  Any help would be much appreciated.
  
  Thanks
  
  Reply
  - anesh August 7, 2018 at 4:34 pm #
    
    Did you used different input shape?.If you changed the input shape then you have to flatten it and add fully connected dense layer of 4096 neurons.
    
    Reply
    - Gaurav Anand August 14, 2018 at 2:59 pm #
      
      Should I avoid using “include_top = false” while feature extraction ?
      or keep it as true ?
      
      Reply
    - abbas November 18, 2018 at 3:45 am #
      
      Anesh how to fix this error?
      
      Error when checking input: expected input_3 to have shape (4096,) but got array with shape (2048,)
      
      Reply
      - Jason Brownlee November 18, 2018 at 6:48 am #
        
        Change the data to meet the model or change the model to meet the data.
Zoltan November 30, 2017 at 11:47 pm #

Hi Jason,

Big thumbs up, nicely written, really informative article. I especially like the step by step approach.

But when I tried to go through it, I got an error in load_poto_features saying that “name ‘load’ not defined”. Which is kinda odd.

Otherwise everything seems fine.

Reply
- Jason Brownlee December 1, 2017 at 7:35 am #
  
  Thanks.
  
  Perhaps double check you have the load function imported from pickle?
  
  Reply
Bikram Kachari December 1, 2017 at 4:59 pm #

Hi Jason

I am a regular follower of your tutorials. They are great. I got to learn a lot. Thank you so much. Please keep up the good work

Reply
- Jason Brownlee December 2, 2017 at 8:49 am #
  
  You’re welcome!
  
  Reply
maibam December 1, 2017 at 7:05 pm #

____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_2 (InputLayer) (None, 34) 0
____________________________________________________________________________________________________
input_1 (InputLayer) (None, 4096) 0
____________________________________________________________________________________________________
embedding_1 (Embedding) (None, 34, 256) 1940224 input_2[0][0]
____________________________________________________________________________________________________
dropout_1 (Dropout) (None, 4096) 0 input_1[0][0]
____________________________________________________________________________________________________
dropout_2 (Dropout) (None, 34, 256) 0 embedding_1[0][0]
____________________________________________________________________________________________________
dense_1 (Dense) (None, 256) 1048832 dropout_1[0][0]
____________________________________________________________________________________________________
lstm_1 (LSTM) (None, 256) 525312 dropout_2[0][0]
____________________________________________________________________________________________________
add_1 (Add) (None, 256) 0 dense_1[0][0]
lstm_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense) (None, 256) 65792 add_1[0][0]
____________________________________________________________________________________________________
dense_3 (Dense) (None, 7579) 1947803 dense_2[0][0]
====================================================================================================
Total params: 5,527,963
Trainable params: 5,527,963
Non-trainable params: 0
_________________________

ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (306404, 7, 7, 512)

Getting error during mode.fit
model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

Keras 2.0.8 with tensorflow
what is wrong ?

Reply
- Jason Brownlee December 2, 2017 at 8:51 am #
  
  Not sure, did you copy all of the code exactly?
  
  Is your numpy and tensorflow also up to date?
  
  Reply
  - Christian January 16, 2018 at 10:09 pm #
    
    This looks like he did change the network for feature extraction. When using include_top=False and wheigts=’imagenet” you get this type of data structure.
    
    Reply
    - Jason Brownlee January 17, 2018 at 9:58 am #
      
      Nice.
      
      Reply
- Kingson June 26, 2018 at 9:54 pm #
  
  @maibam did you find the solution?
  
  I am getting similar error –
  ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (17952, 7, 7, 512)
  
  Please help me out.
  Thanks!!
  
  Reply
  - Jason Brownlee June 27, 2018 at 8:18 am #
    
    Ensure your version of Keras is up to date. v2.1.6 or better.
    
    Reply
    - Kingson June 27, 2018 at 5:26 pm #
      
      __________________________________________________________________________________________________
      Layer (type) Output Shape Param # Connected to
      ==================================================================================================
      input_2 (InputLayer) (None, 27) 0
      __________________________________________________________________________________________________
      input_1 (InputLayer) (None, 4096) 0
      __________________________________________________________________________________________________
      embedding_1 (Embedding) (None, 27, 256) 1058048 input_2[0][0]
      __________________________________________________________________________________________________
      dropout_1 (Dropout) (None, 4096) 0 input_1[0][0]
      __________________________________________________________________________________________________
      dropout_2 (Dropout) (None, 27, 256) 0 embedding_1[0][0]
      __________________________________________________________________________________________________
      dense_1 (Dense) (None, 256) 1048832 dropout_1[0][0]
      __________________________________________________________________________________________________
      lstm_1 (LSTM) (None, 256) 525312 dropout_2[0][0]
      __________________________________________________________________________________________________
      add_1 (Add) (None, 256) 0 dense_1[0][0]
      lstm_1[0][0]
      __________________________________________________________________________________________________
      dense_2 (Dense) (None, 256) 65792 add_1[0][0]
      __________________________________________________________________________________________________
      dense_3 (Dense) (None, 4133) 1062181 dense_2[0][0]
      ==================================================================================================
      Total params: 3,760,165
      Trainable params: 3,760,165
      Non-trainable params: 0
      __________________________________________________________________________________________________
      None
      Traceback (most recent call last):
      File “train2.py”, line 179, in
      model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))
      
      ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (10931, 7, 7, 512)
      
      keras version is – 2.2.0
      
      Please help me out.
      
      Reply
      - Jason Brownlee June 28, 2018 at 6:13 am #
        
        Looks like the dimensions of your data do not match the expectations of the model.
        
        You can change the data or change the model.
      - anesh August 7, 2018 at 4:36 pm #
        
        If you changed the input shape by include_top=False then you have to flatten it and add two FC dense layer of 4096 neurons.
Vik December 2, 2017 at 7:16 pm #

Thank you for the article. It is great to see full pipeline.
Always following your articles with admiration

Reply
- Jason Brownlee December 3, 2017 at 5:24 am #
  
  Thanks!
  
  Reply
Gonzalo Gasca Meza December 4, 2017 at 10:42 am #

In the prepare data section, if using Python 2.7 there is no str.maketrans method.
To make this work just comment that line and in line 46 do this:
desc = [w.translate(None, string.punctuation) for w in desc]

Reply
- Jason Brownlee December 4, 2017 at 4:57 pm #
  
  Thanks Gonzalo!
  
  Reply
- Bani March 8, 2018 at 4:26 am #
  
  after using the function to_vocabulary()
  I am getting a vocabulary of size 24 which is too less though I have followed the code line by line.
  Can u help?
  
  Reply
  - Jason Brownlee March 8, 2018 at 6:36 am #
    
    Are you able to confirm that your Python is version 3.5+ and that you have the latest version of all libraries installed?
    
    Reply
Minel December 11, 2017 at 6:17 pm #

Hi Jason,
I am using your code step by step. There is a light mistake :
you wrote
# save descriptions
save_doc(descriptions, ‘descriptions.txt’)

in fact the right intruction is
# save descriptions
save_descriptions(descriptions, ‘descriptions.txt’)

as you wrote in the final example
best

Reply
- Jason Brownlee December 12, 2017 at 5:25 am #
  
  Thanks Minel, fixed.
  
  Reply
Minel December 11, 2017 at 6:34 pm #

Hi jason
Another small detail. I had to write
from pickle import load
to run the instruction
all_features = load(open(filename, ‘rb’))

Best

Reply
- Jason Brownlee December 12, 2017 at 5:27 am #
  
  Nice catch, fixed. Thanks!
  
  Reply
Minel December 11, 2017 at 9:32 pm #

Hi Jason,
I met some trouble running your code. I got a MemoryError on the instruction :
return array(X1), array(X2), array(y)

I am using a virtual machine with Linux (Debian), Python3, with 32Giga of memory.
Could you tell me what was the size of the memory on the computer you used to check your program ?

Best

Reply
- Jason Brownlee December 12, 2017 at 5:29 am #
  
  I expect 8GB of RAM would be enough.
  
  Perhaps try and use progressive loading instead, as described in this post:
  https://machinelearningmastery.com/prepare-photo-caption-dataset-training-deep-learning-model/
  
  Reply
  - Shaurya Pratap Singh October 10, 2018 at 7:36 pm #
    
    thanks, Jason for a great tutorial!
    from line 96 to 104 in the main complete code.
    
    seq = tokenizer.texts_to_sequences([desc])[0]
    out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]
    seq = tokenizer.texts_to_sequences([desc])[0]
    
    i did not understand why did you do [0] in tokenizer.texts_to_sequences([desc])[0], and moreover why did you passed a 2d list?
    
    what does texts_to_sequences do??
    
    Reply
    - Jason Brownlee October 11, 2018 at 7:53 am #
      
      You can learn more about the function here:
      https://keras.io/preprocessing/text/#text_to_word_sequence
      
      It takes a 2D array, here we provide our 1d array as a 2d array and retrieve the result.
      
      You can learn more about Python array indexing here:
      https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/
      
      Reply
      - Hassaan October 24, 2018 at 12:29 am #
        
        Please slightly explain that. I am not getting why did you do [0] in tokenizer.texts_to_sequences([desc])[0] . I have also read array indexing. How it become 2D array from that? Please explain it
      - Jason Brownlee October 24, 2018 at 6:31 am #
        
        Because the function returns a 2D array and we only need the first dimension.
Minel December 12, 2017 at 11:34 pm #

Thank for the advice.In fact, I upgraded the VM (64Go, 16 cores) and it worked fine (using 45Go of memory)
Best

Reply
- Jason Brownlee December 13, 2017 at 5:35 am #
  
  Nice! Glad to hear it.
  
  Reply
  - Vineeth March 3, 2018 at 12:32 am #
    
    I get the same error even with 64GB VM :/ What to do
    
    Reply
    - Jason Brownlee March 3, 2018 at 8:13 am #
      
      I’m sorry to hear that, perhaps there is something else going on with your workstation?
      
      I can confirm the example works on workstations and on EC2 instances with and without GPUs.
      
      Reply
      - Vineeth March 3, 2018 at 10:06 pm #
        
        It’s throwing a Value error for input_1 after sometime. I tried everything i can but i am not able to understand. Can you paste the link of your project so i can compare ?
      - Jason Brownlee March 4, 2018 at 6:03 am #
        
        Are you able to confirm that your Python environment is up to date?
      - Vineeth March 3, 2018 at 10:26 pm #
        
        And sir, You said the pickle size must be about 127Mb but mine turns out to be above 700MB what did i do wrong ?
      - Jason Brownlee March 4, 2018 at 6:04 am #
        
        The size may be different on different platforms (macos/linux/windows).
Josh Ash December 17, 2017 at 9:56 pm #

Hi Jason – hello from Queensland 🙂
Your tutorials on applied ML in Python are the best on the net hands down, thanks for putting them together!

Reply
- Jason Brownlee December 18, 2017 at 5:22 am #
  
  Thanks Josh!
  
  Reply
Madhivarman December 18, 2017 at 7:12 pm #

hai Jason.. When i run the train.py script my lap freeze…I don’t know whether its training or not.Did anyone face this issue ?

Thanks..!

Reply
- Jason Brownlee December 19, 2017 at 5:16 am #
  
  Sorry to hear that.
  
  Perhaps try running it on AWS for a few dollars:
  https://machinelearningmastery.com/develop-evaluate-large-deep-learning-models-keras-amazon-web-services/
  
  Reply
Muhammad Awais December 20, 2017 at 3:36 pm #

Thanks for such a great work. I found an error message when running a code
FileNotFoundError: [Errno 2] No such file or directory: ‘descriptions.txt’
Please help

Reply
- Jason Brownlee December 20, 2017 at 3:50 pm #
  
  Ensure you generate the descriptions file before running the prior model – check the tutorial steps again and ensure you execute each in turn.
  
  Reply
Daniel F December 21, 2017 at 4:31 am #

Hi Jason,

I’m getting a MemoryError when I try to prepare the training sequences:

Traceback (most recent call last):
File “C:\Users\Daniel\Desktop\project\deeplearningmodel.py”, line 154, in
X1train, X2train, ytrain = create_sequences(tokenizer, max_length, train_descriptions, train_features)
File “C:\Users\Daniel\Desktop\project\deeplearningmodel.py”, line 104, in create_sequences
out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]
File “C:\Program Files\Anaconda3\lib\site-packages\keras\utils\np_utils.py”, line 24, in to_categorical
categorical = np.zeros((n, num_classes))
MemoryError

any advice? I have 8GB of RAM.

Reply
- Jason Brownlee December 21, 2017 at 5:29 am #
  
  Perhaps try using progressive loading described in this post:
  https://machinelearningmastery.com/develop-a-caption-generation-model-in-keras/
  
  Reply
  - Daniel F December 22, 2017 at 8:26 am #
    
    Any chance you could show what that would look like functioning with this example? 🙂 I’m struggling a bit to bring a similar generator from the other example to this script.
    
    Reply
    - Jason Brownlee December 22, 2017 at 4:13 pm #
      
      Thanks for the suggestion, I’ll schedule time to update the example.
      
      Reply
      - Daniel F December 22, 2017 at 11:58 pm #
        
        Great! Thanks so much
        
        And thanks for the blog, it is really wonderful 🙂 For now I just cut down the training set a lot to work around the memory error and understand.the code better.
      - Jason Brownlee December 23, 2017 at 5:19 am #
        
        Thanks Daniel.
      - Vineeth February 13, 2018 at 7:29 pm #
        
        I’m having the same problem. Can you please show the example with progressive generator please ?
      - Jason Brownlee February 14, 2018 at 8:17 am #
        
        I provide an example here:
        https://machinelearningmastery.com/prepare-photo-caption-dataset-training-deep-learning-model/
        
        Update: I have updated the tutorial to include an example of training using progressive loading (a data generator).
zonetrooper32 December 28, 2017 at 3:12 am #

Hi Jason,

Thank you for this amazing article about image captioning.

Currently I am trying to re-implement the whole code, except that I am doing it in pure Tensorflow. I’m curious to see if my re-implementation is working as smooth as yours.

Also a shower thought, it might be better to get a better vector representations for words if using the pretrained word2vec embeddings, for example Glove 6B or GoogleNews. Learning embeddings from scratch with only 8k words might have some performance loss.

Again thank you for putting everything together, it will take quite some time to implement from scratch without your tutorial.

Reply
- Jason Brownlee December 28, 2017 at 5:26 am #
  
  Try it and see if it lifts model skill. Let me know how you go.
  
  Reply
Sasikanth January 8, 2018 at 5:04 pm #

Hello Jason,
Is there a R package to perform modeling of images?

regards
sasikanth

Reply
- Jason Brownlee January 9, 2018 at 5:24 am #
  
  I don’t know, sorry.
  
  Reply
Marco January 16, 2018 at 10:08 pm #

Hi Jason! Thanks for your amazing tutorial! I have a question. I don’t understand the meaning of the number 1 on this line (extract_features):
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))

Can you explain me what reshape does and the meaning of the arguments?

Thanks in advance.

Reply
- Jason Brownlee January 17, 2018 at 9:58 am #
  
  Great question, see this post to learn more about numpy arrays:
  https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/
  
  Reply
junhyung yu January 22, 2018 at 8:54 pm #

Hi Jason! thank you for your great code.
but i have one question.

How long does it take to execute under code?

# define the model
model = define_model(vocab_size, max_length)

This code does not run during the third day.

I think that “se3 = LSTM(256)(se2)” code in define_model function is causing the problem.

My computer configuration is like this.

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz – 6 core
Ram 62G
GeForce GTX TITAN X – 2core

please help me~~

Reply
- Jason Brownlee January 23, 2018 at 7:55 am #
  
  Ouch, something is wrong.
  
  Perhaps try running on AWS?
  
  Perhaps try other models and test your rig/setup?
  
  Perhaps try fewer epochs or a smaller model to see if your setup can train the model at all?
  
  Reply
  - junhyung yu January 23, 2018 at 3:29 pm #
    
    1. No. i try running on my indicvdual linux server and using jupyter notebook
    
    2. No i am using only your code , no other model, no modify
    
    3.
    
    model.fit([X1train, X2train], ytrain, epochs=20, verbose=1, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))
    
    This code has not yet been executed
    
    so I do not think epoch is a problem.
    
    Reply
    - Jason Brownlee January 24, 2018 at 9:50 am #
      
      Perhaps run from the command line as a background process without notebook?
      
      Perhaps check memory usage and cpu/gpu utilization?
      
      Reply
krishna January 23, 2018 at 10:41 pm #

ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

hi sir… I am getting this error above when i run feature extract code.

Reply
- Jason Brownlee January 24, 2018 at 9:55 am #
  
  Sorry, I have not seen that error.
  
  Reply
- Hiroshi February 26, 2018 at 1:01 pm #
  
  Hi Krishna,
  
  I’m also getting this error time to time. Were you able to solve this issue?
  
  Reply
- anesh August 7, 2018 at 4:40 pm #
  
  You have to connect to the internet to download the vgg network.
  
  Reply
Sathiya_Chakra January 28, 2018 at 7:05 am #

Hi Jason!

Is it possible to run this neural network on a 8GB RAM laptop with 2GB Graphics card with Intel core i5 processor?

Reply
- Jason Brownlee January 28, 2018 at 8:28 am #
  
  Perhaps.
  
  You might need to adjust it to use progressive loading so that it does not try to hold the entire dataset in RAM.
  
  Reply
  - sandhya November 20, 2018 at 4:56 am #
    
    Hi jason
    
    Is it possible to run on cpu with progressive loading without any issues??
    
    Reply
    - Jason Brownlee November 20, 2018 at 6:38 am #
      
      Yes!
      
      Reply
Ajit Tiwari January 29, 2018 at 10:46 pm #

Hi Jason,
Can you provide a link for the tokenizer as well as the model file.
I Cannot train this model in my system but would like to see if I can use it to create an Android app

Reply
- Jason Brownlee January 30, 2018 at 9:51 am #
  
  Sorry, I cannot share the models.
  
  Reply
Soumya February 1, 2018 at 10:19 pm #

When I am running

tokenizer = Tokenizer()

I am getting error,

Traceback (most recent call last):
File “”, line 1, in
NameError: name ‘Tokenizer’ is not defined

How to solve this. Any idea please.

Reply
- Jason Brownlee February 2, 2018 at 8:19 am #
  
  Ensure you import the Tokenizer.
  
  Reply
Marco February 9, 2018 at 12:41 am #

Hi Jason, thanks for the tutorial! I want to ask you if you could explain (or send me some links), to better understand, how exactly the fitting works.

Example description: the girl is …

The LSTM network during fitting takes the beginning of the sequence of my description (startseq) and it produces a vector with all possible subsequent words. This vector is combined with the vector of the input image features and it is passed within an FF layer where we then take the most probable word (with softmax). it’s right?

At this point how does the fitting go on? Is the new sequence (e.g startseq – the) passed into the LSTM network, predicts all possible next words, etc.? Continuing this way up to endseq?

If the network incorrectly generates the next word, what happens? How are the weights arranged? The fitting continues by taking in input “startseq – wrong_word” or continues with the correct one (eg startseq – the)?

Thanks for your help
Marco

Reply
- Jason Brownlee February 9, 2018 at 9:10 am #
  
  This is not fitting, it is inference.
  
  Generating the “wrong” word might not matter, the network could correct.
  
  Also, we can sample the probability distribution of the generated sequence to pull out multiple possible descriptions:
  https://machinelearningmastery.com/beam-search-decoder-natural-language-processing/
  
  To learn more see this post and the referenced papers:
  https://machinelearningmastery.com/caption-generation-inject-merge-architectures-encoder-decoder-model/
  
  Reply
Sumit Das February 13, 2018 at 6:10 pm #

Hi Jason great article on caption generator i think the best till now available online.. i am a newbee in ML(AI). i extracted the features and stored it to features.pkl file but getting an error on create sequence functions memory error and i can see you have suggested progressive loading i do not get that properly could you suggest my how to use the current code modified for progressive loading::

[‎2/‎13/‎2018 12:34 PM] Sanchawat, Hardik:
Using TensorFlow backend.
Dataset: 6000
Descriptions: train=6000
Photos: train=6000
Vocabulary Size: 7579
Description Length: 34
Traceback (most recent call last):
File “C:\Users\hardik.sanchawat\Documents\Scripts\flickr\test.py”, line 154, in
X1train, X2train, ytrain = create_sequences(tokenizer, max_length, train_descriptions, train_features)
File “C:\Users\hardik.sanchawat\Documents\Scripts\flickr\test.py”, line 109, in create_sequences
return array(X1), array(X2), array(y)
MemoryError

My system configuration is :

OS: Windows 10
Processor: AMD A8 PRO-7150B R5, 10 Compute Cores 4C+6G 1.90 GHz
Memory(RAM): 16 GB (14.9GB Usable)
System type: 64-bit OS, x64-based processor

Reply
- Jason Brownlee February 14, 2018 at 8:17 am #
  
  I have an example of progressive loading here:
  https://machinelearningmastery.com/prepare-photo-caption-dataset-training-deep-learning-model/
  
  Update: I have updated the tutorial to include an example of training using progressive loading (a data generator).
  
  Reply
Kavya February 14, 2018 at 8:35 am #

Hi Jason,

I am trying to using plot _model . but I getting error

raise ImportError(‘Failed to import pydot. You must install pydot’

ImportError: Failed to import pydot. You must install pydot and graphviz for pydotprint to work.

I tried
conda install graphviz
conda install pydotplus

to install pydot.
my python version is3.x
eras vesion is 2.1.3

Could you please help me , to solve this problem

Reply
- Jason Brownlee February 14, 2018 at 2:40 pm #
  
  I’m sorry to hear that.
  
  Perhaps the installed libraries are not available in your current Python environment?
  
  Perhaps try posting the error to stackoverflow? I’m not an expert at debugging workstations.
  
  Reply
- Vineeth February 14, 2018 at 5:13 pm #
  
  If you are on windows go here and install this, https://graphviz.gitlab.io/_pages/Download/Download_windows.html 2.38 stable msi file.
  
  after that, add the graphviz’s bin onto your system PATH variables. Restart your computer and the path should be picked up.
  
  Then you won’t have that error again.
  
  Reply
  - Kavya February 17, 2018 at 2:36 pm #
    
    Thanks Vinneth,
    I am using Mac. I tried toes pydotplus, but still its giving same error.
    
    Reply
- Precious Angrish May 2, 2018 at 10:34 am #
  
  HI
  
  I am getting the same error, how did you fix it?
  
  Regards
  Precious Angrish
  
  Reply
- Sayan May 14, 2018 at 3:05 am #
  
  Hey Kavya i assume this will surely resolve your error , as it also worked for me as well, https://stackoverflow.com/questions/36869258/how-to-use-graphviz-with-anaconda-spyder.
  Thanks
  
  Reply
Vineeth February 14, 2018 at 9:02 pm #

I used Progressive Loading from https://machinelearningmastery.com/prepare-photo-caption-dataset-training-deep-learning-model/#comment-429470 This tutorial and updated the input layer to inputs1 = Input(shape=(224, 224, 3))

And i got the error
ValueError: Error when checking target: expected dense_3 to have 4 dimensions, but got array with shape (13, 4485)

Then i updated to_categorical function as you mentioned and the error changed to this
ValueError: Error when checking target: expected dense_3 to have 4 dimensions, but got array with shape (13, 1, 4485)

Been trying to figure out the exact input shapes of the model since 2 days please help 🙁

Reply
- Srinath Hanumantha Rao March 21, 2018 at 7:58 pm #
  
  Hey Vineeth!
  
  Were you able to solve this issue? I am stuck on this for a few days too.
  
  Reply
  - Jason Brownlee March 22, 2018 at 6:21 am #
    
    Are you able to confirm your Python and Keras versions?
    
    Reply
Alex February 21, 2018 at 12:30 am #

Hi Jason, why do you apply dropout to the input instead to applying it to the dense layer?

Reply
- Jason Brownlee February 21, 2018 at 6:40 am #
  
  I used a little experimentation to come up with the model.
  
  Try changing it up and see if you can lift skill or reduce training time or model complexity Alex. I’m eager to hear how you go.
  
  Reply
Sunny February 28, 2018 at 7:23 am #

Hi Jason,

I just wanted to know that when you are loading the training data, you are tokenizing the train descriptions. But when you are working with test data, you are not tokenizing the test descriptions, instead working with the previous tokens. Shouldn’t the test descriptions be tokenized too before passing to create_sequence for test ?

Reply
- Jason Brownlee March 1, 2018 at 6:02 am #
  
  The train and test data are tokenized.
  
  Reply
Hgarrison March 7, 2018 at 8:44 am #

Hi Jason,

This tutorial is of great help to us all, I think. I have a question: Does the model eventually learn to predict captions not present in the corpus? I mean, is it possible for the model to output sentences that are never seen before? In the example you give, the model predicted “startseq dog is running across the beach endseq”. Is this sentence found in the training corpus, or did the model make it up based on previous observations? And also, If it is possible for the model to combine sentences, how much training data do you think it needs to do that?

Reply
- Jason Brownlee March 7, 2018 at 3:04 pm #
  
  The model attempts to generalize beyond what it has seen during training.
  
  In fact, this is the goal with a machine learning model.
  
  Nevertheless, the model will be bounded by the types of text and images seen during training, just not the specific combinations.
  
  Reply
Giuseppe March 8, 2018 at 12:05 am #

Hi Jeson, I have a question. What exactly is the LSTM used for? During fitting it takes an input (eg startseq – girl) and outputs a vector of 256 elements that contain the most probable words after the prefix? Is it trained through backpropagation? The purpose of the fitting is to make sure that given a prefix / input the LSTM gives me back a vector that represents “better” the possible following words (which are then merge with the features, etc …)

Reply
- Jason Brownlee March 8, 2018 at 6:32 am #
  
  It is used for interpreting the text generated so far, needed to generate the next word.
  
  Reply
fatma March 16, 2018 at 8:16 pm #

Hi Jason,

for the line:

features = dict()

I got syntaxerror: invalid syntax

How can I fix this error?

Reply
- Jason Brownlee March 17, 2018 at 8:36 am #
  
  Perhaps double check that you have copied the code while maintaining white space?
  
  Perhaps confirm Python 3?
  
  Reply
fatma March 20, 2018 at 10:21 pm #

Hi Jason,

is the following line:

model = Model(inputs=model.inputs, outputs=model.layers[-1].output)

means we will save the features of fc2 layer of the vgg16 model?

Reply
- Jason Brownlee March 21, 2018 at 6:33 am #
  
  We are creating a new model without the last layer.
  
  Reply
  - fatma March 21, 2018 at 3:54 pm #
    
    the new model doesn’t contain any fully connected layer because I read that we can extract the features from the fc2 layers of the pre-trained model also
    
    Reply
    - fatma March 21, 2018 at 4:35 pm #
      
      when I run the line model.summary() I got the last layer is :
      
      block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808
      
      but according to the VGG16 it should be
      
      fc2 (Dense) (None, 4096) 16781312 fc1[0][0]
      
      I don’t know where is the problem?
      
      Reply
      - Saurabh May 6, 2019 at 3:58 pm #
        
        That is because you must have specified include_top = False in VGG. This will not include the fully connected part of the network.
    - fatma March 23, 2018 at 9:27 pm #
      
      Hi Jason,
      
      how we can feed the saved features in the pickle file (features.pkl) to a linear regression model
      
      Reply
      - Jason Brownlee March 24, 2018 at 6:27 am #
        
        That would be a lot of input features! Sorry, I don’t have a worked example.

Akash March 21, 2018 at 7:04 am #

ValueError: Error when checking input: expected input_1 to have shape (None, 4096) but got array with shape (0, 1)

I am getting this error..can anyone help me understand and fix it?

Jason Brownlee March 21, 2018 at 3:03 pm #

Are you able to confirm that you have Python3 and all libs up to date?

Reply
- Akash March 21, 2018 at 9:16 pm #
  
  Yes all my libraries are upto date, have checked.
  I solved the problem i posted before….my problem was in the data generator.
  I am using progressive loading.After fixing the problem i checked my inputs using this code:
  
  generator = data_generator(descriptions, tokenizer, max_length)
  inputs, outputs = next(generator)
  print(inputs[0].shape)
  print(inputs[1].shape)
  print(outputs.shape)
  
  and it’s giving me an output like this:
  
  (13, 224, 224, 3)
  (13, 28)
  (13, 4485)
  
  but now it’s showing this error:
  ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (8, 224, 224, 3)
  
  do i have to change the model architecture for progressive loading??
  
  NOTE:for progressive loading have used this code:https://machinelearningmastery.com/prepare-photo-caption-dataset-training-deep-learning-model/
  
  Reply
  - Steven March 22, 2018 at 9:57 pm #
    
    I am stock with the same issue. The example above runs me into memory problems even when I tried it using AWS EC2 g2.2xlarge instance or a laptop with 16 GB RAM. So I tried the progressive loading example you referred to frequently but I have the same trouble with the input of the model. I tried to use inputs[0] as inputs1 for the define_model function but that returned the error ‘Error when checking input: expected input_13 to have 5 dimensions, but got array with shape (13, 224, 224, 3)’. Do I have to reshape input[0], or is the problem in inputs2?
    
    Reply
    - Akash March 23, 2018 at 6:29 pm #
      
      I think the model architecture needs to be changed for the progressive loading example particularly the input shapes.

Harsha April 2, 2018 at 9:27 pm #

getting the same error for me
File “fittingmodel.py”, line 189, in
model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))
File “C:\Users\pranyaram\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py”, line 1630, in fit
batch_size=batch_size)
File “C:\Users\pranyaram\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py”, line 1476, in _standardize_user_data
exception_prefix=’input’)
File “C:\Users\pranyaram\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py”, line 123, in _standardize_input_data
str(data_shape))
ValueError: Error when checking input: expected input_1 to have shape (4096,) but got array with shape (1,)

Jason Brownlee April 3, 2018 at 6:33 am #

What version of libs are you using?

Here’s what I’m running:

python: 3.6.5
scipy: 1.0.1
numpy: 1.14.2
matplotlib: 2.1.1
pandas: 0.22.0
statsmodels: 0.8.0
sklearn: 0.19.1
nltk: 3.2.5
gensim: 3.4.0
xgboost 0.6
tensorflow: 1.7.0
theano: 1.0.1
Using TensorFlow backend.
keras: 2.1.5

python: 3.6.5

scipy: 1.0.1

numpy: 1.14.2

matplotlib: 2.1.1

pandas: 0.22.0

statsmodels: 0.8.0

sklearn: 0.19.1

nltk: 3.2.5

gensim: 3.4.0

xgboost 0.6

tensorflow: 1.7.0

theano: 1.0.1

Using TensorFlow backend.

keras: 2.1.5

Tanisha March 31, 2018 at 5:50 pm #

Hi Jason,
Thanks for the article.

Due to lack of resources I tried running this in small amount of data.Everything worked fine but the generating new description part is giving this error.

C:\Users\Tanisha\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
2018-03-31 12:07:43.176707: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-03-31 12:07:43.574792: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1212] Found device 0 with properties:
name: GeForce 820M major: 2 minor: 1 memoryClockRate(GHz): 1.25
pciBusID: 0000:08:00.0
totalMemory: 2.00GiB freeMemory: 1.65GiB
2018-03-31 12:07:43.584220: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1283] Ignoring visible gpu device (device: 0, name: GeForce 820M, pci bus id: 0000:08:00.0, compute capability: 2.1) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.0.
Traceback (most recent call last):
File “7_generate_discription.py”, line 72, in
description = generate_desc(model, tokenizer, photo, max_length)
File “7_generate_discription.py”, line 48, in generate_desc
yhat = model.predict([photo,sequence], verbose=0)
File “C:\Users\Tanisha\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\engine\training.py”, line 1817, in predict
check_batch_axis=False)
File “C:\Users\Tanisha\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\engine\training.py”, line 123, in _standardize_input_data
str(data_shape))
ValueError: Error when checking : expected input_2 to have shape (25,) but got array with shape (34,)

Any idea how can i fix this ?
Thanks.

Reply
- Jason Brownlee April 1, 2018 at 5:46 am #
  
  Are you able to confirm that your Keras version and TF are up to date?
  
  Did you copy all of the code as is?
  
  Reply
  - Tanisha April 5, 2018 at 11:52 am #
    
    Yeah those two are updated i just changed “max_length = 34” to “max_length = 25” in the code and now its working.
    
    Reply
    - Jason Brownlee April 5, 2018 at 3:13 pm #
      
      I’m glad to hear you worked it out.
      
      Reply
    - Saurabh May 6, 2019 at 4:02 pm #
      
      Changing max_length did not give any error to you?
      
      Reply
Harsha April 1, 2018 at 2:46 pm #

i am getting this error
X1train, X2train, ytrain = create_sequences(tokenizer, max_length, train_descriptions, train_features)
File “fittingmodel.py”, line 109, in create_sequences
return array(X1), array(X2), array(y)
MemoryError

Reply
- Jason Brownlee April 2, 2018 at 5:19 am #
  
  Perhaps try running the example on a machine with more RAM, or update the example to use progressive loading described in this post:
  https://machinelearningmastery.com/prepare-photo-caption-dataset-training-deep-learning-model/
  
  Reply
pramod choudhari April 1, 2018 at 4:07 pm #

what backend are you using??

Reply
- Jason Brownlee April 2, 2018 at 5:20 am #
  
  TensorFlow.
  
  Reply
anurag vats April 2, 2018 at 3:26 pm #

can some one give me this file “model-ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5”
my pc don’t have enough processing power .

Reply
- Jason Brownlee April 3, 2018 at 6:27 am #
  
  Perhaps you could train the model on an EC2 instance:
  https://machinelearningmastery.com/develop-evaluate-large-deep-learning-models-keras-amazon-web-services/
  
  Reply
Harsha April 2, 2018 at 6:14 pm #

ile “fittingmodel.py”, line 189, in
model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))
File “C:\Users\pranyaram\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py”, line 1522, in fit
batch_size=batch_size)
File “C:\Users\pranyaram\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py”, line 1378, in _standardize_user_data
exception_prefix=’input’)
File “C:\Users\pranyaram\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py”, line 144, in _standardize_input_data
str(array.shape))
ValueError: Error when checking input: expected input_1 to have shape (None, 4096) but got array with shape (0, 1)

Reply
- Jason Brownlee April 3, 2018 at 6:32 am #
  
  Are you able to confirm that you are using Python 3 and that your version of Keras is up to date?
  
  Reply
  - Harsha April 3, 2018 at 2:31 pm #
    
    which keras version should i use
    
    Reply
    - Jason Brownlee April 4, 2018 at 6:04 am #
      
      The most recent.
      
      Reply
      - Harsha April 4, 2018 at 1:42 pm #
        
        even still i am getting the same error once check the model training file how to reduce the training size to avoid memory error.
      - Jason Brownlee April 5, 2018 at 5:52 am #
        
        You can use progressive loading to reduce the memory requirements for the model.
        
        Update: I have updated the tutorial to include an example of training using progressive loading (a data generator).
Lazuardi April 3, 2018 at 3:44 am #

Hello, Jason! Thank you for your tutorial.

I tried to use pre-trained model and copy-paste the code above to my Anaconda python 3.6 and Keras version of 2.1.5. First, it will run smoothly without any problem, and it begins to crawl on several image files. Unfortunately, after a while, I get this kind of error:

“OSError: cannot identify image file ‘Flicker8k_Dataset/find.py”

Any idea what is wrong? I am running it on my laptop with GPU NVIDIA GeForce 1050 Ti with Intel Core i7-7700HQ with Windows 10 OS.

Thank you in advance!

Reply
- Jason Brownlee April 3, 2018 at 6:40 am #
  
  Looks like something very strange is going on.
  
  I have not seen this error. Perhaps try running from the commandline, often notebooks and IDEs introduce new and crazy faults of their own.
  
  Reply
goutham April 4, 2018 at 1:48 pm #

Using TensorFlow backend.
Dataset: 6000
Descriptions: train=6000
Photos: train=6000
Vocabulary Size: 7579
Description Length: 34
Traceback (most recent call last):
File “model_fit.py”, line 154, in
X1train, X2train, ytrain = create_sequences(tokenizer, max_length, train_descriptions, train_features)
File “model_fit.py”, line 109, in create_sequences
return array(X1), array(X2), array(y)
MemoryError

how to reduce the training size to avoid this error.

Reply
- Jason Brownlee April 5, 2018 at 5:52 am #
  
  You can use progressive loading to reduce the memory requirements for the model.
  
  Reply
- Belgaroui April 15, 2018 at 10:22 pm #
  
  I got the same error “OSError: cannot identify image file ‘Flicker8k_Dataset/desktop.ini'” did you fix it?
  
  Reply
  - Jason Brownlee April 16, 2018 at 6:10 am #
    
    Looks like you have a windows file called desktop.ini in the directory for some reason. Delete it.
    
    Reply
harsha April 4, 2018 at 5:58 pm #

Hi, can you provide me the weights file. My laptop is having 12GB RAM, NVIDIA GeForce 820M Graphics, all supported drivers. But Iam getting the memory error issue.

I have tried progressive loading also.. But it is not working.. It is not saving the weights file even after steps per epoch=70000 is completed even. I cant afford for the AWS.
So, I request you to give me the weights file.
Thanks in advance.

Reply
- Jason Brownlee April 5, 2018 at 5:53 am #
  
  Sorry, I cannot share the weights file.
  
  I will schedule time into updating the tutorial to add a progressive loading example.
  
  Update: I have updated the tutorial to include an example of training using progressive loading (a data generator).
  
  Reply
manish April 5, 2018 at 12:58 am #

Hi,
I got an error while generating the captions.

Here is the error:

Traceback (most recent call last):
File “generate_captions5.py”, line 64, in
tokenizer = load(open(‘descriptions.txt’, ‘rb’))
_pickle.UnpicklingError: could not find MARK

Reply
- Jason Brownlee April 5, 2018 at 6:09 am #
  
  I have not seen this error before, sorry. Perhaps try running the code again?
  
  Reply
harsha April 5, 2018 at 4:50 am #

startseq man in red shirt is standing on the street endseq

caption is generating but it is giving same caption for different images.

Reply
- Jason Brownlee April 5, 2018 at 6:15 am #
  
  Perhaps your model requires further training?
  
  Reply
- Mohankumar Balasubramaniyam May 3, 2019 at 12:42 am #
  
  Hi I am also facing the same issue. Can you tell what you did to overcome the problem @harsha
  
  Reply
  - Sayak Paul January 23, 2020 at 6:04 pm #
    
    Same issue I am facing as well.
    
    Reply
    - Roy June 12, 2020 at 4:56 am #
      
      Hey, have you figured out the problem?
      
      Reply
      - Rohan December 21, 2020 at 2:52 am #
        
        I am having the same issue as well. I first did it will the MS COCO dataset because it has many more images and captions, but when I ran into the issue, I followed the tutorial with the Flicker Dataset and I am running into the same issue again. Has anyone figured out the solution?
      - Jason Brownlee December 21, 2020 at 6:40 am #
        
        Are you able to confirm your tensorflow and keras versions?
      - Rohan December 22, 2020 at 1:56 am #
        
        My TensorFlow version is 2.3.1 and my Keras version is 2.4.3. However, I am using the keras built into tensorflow.
      - Jason Brownlee December 22, 2020 at 6:49 am #
        
        The versions look good.
        
        Perhaps these instructions will help you copy the code without error:
        https://machinelearningmastery.com/faq/single-faq/how-do-i-copy-code-from-a-tutorial
      - Rohan December 24, 2020 at 2:24 am #
        
        I had copied the code correctly, but I had been using the data generator because the COCO dataset has so much data. When I tried again with the Flicker dataset, I used the data generator as well, because I wasn’t sure if my 16 gigs of RAM would be enough to load all the data in at once. I am trying again, but without the data generator. I hope it works
      - Rohan December 24, 2020 at 3:50 am #
        
        It is not generating the exact same caption for each image, but it does place “a man in a red shirt is” at the beginning of each caption and the captions do not seem to be accurate.
      - Jason Brownlee December 24, 2020 at 5:36 am #
        
        Perhaps try training the model again?
        Perhaps select a different final model?
        Perhaps tune the learning parameters?
manish April 5, 2018 at 2:16 pm #

val-loss is improving up to 3 epoches only, there’s no any improvement in further epoches.

model-ep003-loss3.662-val_loss3.824.h5. This is the last epoche that has improved till now.

Reply
SAI April 8, 2018 at 12:49 am #

File “”, line 1, in
runfile(‘C:/Users/Owner/.spyder-py3/ML/4.py’, wdir=’C:/Users/Owner/.spyder-py3/ML’)

File “C:\Users\Owner\Anaconda_3\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 705, in runfile
execfile(filename, namespace)

File “C:\Users\Owner\Anaconda_3\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 102, in execfile
exec(compile(f.read(), filename, ‘exec’), namespace)

File “C:/Users/Owner/.spyder-py3/ML/4.py”, line 161, in
model = define_model(vocab_size, max_length)

File “C:/Users/Owner/.spyder-py3/ML/4.py”, line 129, in define_model
plot_model(model, to_file=’model.png’, show_shapes=True)

File “C:\Users\Owner\Anaconda_3\lib\site-packages\keras\utils\vis_utils.py”, line 135, in plot_model
dot = model_to_dot(model, show_shapes, show_layer_names, rankdir)

File “C:\Users\Owner\Anaconda_3\lib\site-packages\keras\utils\vis_utils.py”, line 56, in model_to_dot
_check_pydot()

File “C:\Users\Owner\Anaconda_3\lib\site-packages\keras\utils\vis_utils.py”, line 31, in _check_pydot
raise ImportError(‘Failed to import pydot. You must install pydot’

ImportError: Failed to import pydot. You must install pydot and graphviz for pydotprint to work.

getting this even if i installed pydot and graphviz

Reply
- Jason Brownlee April 8, 2018 at 6:22 am #
  
  Perhaps restart your machine?
  
  Perhaps comment out the part where you visualize the model?
  
  Reply
- deep_ml April 9, 2018 at 3:23 am #
  
  getting same error!
  Tried using solution from stackoverflow, upgraded packages..but it ain’t working..
  
  Reply
  - Jason Brownlee April 9, 2018 at 6:12 am #
    
    No problem, just skip that part and proceed. Comment out the plotting of the model.
    
    Reply
deep_ml April 9, 2018 at 4:06 pm #

I have trained the data using progressive loading and I stopped after 4 iterations, with a loss of 3.4952.

I am unable to understand this part,
In this simple example we will discard the loading of the development dataset and model checkpointing and simply save the model after each training epoch. You can then go back and load/evaluate each saved model after training to find the one we the lowest loss that you can then use in the next section.

Do you mean we have to load test set in the same way using progressive loading ?
Please help me understanding how to load the test set.

Reply
- Jason Brownlee April 10, 2018 at 6:15 am #
  
  I am suggesting that you may want to load the test data in the existing way and evaluate your model (next section).
  
  Reply
Jesia April 11, 2018 at 6:25 pm #

Error by runing “The complete code example is listed below.” in the Loading Data section:

Message Body:
Dataset: 6000
Descriptions: train=6000
Traceback (most recent call last):
File “task2.py”, line 64, in
train_features = load_photo_features(‘features.pkl’, train)
File “task2.py”, line 53, in load_photo_features
features = {k: all_features[k] for k in dataset}
File “task2.py”, line 53, in
features = {k: all_features[k] for k in dataset}
KeyError: ‘878758390_dd2cdc42f6’

Reply
- Jason Brownlee April 12, 2018 at 8:35 am #
  
  Perhaps confirm that you have the full dataset in place?
  
  Reply
  - Jesia April 24, 2018 at 11:22 pm #
    
    Yes, some images were missed.
    
    Thank you
    
    Reply
Belgaroui April 12, 2018 at 12:31 am #

Hello sir I’m learning from your articles that I find very informative and educational, I’ve been trying to compile this code :
# extract features from all images
directory = ‘Flicker8k_Dataset’
features = extract_features(directory)
print(‘Extracted Features: %d’ % len(features))
# save to file
dump(features, open(‘features.pkl’, ‘wb’))

but an error occurred and I don’t understand it can you help me fix it and thanks for all of you
here’s the mistake I made:
PermissionError Traceback (most recent call last)
in ()
1 # extract features from all images
2 directory = ‘Flicker8k_Dataset’
—-> 3 features = extract_features(directory)
4 print(‘Extracted Features: %d’ % len(features))
5 # save to file

in extract_features(directory)
13 # load an image from file
14 filename = directory + ‘/’ + name
—> 15 image = load_img(filename, target_size=(224, 224))
16 # convert the image pixels to a numpy array
17 image = img_to_array(image)

~\Anaconda3\envs\envir1\lib\site-packages\keras\preprocessing\image.py in load_img(path, grayscale, target_size, interpolation)
360 raise ImportError(‘Could not import PIL.Image. ‘
361 ‘The use of array_to_img requires PIL.’)
–> 362 img = pil_image.open(path)
363 if grayscale:
364 if img.mode != ‘L’:

~\Anaconda3\envs\envir1\lib\site-packages\PIL\Image.py in open(fp, mode)
2546
2547 if filename:
-> 2548 fp = builtins.open(filename, “rb”)
2549 exclusive_fp = True
2550

PermissionError: [Errno 13] Permission denied: ‘Flicker8k_Dataset/Flicker8k_Dataset’

Reply
- Jason Brownlee April 12, 2018 at 8:47 am #
  
  Looks like the dataset is missing or is not available on your workstation.
  
  Reply
Seaf April 13, 2018 at 1:33 am #

Hello sir, Thanks for your effort

I have trained the data using progressive loading and my machine restarted after 11 iterations,
how can i continue training from that checkpoint ?

Reply
- Jason Brownlee April 13, 2018 at 6:42 am #
  
  Load the last saved model, then continue training. As simple as that.
  
  I doubt more than a handful of epochs is required on this problem.
  
  Reply
  - Seaf April 13, 2018 at 12:44 pm #
    
    thank you !
    
    i have loaded the last model (‘model_11.h5’) that has 3.445 loss, now it continue training with 5.4461 loss, is that normal ?
    
    Reply
    - Jason Brownlee April 13, 2018 at 3:32 pm #
      
      Interesting, that is a little surprising. I wonder if there is a fault or if indeed the model loss has gotten worse.
      
      Some careful experiments may be required.
      
      Reply
Belgaroui April 13, 2018 at 3:07 am #

Thank you, I think so too….

I already downloaded Flicker8k_Datasets and extracted it in the same file where I work with jupyter notebook.

I consulted Google and Youtube to try to fix this error but in vain…

I don’t know but could you be so kind as to direct me and help me fix the problem.
Thank you very much for your efforts…

Reply
- Jason Brownlee April 13, 2018 at 6:43 am #
  
  What problem?
  
  Reply
  - Belgaroui April 14, 2018 at 12:46 am #
    
    Hi Jason,
    when I try to compile code related to the extracted features from all images I get this error that is “Permission denied” you told me earlier that Looks like the dataset is missing or is not available on my workstation I tried to fix the trick but in vain.
    Do you have any idea how I could do that?
    Do I need a user right or something like that?
    or maybe I need to reload the database?
    
    *the error :
    ~\Anaconda3\envs\envir1\lib\site-packages\PIL\Image.py in open(fp, mode)2546
    2547 if filename:
    -> 2548 fp = builtins.open(filename, “rb”)
    2549 exclusive_fp = True
    2550
    
    PermissionError: [Errno 13] Permission denied: ‘Flicker8k_Dataset/Flicker8k_Dataset’
    
    thanks a lot 🙂 🙂
    
    Reply
    - Jason Brownlee April 14, 2018 at 6:47 am #
      
      You appear to have a problem loading the data from your hard drive. Perhaps you stored the data in a location where you/your code does not have permission to read?
      
      Perhaps you are using a notebook or an IDE as another user?
      
      Try running from the command line and check file permissions.
      
      Reply
@nkish April 14, 2018 at 4:56 pm #

Thanks Jason. I really appreciate your knowledge and the way you express it to us through your articles, it’s amazing.

Reply
- Jason Brownlee April 15, 2018 at 6:24 am #
  
  Thanks, I’m glad the tutorials help!
  
  Reply
Abdallah April 14, 2018 at 7:15 pm #

Thank you very much mr.jason but I have some problems after download the pretrained model when make the model prediction
‘
—————————————————————————
FailedPreconditionError Traceback (most recent call last)
~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1349 try:
-> 1350 return fn(*args)
1351 except errors.OpError as e:

~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
1328 feed_dict, fetch_list, target_list,
-> 1329 status, run_metadata)
1330

~/.local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
472 compat.as_text(c_api.TF_Message(self.status.status)),
–> 473 c_api.TF_GetCode(self.status.status))
474 # Delete the underlying status object from memory otherwise it stays alive

FailedPreconditionError: Attempting to use uninitialized value block1_conv2_5/kernel
[[Node: block1_conv2_5/kernel/read = Identity[T=DT_FLOAT, _class=[“loc:@block1_conv2_5/kernel”], _device=”/job:localhost/replica:0/task:0/device:CPU:0″](block1_conv2_5/kernel)]]

During handling of the above exception, another exception occurred:

FailedPreconditionError Traceback (most recent call last)
in ()
24 return features
25 directory = ‘../ProjectPattern/Flickr8k_Dataset/Flicker8k_Dataset’
—> 26 features =extract_feature(directory)
27 dump(features,open(“feature.pkl”,”wb”))

in extract_feature(directory)
17 img =preprocess_input(img)
18 #extract feature by make prediction use the pretrained model
—> 19 feature = model.predict(img,verbose=0)
20 #extract img_id
21 img_id = name.split(‘.’)[0]

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/training.py in predict(self, x, batch_size, verbose, steps)
1811 f = self.predict_function
1812 return self._predict_loop(
-> 1813 f, ins, batch_size=batch_size, verbose=verbose, steps=steps)
1814
1815 def train_on_batch(self, x, y, sample_weight=None, class_weight=None):

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/training.py in _predict_loop(self, f, ins, batch_size, verbose, steps)
1306 else:
1307 ins_batch = _slice_arrays(ins, batch_ids)
-> 1308 batch_outs = f(ins_batch)
1309 if not isinstance(batch_outs, list):
1310 batch_outs = [batch_outs]

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/backend.py in __call__(self, inputs)
2551 session = get_session()
2552 updated = session.run(
-> 2553 fetches=fetches, feed_dict=feed_dict, **self.session_kwargs)
2554 return updated[:len(self.outputs)]
2555

~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
893 try:
894 result = self._run(None, fetches, feed_dict, options_ptr,
–> 895 run_metadata_ptr)
896 if run_metadata:
897 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1126 if final_fetches or final_targets or (handle and feed_dict_tensor):
1127 results = self._do_run(handle, final_targets, final_fetches,
-> 1128 feed_dict_tensor, options, run_metadata)
1129 else:
1130 results = []

~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1342 if handle is None:
1343 return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1344 options, run_metadata)
1345 else:
1346 return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1361 except KeyError:
1362 pass
-> 1363 raise type(e)(node_def, op, message)
1364
1365 def _extend_graph(self):

FailedPreconditionError: Attempting to use uninitialized value block1_conv2_5/kernel
[[Node: block1_conv2_5/kernel/read = Identity[T=DT_FLOAT, _class=[“loc:@block1_conv2_5/kernel”], _device=”/job:localhost/replica:0/task:0/device:CPU:0″](block1_conv2_5/kernel)]]

Caused by op ‘block1_conv2_5/kernel/read’, defined at:
File “/usr/lib/python3.6/runpy.py”, line 193, in _run_module_as_main
“__main__”, mod_spec)
File “/usr/lib/python3.6/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/home/abdo96/.local/lib/python3.6/site-packages/ipykernel_launcher.py”, line 16, in
app.launch_new_instance()
File “/home/abdo96/.local/lib/python3.6/site-packages/traitlets/config/application.py”, line 658, in launch_instance
app.start()
File “/home/abdo96/.local/lib/python3.6/site-packages/ipykernel/kernelapp.py”, line 478, in start
self.io_loop.start()
File “/home/abdo96/.local/lib/python3.6/site-packages/zmq/eventloop/ioloop.py”, line 177, in start
super(ZMQIOLoop, self).start()
File “/home/abdo96/.local/lib/python3.6/site-packages/tornado/ioloop.py”, line 888, in start
handler_func(fd_obj, events)
File “/home/abdo96/.local/lib/python3.6/site-packages/tornado/stack_context.py”, line 277, in null_wrapper
return fn(*args, **kwargs)
File “/home/abdo96/.local/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py”, line 440, in _handle_events
self._handle_recv()
File “/home/abdo96/.local/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py”, line 472, in _handle_recv
self._run_callback(callback, msg)
File “/home/abdo96/.local/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py”, line 414, in _run_callback
callback(*args, **kwargs)
File “/home/abdo96/.local/lib/python3.6/site-packages/tornado/stack_context.py”, line 277, in null_wrapper
return fn(*args, **kwargs)
File “/home/abdo96/.local/lib/python3.6/site-packages/ipykernel/kernelbase.py”, line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File “/home/abdo96/.local/lib/python3.6/site-packages/ipykernel/kernelbase.py”, line 233, in dispatch_shell
handler(stream, idents, msg)
File “/home/abdo96/.local/lib/python3.6/site-packages/ipykernel/kernelbase.py”, line 399, in execute_request
user_expressions, allow_stdin)
File “/home/abdo96/.local/lib/python3.6/site-packages/ipykernel/ipkernel.py”, line 208, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File “/home/abdo96/.local/lib/python3.6/site-packages/ipykernel/zmqshell.py”, line 537, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File “/home/abdo96/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py”, line 2728, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File “/home/abdo96/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py”, line 2850, in run_ast_nodes
if self.run_code(code, result):
File “/home/abdo96/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py”, line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File “”, line 26, in
features =extract_feature(directory)
File “”, line 2, in extract_feature
model = VGG19()
File “/home/abdo96/.local/lib/python3.6/site-packages/keras/applications/vgg19.py”, line 117, in VGG19
x = Conv2D(64, (3, 3), activation=’relu’, padding=’same’, name=’block1_conv2′)(x)
File “/home/abdo96/.local/lib/python3.6/site-packages/keras/engine/topology.py”, line 590, in __call__
self.build(input_shapes[0])
File “/home/abdo96/.local/lib/python3.6/site-packages/keras/layers/convolutional.py”, line 138, in build
constraint=self.kernel_constraint)
File “/home/abdo96/.local/lib/python3.6/site-packages/keras/legacy/interfaces.py”, line 91, in wrapper
return func(*args, **kwargs)
File “/home/abdo96/.local/lib/python3.6/site-packages/keras/engine/topology.py”, line 414, in add_weight
constraint=constraint)
File “/home/abdo96/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py”, line 392, in variable
v = tf.Variable(value, dtype=tf.as_dtype(dtype), name=name)
File “/home/abdo96/.local/lib/python3.6/site-packages/tensorflow/python/ops/variables.py”, line 229, in __init__
constraint=constraint)
File “/home/abdo96/.local/lib/python3.6/site-packages/tensorflow/python/ops/variables.py”, line 376, in _init_from_args
self._snapshot = array_ops.identity(self._variable, name=”read”)
File “/home/abdo96/.local/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py”, line 127, in identity
return gen_array_ops.identity(input, name=name)
File “/home/abdo96/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py”, line 2134, in identity
“Identity”, input=input, name=name)
File “/home/abdo96/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py”, line 787, in _apply_op_helper
op_def=op_def)
File “/home/abdo96/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py”, line 3160, in create_op
op_def=op_def)
File “/home/abdo96/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py”, line 1625, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value block1_conv2_5/kernel
[[Node: block1_conv2_5/kernel/read = Identity[T=DT_FLOAT, _class=[“loc:@block1_conv2_5/kernel”], _device=”/job:localhost/replica:0/task:0/device:CPU:0″](block1_conv2_5/kernel)]]
‘

Reply
- Jason Brownlee April 15, 2018 at 6:25 am #
  
  Wow. I have not seen this before, sorry.
  
  Perhaps try searching or posting on stackoverflow?
  
  Reply
  - Abdallah April 17, 2018 at 9:18 pm #
    
    so the problem solved by specifying which weights used not None(random initialization)
    but used pretraining on ‘imagenet’ and specify the include_top argument to be True
    
    Reply
    - Jason Brownlee April 18, 2018 at 8:04 am #
      
      Well done, I’m glad to hear that.
      
      Reply
Abdallah April 15, 2018 at 9:45 am #

When using Merged input in model the error below showed
Thanks in advance

in ()
29 plot_model(model,to_file=’model.png’,show_shapes=True,show_layer_names=True)
30 return model
—> 31 define_model(vocab_size,max_len)

in define_model(vocab_size, max_length)
26 model = Model(inputs=[input1,input2],outputs=output)
27
—> 28 model.compile(loss=’categorical_crossentropy’,optimizer=’Adam’)(mask)
29 plot_model(model,to_file=’model.png’,show_shapes=True,show_layer_names=True)
30 return model

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/training.py in compile(self, optimizer, loss, metrics, loss_weights, sample_weight_mode, weighted_metrics, target_tensors, **kwargs)
679
680 # Prepare output masks.
–> 681 masks = self.compute_mask(self.inputs, mask=None)
682 if masks is None:
683 masks = [None for _ in self.outputs]

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/topology.py in compute_mask(self, inputs, mask)
785 return self._output_mask_cache[cache_key]
786 else:
–> 787 _, output_masks = self._run_internal_graph(inputs, masks)
788 return output_masks
789

~/.local/lib/python3.6/site-packages/tensorflow/python/layers/network.py in _run_internal_graph(self, inputs, masks)
896
897 # Apply activity regularizer if any:
–> 898 if layer.activity_regularizer is not None:
899 regularization_losses = [
900 layer.activity_regularizer(x) for x in computed_tensors

AttributeError: ‘InputLayer’ object has no attribute ‘activity_regularizer’

Reply
- Jason Brownlee April 16, 2018 at 6:01 am #
  
  What version of Keras are you using?
  
  Did you copy all of the code exactly?
  
  Reply
  - Abdallah April 16, 2018 at 7:07 pm #
    
    I used verison 2.1.5
    the another question No, I didn’t copy all the code exactly but I understand the idea and imitate it in some parts and in other parts are written in my own
    
    Reply
    - Jason Brownlee April 17, 2018 at 5:56 am #
      
      Sorry, I cannot help you debug your own modifications.
      
      Reply
      - Abdallah April 17, 2018 at 9:10 pm #
        
        I wrote this problem in the stack overflow but no one answer so I will try to fix this problem in my own Thank you for your answers
      - Jason Brownlee April 18, 2018 at 8:04 am #
        
        Hang in there.
prateek bansal April 21, 2018 at 4:03 pm #

Hi, jason brownlee thanks for this fatanstic article.
I am curios to know that how he while loop is getting stopped in progressive training data genertor function ?
Please explain this to me

def data_generator(descriptions, photos, tokenizer, max_length):

# loop for ever over images
while 1:
for key, desc_list in descriptions.items():
# retrieve the photo feature
photo = photos[key][0]
in_img, in_seq, out_word = create_sequences(tokenizer, max_length, desc_list, photo)
yield [[in_img, in_seq], out_word]

Reply
- Jason Brownlee April 22, 2018 at 5:58 am #
  
  Note the yield.
  
  The number of epochs will decide how many times the yeild to the caller will be performed.
  
  Reply
  - Prateek Bansal April 24, 2018 at 6:03 am #
    
    Hi jason Brownlee,
    Can you please tell me what is the heighest BLEU Scores you got from this approach on standard dataset like flickr 8k, 30k etc.
    
    Please guide me how u you mananged to acheive this much Blue scores.
    What is the architecutre You used.
    
    Is there any result comparable to “show and tell model” ?
    
    Reply
    - Jason Brownlee April 24, 2018 at 6:38 am #
      
      All details of the model are in the post.
      
      You can learn more about the chosen architecture here:
      https://machinelearningmastery.com/caption-generation-inject-merge-architectures-encoder-decoder-model/
      
      Reply
Jubaer Hossain April 22, 2018 at 12:31 am #

Sir,
Great article indeed! But I’m facing problems downloading the model. Every time I try to download the model with the code you provided, after sometimes the connection gets lost and shows this message: “ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host”

Can you give any alternate solution to this problem? I have tried several times but failed.

Reply
- Jason Brownlee April 22, 2018 at 6:01 am #
  
  I’m sorry to hear that, I have some ideas:
  
  – Perhaps you can review the code in Keras that downloads the model and download it manually?
  – Perhaps you can use an alternate internet connection to download the model?
  – Perhaps you can setup an EC2 instance and download the model there to work with?
  – Perhaps you can ask a friend or peer to download the model for you?
  
  Reply
Sailee April 24, 2018 at 4:15 pm #

Hello Sir,
Your article is very interesting and easy to understand.

For the above code I am getting a very accurate caption if I use the same image as you have shown in the figure. But if I use some other image I am getting some description but not a correct one. So could you please tell me what is the problem here?
Thanks in advance.

Reply
- Jason Brownlee April 25, 2018 at 6:18 am #
  
  Perhaps try a suite of images to see how the model performs on average?
  
  Reply
Jesia April 24, 2018 at 11:36 pm #

I have trained the data using progressive loading untill 19 iterations.
Caption for your provided test image is generated. However, for new one( image of rabbit and other animals) i got the caption “dog is running …”.
Is there a way to train the models more than 19 iterations to get a better result or how to solve this issue?

thank you

Reply
- Jason Brownlee April 25, 2018 at 6:31 am #
  
  The model may only need 2-5 training iterations.
  
  Reply
Kingson May 10, 2018 at 11:49 pm #

Hi Jason,

Can you please share me full github repository of image captioning?

Reply
- Jason Brownlee May 11, 2018 at 6:37 am #
  
  No need, all of the code is listed above.
  
  Reply
Sayan May 12, 2018 at 4:25 am #

Hey , Jason the post is really amazing , but can you help to load me this especially the first step (Keras) which will probably take 1hour in CPU , I wanna test that I’m in GPU , how shall I be able to get that , Keras (GPU) so as to save time tho.
Thanks Jason.

Reply
- Jason Brownlee May 12, 2018 at 6:51 am #
  
  There are some commands on this post to check if you are using the GPU:
  https://machinelearningmastery.com/command-line-recipes-deep-learning-amazon-web-services/
  
  Reply
  - Sayan May 12, 2018 at 2:24 pm #
    
    Thanks Jason , one more thing i wanna ask is that in the FILENAME i must put the complete path to working directory right?
    
    Reply
    - Jason Brownlee May 13, 2018 at 6:33 am #
      
      If the data is in the current working directory, then the path can be relative.
      
      Reply
Sayan May 13, 2018 at 10:33 pm #

Hey Jason wassup , can you please explain what is meant by these lines :-
filepath = ‘model-ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5′
checkpoint = ModelCheckpoint(filepath, monitor=’val_loss’, verbose=1, save_best_only=True, mode=’min’)
1. The line in filepath especially this – epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f} ?

Reply
- Jason Brownlee May 14, 2018 at 6:34 am #
  
  It is the name of the file that will be saved with placeholders for specific values of the model at the time of saving.
  
  Reply
abbas khan May 20, 2018 at 4:59 pm #

hey jason!! I ran the code that returns a dictionary of image identifier to image features. but did nt work and gave the following error. Please Guide me how to fix this bug.

FileNotFoundError Traceback (most recent call last)
in ()
39 # extract features from all images
40 directory = ‘Flicker8k_Dataset’
—> 41 features = extract_features(directory)
42 print(‘Extracted Features: %d’ % len(features))
43 # save to file

in extract_features(directory)
18 # extract features from each photo
19 features = dict()
—> 20 for name in listdir(directory):
21 # load an image from file
22 filename = directory + ‘/’ + name

FileNotFoundError: [WinError 3] The system cannot find the path specified: ‘Flicker8k_Dataset’

Reply
- Jason Brownlee May 21, 2018 at 6:27 am #
  
  It looks like you do not have the dataset in the same directory as the code.
  
  Reply
  - abbas June 22, 2018 at 4:17 pm #
    
    jason i have code and dataset in the same directory.I can access a test png image from the same directory but i am unable to access the dataset images..I don’t know whats wrong with it.Please help me solving the issue because i can also access the flick_text dataset.The only issue i have with images dataset.
    
    Reply
    - Jason Brownlee June 23, 2018 at 6:13 am #
      
      I have list of things to try here:
      https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
      
      Reply
      - abbas June 25, 2018 at 2:03 pm #
        
        1) I have installed the latest environment except tensorflow 1.5 becuase higher versions not working for me.
        2) I have dataset and code in the same directory
        3) I ran the code from command line but still found no luck.
        4) I have exactly copied the code.
        5) I searched the error on stackoverflow but never found any authentic solution yet.
      - Jason Brownlee June 25, 2018 at 2:40 pm #
        
        If you type “ls” is the “Flicker8k_Dataset” directory in the current directory beside the code file/s?
      - abbas July 4, 2018 at 2:18 pm #
        
        I replaced the relative path(as in the tutorial) with absolute full path and it worked for me
      - Jason Brownlee July 4, 2018 at 2:56 pm #
        
        Glad to hear it.
      - abbas July 4, 2018 at 2:39 pm #
        
        Now i am facing an error while running the code “# define the model
        model = define_model(vocab_size, max_length)” in the progressive training section.I have installed pydot and graphviz libraries but still come up with the following error.
        
        —————————————————————————
        FileNotFoundError Traceback (most recent call last)
        C:\anaconda3\lib\site-packages\pydot.py in create(self, prog, format)
        1877 shell=False,
        -> 1878 stderr=subprocess.PIPE, stdout=subprocess.PIPE)
        1879 except OSError as e:
        
        C:\anaconda3\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors)
        708 errread, errwrite,
        –> 709 restore_signals, start_new_session)
        710 except:
        
        C:\anaconda3\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
        996 os.fspath(cwd) if cwd is not None else None,
        –> 997 startupinfo)
        998 finally:
        
        FileNotFoundError: [WinError 2] The system cannot find the file specified
        
        During handling of the above exception, another exception occurred:
        
        Exception Traceback (most recent call last)
        in ()
        1 # define the model
        —-> 2 model = define_model(vocab_size, max_length)
        
        in define_model(vocab_size, max_length)
        20 # summarize model
        21 model.summary()
        —> 22 plot_model(model, to_file=’model.png’, show_shapes=True)
        23 return model
        
        C:\anaconda3\lib\site-packages\keras\utils\vis_utils.py in plot_model(model, to_file, show_shapes, show_layer_names, rankdir)
        131 ‘LR’ creates a horizontal plot.
        132 “””
        –> 133 dot = model_to_dot(model, show_shapes, show_layer_names, rankdir)
        134 _, extension = os.path.splitext(to_file)
        135 if not extension:
        
        C:\anaconda3\lib\site-packages\keras\utils\vis_utils.py in model_to_dot(model, show_shapes, show_layer_names, rankdir)
        53 from ..models import Sequential
        54
        —> 55 _check_pydot()
        56 dot = pydot.Dot()
        57 dot.set(‘rankdir’, rankdir)
        
        C:\anaconda3\lib\site-packages\keras\utils\vis_utils.py in _check_pydot()
        24 # Attempt to create an image of a blank graph
        25 # to check the pydot/graphviz installation.
        —> 26 pydot.Dot.create(pydot.Dot())
        27 except OSError:
        28 raise OSError(
        
        C:\anaconda3\lib\site-packages\pydot.py in create(self, prog, format)
        1881 raise Exception(
        1882 ‘”{prog}” not found in path.’.format(
        -> 1883 prog=prog))
        1884 else:
        1885 raise
        
        Exception: “dot.exe” not found in path.
      - Jason Brownlee July 4, 2018 at 2:57 pm #
        
        Try commenting out the call to plot_model().
      - abbas July 7, 2018 at 1:53 pm #
        
        thanks jason! my training is in progress.
      - Jason Brownlee July 8, 2018 at 6:15 am #
        
        Glad to hear it.
      - abbas July 7, 2018 at 6:17 pm #
        
        In model evaluation section when i come to run the code
        ” filename = ‘model-ep002-loss3.245-val_loss3.612.h5’
        model = load_model(filename)”
        I come up with the error
        “OSError: Unable to open file (unable to open file: name = ‘model-ep002-loss3.245-val_loss3.612.h5’, errno = 2, error message = ‘No such file or directory’, flags = 0, o_flags = 0)”
        
        i want to ask where is the file ‘model-ep002-loss3.245-val_loss3.612.h5’??and how to select the file??should i pick up the file with least loss value???
      - Jason Brownlee July 8, 2018 at 6:18 am #
        
        You must change the filename to the model that you saved while training.
      - abbas July 9, 2018 at 2:16 pm #
        
        Jason i trained the model upto 20 epochs.Now please explain which model i should use for prediction? and if i should select from 1-5 then why i am running it for 20 epochs?
      - Jason Brownlee July 10, 2018 at 6:40 am #
        
        The one with the lowest error on a validation set.
      - abbas July 24, 2018 at 2:12 pm #
        
        I just want to understand the the whole pipeline.The CNN-VGG16 extracts the the features of image to a fixed length 256 vector.The text is cleaned and preprocesed , the RNN-LSTM predicts the next words of the sequence.
        What is the strategy and intuition of the encoder/decoder?
        how these two modalities (image and text) are merged by FF?
      - Jason Brownlee July 24, 2018 at 2:31 pm #
        
        More on the model architecture here:
        https://machinelearningmastery.com/caption-generation-inject-merge-architectures-encoder-decoder-model/
      - abbas July 31, 2018 at 3:26 pm #
        
        What alternative algorithms i can used for photo feature extraction or what extra modifications in the model is likely to perform better results?? or what extra building blocks needs to be added to the current tutorial for getting even refined results?
      - Jason Brownlee August 1, 2018 at 7:38 am #
        
        I have some suggestions here:
        https://machinelearningmastery.com/improve-deep-learning-performance/
      - abbas August 5, 2018 at 3:55 am #
        
        Dropout layer usually used to get rid of over-fitting.While Dense layer is usually used to change the dimensions.
        Why Dropout_1 and dropout_2 are not changing the dimensions while we set some of the connections to 0 ??What is the the intuition behind Dropout_1 and Dropout_2 Layer??Please suggest some links or explaination
      - Jason Brownlee August 5, 2018 at 5:38 am #
        
        Not get rid of, but reduce the likelihood of overfitting.
        
        You can learn more about the intuitions for dropout here:
        https://machinelearningmastery.com/dropout-regularization-deep-learning-models-keras/
      - abbas November 11, 2018 at 3:06 pm #
        
        Hi Jason!
        I implemented the above mentioned tutorial using VGG16 CNN architecture.Please let me know the code or tutorial that implements Inceptin model for image captioning.
      - Jason Brownlee November 12, 2018 at 5:35 am #
        
        You can change the example to use inception if you wish.
- Kanaan October 29, 2019 at 7:16 am #
  
  Dear abbas,
  kindly how did you solve your problem? I have the same problem :
  PermissionError : [Errno 13] Permission Denied: Flickr8k_Dataset/Flicker8k_Dataset’
  
  Reply
  - Jason Brownlee October 29, 2019 at 1:48 pm #
    
    Use the alternate download for the dataset listed in the tutorial.
    
    Reply
wasif May 24, 2018 at 10:56 pm #

Hi Jason Brownlee! Good tutorial. I doubt how model guarantee to generate semantically correct sentences. Please share your intuition or any available resource. For example, there is three word in vocabulary “is, dog, running”, so how could we guarantee model will generate a sentence with correct grammar structure like ‘dog is running’. Thank you

Reply
- Jason Brownlee May 25, 2018 at 9:27 am #
  
  Perhaps you can run the generated sentences through another process that corrects grammar.
  
  Reply
- abbas October 27, 2018 at 3:21 pm #
  
  Sir where can i find the implemented tutorial for extracting features from images using inception v3?
  
  Reply
  - Jason Brownlee October 28, 2018 at 6:07 am #
    
    You can remove the VGG and add the Inception model yourself.
    
    Reply
    - abbas November 18, 2018 at 3:28 am #
      
      I am trying to train my model using inception model.While training i come with the following error.How do i change the Shape of the input?
      
      CODE:
      # train the model, run epochs manually and save after each epoch
      epochs = 5
      steps = len(train_descriptions)
      for i in range(epochs):
      # create the data generator
      generator = data_generator(train_descriptions, train_features, tokenizer, max_length)
      # fit for one epoch
      model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
      # save model
      model.save(‘inception-model_’ + str(i) + ‘.h5’)
      
      ERROR:
      Error when checking input: expected input_3 to have shape (4096,) but got array with shape (2048,)
      
      Reply
      - Jason Brownlee November 18, 2018 at 6:46 am #
        
        It looks like you model and data have differing shapes, perhaps change the model or change the data.
      - abbas November 19, 2018 at 2:51 pm #
        
        do you have any working example for changing data dimensions?
      - Jason Brownlee November 20, 2018 at 6:31 am #
        
        You can learn about the reshape() function here:
        https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/
      - abbas November 20, 2018 at 7:04 pm #
        
        my input has dimension of 4096 while its giving error that its 2048.
        
        __________________________________________________________________________________________________
        Layer (type) Output Shape Param # Connected to
        ==================================================================================================
        input_4 (InputLayer) (None, 34) 0
        __________________________________________________________________________________________________
        input_3 (InputLayer) (None, 4096) 0
        __________________________________________________________________________________________________
        embedding_2 (Embedding) (None, 34, 256) 1940224 input_4[0][0]
        __________________________________________________________________________________________________
        dropout_3 (Dropout) (None, 4096) 0 input_3[0][0]
        __________________________________________________________________________________________________
        dropout_4 (Dropout) (None, 34, 256) 0 embedding_2[0][0]
        __________________________________________________________________________________________________
        dense_4 (Dense) (None, 256) 1048832 dropout_3[0][0]
        __________________________________________________________________________________________________
        lstm_2 (LSTM) (None, 256) 525312 dropout_4[0][0]
        __________________________________________________________________________________________________
        add_2 (Add) (None, 256) 0 dense_4[0][0]
        lstm_2[0][0]
        __________________________________________________________________________________________________
        dense_5 (Dense) (None, 256) 65792 add_2[0][0]
        __________________________________________________________________________________________________
        dense_6 (Dense) (None, 7579) 1947803 dense_5[0][0]
        ==================================================================================================
        Total params: 5,527,963
        Trainable params: 5,527,963
        Non-trainable params: 0
        
        ERROR:
        Error when checking input: expected input_3 to have shape (4096,) but got array with shape (2048,)
      - Jason Brownlee November 21, 2018 at 7:50 am #
        
        Looks like there is a mismatch between your data and the model.
      - abbas November 21, 2018 at 3:04 pm #
        
        so then how to make data and model inter harmony?
      - Jason Brownlee November 22, 2018 at 6:20 am #
        
        Sorry, I don’t understand, can you elaborate?
- abbas August 23, 2019 at 4:27 pm #
  
  jason my model is not loading even the print command is not giving me the output..
  the following block of code is not giving the output..where is the error?
  
  # load the model
  filename = ‘xraysmodel_8.h5’
  print(‘abbas’)
  model = load_model(filename)
  # evaluate model
  evaluate_model(model, test_descriptions, test_features, tokenizer, max_length)
  
  Reply
  - Jason Brownlee August 24, 2019 at 7:44 am #
    
    Are you running from the command line? If so, it will report something.
    
    Reply
    - abbas August 25, 2019 at 4:54 pm #
      
      no i am using jupyter notebook
      
      Reply
      - Jason Brownlee August 26, 2019 at 6:10 am #
        
        I recommend not using a notebook. Instead, run from the command line, here’s how:
        https://machinelearningmastery.com/faq/single-faq/how-do-i-run-a-script-from-the-command-line
Andreas May 26, 2018 at 7:27 am #

Thanks for the great post.

I trained the model when I save the image from “http://media.einfachtierisch.de/thumbnail/600/0/media.einfachtierisch.de/images/2017/07/glueckliche-freigaenger-katze-Shutterstock-Olga-Visav_504063007.jpg” then I am still getting the text

startseq dog is running through the grass endseq

what do I make wrong?
The test image appears as intended.

Reply
- Jason Brownlee May 27, 2018 at 6:42 am #
  
  Perhaps a bug in your check?
  Perhaps try other images?
  Perhaps the model is overfit?
  
  Reply
  - Andi June 9, 2018 at 10:55 pm #
    
    Hi, I have tried other images and the same result. Even after only 2 or 3 iterations I mostly get the same text. As Paul described below.
    
    Reply
    - Jason Brownlee June 10, 2018 at 6:03 am #
      
      That is surprising.
      
      Reply
- abbas November 22, 2018 at 3:41 pm #
  
  How can i change my data or my model accordingly to match it??
  
  Reply
  - abbas March 3, 2019 at 3:34 am #
    
    jason! please let me know how can i change learning rate,momentum and number of neurons in this tutorial? can i do it or not?
    
    Reply
    - Jason Brownlee March 3, 2019 at 8:04 am #
      
      Learn how to change the learning rate here:
      https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/
      
      Reply
      - abbas March 7, 2019 at 4:35 pm #
        
        can i change learning rate and optimizer for this tutorial. if yes then how
      - Jason Brownlee March 8, 2019 at 7:44 am #
        
        Yes, here are some examples:
        https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/
Paul May 28, 2018 at 4:17 pm #

Hi Jason
It was a nice article.
I trained the model for 12 epochs in my gpu.
But the prediction was not so accurate.
Most of the times I got the prediction with “man in blue shirt is riding his bohemian on the street” . with the keywords in this sentence.

Help me out .

Reply
- Jason Brownlee May 29, 2018 at 6:23 am #
  
  It needs far fewer epochs, try early stopping against a validation set.
  
  Reply
- Andi June 9, 2018 at 10:56 pm #
  
  HI Paul, have you found a solution to this? I have a similiar issue.
  
  Reply
Ravi June 4, 2018 at 10:21 pm #

Hi jason,
While progressive loading, we will get 20 models. Which model is choosen for prediction?

Reply
- Jason Brownlee June 5, 2018 at 6:39 am #
  
  The one with the best skill on the hold out set, likely within epoch 1-5.
  
  Reply
Praharsha Singaraju June 5, 2018 at 5:12 pm #

Hi jason,

I got the following error when i ran the extract_features function.
can you please help me fix it?

field_value = self._fields.get(field)
TypeError: descriptor ‘_fields’ for ‘OpDef’ objects doesn’t apply to ‘OpDef’ object

Reply
- Jason Brownlee June 6, 2018 at 6:37 am #
  
  This might give you some ideas:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Ananya June 14, 2018 at 2:26 pm #

Hello Jason! I just wanted to know why aren’t we validating the trained model in progressive loading…

Reply
- Jason Brownlee June 14, 2018 at 4:09 pm #
  
  You can, as I note in the tutorial. The progressive loading is just a small example to help those who don’t have enough RAM to run the main example.
  
  Reply
Ananya June 14, 2018 at 2:44 pm #

I meant to ask, ‘Why cant we simultaneously validate, as in the previous code wherein no progressive loading is used?”

Reply
Malik June 15, 2018 at 12:41 pm #

Finally someone who understands the importance of separating mathematics from ‘implementation’. The drawback most tutorials have is that they try to discuss both simultaneously and hence making things quite confusing. ‘Implementation’ requires a completely different approach from understanding the theory.

Another wonderful thing about this tutorial is that you actually go through the preprocessing steps. This is where I usually get stuck because most university and online courses and tutorials do not discuss them at all.

Reply
- Jason Brownlee June 15, 2018 at 2:50 pm #
  
  I’m glad the tutorial helps Malik.
  
  Reply
DIKSHA SINGLA June 16, 2018 at 5:59 pm #

Traceback (most recent call last):
File “C:\Users\hp\AppData\Local\Programs\Python\Python36\lib\site-packages\pydot.py”, line 1861, in create
stderr=subprocess.PIPE, stdout=subprocess.PIPE)
File “C:\Users\hp\AppData\Local\Programs\Python\Python36\lib\subprocess.py”, line 709, in __init__
restore_signals, start_new_session)
File “C:\Users\hp\AppData\Local\Programs\Python\Python36\lib\subprocess.py”, line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:\Users\hp\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\utils\vis_utils.py”, line 26, in _check_pydot
pydot.Dot.create(pydot.Dot())
File “C:\Users\hp\AppData\Local\Programs\Python\Python36\lib\site-packages\pydot.py”, line 1867, in create
raise OSError(*args)
FileNotFoundError: [WinError 2] “dot.exe” not found in path.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:\Users\hp\Desktop\iitp\caption_new\5.py”, line 163, in
model = define_model(vocab_size, max_length)
File “C:\Users\hp\Desktop\iitp\caption_new\5.py”, line 131, in define_model
plot_model(model, to_file=’model.png’, show_shapes=True)
File “C:\Users\hp\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\utils\vis_utils.py”, line 133, in plot_model
dot = model_to_dot(model, show_shapes, show_layer_names, rankdir)
File “C:\Users\hp\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\utils\vis_utils.py”, line 55, in model_to_dot
_check_pydot()
File “C:\Users\hp\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\utils\vis_utils.py”, line 29, in _check_pydot
‘pydot failed to call GraphViz.’
OSError: pydot failed to call GraphViz.Please install GraphViz (https://www.graphviz.org/) and ensure that its executables are in the $PATH.

Reply
- Jason Brownlee June 17, 2018 at 5:38 am #
  
  Looks like you need to install pygraphviz, or comment out the plotting of the model.
  
  Reply
shantanu singh June 19, 2018 at 3:31 pm #

—-> 1 description = generate_desc(model, tokenizer, photo, max_length)
2 print(description)

in generate_desc(model, tokenizer, photo, max_length)
10 sequence = tokenizer.texts_to_sequences([in_text])[0]
11 sequence = pad_sequences([sequence], maxlen=max_length)
—> 12 yhat = model.predict([photo,sequence], verbose=0)
13 yhat = argmax(yhat)
14 word = word_for_id(yhat, tokenizer)
AttributeError: ‘dict’ object has no attribute ‘ndim’

Reply
- Jason Brownlee June 20, 2018 at 6:21 am #
  
  Ensure that you copy all code for the example.
  
  Reply
- Devesh Pandey May 4, 2019 at 11:51 pm #
  
  @Shantanu Singh have you resolved your problem, cause I am facing the exact same problem
  
  Reply
  - Jason Brownlee May 5, 2019 at 6:30 am #
    
    Are all of your libraries up to date?
    
    Reply
vinay June 21, 2018 at 12:13 am #

When i am training, i am getting an vocab length of 8359. It is less than what you are getting.
Will it be a problem?

Reply
- Jason Brownlee June 21, 2018 at 6:18 am #
  
  Maybe.
  
  Reply
mun June 24, 2018 at 6:43 am #

Hello, i am stuck into this..’startseq’ and ‘endseq’ are not added in the Description.txt file but there is no error when i am running that module

Reply
- Jason Brownlee June 24, 2018 at 7:37 am #
  
  We add them in the load_clean_descriptions() function after loading the data.
  
  Reply
Ben June 24, 2018 at 8:22 am #

So,I followed exactly all the steps as shown above and after progressive loading,when the model is getting compiled,it keeps on running epoch 1/1 over and over again and keeps saving different .h5 files. So,I stopped the process after 5 iterations and got a loss of ~3.38 and when I am generating captions,it is not giving even close captions. What should I do to improve my results? Should I let the model to be trained for more iterations or will it cause over-fitting?

Reply
- Jason Brownlee June 25, 2018 at 6:16 am #
  
  The progressive loading example runs epochs manually, not the same epoch again and again.
  
  Perhaps test each saved model and use the one with the lowest loss to generate captions.
  
  Reply
Kingson June 28, 2018 at 4:11 am #

Hi Jason,
I am trying to create image caption for my own datasets. Like I have 4k images with single caption. I am able to run and create model for Flickr8K dataset.Its work properly. But when I use my dataset I am able to generate all required files except model. When I try to train the model it gives error –
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_2 (InputLayer) (None, 27) 0
__________________________________________________________________________________________________
input_1 (InputLayer) (None, 4096) 0
__________________________________________________________________________________________________
embedding_1 (Embedding) (None, 27, 256) 1058048 input_2[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout) (None, 4096) 0 input_1[0][0]
__________________________________________________________________________________________________
dropout_2 (Dropout) (None, 27, 256) 0 embedding_1[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 256) 1048832 dropout_1[0][0]
__________________________________________________________________________________________________
lstm_1 (LSTM) (None, 256) 525312 dropout_2[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 256) 0 dense_1[0][0]
lstm_1[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 256) 65792 add_1[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 4133) 1062181 dense_2[0][0]
==================================================================================================
Total params: 3,760,165
Trainable params: 3,760,165
Non-trainable params: 0
__________________________________________________________________________________________________
None
Traceback (most recent call last):
File “train2.py”, line 179, in
model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (10931, 7, 7, 512)

keras version is – 2.2.0

How I can solve this error?
Please help me out.

Reply
- Jason Brownlee June 28, 2018 at 6:26 am #
  
  Looks like the dimensions of your data do not match the expected dimensions of the model. You can change the data or the model.
  
  Reply
  - Kingson June 28, 2018 at 7:59 pm #
    
    Ok thanks Jason,
    I will try to change the model.
    
    Reply
    - Gaurav Anand August 3, 2018 at 3:26 pm #
      
      Hi Kingson
      
      Were you able to get rid of the above problem? Since I am also getting the same error while training the model.
      
      Thanks in advance
      
      Reply
- Aksha Jadhav September 15, 2020 at 12:18 am #
  
  Hey ….Did u solve this error?
  I m also stuck here…Please help
  
  Reply
Satendra Varma July 3, 2018 at 2:11 pm #

Hey Jason,

Great article. I just wanted to ask where to do you start developing code for such implementations. Do you refer papers and code from scratch or refer material that explains implementation code in detail and translate it to keras ?

Thanks,
Satendra

Reply
- Jason Brownlee July 4, 2018 at 8:18 am #
  
  Start by understanding the principle of the approach (from multiple papers), then implement it using whatever tools, e.g. keras.
  
  Reply
Peter Bonac July 8, 2018 at 2:53 am #

Hi Jason,

Great tutorial. I am wondering what the best way to limit the vocabulary size. As num_words does not influence tokenizer.fit_on_text, are these changes correct:

def create_tokenizer(descriptions):
lines = to_lines(descriptions)
tokenizer = Tokenizer(num_words = VOCAB_NUM_WORDS)
tokenizer.fit_on_texts(lines)
tokenizer.texts_to_sequences(lines)
return tokenizer

and

vocab_size = VOCAB_NUM_WORDS

Reply
- Jason Brownlee July 8, 2018 at 6:24 am #
  
  Create a list of the n most frequent words you want to work with from the dataset, save them to file, then use them to filter the dataset prior to modeling.
  
  I have many examples of this on the blog, for example:
  https://machinelearningmastery.com/develop-word-embedding-model-predicting-movie-review-sentiment/
  
  Reply
  - Peter Bonac July 8, 2018 at 2:14 pm #
    
    Thank you!
    
    One more question. For Progressive Loading it seems that the batch size is 1 from the data_generator. If I would like to create a batch i run into the problem of size defining as the “create_sequences” output is variable in size from:
    
    in_img, in_seq, out_word = create_sequences(tokenizer, max_length, desc_list, photo)
    
    For example i can’t size define to something like:
    
    batch_features = np.zeros((batch_size, 17, 1280))
    batch_labels = np.zeros((batch_size, 17, 40000))
    
    How can I create a batch of “in_img, in_seq, out_word” (if each sequence will be a different length)? Is there an easy way to make a larger batch size? Again thank you for your help.
    
    Reply
    - Jason Brownlee July 9, 2018 at 6:32 am #
      
      Not sure I follow.
      
      In progressive loading, the generator will release a batch of data. You can change the code to make this as few or as many samples as you wish.
      
      Reply
      - Peter Bonac July 9, 2018 at 8:12 am #
        
        Sorry I didn’t explain well. From my understanding your “data_generator” releases data for 1 image into “model.fit_generator” at a time. I would like to change this to a batch of images.
        
        The problem I am having is if I try to use a code structure like below, I am not able to create the empty arrays (unless I pad each line to “max_length”, and make “batch_features = np.zeros((batch_size, max_length, NN_input_shape))”).
        
        #code structure
        def generator(features, labels, batch_size):
        # Create empty arrays to contain batch of features and labels#
        batch_features = np.zeros((batch_size, 64, 64, 3))
        batch_labels = np.zeros((batch_size,1))
        while True:
        for i in range(batch_size):
        # choose random index in features
        index= random.choice(len(features),1)
        batch_features[i] = some_processing(features[index])
        batch_labels[i] = labels[index]
        yield batch_features, batch_labels
        
        Also, is there somewhere I can donate money to your site?
      - Jason Brownlee July 10, 2018 at 6:37 am #
        
        Correct.
        
        Yes, you can build up data as Python lists then covert the lists to numpy arrays before you return them. It is a strategy I use all the time.
Fathi July 11, 2018 at 12:01 pm #

Here the part of my code where I have a problem :

size = 64
img1 = load_img(‘00598546-9.jpg’, target_size=(1, size, size))
imshow(img1)

X1 = (TimeDistributed(Conv2D(32, (3,3), activation=’relu’), input_shape=(None, size, size, 3)))(img1)

Error message:
ayer time_distributed_11 was called with an input that isn’t a symbolic tensor. Received type: . Full input: []. All inputs to the layer should be tensors.

I’m looking to find the output X1 by using (img1) as an input but I get this error message.

How can I use (img1) to find the output ?

Reply
- Jason Brownlee July 11, 2018 at 2:57 pm #
  
  Are you able to confirm that your libraries are up to date?
  
  Reply
  - Fathi July 14, 2018 at 7:01 am #
    
    How can I make sure it’s up to date ?
    
    Thank you.
    
    Reply
    - Jason Brownlee July 15, 2018 at 6:01 am #
      
      It depends on the method you used to install the libraries.
      
      Perhaps this post will help:
      https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/
      
      Reply
Maqsood July 16, 2018 at 6:24 pm #

Hi Jason,

I am trying to use the model you presented above for recognizing handwritten documents. In literature the feature extraction stage for OCR produces another 2-D matrix for each image (a feature vector is found for each column vector in the image). How can I then convert this 2-D matrix into a 1-D vector?

Reply
- Jason Brownlee July 17, 2018 at 6:13 am #
  
  The vector output will be a the probability of an image belonging to each output class.
  
  Reply
Shantanu Patil July 19, 2018 at 11:38 pm #

After training it for two epoch it gives caption as “man in red shirt standing on Street” for every other image i put

Reply
- Jason Brownlee July 20, 2018 at 5:59 am #
  
  Sounds like it got stuck, try training it again?
  
  Reply
  - Shantanu Patil July 21, 2018 at 10:35 pm #
    
    After training again for 6 epoch and loss of 3.3 its showing captions for girls as boys and calling a bird as a dog, should I train it for 20 epoch?
    
    Reply
    - Jason Brownlee July 22, 2018 at 6:23 am #
      
      No, the model does not need very much training.
      
      Reply
      - Shantanu Patil July 25, 2018 at 11:52 pm #
        
        For new Images it is not working, can you send me a trained model? because I tried up to 10 epoch and it is not working
      - Jason Brownlee July 26, 2018 at 7:43 am #
        
        What do you mean it is not working?
- Saurabh May 6, 2019 at 4:31 pm #
  
  Same problem with me. I’ve trained the model several times now but it is giving same captions to all other images when i test it.
  
  Reply
Moha July 26, 2018 at 5:22 pm #

Is the sequence length the number of words in a sequence?

Reply
- Jason Brownlee July 27, 2018 at 5:48 am #
  
  Yes. Or rather, the maximum number of words that may appear in a sequence.
  
  Reply
Rishav July 27, 2018 at 8:45 pm #

Hi Jason,

Firstly I would like to thank you for sharing your knowledge and helping everyone. I am new to this field now. Could you please explain, why are removing all single letter words?

Yes, removing words having numbers and removing punctuation does make sense. Even removing single letter word also makes sense, but by removing “a”, wont it affect formation of new sentences?

Reply
- Jason Brownlee July 28, 2018 at 6:34 am #
  
  It does, but it makes the problem simpler to model with little effect on meaning.
  
  Reply
Gaurav Anand August 2, 2018 at 3:21 pm #

Hello Jason

I am facing the following error while training the model with progressive loading.
Could you please help to fix this?

ImportError Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py in swig_import_helper()
13 try:
—> 14 return importlib.import_module(mname)
15 except ImportError:

~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\__init__.py in import_module(name, package)
125 level += 1
–> 126 return _bootstrap._gcd_import(name[level:], package, level)
127

~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\_bootstrap.py in _gcd_import(name, package, level)

~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\_bootstrap.py in _find_and_load(name, import_)

~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\_bootstrap.py in _find_and_load_unlocked(name, import_)

~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\_bootstrap.py in _load_unlocked(spec)

~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\_bootstrap.py in module_from_spec(spec)

~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\_bootstrap_external.py in create_module(self, spec)

~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\_bootstrap.py in _call_with_frames_removed(f, *args, **kwds)

ImportError: DLL load failed: The specified module could not be found.

During handling of the above exception, another exception occurred:

ModuleNotFoundError Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow.py in ()
57
—> 58 from tensorflow.python.pywrap_tensorflow_internal import *
59 from tensorflow.python.pywrap_tensorflow_internal import __version__

~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py in ()
16 return importlib.import_module(‘_pywrap_tensorflow_internal’)
—> 17 _pywrap_tensorflow_internal = swig_import_helper()
18 del swig_import_helper

~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py in swig_import_helper()
15 except ImportError:
—> 16 return importlib.import_module(‘_pywrap_tensorflow_internal’)
17 _pywrap_tensorflow_internal = swig_import_helper()

~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\__init__.py in import_module(name, package)
125 level += 1
–> 126 return _bootstrap._gcd_import(name[level:], package, level)
127

ModuleNotFoundError: No module named ‘_pywrap_tensorflow_internal’

During handling of the above exception, another exception occurred:

ImportError Traceback (most recent call last)
in ()
1 from numpy import array
2 from pickle import load
—-> 3 from keras.preprocessing.text import Tokenizer
4 from keras.preprocessing.sequence import pad_sequences
5 from keras.utils import to_categorical

~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\keras\__init__.py in ()
1 from __future__ import absolute_import
2
—-> 3 from . import utils
4 from . import activations
5 from . import applications

~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\keras\utils\__init__.py in ()
4 from . import data_utils
5 from . import io_utils
—-> 6 from . import conv_utils
7
8 # Globally-importable utils.

~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\keras\utils\conv_utils.py in ()
7 from six.moves import range
8 import numpy as np
—-> 9 from .. import backend as K
10
11

~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\keras\backend\__init__.py in ()
85 elif _BACKEND == ‘tensorflow’:
86 sys.stderr.write(‘Using TensorFlow backend.\n’)
—> 87 from .tensorflow_backend import *
88 else:
89 # Try and load external backend.

~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py in ()
4
5 import tensorflow as tf
—-> 6 from tensorflow.python.framework import ops as tf_ops
7 from tensorflow.python.training import moving_averages
8 from tensorflow.python.ops import tensor_array_ops

~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\__init__.py in ()
47 import numpy as np
48
—> 49 from tensorflow.python import pywrap_tensorflow
50
51 # Protocol buffers

~\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow.py in ()
72 for some common reasons and solutions. Include the entire stack trace
73 above this error message when asking for help.””” % traceback.format_exc()
—> 74 raise ImportError(msg)
75
76 # pylint: enable=wildcard-import,g-import-not-at-top,unused-import,line-too-long

ImportError: Traceback (most recent call last):
File “C:\Users\gaurav.anand\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py”, line 14, in swig_import_helper
return importlib.import_module(mname)
File “C:\Users\gaurav.anand\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\__init__.py”, line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 994, in _gcd_import
File “”, line 971, in _find_and_load
File “”, line 955, in _find_and_load_unlocked
File “”, line 658, in _load_unlocked
File “”, line 571, in module_from_spec
File “”, line 922, in create_module
File “”, line 219, in _call_with_frames_removed
ImportError: DLL load failed: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:\Users\gaurav.anand\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow.py”, line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File “C:\Users\gaurav.anand\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py”, line 17, in
_pywrap_tensorflow_internal = swig_import_helper()
File “C:\Users\gaurav.anand\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py”, line 16, in swig_import_helper
return importlib.import_module(‘_pywrap_tensorflow_internal’)
File “C:\Users\gaurav.anand\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\importlib\__init__.py”, line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named ‘_pywrap_tensorflow_internal’

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.

Reply
- Jason Brownlee August 3, 2018 at 5:58 am #
  
  Sorry to hear that.
  
  Perhaps post your error to stackoverflow?
  
  Reply
- abbas August 3, 2018 at 1:41 pm #
  
  downgrade your tensorflow to version 1.5, i hope it will work for you.
  
  Reply
  - Gaurav Anand August 3, 2018 at 3:19 pm #
    
    Yes, it has now worked somehow after creating new environment with latest tensorflow version. However, it is giving me another possibly known error. Please have a look once.
    
    Model:
    __________________________________________________________________________________________________
    Layer (type) Output Shape Param # Connected to
    ==================================================================================================
    input_4 (InputLayer) (None, 30) 0
    __________________________________________________________________________________________________
    input_3 (InputLayer) (None, 7, 7, 512) 0
    __________________________________________________________________________________________________
    embedding_2 (Embedding) (None, 30, 256) 987392 input_4[0][0]
    __________________________________________________________________________________________________
    dropout_3 (Dropout) (None, 7, 7, 512) 0 input_3[0][0]
    __________________________________________________________________________________________________
    dropout_4 (Dropout) (None, 30, 256) 0 embedding_2[0][0]
    __________________________________________________________________________________________________
    dense_4 (Dense) (None, 7, 7, 256) 131328 dropout_3[0][0]
    __________________________________________________________________________________________________
    lstm_2 (LSTM) (None, 256) 525312 dropout_4[0][0]
    __________________________________________________________________________________________________
    add_2 (Add) (None, 7, 7, 256) 0 dense_4[0][0]
    lstm_2[0][0]
    __________________________________________________________________________________________________
    dense_5 (Dense) (None, 7, 7, 256) 65792 add_2[0][0]
    __________________________________________________________________________________________________
    dense_6 (Dense) (None, 7, 7, 3857) 991249 dense_5[0][0]
    ==================================================================================================
    
    —————————————————————————
    ValueError Traceback (most recent call last)
    in ()
    178 generator = data_generator(train_descriptions, train_features, tokenizer, max_length)
    179 # fit for one epoch
    –> 180 model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
    181 # save model
    182 model.save(‘model_’ + str(i) + ‘.h5’)
    
    c:\users\gaurav.anand\appdata\local\continuum\anaconda3\envs\tensorflow1.7\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
    89 warnings.warn(‘Update your ' + object_name + 90 ' call to the Keras 2 API: ‘ + signature, stacklevel=2)
    —> 91 return func(*args, **kwargs)
    92 wrapper._original_function = func
    93 return wrapper
    
    c:\users\gaurav.anand\appdata\local\continuum\anaconda3\envs\tensorflow1.7\lib\site-packages\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    1413 use_multiprocessing=use_multiprocessing,
    1414 shuffle=shuffle,
    -> 1415 initial_epoch=initial_epoch)
    1416
    1417 @interfaces.legacy_generator_methods_support
    
    c:\users\gaurav.anand\appdata\local\continuum\anaconda3\envs\tensorflow1.7\lib\site-packages\keras\engine\training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    211 outs = model.train_on_batch(x, y,
    212 sample_weight=sample_weight,
    –> 213 class_weight=class_weight)
    214
    215 outs = to_list(outs)
    
    c:\users\gaurav.anand\appdata\local\continuum\anaconda3\envs\tensorflow1.7\lib\site-packages\keras\engine\training.py in train_on_batch(self, x, y, sample_weight, class_weight)
    1207 x, y,
    1208 sample_weight=sample_weight,
    -> 1209 class_weight=class_weight)
    1210 if self._uses_dynamic_learning_phase():
    1211 ins = x + y + sample_weights + [1.]
    
    c:\users\gaurav.anand\appdata\local\continuum\anaconda3\envs\tensorflow1.7\lib\site-packages\keras\engine\training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size)
    785 feed_output_shapes,
    786 check_batch_axis=False, # Don’t enforce the batch size.
    –> 787 exception_prefix=’target’)
    788
    789 # Generate sample-wise weight values given the sample_weight and
    
    c:\users\gaurav.anand\appdata\local\continuum\anaconda3\envs\tensorflow1.7\lib\site-packages\keras\engine\training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
    125 ‘: expected ‘ + names[i] + ‘ to have ‘ +
    126 str(len(shape)) + ‘ dimensions, but got array ‘
    –> 127 ‘with shape ‘ + str(data_shape))
    128 if not check_batch_axis:
    129 data_shape = data_shape[1:]
    
    ValueError: Error when checking target: expected dense_6 to have 4 dimensions, but got array with shape (11, 1, 3857)
    
    I have made an only change in line 114:
    inputs1 = Input(shape=(4096,)) >> inputs1 = Input(shape=(7, 7, 512,))
    Because, it was earlier giving error for Data structure mismatch for inputs1 but now it is giving same error in 3rd dense layer.
    
    As I read other comments, it is a common issue.
    Could you please share your opinion how to get rid of this ?
    
    Any external guide to data structure mismatch would be much appreciated.
    
    Reply
    - Jason Brownlee August 4, 2018 at 5:58 am #
      
      I have some suggestions here:
      https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
      
      Reply
      - Gaurav Anand August 6, 2018 at 6:45 pm #
        
        Thanks for FAQ page.
        
        I guess the problem is that I am defining model’s inputs1 with shape
        inputs1 = Input(shape=(4096,))
        but data_generator() method is generating in_img of shape (11,7,7,512).
        Probably there is a need to change format of features.pkl file’s data.
        
        Is it right to move in this direction?
Moha August 3, 2018 at 7:51 pm #

Some image captioning libraries (such as Im2txt) are able to provide a confidence score for their generated captions. This helps us when we have a caption that is wrong, so that we can at least tell whether or not by the confidence if the model was ‘unsure’ about the text it generated. How would we go about adding something like that to this?

Reply
- Jason Brownlee August 4, 2018 at 6:03 am #
  
  Good question. Perhaps contact the developers and ask their approach?
  
  Reply
Moha August 3, 2018 at 7:53 pm #

I have got to say. This is the best image captioning tutorial I have found online. Thank you for helping me understand it better.

Reply
- Jason Brownlee August 4, 2018 at 6:03 am #
  
  Thanks, I’m glad it helped.
  
  Reply
Mmed August 14, 2018 at 1:13 am #

Would it make sense to monitor the accuracy and validation accuracy for image captioning?

That is what I added to model.compile:

model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

That gave for the first epoch an accuracy of 0.9463.
And a validation accuracy of 0.9903.

Doesn’t that seem too high though for the 1st epoch?

Reply
- Jason Brownlee August 14, 2018 at 6:22 am #
  
  No accuracy does not tell us much about the performance of the model. We must use a score like BLEU.
  
  Reply
  - Mmed August 15, 2018 at 1:16 am #
    
    Thank you for your response, Dr. Bronwlee. Okay, but does ‘accuracy’ mean anything? I mean Keras is doing some calculations to get these numbers, right? Even if it does not help us to learn about the model’s performance, does the accuracy metric represent anything?
    
    Reply
    - Jason Brownlee August 15, 2018 at 6:06 am #
      
      No. You can monitor loss though.
      
      Reply
Naveen Kumar August 19, 2018 at 3:24 am #

I am Unable to make Progressive Loading, After first epoch I am getting error like

5995/6000 [============================>.] – ETA: 2s – loss: 4.6600
5996/6000 [============================>.] – ETA: 2s – loss: 4.6597
5997/6000 [============================>.] – ETA: 1s – loss: 4.6597
5998/6000 [============================>.] – ETA: 1s – loss: 4.6595
5999/6000 [============================>.] – ETA: 0s – loss: 4.6595
6000/6000 [==============================] – 3329s 555ms/step – loss: 4.6598

Process finished with exit code -1073741819 (0xC0000005)

Reply
- Naveen Kumar August 19, 2018 at 3:25 am #
  
  I am running on Lenovo Laptop with 16 GB RAM., Intel i5, Pychram IDE
  
  Reply
  - Jason Brownlee August 19, 2018 at 6:30 am #
    
    First, confirm your environment is up to date:
    https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/
    
    Then, I have some suggestions here:
    https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
    
    Reply
    - Naveen Kumar August 19, 2018 at 1:37 pm #
      
      C:\Users\navee\Anaconda3\python.exe “D:/Image Caption/versions.py”
      Using TensorFlow backend.
      Anaconda Version3.6.6 |Anaconda 4.3.1 (64-bit)| (default, Jun 28 2018, 11:27:44) [MSC v.1900 64 bit (AMD64)]
      Keras Version2.1.5
      Tensorflow Version1.9.0
      Matplotlib Version2.2.2
      Numpy Version1.15.0
      
      Process finished with exit code 0
      
      Reply
      - Jason Brownlee August 20, 2018 at 6:32 am #
        
        Well done!
Emil Lundh August 20, 2018 at 5:28 pm #

5,5 million parameters… and 8000 examples? Clearly, the old rule doesn’t apply that the training data should be at least as many as the # parameters. Is there a way to think about this? Clearly, I shouldn’t use my intuition from a linear system of equations?

Reply
- Jason Brownlee August 21, 2018 at 6:12 am #
  
  Yes, the old ways of thinking do not apply.
  
  I have not seen a good conceptual model for thinking about highly over-specified models.
  
  Nevertheless, they are skillful and do generalize.
  
  Reply
Md. Zakir Hossain August 23, 2018 at 11:08 pm #

Hi Jason,

Many thanks for your kind help. When we use model.fit for training, we are using training data as well as validation data. But When we use mode.fit_generator (Progressive Loading), in that case why we are not using validation data?

Reply
- Jason Brownlee August 24, 2018 at 6:08 am #
  
  I added that progressive loading much later, as a simpler version for those that were having trouble. You can update it to use validation data if you wish.
  
  Reply
nehna August 28, 2018 at 7:13 pm #

hi Jason

Due to internet connectivity, my download when i run feature_extraction.py code.

later i tried to run the code again and it is not downloading and not showing error also.
without features.pkl file i cant proceed furthur.
is there any other way to make it download

Reply
- Jason Brownlee August 29, 2018 at 8:08 am #
  
  No, sorry. You require the dataset to work through the example.
  
  Reply
  - nehna September 1, 2018 at 12:52 pm #
    
    thank you jason
    
    i got the dataset
    
    but due to memory error , i am doing with progressive loading
    
    I am getting value error
    
    valueError: Error when checking input : expected input_1 to have 4 dimensions but got array with shape (28,4096)
    
    thank you Jason in advance
    
    Reply
    - Jason Brownlee September 2, 2018 at 5:27 am #
      
      Perhaps ensure that you have copied the data exactly and that your libraries are up to date?
      
      Reply
Michael September 8, 2018 at 4:23 am #

Hi Jason,
thank you for this super tutorial.

But I have a question :):

My generated caption is for the sample picture:

“startseq dog is running through the snow endseq”

and not

“startseq dog is running across the beach endseq”

My BLUE Score is also lower as in your tutorial.

BLEU-1: 0.553073
BLEU-2: 0.293371
BLEU-3: 0.200420
BLEU-4: 0.090321

Do have any idea why, or better how can I improve my result?

TIA
Michael

Reply
- Jason Brownlee September 8, 2018 at 6:16 am #
  
  Perhaps try fitting the model again?
  
  Reply
kalverk September 10, 2018 at 11:10 pm #

Hi!

Is the feature order directly tied to the caption? How much does the model rely on input’s order?

Imagine if the extracted features of an image are [‘dog’, ‘water’, ‘blue’, ‘sand’] and the caption is “dog at the beach”, now this is correct and expected caption.

Now the same image, but the features are [‘sand’, ‘water’, ‘dog’, ‘blue’], how different might the new caption be?

Can we achieve the same caption with differently ordered features vector?

Reply
- Jason Brownlee September 11, 2018 at 6:30 am #
  
  Yes, the order of the generated words is important for the design of this specific model.
  
  Reply
Jeff September 12, 2018 at 6:15 am #

Jason – given the model architecture, if I use my own data, with only 1 caption per image, would it impact the quality of the outcome?

Reply
- Jason Brownlee September 12, 2018 at 8:16 am #
  
  It will, perhaps some changes to the model configuration or training will be required. Experiment.
  
  Reply
nehna September 15, 2018 at 12:00 pm #

hiii Jason

your tutorial is super and its working fine

but

i have a doubt !!

in generating new caption , will it generate captions to only images in Flickr data set or general to normal images (downloaded in google)?

thank you very much Jason

Reply
- Jason Brownlee September 16, 2018 at 5:56 am #
  
  It will generate captions for any photo you provide.
  
  Remember, it is just an experiment, not an application.
  
  Reply
  - nehna September 24, 2018 at 3:42 pm #
    
    hii Jason
    
    yeah , just to test how it is generaing captions for images other than present in flickr dataset
    
    but it is giving appropriate captions only for images in dataset . For all the other images , generating some caption which no way related to image
    
    Reply
    - Jason Brownlee September 25, 2018 at 6:17 am #
      
      Perhaps your model has overfit. You could try adding some regularization.
      
      Reply
      - Priyam September 29, 2018 at 1:25 am #
        
        what should i do to add regularization
      - Jason Brownlee September 29, 2018 at 6:36 am #
        
        Try Dropout, weight noise, weight regularization, activation regularization, early stopping, etc.
      - Priyam October 1, 2018 at 3:56 am #
        
        On which part of the program should i apply the given techniques.
        And how to apply all of them?
      - Jason Brownlee October 1, 2018 at 6:30 am #
        
        I cannot know, I recommend testing a number of different approaches and discover what works.
        
        If this is a challenge, then I am currently writing a series of tutorials on this exact topic (e.g. how to improve model performance).
      - Priyam October 1, 2018 at 4:09 am #
        
        I am new to deep learning implementation.I dont know where to insert the required techniques inside the code written by you.? Can you suggest a tutorial or some sourse for it
Mmed September 18, 2018 at 5:31 am #

Hello Jason,

Why is there ” + 1 ” every time you find the vocabulary size from the tokenizer?

Reply
- Jason Brownlee September 18, 2018 at 6:25 am #
  
  To start numbering of tokenized words at “1” rather than “0”. We need room for the “0” value for “unknown word”.
  
  Reply
Mmed September 18, 2018 at 7:16 am #

When would we encounter an unknown word if the vocabulary consists of all the words in the training data?

Reply
- Jason Brownlee September 18, 2018 at 2:15 pm #
  
  There may be words in the test set not in the training set.
  
  There may be works in new data not in the training set.
  
  Does that help?
  
  Reply
  - Mmed September 20, 2018 at 4:05 am #
    
    I am still a bit confused to be honest.
    
    1. The training vocab and the testing vocab are different, that I can see. Why would a trained model ever encounter a word only in the test set? Wouldn’t test set captions (and hence the words in the test) only be used when calculating BLEU scores?
    
    2. Could this ‘unknown word’ token ever be generated by the model?
    
    3. When adding new data with new words to the training data, why would you stick with the older vocabulary and not ‘evaluate’ the newer one?
    
    Reply
    - Jason Brownlee September 20, 2018 at 8:08 am #
      
      Yes, it is an artefact of evaluating the model.
      
      In the future, you would finalize the model by training it on all available data and use it to generate captions.
      
      The model may still generate unknown word tokens if it gets confused.
      
      Reply
RD September 19, 2018 at 2:32 am #

Jason,

Thank you for this clear and thorough tutorial. I have two questions to make sure I understand things correctly:

1. By removing single letters from the descriptions, the generated descriptions will never include/generate descriptions with ‘a’ or ‘I’. Is that correct?

2. Because load_clean_descriptions filters by testing/training the tokenizer may be missing words that are in the test set but not the training set. Is this correct? And for fitting I understand you want to keep test/training data separate, but for the vocabulary ideally you would include the entire vocabulary from both the test and training set. Is this correct?

3. If I understand correctly one could use an even larger vocabulary (would be a bigger model) but in principle there is no reason not to include a larger vocabulary?

Reply
- Jason Brownlee September 19, 2018 at 6:26 am #
  
  Yes, you can add them back if you like. It just makes the vocab larger/model slower to train.
  
  Yes, train defines the vocab. Ideally you want your model to have all the words that may be seen.
  
  Yes, there is good reason to use a larger vocab, it will be more expressive, but I was trying to keep the example fast/simple.
  
  Reply
Vikas September 27, 2018 at 8:54 pm #

Hello
I wanted to know what should be the target validation loss.
Right now, I am getting the best validation loss of 3.86 after 5 epochs. However, you have a lower validation as well as training loss than mine just after two epochs.
Is my model trained enough or should I train again?

Reply
- Jason Brownlee September 28, 2018 at 6:13 am #
  
  Lower is better.
  
  Reply
Priyam September 29, 2018 at 1:00 am #

I want to test the model with more images.
Can you tell me the source from which i can get images which will run efficiently using the code.Itried some randon images from google but they were unsatisfactory.Also tell me steps to add new image along with model to train it?

Reply
- Jason Brownlee September 29, 2018 at 6:35 am #
  
  I expect there are other image captioning datasets you can use.
  
  Sorry, I cannot point you to them off the cuff.
  
  Reply
Omnia October 3, 2018 at 3:25 am #

Hi Jason

I really like this post, it helped a lot

your post is better than my daily DL learning class

I have a question

I ran the code until fitting the model, actually till this line

“Train on 306404 samples, validate on 50903 samples
Epoch 1/20”

until now it took 20 mins but nothing has appeared

Does it take so much of time to run each epoch?

Or am I doing something wrong?

I really hope this code work properly with me so I can optimize it and see different results

Thanks

Reply
- Jason Brownlee October 3, 2018 at 6:22 am #
  
  It can take a while.
  
  Reply
omnia October 3, 2018 at 6:17 am #

Another question is that if feature.pkl file has been created the first time I ran the code and I have it in my directory

do I have to run these commands if I ran the code another time

# extract features from all images
directory = ‘Flicker8k_Dataset’
features = extract_features(directory)
print(‘Extracted Features: %d’ % len(features))
# save to file
dump(features, open(‘features.pkl’, ‘wb’))

Thanks again

Reply
- Jason Brownlee October 3, 2018 at 6:23 am #
  
  Once you have created the features, you don’t need to create them again.
  
  Reply
Omnia October 3, 2018 at 7:20 am #

Thanks a lot

now it’s working properly

will let you know what I will get at the end

Thanks again, really appreciate your hard work, so happy that I understand your code very well

Reply
- Jason Brownlee October 3, 2018 at 4:13 pm #
  
  Well done!
  
  Reply
Xuan October 3, 2018 at 8:35 pm #

Hey Jason, thanks for the tutorial. The model trained fine for me but when I tried to generate caption for a single new image I encountered the following error:

ValueError: Error when checking input: expected input_8 to have shape (110,) but got array with shape (34,)

Reply
- Jason Brownlee October 4, 2018 at 6:15 am #
  
  Sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Diana October 8, 2018 at 6:25 am #

Hi Jason! Thanks a lot for this!

I tried your model with ResNet50 and got model-ep005-loss3.417-val_loss3.767.h5, so yours works a little bit better even when it comes to BLEU.

I’m gonna try to reduce the vocabulary size and see what happens.

Reply
- Jason Brownlee October 8, 2018 at 9:29 am #
  
  Nice work!
  
  Reply
  - Diana October 8, 2018 at 11:47 am #
    
    I cannot clearly see how to ‘correct’ misspelling. I have already gone through the vocabulary and there are about 1000 misspelled words.
    
    Any thoughts on how I could go over that?
    
    Reply
    - Jason Brownlee October 9, 2018 at 8:32 am #
      
      Perhaps remove or correct all captions with misspellings?
      
      Reply
Oliver October 10, 2018 at 9:30 pm #

Getting the following error when calling train_features = load_photo_features(‘features.pkl’, train)

The error occurs when trying to run all_features = load(open(filename, ‘rb’))

UnpicklingError: pickle data was truncated

Has anybody a solution to this?

Reply
- Jason Brownlee October 11, 2018 at 7:55 am #
  
  I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Jerome MASSOT October 11, 2018 at 5:54 pm #

Hi Jason,
I come back tonight with the question regarding the VGG16.layers.pop() method which seems not to work with Keras 2.2.2…
Before and after pop() and reshaping the model, the light one has exactly the same architecture as the original one…
Features extracted has dim = 1000 which cause me trouble with my Input = (4096,)…
If I change the input to (1000,) the performance is low…
Thanks for the help
Best regards
Jerome

Reply
- Jason Brownlee October 12, 2018 at 6:35 am #
  
  Thanks, I’ll investigate.
  
  Reply
- Werner June 11, 2020 at 3:18 pm #
  
  I found the same. 1,000 dims.
  
  Reply
Prasanna Kumar Behera October 15, 2018 at 12:35 am #

Hi Jason,

My question is, are we retraining all the parameters of the VGG16 models in this example?

If yes, why should we train since we are using already trained model?
If no, then what part of the above coding is doing since we have not set layer.trainable = False for any layer?
Please let me when we should train all the layers or when we should not when using a pretrained model like VGG16?

Reply
- Jason Brownlee October 15, 2018 at 7:28 am #
  
  No, we are not re-training the vgg, we are using the vgg to output features that are fed into the captioning model.
  
  Reply
Vidyush Bakshi October 15, 2018 at 11:56 pm #

My BLEU scores with progressive loading
BLEU-1: 0.547871
BLEU-2: 0.293608
BLEU-3: 0.196752
BLEU-4: 0.086692

Reply
- Jason Brownlee October 16, 2018 at 6:37 am #
  
  Nice work.
  
  Reply
Hassaan October 23, 2018 at 5:26 pm #

Hy Jason. I am new to ML and you are the source which rise my interest in ML. I am following your above tutorial. I am confuse to get some concept, where you are applying tokenization to the text. You mentioned that ” The model will be provided one word and the photo and generate the next word. After that it recursively run to generate new sentence”.I am just confuse here that what are you doing here? What is the purpose of doing that? Please explain in detail that point or suggest me a source to get help from somewhere else.
Second question is that when we will done with that, the model will generate captions, which are in the data-set (I mean to say exact some captions will be suggested for new unseen images or it can be new captions based on image )..plz explain it in details..

Reply
- Jason Brownlee October 24, 2018 at 6:26 am #
  
  Perhaps it will be helpful to learn more about this type of model:
  https://machinelearningmastery.com/caption-generation-inject-merge-architectures-encoder-decoder-model/
  
  Yes, the model can be used to generate captions for new unseen photos, we do this at the end of the tutorial.
  
  Reply
Ahmed October 24, 2018 at 9:20 pm #

valueError: Error when checking input: expected input_2 to have shape (40,) but got array with shape (34,)

I am getting that error, I am unable to figure it out. Can you please help me to get that?

Reply
- Jason Brownlee October 25, 2018 at 7:54 am #
  
  I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Ahmed October 25, 2018 at 8:25 pm #

I am just confused about the max_length method.

def max_length(descriptions): lines = to_lines(descriptions) return max(len(d.split()) for d in lines)

1
2
3

def max_length(descriptions):
lines = to_lines(descriptions)
return max(len(d.split()) for d in lines)

What is the purpose of that method. Why we are trying to find that. Please slightly explain ti

Reply
- Jason Brownlee October 26, 2018 at 5:35 am #
  
  To find the number of words in the longest description.
  
  We need this so we can pad all other descriptions to that length (in terms of numbers of words).
  
  Reply
Omnia October 26, 2018 at 1:52 am #

hi Jason

my BLEU scores are like this

BLEU-1: 0.528302
BLEU-2: 0.277568
BLEU-3: 0.227300
BLEU-4: 0.117189

and here is my validation loss

Train on 306404 samples, validate on 50903 samples
Epoch 1/20
306404/306404 [==============================] – 9983s 33ms/step – loss: 4.5003 – val_loss: 4.0387

Epoch 00001: val_loss improved from inf to 4.03874, saving model to model-ep001-loss4.500-val_loss4.039.h5
Epoch 2/20
306404/306404 [==============================] – 9512s 31ms/step – loss: 3.8575 – val_loss: 3.8717

Epoch 00002: val_loss improved from 4.03874 to 3.87171, saving model to model-ep002-loss3.857-val_loss3.872.h5
Epoch 3/20
306404/306404 [==============================] – 7866s 26ms/step – loss: 3.6712 – val_loss: 3.8360

Epoch 00003: val_loss improved from 3.87171 to 3.83603, saving model to model-ep003-loss3.671-val_loss3.836.h5
Epoch 4/20
306404/306404 [==============================] – 10109s 33ms/step – loss: 3.5803 – val_loss: 3.8296

Epoch 00004: val_loss improved from 3.83603 to 3.82960, saving model to model-ep004-loss3.580-val_loss3.830.h5
Epoch 5/20
306404/306404 [==============================] – 5384s 18ms/step – loss: 3.5246 – val_loss: 3.8364

Though, when I try to generate a description for a random image from the intern the model seems not working properly

it gives me the same sentence for different kind of images

any suggestion?

Reply
- Jason Brownlee October 26, 2018 at 5:38 am #
  
  It suggests the model may be overfit, perhaps try re-fitting the model or using a model it over fewer epochs or using some regularization.
  
  Reply
  - Omnia October 26, 2018 at 10:19 am #
    
    I see
    
    Thanks
    
    I will try and post my experiment
    
    Reply
- PhyuPhyuKhaing July 5, 2020 at 9:11 pm #
  
  Hi Omnia,
  
  I am interested in your model’s result. May I know how to change the model.
  
  Reply
Saifullah October 29, 2018 at 3:25 am #

Hi Jason,

Thanks for such nice work.
I want to know how I print actual caption for the test image. If I am using a new image from the test set.

Reply
- Jason Brownlee October 29, 2018 at 6:00 am #
  
  I show how to print a caption for a new image in the tutorial.
  
  Reply
Omnia October 30, 2018 at 2:21 pm #

Hi Jason,

in fitting the model, I’m not sure if my thought of input and output is correct

here is the command
model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, validation_data=([X1test, X2test], ytest)

I understand that X1train contains the photo which should be the feature of the photo as integers, correct me if I’m wrong

X2train is sequence text which is the ground truth captions corresponding to the photo

I didn’t understand what is ytrain

would you please explain it briefly

another question is that how does the output penalize if it generates a wrong caption?

Thanks

Reply
- Jason Brownlee October 31, 2018 at 6:21 am #
  
  Correct.
  
  ytrain is the next word to be predicted by the model for each sample.
  
  Reply
Mmed November 1, 2018 at 8:53 am #

Dear Dr. Brownlee.

The create_sequences() function that returns for us input-output pairs of training data makes teacher forcing possible in this example, right?

Reply
- Jason Brownlee November 1, 2018 at 2:28 pm #
  
  I guess so, or more accurately, the way we use the sequences during training.
  
  Reply
Omnia November 2, 2018 at 10:35 am #

Hi Jason

Thanks a lot for your advice

I’m using Pycharm

I tried different types of regularization until I picked the best one, also different optimizers

I got pretty good bleu scores and predictions, the model was predicting everything in details for flicker image

and good enough for some images from the internet

Though for images from the internet, the model couldn’t clearly recognize cat from dog face, I’m still working on that

These are my blue score

BLEU-1: 0.601031
BLEU-2: 0.380297
BLEU-3: 0.279632
BLEU-4: 0.151589

Thanks again

Reply
- Jason Brownlee November 2, 2018 at 2:49 pm #
  
  Well done!
  
  Reply
- abbas November 18, 2018 at 3:51 am #
  
  Omnia! Please can you share the code using inception model?IF yes then let me know..also i would like to check your results
  
  Reply
- Ajay January 1, 2019 at 11:43 pm #
  
  Hi Omnia ! Can you tell me which regularization technique you used and helped impove the BLEU score.
  
  Reply
- Saurabh May 6, 2019 at 4:47 pm #
  
  Hi Omnia, can you share the approach you used for regularization at saurabh18@somaiya.edu?
  
  Thanks!
  
  Reply
Vishwa Dadhania November 14, 2018 at 11:02 pm #

Hi Jason,
Thank you for an amazing tutorial. I learnt many things here. esp. progressive loading. So here I have one query as you explain in “progressive loading” section:
“Finally, we can use the fit_generator() function on the model to train the model with this data generator.

In this simple example we will discard the loading of the development dataset and model checkpointing and simply save the model after each training epoch. You can then go back and load/evaluate each saved model after training to find the one we the lowest loss that you can then use in the next section.”

I have already got all 20 models from 20 epochs by training the “training” dataset. Now how do I check which model is best using the development set?? Because we have not included development set in the fit_generator(). So how to choose the best model from 20 saved models ? Should I apply evaluate() function on development set for each model?? It would be great if you could give me some idea/hint further on this!! Thanks.

Reply
- Jason Brownlee November 15, 2018 at 5:31 am #
  
  Good question.
  
  Evaluate each of the saved mode on a validation dataset and use the one with the best performance. Probably around epoch 3-4.
  
  Reply
  - Ajay January 1, 2019 at 11:04 pm #
    
    HI Jason, Can you tell me what do you mean by 3-4 epochs? I hope that evaluating the model will just go though all the images once and generate descriptions for them and then calculate the BLEU score from that. So, what 3-4 epochs are you speaking about?
    
    Reply
    - Jason Brownlee January 2, 2019 at 6:36 am #
      
      I meant that the best performing model was found after the completion of 3 or 4 epochs.
      
      Reply
Omnia November 16, 2018 at 4:31 am #

Hi Jason

If I want to generate descriptions for the test images, how do I pass the photo features

(which we already have extracted) to the generate_desc model?

As in the following command, we are passing a single extracted feature for a given image

photo = extract_features(cat.jpg)

description = generate_desc(model, tokenizer, photo, max_length)

I want to generate a description for the test image using test features without using the

function extract feature again, could you please suggest any way to do it?

Thanks

Reply
- Jason Brownlee November 16, 2018 at 6:18 am #
  
  The example at the end of the tutorial shows you how to generate a description for one photo.
  
  Reply
Omnia November 16, 2018 at 8:51 am #

Correct

That’s what I meant to say

in the example, it shows how to generate for one photo but with using the extract feature function,

If we already extracted features for the test images

why do we need to use extract_features again to generate a description

can’t we use our saved features in the test?

Reply
- Jason Brownlee November 16, 2018 at 1:57 pm #
  
  Yes, if you have already extracted the feature, then you can pass the extracted feature directly to generate_desc().
  
  Reply
Sapar November 18, 2018 at 10:12 am #

Hello,
This is a very good work.
What Machine learning techniques do you use in this work?

Thank you.

Reply
- Jason Brownlee November 19, 2018 at 6:41 am #
  
  LSTMs.
  
  Reply
Chen Mei November 26, 2018 at 3:51 pm #

Anyone received this problem during test phase?

OSError: Unable to open file (unable to open file: name = ‘model-ep001-loss3.245-val_loss3.612.h5’, errno = 2, error message = ‘No such file or directory’, flags = 0, o_flags = 0)

Reply
- Jason Brownlee November 27, 2018 at 6:31 am #
  
  You must change the code to load the file that you saved.
  
  Reply
Sunny December 7, 2018 at 5:57 pm #

Hi Mr.Jason,

I am a computer science and engineering student. Me and team mates are doing the same project. Reply me,
1.Can we develop this using MATLab?
2.Can we use the same code to our project for reference purpose using python?
3.In how many months we can complete it?
4.Suggest me what to use either python or matlab?

Reply
- Jason Brownlee December 8, 2018 at 7:00 am #
  
  Sorry, I don’t have examples in matlab, I can’t give you good advice.
  
  Reply
- Mohankumar Balasubramaniyam May 3, 2019 at 12:40 am #
  
  Hi I am also facing the same issue. Can you tell what you did to overcome the problem @harsha
  
  Reply
Caner December 20, 2018 at 12:57 am #

Hi Jason. Thank you for this tutorial. I want to develop text-to-image model. Does it work if I change input and output elements? or What would you suggest ?

Reply
- Jason Brownlee December 20, 2018 at 6:28 am #
  
  I don’t have a tutorial on text to image at this stage, I hope to cover it in the future – then I can give you good advice.
  
  Reply
Ajay December 26, 2018 at 11:56 pm #

Hi Jason, Why have you taken the maximum size of the sentence to be 34, when the maximum length of a sentence is 33?

Reply
Ajay December 28, 2018 at 10:28 pm #

Is there any reason for selection of this particular RNN architecture? Is it giving any benefit?

Reply
- Jason Brownlee December 29, 2018 at 5:51 am #
  
  Yes, I cover this architecture more here:
  https://machinelearningmastery.com/caption-generation-inject-merge-architectures-encoder-decoder-model/
  
  Reply
Ajay December 28, 2018 at 11:10 pm #

Hello Jason, Can you explain what is the role of mask_zero inside the embedding layer?

Reply
- Jason Brownlee December 29, 2018 at 5:51 am #
  
  We zero pad inputs to the same length, the zero mask ignores those inputs. E.g. it is an efficiency.
  
  Reply
  - Ajay December 30, 2018 at 7:41 pm #
    
    Can you elaborate on your answer, I didn’t get anything.
    
    Reply
  - Ajay January 1, 2019 at 11:17 pm #
    
    Hi Jason! I’m waiting for you to elaborate on zero mask. Didn’t get anything from your comment.
    
    Reply

Ajay December 29, 2018 at 6:42 pm #

Hi Jason, Inside the data_generator() function why have you used the while 1: loop?

Can you email all the previous answers that I’ve asked?

Jason Brownlee December 30, 2018 at 5:38 am #

Because it is a generator that will yield each loop when called.

You can learn more about python generators here:
https://wiki.python.org/moin/Generators

Ajay December 30, 2018 at 7:47 pm #

I learned that on the repetitive calling of the generator function, the execution starts where it previously left off.

So, in

def data_generator(tokenizer,train_descs,train_features,maxlen):
    while 1:
        for ids, descs in train_descs.items():
            feature = train_features[ids][0]
            feature_vector, inseq, outseq = create_sequence(tokenizer,descs,feature,maxlen)
            yield[[feature_vector,inseq],outseq]
generator = data_generator(tokenizer,train_descs,train_features,maxlen)

def data_generator(tokenizer,train_descs,train_features,maxlen):

while 1:

for ids, descs in train_descs.items():

feature = train_features[ids][0]

feature_vector, inseq, outseq = create_sequence(tokenizer,descs,feature,maxlen)

yield[[feature_vector,inseq],outseq]

generator = data_generator(tokenizer,train_descs,train_features,maxlen)

Shouldn’t this be :

def data_generator(tokenizer,train_descs,train_features,maxlen):
    while 1:
    for ids, descs in train_descs.items():
        feature = train_features[ids][0]
        feature_vector, inseq, outseq = create_sequence(tokenizer,descs,feature,maxlen)
        yield[[feature_vector,inseq],outseq]
    generator = data_generator(tokenizer,train_descs,train_features,maxlen)

def data_generator(tokenizer,train_descs,train_features,maxlen):

while 1:

for ids, descs in train_descs.items():

feature = train_features[ids][0]

feature_vector, inseq, outseq = create_sequence(tokenizer,descs,feature,maxlen)

yield[[feature_vector,inseq],outseq]

generator = data_generator(tokenizer,train_descs,train_features,maxlen)

Where am I getting wrong?

Ajay December 30, 2018 at 8:58 pm #

Shouldn’t this be :

def data_generator(tokenizer,train_descs,train_features,maxlen):

for ids, descs in train_descs.items():
feature = train_features[ids][0]
feature_vector, inseq, outseq = create_sequence(tokenizer,descs,feature,maxlen)
yield[[feature_vector,inseq],outseq]
generator = data_generator(tokenizer,train_descs,train_features,maxlen)

Where am I getting wrong?

Reply
Jason Brownlee December 31, 2018 at 6:09 am #

It looks like you are calling the generator from within the data_generator function.

Reply
- Ajay January 1, 2019 at 5:05 pm #
  
  Sorry the last line in the second code snippet is outside the function.
  
  def data_generator(tokenizer,train_descs,train_features,maxlen):
  for ids, descs in train_descs.items():
  feature = train_features[ids][0]
  feature_vector, inseq, outseq = create_sequence(tokenizer,descs,feature,maxlen)
  yield[[feature_vector,inseq],outseq]
  
  generator = data_generator(tokenizer,train_descs,train_features,maxlen)
  
  As, we know that data_generator is yielding one example at a time and each time the function is called, the function execution starts where it previously left off. So, since the “for ids, descs in train_descs.items():” loop is still not complete in the mid-way, it should loop and yield more sequences until it ends.
  
  So, my quesiton is if the loop can continue till all the “train_descs.items()” are encountered, then why do we need the “while 1:” loop there?
  
  I want to know where am I going wrong, kindly let me know.
- Jason Brownlee January 2, 2019 at 6:34 am #
  
  Good question. To loop over the entire dataset as many times as we need (e.g. number of epochs is exhausted).

Ajay December 29, 2018 at 8:14 pm #

On running this:

# test the data generator
generator = data_generator(train_descriptions, train_features, tokenizer, max_length)
inputs, outputs = next(generator)
print(inputs[0].shape)
print(inputs[1].shape)
print(outputs.shape)

I’m getting :

(5, 4096)
(47, 33)
(7266,)

whereas your output is
(47, 4096)
(47, 34)
(47, 7579)

Am I getting wrong somewhere?

Also, can you explain these dimensions?

Reply
- Jason Brownlee December 30, 2018 at 5:39 am #
  
  Perhaps ensure that you copied all of the code and that your Keras and Tensorflow are up to date.
  
  Reply
  - Ajay December 30, 2018 at 7:37 pm #
    
    Can you explain what is 47 in the dimension? I mean, data_generator is outputting one example at a time then instead of 47, shouldn’t it be 1? Can you explain me the dimension?
    
    Reply
    - Jason Brownlee December 31, 2018 at 6:07 am #
      
      I believe I explain this in the post:
      
      Running this sanity check will show what one batch worth of sequences looks like, in this case 47 samples to train on for the first photo.
      
      Reply
  - Ajay December 30, 2018 at 7:40 pm #
    
    I am coding the stuff myself and rectified something and now the output is :
    
    (47, 4096)
    (47, 33)
    (7266,)
    
    Even now, print(outputs.shape) is giving me (7266,).
    
    Stll, I want to ask that if data_generator is outputting 1 example at a time then why is 47 the first dimension?
    
    Reply
    - Ajay December 30, 2018 at 7:51 pm #
      
      Aah!! Finally, I got it right. Thanks. It was a small glitch.
      
      Reply
      - Jason Brownlee December 31, 2018 at 6:09 am #
        
        Well done!
    - Jason Brownlee December 31, 2018 at 6:08 am #
      
      Perhaps confirm that you are using Keras 2.2.4 or better, the output should have 47 samples worth of output as well.
      
      Reply
Ajay December 29, 2018 at 8:56 pm #

Hi Jason, You have set steps_per_epoch=len(descriptions) and passed it into model.fit_generator(). As far as I’ve read, steps_per_epoch signify the total no. of batches before a epoch to finish. See this :

https://stackoverflow.com/questions/48604149/keras-fit-generator-and-steps-per-epoch

Reply
- Ajay December 29, 2018 at 9:13 pm #
  
  and also this :
  https://datascience.stackexchange.com/questions/29719/how-to-set-batch-size-steps-per-epoch-and-validation-steps
  
  Reply
- Jason Brownlee December 30, 2018 at 5:39 am #
  
  Correct.
  
  Reply
Ajay December 30, 2018 at 7:34 pm #

I want to clarify how data_generator is feeding the data to fit_generator. I mean, is it giving it one training example at a time or some batch of training examples at a time.

Reply
- Jason Brownlee December 31, 2018 at 6:06 am #
  
  It releases one batch of samples per loop.
  
  Reply
  - Ajay January 1, 2019 at 5:15 pm #
    
    epochs = 20
    steps = len(train_descriptions)
    for i in range(epochs):
    # create the data generator
    generator = data_generator(train_descriptions, train_features, tokenizer, max_length)
    # fit for one epoch
    model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
    # save model
    model.save(‘model_’ + str(i) + ‘.h5’)
    
    steps_per_epoch represents the no. of batches that will be trained in one epoch.
    As you have said before that data_generator is feeding a batch of examples to fit_generator so that should mean that that in one batch, let’s say x examples are being sent for training process. This should mean that no. of batches for 1 epoch training should be (total no. of training examples)/(1 batch size). On running below snippet, total no. of training examples comes out to be 6000.
    
    print(len(train_descriptions))
    6000
    
    So,steps_per_epoch should be 6000/(1 batch size), but in your code steps_per_epoch = len(train_descriptions)
    
    Why have you set it so large?
    Are you forcing fit_generator to train over one example at a time even though data_generator is generating a batch of training example at a time?
    
    Reply
    - Jason Brownlee January 2, 2019 at 6:35 am #
      
      Feel free to change it.
      
      Reply
Ajay December 30, 2018 at 9:21 pm #

Hi Jason, does max_length = 34 or any other bigger value have any effect on model performing well?

Reply
- Jason Brownlee December 31, 2018 at 6:10 am #
  
  Try it and see.
  
  Reply
Ajay December 31, 2018 at 4:47 pm #

Hi Jason, I’m running

evaluate_model(mapping,tokenizer,maxlen,model,features)

and getting this error.

—————————————————————————
KeyError Traceback (most recent call last)
in
—-> 1 evaluate_model(mapping,tokenizer,maxlen,model,features)

in evaluate_model(mapping, tokenizer, maxlen, model, feature_vector)
33 for ids,descs in mapping.items():#1
34 count += 1
—> 35 pred_caption = generate_desc(feature_vector[ids],tokenizer,model,maxlen)#caption string returned
36 for desc in descs:#2
37 reference.append(desc.split())

KeyError: ‘2258277193_586949ec62’

When I search for this image in my pc, I found that its id and its descriptions are present in the

Flickr8k.lemma.token.txt

cleanedcaptions.txt

However, the image is not present in Flicker8k_Dataset.

Why isn’t there any image Flicker8k_Dataset.

Reply
- Ajay December 31, 2018 at 10:53 pm #
  
  I redownloaded the dataset and searched the above-mentioned image in it and guess what, It was NOT present in that too !!
  
  Reply
  - Ajay January 1, 2019 at 4:02 am #
    
    Also, when I loaded features from features.pkl and then ran
    print(features[‘2258277193_586949ec62’]) then it gave me
    
    KeyError Traceback (most recent call last)
    in
    —-> 1 features[‘2258277193_586949ec62’]
    
    KeyError: ‘2258277193_586949ec62’
    
    From this, It seems like 2258277193_586949ec62.jpg was never present in the dataset.
    But, its description is present in Flickr8k.lemma.token.txt.
    
    Can you share the dataset?
    
    Reply
    - Jason Brownlee January 1, 2019 at 6:28 am #
      
      Perhaps ignore that token then?
      
      Reply
      - geeta gupta March 13, 2022 at 4:07 pm #
        
        I am also getting KeyError: ‘2258277193_586949ec62’
        
        how to resolve this. and how to get this image?
      - James Carmichael March 14, 2022 at 12:04 pm #
        
        Hello Geeta…Please specify which code listing you are working with so that we can better assist you.
- Jason Brownlee January 1, 2019 at 6:13 am #
  
  Perhaps you skipped a step, are you able to confirm that you have all of the steps/code?
  
  Are you able to confirm that your Python and libraries are up to date?
  
  Reply
  - Ajay January 10, 2019 at 12:31 am #
    
    I resolved the problem by putting an appropriate image that relates well to its description in Flickr8k.lemma.token.txt. The above image was really missing from the image directory. I reckon that anyone must have faced the same problem as mine.
    
    Reply
Ajay December 31, 2018 at 11:55 pm #

Can you check this on your system? Also check that it is present in Flickr8k.lemma.token.txt.

Reply
- Jason Brownlee January 1, 2019 at 6:16 am #
  
  The example in the blog post works perfectly for me and tens of thousands of readers, I suspect there is something going on with your local version.
  
  Reply
Ajay January 2, 2019 at 12:52 am #

Hi Jason!! Can you have a look at my model image

https://drive.google.com/open?id=1anAmPPIi0pfoe_3ISQ2AzaSuibI1KRqo

I cannot understand that there is an “input_3” and an “input_2” layer. Is there any problem if there is no “input_1” present in it?

Reply
- Jason Brownlee January 2, 2019 at 6:37 am #
  
  Sorry, I don’t have the capacity to debug your model or model diagrams.
  
  Reply
  - Ajay January 10, 2019 at 12:27 am #
    
    Just see and tell if missing of input_1 signify anything bad?
    
    Reply
Kashish January 4, 2019 at 6:23 pm #

ValueError: Error when checking input: expected input_3 to have shape (34,) but got array with shape (30,)
Sir,while evaluating the model,I’m getting such an error how should I get rid of it?I exactly typed the code and I’m also using the same dataset as mentioned above.

Reply
- Jason Brownlee January 5, 2019 at 6:51 am #
  
  Sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Ajay January 10, 2019 at 12:33 am #

Hi Jason !! Can you tell me about other regularization techniques to improve the model?

Can you suggest adding anything to improve accuracy?

Reply
- Jason Brownlee January 10, 2019 at 7:51 am #
  
  Yes, I list regularization methods here:
  https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/
  
  Reply
  - Ajay January 12, 2019 at 9:19 pm #
    
    Are there any other regularisation techniques specific to the caption generation task that can be helpful?
    
    Reply
    - Jason Brownlee January 13, 2019 at 5:40 am #
      
      Not really.
      
      Reply
Ajay January 10, 2019 at 12:40 am #

I tried my model using my MacBook Air Webcam and it gave pretty bad results and the captions that it generated were from the training dataset.

Where am I going? I am ready to try all the possibilities to improve my model. What can I do?

Reply
Ajay January 17, 2019 at 11:22 pm #

I am a student. My model is taking an hour or more to train. After training, I’m not getting the desired results. So, I don’t want to retrain it and sit back and see. There could be a high change that it may not work well again. I think AWS requires some bucks for this.

I’m using MacBook Air.

8 GB RAM
i5 5th gen processor

Is there any “free” source to train the model which will take lesser time.

Reply
- Jason Brownlee January 18, 2019 at 5:38 am #
  
  Yes, fit a smaller model on less data as a prototype, then scale up once you find a good config.
  
  Reply
Al Krinker January 23, 2019 at 3:30 am #

Hi Jason,

Like many mentioned, it is a very comprehensive tutorial on caption generation. Progressive loading is a big plus.

Do you plan to cover or have some ideas on the topic of using image and description to search for similar images? Example of what I am trying to do: I already implemented CNN CBIR model that extracts features clusters them and when new image comes in, its features are extracted and nearest neighbors are given as similar images suggestions. This works fine, but I would like to enhance it by adding image description in the mix, so that when I give a picture of the steering wheel and specify “car part”, I will be given list of images of the steering wheels, and not all circular options like bike wheels for example.

I thought to use lucene to help with image description search first and then use CNN to find similar images, but not sure if it is the best approach to take as lucene search might throw out images that are relevant, but not well described.

Reply
- Jason Brownlee January 23, 2019 at 8:51 am #
  
  Very cool idea. I have not tried this but I believe it would be straight-forward to implement.
  
  Let me know how you go.
  
  Reply
  - Al Krinker January 24, 2019 at 6:46 am #
    
    I dont think that trying to come up with a model that would combine the text along with image features will be straight forward or would perform well as oppose to having elasticsearch in the mix where I can take advantage of text search that elastic provides out of the box, but elastic falls short of image search (tried LIRE before and the results were really bad compared to ConvNet approach)
    
    if you have any other ideas or suggestions, I am all ears 🙂
    
    Reply
cleansky February 6, 2019 at 4:56 am #

Thanks for the nice tutorial.

It is interesting to see that even though the LSTM and CNN have no connection, the decoder may produce proper caption words.

How does the LSTM choose words without any information about the image? What is the implicit mechanism in this architecture?

Any comment is welcome.

Reply
- Jason Brownlee February 6, 2019 at 8:01 am #
  
  It has the extracted features from the image as input. They are abstract, but it finds meaning in them.
  
  Reply
Sanjay February 11, 2019 at 2:08 am #

Hey Jason!
Amazing tutorial. Great Learning experience from start till end.

I wanted to ask you what do you mean exactly in the last section of the article under ‘Extensions’ section by,
‘Pre-trained Word Vectors. The model learned the word vectors as part of fitting the model. Better performance may be achieved by using word vectors either pre-trained on the training dataset or trained on a much larger corpus of text, such as news articles or Wikipedia.’

Thank You!

Reply
- Jason Brownlee February 11, 2019 at 8:01 am #
  
  You can use word vectors learned on another dataset, more here:
  https://machinelearningmastery.com/develop-word-embeddings-python-gensim/
  
  Reply
boumelha adaam February 14, 2019 at 1:58 am #

hey Jason , thank you for this amazing article .

i wanted to ask you about an issue i ve faced in the generate_desc function , i am getting in the model.predict line this error :

ValueError: Error when checking input: expected input_2 to have shape (74,) but got array with shape (34,).

any solutions please!

thank you !!

Reply
- Jason Brownlee February 14, 2019 at 8:49 am #
  
  Sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
O Lokesh February 17, 2019 at 12:53 am #

sir i couldn’t download the datasets after filling the form .please let me know if there is another way

Reply
- Jason Brownlee February 17, 2019 at 6:33 am #
  
  You should be sent an email with the link after completing the form, I believe.
  
  Reply
Karan February 18, 2019 at 12:38 am #

sir, i am also unable to download neither the dataset nor the text files. I am getting a 404 error.

Reply
- Jason Brownlee February 18, 2019 at 6:31 am #
  
  You must fill out this form in order to download the dataset:
  https://forms.illinois.edu/sec/1713398
  
  Reply
sonali verma February 19, 2019 at 7:13 pm #

respected sir,
I am not able to download the datasheet from the link that is provided by flickr 8k.
It is showing
The requested URL /HockenmaierGroup/Framing_Image_Description/Flickr8k_Dataset.zip was not found on this server.

Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

Reply
- Jason Brownlee February 20, 2019 at 7:57 am #
  
  It looks like they have taken the site down, it says: “Proper NLP home page coming soon.”
  
  I will prepare a workaround ASAP.
  
  UPDATE:
  
  I have added direct download links to the post.
  
  Reply
Vedic Mishra February 20, 2019 at 9:27 pm #

In the function, create_sequences(), the pad_sequences was generating a list so big that the 17 GB Kaggle RAM was crashing. So I tried appending directly to numpy arrays instead of creating it as a list first. However, now it is taking infinite time to execute. Is there any alternative to this function or any way to increase the rate. Please help
P.S Thanks for uploading the dataset, I spent days searching for it on the internet.

Reply
- Jason Brownlee February 21, 2019 at 7:58 am #
  
  Perhaps try the progressive loading section?
  
  Reply
Mohammad Anas February 22, 2019 at 2:06 am #

i used progressive loading and after execution there were 20 models one for each epoch.
but further sections are using one single file for model . but i have 20 models.how to proceed?

Reply
- Jason Brownlee February 22, 2019 at 6:22 am #
  
  Choose the model with the lowest validation error, you might need to evaluate each.
  
  If that is a pain, use any model, e.g. from epoch 4.
  
  Reply
  - Mohammad Anas February 22, 2019 at 7:58 am #
    
    Thank you very much.
    
    Reply
Mohammad Anas February 22, 2019 at 7:55 am #

i have developed a deep learning model with a .csv file as training data.
file contains a column with text data and during execution of the code
could not convert string to float: ‘Moong(Green Gram)’
this error is being displayed.
what should i do?

Reply
- Jason Brownlee February 22, 2019 at 2:44 pm #
  
  I’m not sure what the cause might be, sorry. Perhaps try debugging the data loading/transforming part of your code?
  
  Reply
Abhishek Verma February 22, 2019 at 4:59 pm #

Hi Jason, I am unable to have access to the Flickr 8k dataset after filling the form. The link shows this:

Please help me with it. Thank you!

Not Found
The requested URL /HockenmaierGroup/Framing_Image_Description/Flickr8k_Dataset.zip was not found on this server.

Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

Reply
- Jason Brownlee February 23, 2019 at 6:28 am #
  
  Yes, they have recently removed it.
  
  I have added direct download links above in the dataset section.
  
  Reply
Akhil February 22, 2019 at 7:31 pm #

Jason…..we tried to complete the model generation using progressive loading…total 19 epochs ……And now we are getting outputs but the accuracy is very worse..is there any suggestions to improve the accuracy…..pls help

Reply
- Jason Brownlee February 23, 2019 at 6:30 am #
  
  Please don’t use accuracy, instead use BLEU scores – perhaps re-read the post!
  
  Reply
James February 23, 2019 at 8:30 pm #

How to train this model using mscoco dataset?

Reply
- Jason Brownlee February 24, 2019 at 9:06 am #
  
  Sorry, I don’t have an example of training with MSCOCO. Thanks for the suggestion.
  
  Reply
Aron February 24, 2019 at 11:30 pm #

Great article!
Comprehensive, well-written and well-explained. I used the progressive loading approach and ran the scripts in Google Colab. Everything worked fine (got some errors along the process every now and then, but managed to solve them all). I am currently extracting features using VGG16, VGG19, ResNet50 and Inception and hopefully will make a comparison between them. Thanks for this great post!

Reply
- Aron February 25, 2019 at 12:18 am #
  
  I wanted to get your opinion on this. Since I used progressive loading, I do not have a measure for the loss function on the validation dataset, so I took the models and evaluated the BLEU scores directly. However, it’s not straightforward to decide which models gives the best performance and when exactly the model starts to overfit.
  
  I calculated the mean squared error between some “ideal” BLEU scores taken from Marc Tanti et al. (BLEU-1 = 0.6, BLEU-2 = 0.413, BLEU-3 = 0.273, BLEU-4 = 0.178) and the BLEU scores I’ve obtained for my models. The best performing one was actually model_0.h, which was the model calculated after the first epoch. However, I don’t know if the mean squared error is actually very indicative or relevant in this case, but I didn’t know what else to use. From the limited amount of research I’ve done online, I tend to believe that BLEU-4 is a bit more important than the rest of the BLEU scores, but I am not sure. Do you have any suggestions?
  
  Thank you for your time!
  
  Reply
  - Jason Brownlee February 25, 2019 at 6:46 am #
    
    Perhaps look at the loss or the learning curve of loss across all saved models?
    
    Reply
- Jason Brownlee February 25, 2019 at 6:44 am #
  
  Thanks.
  
  Well done! Let me know what works well/best.
  
  Reply
Abhishek Verma February 25, 2019 at 7:46 pm #

Why did you increment vocab_size by 1 ??

Reply
- Jason Brownlee February 26, 2019 at 6:16 am #
  
  To start words at index 1 and make room for 0 == unknown word.
  
  Reply
Hassan February 26, 2019 at 10:35 pm #

Hy Jason!
Thanks for great article.
I tried to run the model through progressive loading. My code is running perfectly. But my model generates generates just 3,4 type of captions for every image. It seems model is being trained on just 3,4 captions. I follow exact your code.
Any suggestion to improve my results.

(PS: I am testing on the same images, on which model has been trained . . .but still result is worse)

Reply
- Jason Brownlee February 27, 2019 at 7:29 am #
  
  Sorry to hear that, some ideas:
  
  Perhaps try re-fitting the model?
  Perhaps try using a different final model?
  Perhaps there was a typo in your code or you skipped a line?
  
  Reply
Hassan February 28, 2019 at 9:53 pm #

Hy Jason !
I am not getting that why you reshaped image in the 4 dimensions .

image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))

What is the purpose of reshaping the input images in that dimension. Please help me out . . .

Reply
- Jason Brownlee March 1, 2019 at 6:18 am #
  
  The model expects an array of samples as input, e.g. 1 sample, and each image has rows, cols and channels.
  
  Reply
erebus March 1, 2019 at 3:10 pm #

Hi Jason, how can I continue your code with beam search algorithm? Because I want to show all the captions per image. Thanks!

Reply
- Jason Brownlee March 2, 2019 at 9:28 am #
  
  This might help:
  https://machinelearningmastery.com/beam-search-decoder-natural-language-processing/
  
  Reply
  - erebus March 2, 2019 at 6:51 pm #
    
    Where should I put the beam search algorithm in your code? Can you explain this to me because I want it to try. Thanks!
    
    Reply
    - Jason Brownlee March 3, 2019 at 8:00 am #
      
      Sorry, I don’t have a worked example.
      
      Reply
Rijoan March 7, 2019 at 5:27 am #

I have problems in this section

The complete updated example with progressive loading (use of the data generator) for training the caption generation model is listed below.

my output is shown below :

Requirement already satisfied: pydot in c:\users\rijoanrabbi\anaconda3\lib\site-packages (1.4.1)
Requirement already satisfied: pyparsing>=2.1.4 in c:\users\rijoanrabbi\anaconda3\lib\site-packages (from pydot) (2.2.0)
Dataset: 6000
Descriptions: train=6000
Photos: train=6000
Vocabulary Size: 7579
Description Length: 34
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_9 (InputLayer) (None, 34) 0
__________________________________________________________________________________________________
input_8 (InputLayer) (None, 4096) 0
__________________________________________________________________________________________________
embedding_3 (Embedding) (None, 34, 256) 1940224 input_9[0][0]
__________________________________________________________________________________________________
dropout_5 (Dropout) (None, 4096) 0 input_8[0][0]
__________________________________________________________________________________________________
dropout_6 (Dropout) (None, 34, 256) 0 embedding_3[0][0]
__________________________________________________________________________________________________
dense_7 (Dense) (None, 256) 1048832 dropout_5[0][0]
__________________________________________________________________________________________________
lstm_3 (LSTM) (None, 256) 525312 dropout_6[0][0]
__________________________________________________________________________________________________
add_3 (Add) (None, 256) 0 dense_7[0][0]
lstm_3[0][0]
__________________________________________________________________________________________________
dense_8 (Dense) (None, 256) 65792 add_3[0][0]
__________________________________________________________________________________________________
dense_9 (Dense) (None, 7579) 1947803 dense_8[0][0]
==================================================================================================
Total params: 5,527,963
Trainable params: 5,527,963
Non-trainable params: 0
__________________________________________________________________________________________________
—————————————————————————
ImportError Traceback (most recent call last)
in ()
162
163 # define the model
–> 164 model = define_model(vocab_size, max_length)
165 # train the model, run epochs manually and save after each epoch
166 epochs = 20

in define_model(vocab_size, max_length)
130 # summarize model
131 model.summary()
–> 132 plot_model(model, to_file=’model.png’, show_shapes=True)
133 return model
134

~\Anaconda3\lib\site-packages\keras\utils\vis_utils.py in plot_model(model, to_file, show_shapes, show_layer_names, rankdir)
130 ‘LR’ creates a horizontal plot.
131 “””
–> 132 dot = model_to_dot(model, show_shapes, show_layer_names, rankdir)
133 _, extension = os.path.splitext(to_file)
134 if not extension:

~\Anaconda3\lib\site-packages\keras\utils\vis_utils.py in model_to_dot(model, show_shapes, show_layer_names, rankdir)
53 from ..models import Sequential
54
—> 55 _check_pydot()
56 dot = pydot.Dot()
57 dot.set(‘rankdir’, rankdir)

~\Anaconda3\lib\site-packages\keras\utils\vis_utils.py in _check_pydot()
18 if pydot is None:
19 raise ImportError(
—> 20 ‘Failed to import pydot. ‘
21 ‘Please install pydot. ‘
22 ‘For example with pip install pydot.’)

ImportError: Failed to import pydot. Please install pydot. For example with pip install pydot.

Reply
- Jason Brownlee March 7, 2019 at 6:59 am #
  
  You can comment out the plot_model() call if you like.
  
  Reply
- Akshat Jadhav September 15, 2020 at 12:21 am #
  
  Hii…..How did u solve this error?
  I m also stuck here….I need ur help
  
  Reply
Md.Rijoan March 8, 2019 at 3:39 pm #

how many epoches takes for this training ? it took’s 6 hours per epoces ,now i am concerning how much time it will be taken ?

my laptop configuration
cpu:2.2GHz
Ram: 4Gb
graphics: 2Gb

another thanks for your previous reply 🙂

Reply
- Jason Brownlee March 9, 2019 at 6:21 am #
  
  Typically good results (low loss) can be seen in the first few epochs.
  
  Reply
  - Aman September 11, 2019 at 10:09 am #
    
    jason plz tell How many epochs takes for this training plzz tell us…Beacuse Evary Epoch take 3hour….5 Epoch enough or Not…plz tell me
    ..
    
    Reply
    - Jason Brownlee September 11, 2019 at 2:28 pm #
      
      Typically just 2-3 epochs is sufficient.
      
      Reply
RT March 8, 2019 at 7:20 pm #

Hi Jason
Awesome tutorial
Can you please guide me on how to call fit_generator like the same way we call model.fit(….)

filepath = ‘model-ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5′

checkpoint = ModelCheckpoint(filepath, monitor=’val_loss’, verbose=1,save_best_only=True, mode=’min’)

model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

i.e. along with callbacks , save best only and include a tensorboard callback to it too!

It’d be of great help.

Thank you!

Reply
- Jason Brownlee March 9, 2019 at 6:23 am #
  
  You can fall fit_generator() in an identical way to calling fit().
  
  What problem are you having exactly?
  
  Reply
  - RT March 9, 2019 at 6:12 pm #
    
    In Including a tensorboard callback
    
    Reply
    - RT March 10, 2019 at 2:16 am #
      
      generator_train=data_generator(train_descriptions,train_features,tokenizer,max_len)
      
      generator_test=data_generator(test_descriptions,test_features,tokenizer,max_len)
      
      generator_validtn=data_generator(validtn_descriptions,validtn_features,tokenizer,max_len)
      
      model.fit_generator(generator_train,steps_per_epoch=32,epochs=20,verbose=2,callbacks=[checkpoint],validation_data=generator_test,validation_steps=32)
      
      ———————————————————————————————————————————————-
      
      ValueError Traceback (most recent call last)
      in ()
      14 #model.fit_generator(generator_train,steps_per_epoch=64,epochs=20,verbose=2,validation_data=next(generator_validtn),validation_steps=64,callbacks=[checkpoint])#tf.keras.callbacks.TensorBoard()
      15
      —> 16 model.fit_generator(generator_train,steps_per_epoch=32,epochs=20,verbose=2,callbacks=[checkpoint],validation_data=generator_test,validation_steps=32)
      
      /usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
      89 warnings.warn(‘Update your ' + object_name + ' call to the ‘ +
      90 ‘Keras 2 API: ‘ + signature, stacklevel=2)
      —> 91 return func(*args, **kwargs)
      92 wrapper._original_function = func
      93 return wrapper
      
      /usr/local/lib/python3.6/dist-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
      1416 use_multiprocessing=use_multiprocessing,
      1417 shuffle=shuffle,
      -> 1418 initial_epoch=initial_epoch)
      1419
      1420 @interfaces.legacy_generator_methods_support
      
      /usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
      215 outs = model.train_on_batch(x, y,
      216 sample_weight=sample_weight,
      –> 217 class_weight=class_weight)
      218
      219 outs = to_list(outs)
      
      /usr/local/lib/python3.6/dist-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight)
      1209 x, y,
      1210 sample_weight=sample_weight,
      -> 1211 class_weight=class_weight)
      1212 if self._uses_dynamic_learning_phase():
      1213 ins = x + y + sample_weights + [1.]
      
      /usr/local/lib/python3.6/dist-packages/keras/engine/training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size)
      749 feed_input_shapes,
      750 check_batch_axis=False, # Don’t enforce the batch size.
      –> 751 exception_prefix=’input’)
      752
      753 if y is not None:
      
      /usr/local/lib/python3.6/dist-packages/keras/engine/training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
      136 ‘: expected ‘ + names[i] + ‘ to have shape ‘ +
      137 str(shape) + ‘ but got array with shape ‘ +
      –> 138 str(data_shape))
      139 return data
      140
      
      ValueError: Error when checking input: expected input_7 to have shape (4096,) but got array with shape (1536,)
      
      Reply
    - Jason Brownlee March 10, 2019 at 8:16 am #
      
      I dob’t believe callbacks make sense or can be used the same way when running epochs manually as we do in the progressive loading section.
      
      Reply
      - RT March 10, 2019 at 11:20 pm #
        
        Yeah, But I need a tensorboard callback for this, So how should I proceed with it?
      - Jason Brownlee March 11, 2019 at 6:51 am #
        
        No, sorry.
Md.Rijoan March 10, 2019 at 8:55 am #

Thanks all.

Finally i can run this in my laptop within 48 hours running in my laptop to train 6 epoches only.

i have some issues with library function and parameter name or rename problem with direction setup.

Reply
- Jason Brownlee March 11, 2019 at 6:42 am #
  
  Well done!
  
  Reply
martino March 16, 2019 at 12:53 am #

Hi Jason thanks for the excellent write up.

My workstation has 4 Titan XP GPUs and 128 GB RAM (Ubuntu 14.04) and stalls when training, right after showing “Epoch 1/20”.

If I run the version with progressive loading, it does proceed with training, but is too slow to be practical.

Please let me know if you have any suggestions!

Reply
- Jason Brownlee March 16, 2019 at 7:55 am #
  
  Wow, that is an impressive machine!
  
  I recommend running from the command line, not from a notebook or IDE, more advice here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
sadam March 21, 2019 at 8:13 pm #

Hy Jason!
Thanks for great article.
I am traying to change learning rate to 1e-5 so I change the code like this:

# compile model
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss=’categorical_crossentropy’, optimizer=’opt’)
# summarize model
model.summary()
return model
but I received this error

ValueError: Unknown optimizer: opt
so how can I change it .

———————–
another thing:

I tried to extract the features by using resnet50 so I only change a little in the code like this :

from keras.applications.resnet50 import ResNet50
from keras.applications.resnet50 import preprocess_input, decode_predictions
and
# load the model
model = ResNet50()

but it is not working , can you help me to change it

Reply
- Jason Brownlee March 22, 2019 at 8:26 am #
  
  Yes, don’t quite it like ‘opt’, just specify the variable name: opt
  
  Why is resnet not working?
  
  Reply
sadam March 22, 2019 at 10:17 pm #

Hy Jason!
Thanks it is working.

but when I am trying to train the model with resnet50 I received this error:

ValueError: Error when checking input: expected input_1 to have shape (4096,) but got array with shape (2048,)

is every model has different shape and what shape I should write for resnet50 and inceptionv3

another thing:

to extract the features by using inceptionv3 should i only change a little in the code like this :

from keras.applications.inception_v3 import InceptionV3

and
# load the model
model = InceptionV3()

and thank you so much for your help

Reply
- Jason Brownlee March 23, 2019 at 9:28 am #
  
  You can change the same of your input to match the model or change the model to match the shape of your data.
  
  For example, you can specify the input shape for the pre-trained model and use average pooling on the output layer.
  
  Reply
  - abbas July 27, 2019 at 5:48 pm #
    
    jason i am facing the same error while trying inceptionv3
    ValueError: Error when checking input: expected input_1 to have shape (4096,) but got array with shape (2048,)
    
    please let me know how to specify the input shape for my inceptionv3?
    please write code if possible.
    
    Reply
    - Jason Brownlee July 28, 2019 at 6:41 am #
      
      Perhaps start with vgg. Once you have it working, you can try adapting it to use another model.
      
      Reply
Saddam March 23, 2019 at 11:53 pm #

Hy jason!
Really thank you I changed only the shape and it is working but I found that vgg16 is better than inceptionv3 , is it true?

Reply
- Jason Brownlee March 24, 2019 at 7:06 am #
  
  It can, it depends on the specifics of the application.
  
  Reply
- Edward October 22, 2020 at 10:29 pm #
  
  Saddam can you please share your code or github link for inceptionv3 and resnet50? link: edwardsharma1311@gmail.com
  
  Reply
Saddam March 24, 2019 at 7:53 pm #

Thank you so much my dear teacher

Reply
- Jason Brownlee March 25, 2019 at 6:43 am #
  
  You’re welcome.
  
  Reply
Habeb March 26, 2019 at 2:53 am #

Hi Jason , thank you For this article. I want to ask you what is the best learning rate and regularization to this code Because I tried to use different learning rate but the results was not ok

Reply
- Jason Brownlee March 26, 2019 at 8:11 am #
  
  I don’t know the best hyperparameters for the model, perhaps you can experiment and discover what works well/best, more here:
  https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/
  
  Reply
Anjali March 28, 2019 at 6:04 pm #

Hi Jason. Extremely Thank You for this fascinating article. Now I’ve chosen this for my Masters Project. Sir can you give me the Data Flow Diagram for the same upto 2 levels. Actually i’ve done but still want to clarify. So can you please try it.

Reply
- Jason Brownlee March 29, 2019 at 8:27 am #
  
  Sorry, I cannot prepare a data flow diagram for you. You have everything you need to create one yourself.
  
  Reply
- Harshit Mittal March 5, 2020 at 4:27 pm #
  
  heyy Can u plz send your data flow diagram. I too just want to clarify
  
  Reply
Habeb March 28, 2019 at 6:11 pm #

Hi Jason , when I am traying to decrease the size of Dense from 256 to 128 I found this error

ValueError: Operands could not be broadcast together with shapes (128,) (256,).

what is the reason for this error

Reply
- Jason Brownlee March 29, 2019 at 8:27 am #
  
  I’m not sure off the cuff, you can debug the error to discover it’s cause, here are some suggestions:
  https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
  
  Reply
Habeb March 28, 2019 at 7:09 pm #

another question :
I run your code and in epoch 3 I received this result

BLEU-1: 0.566754
BLEU-2: 0.310778
BLEU-3: 0.210816
BLEU-4: 0.095132

why I didn’t received like your result specially blue-4 .

Reply
- Jason Brownlee March 29, 2019 at 8:29 am #
  
  This is due to the stochastic nature of the learning algorithm, more here:
  https://machinelearningmastery.com/faq/single-faq/why-do-i-get-different-results-each-time-i-run-the-code
  
  Reply
  - Habeb March 29, 2019 at 8:10 pm #
    
    i read (https://machinelearningmastery.com/faq/single-faq/why-do-i-get-different-results-each-time-i-run-the-code) and i understand that :
    
    this problem because i am using pc , but using the GPU to train your models, the backend may be configured to use a sophisticated stack of GPU libraries so if i train my model using gpu maybe i will not face like this problem
    
    and thank you for helping
    
    Reply
    - Habeb March 29, 2019 at 8:22 pm #
      
      BLEU-1: 0.566754
      BLEU-2: 0.310778
      BLEU-3: 0.210816
      BLEU-4: 0.095132
      
      i am focusing about BLEU-4 why its 0.095132 at least it should be 0.9 no 0.09
      
      Reply
      - Habeb March 29, 2019 at 10:04 pm #
        
        and your result for BLEU-4: 0.131446 and my result BLEU-4: 0.095132
    - Jason Brownlee March 30, 2019 at 6:26 am #
      
      The main reason is the stochastic learning algorithm of neural networks.
      
      Reply
Habeb March 28, 2019 at 8:28 pm #

another question :

to add one lstm more I only add

se3 = LSTM(256, return_sequences=True)(se2)

before
se4 = LSTM(256)(se3)

is this true or no? and thank you for your help

Reply
- Jason Brownlee March 29, 2019 at 8:31 am #
  
  It looks correct.
  
  Reply
Yash Dwivedi March 30, 2019 at 7:34 pm #

Hey, amazing work
When i train my network iam getting this error:-

if len(set(self.inputs)) != len(self.inputs):

TypeError: unhashable type: ‘numpy.ndarray’

What is the problem?

Reply
- Jason Brownlee March 31, 2019 at 9:28 am #
  
  That is very odd, I have not seen that before, sorry.
  
  Reply
Audi April 1, 2019 at 5:21 am #

Hey Jason,
Can I train the following model on Flickr30K with 16GB of RAM?
Thanks

Reply
- Jason Brownlee April 1, 2019 at 7:52 am #
  
  Yes, but you may have to use progressive loading.
  
  Reply
Anjali April 6, 2019 at 3:21 am #

hii Jason, I’m using progressive loading. And the last sequence of code is as follows…..

Total params: 5,527,963
Trainable params: 5,527,963
Non-trainable params: 0
__________________________________________________________________________________________________
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/1
2019-04-05 16:17:05.604626: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-04-05 16:17:05.824794: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1796570000 Hz
2019-04-05 16:17:05.834826: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6e06d60 executing computations on platform Host. Devices:
2019-04-05 16:17:05.834922: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
6000/6000 [==============================] – 5512s 919ms/step – loss: 4.7276
Epoch 1/1
6000/6000 [==============================] – 5896s 983ms/step – loss: 3.9618
Epoch 1/1
6000/6000 [==============================] – 5610s 935ms/step – loss: 3.7152
Epoch 1/1
2800/6000 [============>. . . . . . . . . . . . . . . . . ] – ETA- 52:12 loss:3.58

And it is still loading. So my question is when do i want to close the terminal as it is taking hours to load?

Reply
- Jason Brownlee April 6, 2019 at 6:51 am #
  
  It is not loading, the code is running and the model is being fit.
  
  You can probably kill it after 5-10 epochs.
  
  Reply
Anjali April 6, 2019 at 1:34 pm #

Thank you…! Now I’m facing another issue. While running the evaluation code, I got some error.

Using TensorFlow backend.
Traceback (most recent call last):
File “./evaluate.py”, line 7, in
from nltk.translate.bleu_score import corpus_bleu
ImportError: No module named nltk.translate.bleu_score

Reply
- Anjali April 6, 2019 at 2:06 pm #
  
  Yes, Ive fixed it myself. And the BLEU Scores i got is:
  
  BLEU-1: 0.555265
  BLEU-2: 0.311960
  BLEU-3: 0.217599
  BLEU-4: 0.103576
  
  Reply
  - Jason Brownlee April 7, 2019 at 5:25 am #
    
    Well done.
    
    Reply
- Jason Brownlee April 7, 2019 at 5:25 am #
  
  It looks like you might need to install the nltk library.
  
  Reply
Anjali P Kaimal April 7, 2019 at 4:03 pm #

Yes. Thank You Jason. But now when I try to get caption for another image, the first caption is displaying. I’ve tried several images but the caption is not changing. Why is it so?

Reply
- Jason Brownlee April 8, 2019 at 5:54 am #
  
  Perhaps your model is overfit?
  
  You could try fitting the model again, or using a model from an earlier step in the training process?
  
  Reply
Anjali April 8, 2019 at 1:02 am #

Yeah.. Now it is working. But the caption has no accuracy. Jason, what about trying MS-COCO dataset instead of flickr? As the number of images n their captions are very high, Is there any chance for getting accurate results?

Reply
- Jason Brownlee April 8, 2019 at 5:55 am #
  
  I suspect your model is overfit.
  
  You can explore another dataset, let me know how you go.
  
  Reply
samjava April 8, 2019 at 6:24 am #

Ever since started studying books authored by you, my zeal for machine learning grew up fast

Reply
- Jason Brownlee April 8, 2019 at 1:53 pm #
  
  Thanks, I’m happy to hear that!
  
  Reply
Alka April 8, 2019 at 9:39 pm #

Hi Jason,
I am using InceptionV3 model here…everything is going great but when i am testing train model with an image i got an error…………………….

File “\Anaconda3\lib\site-packages\keras\engine\training_utils.py”, line 138, in standardize_input_data
str(data_shape))

ValueError: Error when checking input: expected input_12 to have shape (2048,) but got array with shape (1000,)

Reply
- Jason Brownlee April 9, 2019 at 6:25 am #
  
  Looks like a mismatch between data and model. You may have to debug it:
  https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
  
  Reply
CJ April 9, 2019 at 12:50 pm #

Hi Jason.

This is such a great blog! So glad to see you still actively responding.

I’m a noob to machine learning, so forgive my simplistic understanding. I have a question about how to alter the “style” of the captions (style transfer). For example, how would I go about changing the existing captions to match the linguistic style of a child, or of different people (ie, Donald Trump, Snoop Dog, etc.), using text files with speech samples of various people?

Based on your article here, you stated that I could use Pre-trained Word Vectors, or use my own vocabulary file and mapping to integers function during training..?? Can you explain that a little more and perhaps point me in the right direction for applying these methods?

Any recommendation would be appreciated. Thanks in advance.

Reply
- Jason Brownlee April 9, 2019 at 2:42 pm #
  
  Good question, you might need to first translate the text examples in the training dataset, then use that as the training dataset for fitting the caption model.
  
  Reply
Alka April 9, 2019 at 7:54 pm #

Hi Jason,

i solved my previous problem but now i got stuck with the result

startseq man man man man man man man man man man man man man man man man man man man man man man man man man man man man man man man man man man

every time i generate the caption .

Reply
- Jason Brownlee April 10, 2019 at 6:10 am #
  
  Perhaps your model is overfit or underfit?
  
  Perhaps try fitting it again?
  
  Reply
- HomaK February 28, 2022 at 5:18 am #
  
  Hi Alka
  I also have same problem. Could you resolve it?
  
  Reply
Anjali April 11, 2019 at 2:13 pm #

Hi.. Jason. Now I’m going to change the dataset by Flickr30K dataset. Inorder to check any change in accuracy. So I’ve a doubt, here flickr 8K provides 6K train images. So is there any chance of increasing accuracy as increasing the number of train images?

Reply
- Jason Brownlee April 11, 2019 at 2:22 pm #
  
  I think it is likely, yes.
  
  Reply
Shayak April 12, 2019 at 12:54 pm #

Hi Jason,
Thank you so much for your great work, which is really helpful to understand image captioning work from scratch.I have prepared own dataset containing Nepalese socio-cultural images (total 400 images with 3 captions per image).
1. It works fine on training set but generates jumbled sentences on test set.Why this happen and how to generate relevent and grammatically correct captions?
2. What is the role of accuracy? I think accuracy refers to the VGG16’s accuracy, so is it relevent to calculate here?
3. After 40 epoch, there is no significant decreases on loss. So is there any way to reduce loss?
4. I have changed droupout from 0.5 to other values but no significant changes in result happen.So, is it necessary to put droupout on small dataset also?

Best regards
Shayak
shayakraj@ioe.edu.np

Reply
- Jason Brownlee April 12, 2019 at 2:47 pm #
  
  Perhaps the model requires further training or tuning, these tutorials may help:
  https://machinelearningmastery.com/start-here/#better
  
  Accuracy is useless, use BLEU or similar scores.
  
  Monitor validation loss compared to train loss for the right time to stop training, see this:
  https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/
  
  Dropout helps stop overfitting, monitor learning curves to see if you are overfitting.
  
  Reply
Anjali April 13, 2019 at 1:37 am #

Hii Jason, I’m using 3000 development images and 25381 train images as I’m using flickr30K dataset. So how many epochs we need?

Reply
- Jason Brownlee April 13, 2019 at 6:33 am #
  
  It is an intractable question.
  
  Train until the model achieves a good fit.
  
  Reply
Anjali April 13, 2019 at 11:33 pm #

While extracting the features of images, i got an error.

terminate called after throwing an instance of ‘std::bad_alloc’
what(): std::bad_alloc
Aborted (core dumped)

i’ve waited one whole day for the execution, but it end up like this. What will be the reason?

Reply
- Jason Brownlee April 14, 2019 at 5:48 am #
  
  Sorry to hear that.
  
  It sounds like a hardware fault.
  
  Perhaps try searching/posting about it on stackoverflow?
  
  Reply
Xu Zhang April 17, 2019 at 5:01 am #

Such a great post! I learned a lot from it.

When you train your model with progressive loading, what is the reason that you used a for loop and train your model with one epoch?

for i in range(epochs):
# create the data generator
generator = data_generator(train_descriptions, train_features, tokenizer, max_length)
# fit for one epoch
model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
# save model
model.save(‘model_’ + str(i) + ‘.h5’)

Many thanks.

Reply
- Jason Brownlee April 17, 2019 at 7:05 am #
  
  In this tutorial, because I want to save my model manually each epoch.
  
  Reply
Donny April 19, 2019 at 5:23 am #

Hi Jason,

This is an excellent tutorial and really thankful for it. May I check with you, why did you use 256 for your dense_1 and lstm_1 layers? Are there any considerations how one might choose this number?

Reply
- Jason Brownlee April 19, 2019 at 6:20 am #
  
  Trial and error. There is no reliable way to select the number of nodes other than experimentation:
  https://machinelearningmastery.com/faq/single-faq/how-many-layers-and-nodes-do-i-need-in-my-neural-network
  
  Reply
madi April 19, 2019 at 4:45 pm #

Hi Jason
can you plz tell me what basically this error is ?
mportError Traceback (most recent call last)
ImportError: numpy.core.multiarray failed to import

The above exception was the direct cause of the following exception:

SystemError Traceback (most recent call last)
~\Anaconda3\lib\importlib\_bootstrap.py in _find_and_load(name, import_)

SystemError: returned a result with an error set

—————————————————————————
ImportError Traceback (most recent call last)
ImportError: numpy.core._multiarray_umath failed to import

—————————————————————————
ImportError Traceback (most recent call last)
ImportError: numpy.core.umath failed to import

Reply
- Jason Brownlee April 20, 2019 at 7:32 am #
  
  Sorry, I have not seen this error before.
  
  It looks like numpy might not be installed correctly.
  
  Perhaps try this tutorial:
  https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/
  
  Reply
habib April 20, 2019 at 3:05 am #

hi jason
how can i use beam search with this code

Reply
- Jason Brownlee April 20, 2019 at 7:42 am #
  
  This might help:
  https://machinelearningmastery.com/beam-search-decoder-natural-language-processing/
  
  Reply
  - habib April 23, 2019 at 1:47 am #
    
    thank you teacher jason :
    
    i read this https://machinelearningmastery.com/beam-search-decoder-natural-language-processing/
    
    and the data is :
    # define a sequence of 10 words over a vocab of 5 words
    data = [[0.1, 0.2, 0.3, 0.4, 0.5],
    [0.5, 0.4, 0.3, 0.2, 0.1],
    [0.1, 0.2, 0.3, 0.4, 0.5],
    [0.5, 0.4, 0.3, 0.2, 0.1],
    [0.1, 0.2, 0.3, 0.4, 0.5],
    [0.5, 0.4, 0.3, 0.2, 0.1],
    [0.1, 0.2, 0.3, 0.4, 0.5],
    [0.5, 0.4, 0.3, 0.2, 0.1],
    [0.1, 0.2, 0.3, 0.4, 0.5],
    [0.5, 0.4, 0.3, 0.2, 0.1]]
    
    but i dont know where is the data that i can use it in our code in this page
    
    Reply
    - Jason Brownlee April 23, 2019 at 7:56 am #
      
      Sorry, I cannot implement this for you.
      
      Reply
Arelis April 20, 2019 at 10:03 pm #

Hello Jason. I have a problem with the file of features
features = {k: all_features[k] for k in dataset}
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
Could you help, please

Thank you

Reply
- Jason Brownlee April 21, 2019 at 8:21 am #
  
  I’m sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Anjali April 22, 2019 at 12:25 pm #

While executing the code for generating new caption I got an error.

File “./ur_caption.py”, line 73, in
description = generate_desc(model, tokenizer, photo, max_length)
File “./ur_caption.py”, line 49, in generate_desc
yhat = model.predict([photo,sequence], verbose=0)
File “/usr/local/lib/python2.7/dist-packages/keras/engine/training.py”, line 1149, in predict
x, _, _ = self._standardize_user_data(x)
File “/usr/local/lib/python2.7/dist-packages/keras/engine/training.py”, line 751, in _standardize_user_data
exception_prefix=’input’)
File “/usr/local/lib/python2.7/dist-packages/keras/engine/training_utils.py”, line 138, in standardize_input_data
str(data_shape))
ValueError: Error when checking input: expected input_2 to have shape (74,) but got array with shape (34,)

How to resolve this?

Reply
- Jason Brownlee April 22, 2019 at 2:27 pm #
  
  Sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
- Saurabh May 6, 2019 at 5:19 pm #
  
  Hi Anjali,
  I am also trying to implement photo caption generation as my project. I have already implemented it on Flickr8k and now I am planning to implement it on Flickr30k. For any queries you can contact me on saurabh18@somaiya.edu
  
  Cheers!
  
  Reply
habib April 26, 2019 at 10:07 pm #

hi Jason :
for vgg16 we put model = Model(inputs=model.inputs, outputs=model.layers[-1].output)
for inception v2 is the same of I should change outputs=model.layers[-2]

Reply
- Jason Brownlee April 27, 2019 at 6:30 am #
  
  It might be different considering the architecture of the model. Perhaps use the API to create average pooling layer on the output and add output layers to it?
  
  Reply
sadam nagi April 29, 2019 at 9:02 pm #

hi Jason :
I want to ask you about why I got like this result:

with model-ep001-loss4.514-val_loss4.070.h5 : I got this result

BLEU-1: 0.568186
BLEU-2: 0.308237
BLEU-3: 0.208821
BLEU-4: 0.094350

but with model-ep002-loss3.878-val_loss3.897.h5 : I got this result

BLEU-1: 0.431618
BLEU-2: 0.224081
BLEU-3: 0.148920
BLEU-4: 0.059321

Reply
- Jason Brownlee April 30, 2019 at 6:55 am #
  
  Good question, I answer it here:
  https://machinelearningmastery.com/faq/single-faq/why-do-i-get-different-results-each-time-i-run-the-code
  
  Reply
sadam nagi April 30, 2019 at 8:10 pm #

thank you so much

Reply
- Jason Brownlee May 1, 2019 at 7:03 am #
  
  You’re welcome.
  
  Reply
sadam nagi April 30, 2019 at 8:43 pm #

hi jaso :
for example with progressive how can i change batch size equal to 16 or any number
and thank you for your help

Reply
madi April 30, 2019 at 9:14 pm #

Hi Jason
while progressive loading i am having this error
Dataset: 6000
Descriptions: train=6000
Photos: train=6000
Vocabulary Size: 3857
Description Length: 30
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_7 (InputLayer) (None, 30) 0
__________________________________________________________________________________________________
input_6 (InputLayer) (None, 4096) 0
__________________________________________________________________________________________________
embedding_3 (Embedding) (None, 30, 256) 987392 input_7[0][0]
__________________________________________________________________________________________________
dropout_5 (Dropout) (None, 4096) 0 input_6[0][0]
__________________________________________________________________________________________________
dropout_6 (Dropout) (None, 30, 256) 0 embedding_3[0][0]
__________________________________________________________________________________________________
dense_7 (Dense) (None, 256) 1048832 dropout_5[0][0]
__________________________________________________________________________________________________
lstm_3 (LSTM) (None, 256) 525312 dropout_6[0][0]
__________________________________________________________________________________________________
add_3 (Add) (None, 256) 0 dense_7[0][0]
lstm_3[0][0]
__________________________________________________________________________________________________
dense_8 (Dense) (None, 256) 65792 add_3[0][0]
__________________________________________________________________________________________________
dense_9 (Dense) (None, 3857) 991249 dense_8[0][0]
==================================================================================================
Total params: 3,618,577
Trainable params: 3,618,577
Non-trainable params: 0
__________________________________________________________________________________________________
WARNING:tensorflow:From C:\Users\Dell\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/1
—————————————————————————
ValueError Traceback (most recent call last)
in
168 generator = data_generator(train_descriptions, train_features, tokenizer, max_length)
169 # fit for one epoch
–> 170 model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
171 # save model
172 model.save(‘model_’ + str(i) + ‘.h5’)

~\Anaconda3\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
89 warnings.warn(‘Update your ' + object_name + ' call to the ‘ +
90 ‘Keras 2 API: ‘ + signature, stacklevel=2)
—> 91 return func(*args, **kwargs)
92 wrapper._original_function = func
93 return wrapper

~\Anaconda3\lib\site-packages\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
1416 use_multiprocessing=use_multiprocessing,
1417 shuffle=shuffle,
-> 1418 initial_epoch=initial_epoch)
1419
1420 @interfaces.legacy_generator_methods_support

~\Anaconda3\lib\site-packages\keras\engine\training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
215 outs = model.train_on_batch(x, y,
216 sample_weight=sample_weight,
–> 217 class_weight=class_weight)
218
219 outs = to_list(outs)

~\Anaconda3\lib\site-packages\keras\engine\training.py in train_on_batch(self, x, y, sample_weight, class_weight)
1209 x, y,
1210 sample_weight=sample_weight,
-> 1211 class_weight=class_weight)
1212 if self._uses_dynamic_learning_phase():
1213 ins = x + y + sample_weights + [1.]

~\Anaconda3\lib\site-packages\keras\engine\training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size)
749 feed_input_shapes,
750 check_batch_axis=False, # Don’t enforce the batch size.
–> 751 exception_prefix=’input’)
752
753 if y is not None:

~\Anaconda3\lib\site-packages\keras\engine\training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
126 ‘: expected ‘ + names[i] + ‘ to have ‘ +
127 str(len(shape)) + ‘ dimensions, but got array ‘
–> 128 ‘with shape ‘ + str(data_shape))
129 if not check_batch_axis:
130 data_shape = data_shape[1:]

ValueError: Error when checking input: expected input_6 to have 2 dimensions, but got array with shape (15, 7, 7, 512)

Reply
- Jason Brownlee May 1, 2019 at 7:03 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - madi May 1, 2019 at 11:22 pm #
    
    but how can i solve it? any hint ?
    
    Reply
habib May 3, 2019 at 4:47 am #

hi jason :

I am a little confused, after training I received 20 models(for 20 epochs ) should I evaluate all of them and take the heighest accuracy and if like this it need along time.
and thank you for your help

Reply
- Jason Brownlee May 3, 2019 at 6:23 am #
  
  Evaluate all and choose one with lowest loss.
  
  Reply
habib May 3, 2019 at 7:38 am #

hi jason :
thank you for your answering

means i should choose the heighst blue score because we are using BLUE

Reply
- Jason Brownlee May 3, 2019 at 2:41 pm #
  
  Yes, a larger BLEU score is better.
  
  Reply
- Aman Ullah June 5, 2020 at 5:49 am #
  
  A model having high blue score blue is 0.57.also loss 4.6699.
  and another model having blue score 0.43 and loss 3.088
  What would we choose???
  
  Reply
  - Jason Brownlee June 5, 2020 at 8:25 am #
    
    This will help you understand the score:
    https://machinelearningmastery.com/calculate-bleu-score-for-text-python/
    
    Reply
ThiLee May 12, 2019 at 2:52 am #

Hi Jason,
Thank you very much for your amazing article
I have a problem:
tokenizer = load(open(‘tokenizer.pkl’, ‘rb’))
FileNotFoundError: [Errno 2] No such file or directory: ‘tokenizer.pkl’
I don’t know how the file is organized.
Could you help me, please!

Reply
- Jason Brownlee May 12, 2019 at 6:45 am #
  
  I think you might have missed some lines of code.
  
  Reply
Sravan Malla May 14, 2019 at 8:12 am #

Can we use batch_size with progressive loading?

like instead of steps_per_epoch = len(train_descriptions)
can we give it as len(train_descriptions)/batch_size=32

How do we ensure generator is yielding o/p in batches?

currently if my understanding is correct, in our case the generator is yielding single output so we had steps_per_epoch = len(train_descriptions). am i right?

Reply
- Jason Brownlee May 14, 2019 at 2:26 pm #
  
  Yes, you have complete control over the data generator.
  
  You can load and yield any number of sample you wish.
  
  Reply
  - Sravan Malla May 14, 2019 at 3:03 pm #
    
    So in this case, where our data generator is yielding single sample everytime, we can’t use
    steps_per_epoch = len(train_descriptions)/batch_size=32 ? am I right?
    
    Reply
    - Jason Brownlee May 15, 2019 at 8:09 am #
      
      The steps per epoch should be the total samples divided by the batch size, perhaps as you have listed.
      
      Reply
      - Sravan Malla May 15, 2019 at 3:07 pm #
        
        Taking this as reference I built data generator for Neural machine translation to load larger data and one-hot encode the targets and train the model without facing Memory Error, just an extension to your tutorial at https://machinelearningmastery.com/develop-neural-machine-translation-system-keras/.
        
        There the Data Generator I built is yielding one output i.e. single trainX, trainY for every yield
        
        So while fit_generator, I just gave a try giving “steps_per_epoch = total_samples/32 (batch_size)”, then I tried to evaluate model saved after 1 epoch and results seems to be surprising almost same and not making any sense for any input we give, may be beacuse not all the records are passed through the model, becasue the generator I coded is yielding one record at a time and steps I am asking is lesser i.e. diving by batch size.
        
        So changed it back to steps_per_epoch = total samples, then the model I saved after 1 epoch is giving some sensible outputs may not be very accurate as its just 1 epoch but taking some good amount of time to train.
      - Jason Brownlee May 16, 2019 at 6:23 am #
        
        Well done!
- Abhi February 18, 2020 at 4:25 am #
  
  Sir, I’m getting memory error while running updated code too..What can I do sir ? My jupyter notebook is also not responding when running that code..please give me a solution sir…
  
  Reply
  - Jason Brownlee February 18, 2020 at 6:24 am #
    
    Run from the command line, not a notebook.
    
    Use progressive loading.
    
    Reply
Boubacar May 29, 2019 at 8:26 am #

hello Jason!
I use the progressive loading and I fund

BLEU-1: 0.536255
BLEU-2: 0.289525
BLEU-3: 0.201866
BLEU-4: 0.096334
can you explain me what I can do for ameliorate this.

secondly when I wnnt generate a new caption whith a new photograph for example a Photo of a dog at the beach, I fund this description “startseq two black dogs are playing in the water endseq”, you see it’s not a good description.

Thanks!

Reply
- Jason Brownlee May 29, 2019 at 8:59 am #
  
  I give general suggestions for improving deep learning performance here:
  https://machinelearningmastery.com/start-here/#better
  
  It is a good description if you consider the reflection of the dog. Perhaps try another photo?
  
  Reply
Saurabh Shinde June 3, 2019 at 1:52 am #

Hello Jason,
I am doing this experiment with Flickr30k dataset. When training the same decoder architecture and evaluating on test data, the model performance is decreasing when compared to Flickr8k. The BLEU scores for F30k are worse than F8k. What should be done to solve this problem?

Maybe,
1) Limit the vocabulary? (F8k had around 8000 words and F30k has 18000 words)
2) Add another LSTM layer? (but doubles the training time)
3) Increase/Decrease word vector dimension?
4) Change no. of units in LSTM?
5) Add more Dense layers?

What should be done to achieve similar performance of F8k on F30k?

Reply
- Jason Brownlee June 3, 2019 at 6:43 am #
  
  Perhaps try each approach and evaluate the effect on model skill.
  
  Reply
  - Saurabh Shinde June 5, 2019 at 2:13 am #
    
    Also, I have a doubt. In create_sequences function, we are not passing the vocab_size, but it is used in the to_categorical method. How is it one-hot encoding it if vocab_size variable is not passed in the function?
    
    Reply
    - Jason Brownlee June 5, 2019 at 8:49 am #
      
      Looks like a bug, create_sequences uses the vocab size.
      
      I will schedule time to update the code.
      
      Thanks!
      
      Update: Fixed.
      
      Reply
    - JN June 10, 2019 at 11:05 pm #
      
      Hi Saurabh,
      
      We might not need to pass the vocab_size as argument for the create_sequences function because the vocab_size is a global variable so it can be used inside the create_sequences function. Is that right, Jason Brownlee?
      
      Reply
      - Jason Brownlee June 11, 2019 at 7:53 am #
        
        Yes, but that was not the intent. I like to pass things around.
      - Saurabh Shinde June 23, 2019 at 12:46 am #
        
        Yes, I thought of that too, because of vocab_size being a global variable. Anyways, thanks!
CharlesYuan June 6, 2019 at 2:27 pm #

i try to download the dataset but it failed two times when download about 873 Mb

Reply
- Jason Brownlee June 7, 2019 at 7:45 am #
  
  I’m sorry to hear that.
  
  Perhaps try downloading from a different computer, at a different time of day, or via a different internet connection?
  
  Reply
JN June 10, 2019 at 11:00 pm #

Hi Jason,
Thank you for your great tutorial. I’ve been trying to understand your code line by line. However, I can’t figure out this line
vocab_size = len(tokenizer.word_index) + 1
I thought the vocab_size should be the same as the tokenizer.word_index. Why did you add an extra 1?

Reply
- Jason Brownlee June 11, 2019 at 7:52 am #
  
  Good question.
  
  We add 1 for the integer “0” used for “unknown”, e.g. words not in our vocab.
  
  Therefore, integers assigned to words in our vocab start at 1, not 0.
  
  Reply
CharlesYuan June 11, 2019 at 9:22 pm #

FInally i successfully download the dataset file. looks like the sample of your code looks wrong,

# extract features from all images
directory = ‘Flicker8k_Dataset’

should be ‘Flickr8k not Flicker8k

# extract features from all images
directory = ‘Flickr8k_Dataset’

Reply
- Jason Brownlee June 12, 2019 at 8:03 am #
  
  Thanks, fixed!
  
  Reply
CharlesYuan June 12, 2019 at 11:42 am #

Just let you know,

I tried the example.jpg and got “two dogs are running in the snow” 🙂

Reply
- CharlesYuan June 12, 2019 at 11:53 am #
  
  i try two more images, one picture is a bird and the other is the “55470226_52ff517151”, but the return description is always “man in red shirt is standing on the street”
  
  Reply
  - Jason Brownlee June 12, 2019 at 2:24 pm #
    
    Perhaps try a different fit of the model?
    
    Reply
    - CharlesYuan June 13, 2019 at 11:59 am #
      
      Do you mean try different pictures?
      
      if so, I had tried about ten different pictures, man, dog or child, the model return
      
      9 time “man in red shirt is standing on the street” and 1 time ” two dogs are running in the grass’.
      
      Reply
      - Jason Brownlee June 13, 2019 at 2:34 pm #
        
        I’m suggesting perhaps try refitting the model/try different checkpoints of saved model weights.
        
        Your chosen model may have overfit.
- Jason Brownlee June 12, 2019 at 2:24 pm #
  
  Ha, nice!
  
  The image has a rejection that might make it a complex example.
  
  Reply
Ajay Dabas June 16, 2019 at 4:50 pm #

Hi Jason, thanks for this awesome tutorial. I’d like to share my work which is highly inspired by this tutorial.

Github repo: https://github.com/dabasajay/Image-Caption-Generator

I’ve tried InceptionV3 and VGG16 as Encoder and two types of RNN as Decoder making a combination of 4 image captioning models and compared results. I also implemented BEAM search algorithm and compared results with simple argmax.

Please have a look, thank you.

Ajay Dabas
Github: https://github.com/dabasajay

Reply
- Jason Brownlee June 17, 2019 at 8:17 am #
  
  Nice work.
  
  Reply
saddam June 17, 2019 at 3:07 am #

hello jaso :
I want to ask you why for image caption we are using different matrix (bleu1, bleu2, bleu3 , bleu4) why 4 blue scores not one and what are the different betwwen them .

Reply
- Jason Brownlee June 17, 2019 at 8:25 am #
  
  Good question, I explain more here:
  https://machinelearningmastery.com/calculate-bleu-score-for-text-python/
  
  Reply
Ritik June 20, 2019 at 2:39 pm #

Hi Jason

I am trying to replace the RNN model used for the language model by a CNN language model.

I have understood the conepts but i am not able to figure out how to code it.

If you don’t mind , please help me with the above

Reply
- Jason Brownlee June 21, 2019 at 6:32 am #
  
  Perhaps this will help as a starting point:
  https://machinelearningmastery.com/how-to-develop-convolutional-neural-network-models-for-time-series-forecasting/
  
  Reply
saddam June 22, 2019 at 3:19 am #

hi jason :

i used vgg16 ,resnet50 and inception v3 in my model and the results like this :

vgg16 :

Bleu_1: 0.661
Bleu_2: 0.486
Bleu_3: 0.350
Bleu_4: 0.252

resnet50 :

Bleu_1: 0.682
Bleu_2: 0.511
Bleu_3: 0.375
Bleu_4: 0.274

inception v3 :

Bleu_1: 0.646
Bleu_2: 0.470
Bleu_3: 0.339
Bleu_4: 0.248

when i test images using those three models i found that :
resnet50 generate a good sentence better than vgg16 ,
but the problem that inception v3 generate a good sentence better than vgg16 and resnet50
, so why bleu score for inception v3 is lowest than vgg16 and resnet50 , really i am so confused

Reply
- Jason Brownlee June 22, 2019 at 6:46 am #
  
  Nice work!
  
  Perhaps the BLEU score is not capturing what you noticed in the generated sentence structure? It is a very simple score.
  
  Reply
- abbas July 27, 2019 at 5:09 pm #
  
  Please saddam share your code with me .I also want to try the inception model.thanks in advance.
  abbaskhan857@yahoo.com
  
  Reply
- Phyu Phyu Khaing October 24, 2019 at 8:09 pm #
  
  Dear Saddam,
  
  How do you create the model to get that results.
  I don’t get as that results.
  Can you please tell me how do you create the network to achieve those scores you obtained?
  If you are okay, may I know.
  My email address is phyukhaing7@gmail.com
  
  Best Regards,
  
  Reply
  - Jason Brownlee October 25, 2019 at 6:39 am #
    
    I ran the code as described in the tutorial.
    
    Are you able to confirm your libraries are up to date?
    
    Reply
    - Phyu Phyu Khaing October 25, 2019 at 8:01 pm #
      
      Yes, my library is up to date.
      
      I don’t get Saddam’s result.
      I would like to get the best result.
      How should change the model?
      
      Reply
- Phelan December 22, 2019 at 12:33 am #
  
  Hi saddam, Jason:
  For VGG16, I got poor result (specical BLUE-4). Not high as your result
  BLEU-1: 0.487551
  BLEU-2: 0.259738
  BLEU-3: 0.179878
  BLEU-4: 0.085398
  
  Is there any modification compared to this article?
  
  Reply
  - Jason Brownlee December 22, 2019 at 6:15 am #
    
    Perhaps try fitting the model a few times and compare results?
    
    Reply
Saddam June 22, 2019 at 1:05 pm #

I want to compare the results of those three models and write a discussion about it so I trained those three models until 32 epochs and I received those results , so for example depends on bleu score can I say that resnet50 gave me the best results

Reply
- Jason Brownlee June 23, 2019 at 5:29 am #
  
  It is a good idea to average the results of a neural network over multiple runs, if possible.
  
  Reply
  - Saddam June 23, 2019 at 2:31 pm #
    
    Thank you so much
    
    Reply
Saurabh Shinde June 23, 2019 at 12:45 am #

Hi Saddam,
Can you please tell me what parameters did you use in your network to achieve those scores you obtained?

You can email me at this address: saurabh18@somaiya.edu

Reply
- Jason Brownlee June 23, 2019 at 5:37 am #
  
  All of the parameters are listed in the code directly.
  
  What parameters are you having problems with exactly?
  
  Reply
  - Saurabh Shinde June 23, 2019 at 6:50 pm #
    
    Hi Jason,
    
    I just want to know if I change the dataset from 8K to 30K, then should I change the sequence model architecture as well?, because I tried training with same architecture on 30k and it was overfitting and the for every new image it is given same caption.
    
    Reply
    - Saurabh Shinde June 23, 2019 at 6:58 pm #
      
      Also, in the Show and Tell paper by A. Karpathy, LSTM units used is 512 and as mentioned in this paper (https://arxiv.org/pdf/1805.09137.pdf) embedding size and vocab_size used is 512 as well.
      
      512 vocab_size seems less or maybe having less vocabulary gives better results sometimes?
      
      I am confused.
      
      Reply
      - Jason Brownlee June 24, 2019 at 6:23 am #
        
        The embedding size is not the vocab size, more details on embeddings here:
        https://machinelearningmastery.com/what-are-word-embeddings/
    - Jason Brownlee June 24, 2019 at 6:22 am #
      
      More data may help if you don’t want to tune the model.
      
      Reply
      - Saurabh Shinde July 5, 2019 at 10:05 pm #
        
        Hi Jason,
        
        Sorry for the late reply. I tried Ajay’s code and added support for xception,mobilenet and resent50 models.
        
        Now it is giving proper captions.
        
        Also, I wanted to know if we can give a different max_length value other than what we found in the dataset?
        
        For example, in Flickr8k, max_length of a caption is 34. What if I want to set it to some lower number. How can I do it?
        
        As mentioned in this (https://arxiv.org/pdf/1805.09137.pdf):
        
        “Following are a few key hyperparameters that we retained across various models. These could be helpful for attempting to reproduce our results.”
        
        RNN Size: 512
        Batch size: 16
        Learning Rate: 4e-4
        Learning Rate Decay: 50% every 50000 iterations
        RNN Sequence max_length: 16
        Dropout in RNN: 50%
        Gradient clip: 0.1%
      - Jason Brownlee July 6, 2019 at 8:38 am #
        
        Yes, you can change the length, I would encourage you to explore changing many aspects of the model configuration.
      - Saurabh Shinde July 6, 2019 at 3:56 pm #
        
        For example, in Flickr8k, max_length found is 34. But if we set it to 16, wouldn’t it throw an error saying “expected input_shape is (32, ) but got (16, )” or something like that.
        
        How to solve this problem?
        
        In Decoder Model,
        Input_layer_1 = Input_Shape((max_length, ))
      - Jason Brownlee July 7, 2019 at 7:48 am #
        
        You must change the expectation of the model.
      - Saurabh Shinde July 7, 2019 at 3:47 pm #
        
        So, if i change max_len to 16, what should be done in order to handle the captions whose length is greater than 16?
        
        Should i clip each caption to length 16? But, won’t it result in loss of information?
      - Jason Brownlee July 8, 2019 at 8:39 am #
        
        You can truncate them:
        https://machinelearningmastery.com/data-preparation-variable-length-input-sequences-sequence-prediction/
      - Saurabh Shinde July 10, 2019 at 2:45 pm #
        
        Thanks Jason!
        
        I’ll try using this and test whether the results are improving or not.
Ketan Dhakate June 23, 2019 at 3:45 pm #

i am getting same caption for different test images.

Reply
- Jason Brownlee June 24, 2019 at 6:21 am #
  
  The model may have overfit, perhaps try fitting it again or choosing a different set of weights/final model?
  
  Reply
Ankit Rathi June 29, 2019 at 5:17 pm #

Thanks you Sir for this great tutorial. I implemented the Image captioning in Google Colab. Can you please upload a tutorial for attention mechanism used in Image captioning with code.

Reply
- Jason Brownlee June 30, 2019 at 9:34 am #
  
  Thanks for the suggestion, perhaps in the future.
  
  Reply
Nivethan nivan July 4, 2019 at 7:41 pm #

Dear Jason,

It’s a very good tutorial. Loved it, And made it working. I have two qustions.

1. How can we make the network Predict the exact captions which we used for training?

Example:

Original: a dog is playing with the ball
predicted should be: a dog is playing with the ball
(not some random/ something close to the original)

2. How to stop the longer prediction?

Example: (Currently this is what happening)

Original: a dog is playing with the ball
Predicted: a dog is playing with the ball ball ball ball ball ball ball ball ball ball ball ball

A clear explanation would be helpful.

Thanks a lot

Reply
- Jason Brownlee July 5, 2019 at 7:56 am #
  
  We cannot predict the exact captions used in training.
  
  Your error suggests that model may need to be re-fit.
  
  Reply
  - Nivethan nivan July 5, 2019 at 10:15 pm #
    
    Can you give any suggestions on how to do it? (Predicting exact captions)
    
    I tried several times retraining the model but I had no luck, Do you think longer training would help?
    
    Reply
    - Jason Brownlee July 6, 2019 at 8:38 am #
      
      All models have error, none will predict a training dataset perfectly.
      
      Perhaps this will help:
      https://machinelearningmastery.com/faq/single-faq/why-cant-i-get-100-accuracy-or-zero-error-with-my-model
      
      Reply
Ajay July 13, 2019 at 9:25 pm #

Since LSTM cells produce sequential information, and the sequence in the above model is “words in the caption”.

You have used LSTM layer before the features from CNN are used to generate the caption.

We know that it is the decoder network which is generating the words of the caption but I’m unable to understand the role of LSTM.

Which of these: LSTM or the decoder ?
is generating the sequence.

If decoder is generating the sequence then what is the role of the LSTM layer?

Reply
- Jason Brownlee July 14, 2019 at 8:09 am #
  
  Good question.
  
  This might help to better understand the architecture:
  https://machinelearningmastery.com/caption-generation-inject-merge-architectures-encoder-decoder-model/
  
  Reply
Ajay July 13, 2019 at 11:51 pm #

Hi Jason !

I read about your merge model that you described in some other post and you used it here.

But, it is not-understandable that you used LSTM (not making use of the image features) and the CNN separately.

I mean, your lstm layer is just using word sequences( vectors) to generate the next word without using the image features.

Image features are only added when you combine the LSTM and CNN vectors. Before combining, LSTM is not aware of the image because it is not using the image features. How is the LSTM generating the right captions when you are not even using the image features during LSTM network?

Reply
- Jason Brownlee July 14, 2019 at 8:13 am #
  
  “why” questions might not be tractable at this time. We don’t have good theories on why many of these models work so well. But they do – so we use them.
  
  Same with drugs issued by doctors. No idea why they work, but they do – so we use them.
  
  Reply
Ajay July 14, 2019 at 12:04 am #

Here is what I’ve understood from your model:

1. The network is assuming that the caption for every image will use at max 34 words because the longest caption is of 34 words.

2. The embedding layer is taking a word index and outputting a 256 long vector.

3. After the dropout, the LSTM layer is used “consisting of 34 LSTM cells”.

4. Each LSTM cell is producing a new word which is a 256 long vector.

Correct me on the above points if I’m wrong.

1. I’m not able to understand how the output dimension of LSTM is (None,256).
2. How is the LSTM using the image information because CNN output is not being fed to the LSTM layer?
3. What is the output of dense_3 layer.
4. From which layer are we extracting the word sequences?

Reply
- Ajay July 14, 2019 at 12:20 am #
  
  Got the answer of the above 3rd question that the output of dense_3 is the next word.
  
  1. But, does output of LSTM hold some significance or does it just compute some 256 long vector which will later be used(combined) with a CNN?
  
  2. Kindly explain how the 34 LSTM cells are taking input. I reckon you are giving the input sequence of words to these 34 LSTMs(if the length of the input sequence is less than 34 then these are padded). Each word is a 256-dimensional vector.
  
  Correct me if I’m wrong in the 2nd part.
  
  Reply
- Jason Brownlee July 14, 2019 at 8:15 am #
  
  Seems reasonable.
  
  Perhaps study this post and the associated papers:
  https://machinelearningmastery.com/caption-generation-inject-merge-architectures-encoder-decoder-model/
  
  Reply
  - Ajay July 16, 2019 at 6:44 pm #
    
    I’ve read about those architectures but still had some doubts and I’ve asked them on the post above.
    
    Kindly answer them.
    
    Reply
Nasif Mahbub July 22, 2019 at 7:06 am #

This is a wonderful tutorial. Thank you Jason Brownlee. Would love to see a tutorial on attention mechanism applied on an image caption generator (preferably this one).

Reply
- Jason Brownlee July 22, 2019 at 8:28 am #
  
  Great suggestion, thanks.
  
  Reply
Nasif Mahbub July 23, 2019 at 5:23 am #

I’ve encountered a problem regarding feature extraction. In both the training phase and test phase when using the VGG16 feature extractor model, it is necessary to download VGG16 weight model which is 500+ MB. So, I followed the link they used to download the weight model and downloaded it.

Then instead of using “VGG16()” I used:
load_model(“/folder/vgg16_weights_tf_dim_ordering_tf_kernels.h5”).

But then it shows the error:
“raise ValueError(‘Cannot create group in read only mode.’)
ValueError: Cannot create group in read only mode.”

But for the same image it works perfectly when I use “VGG16()” and they download it from the link below:

https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5

Downloading the model each time is not actually practical. Is there any workaround?

Reply
- Jason Brownlee July 23, 2019 at 8:16 am #
  
  I believe you can specify the path to the model via the API, meaning you can download it once and reuse it each time.
  
  E.g. the “weights” argument when loading the model.
  https://keras.io/applications/
  
  Reply
Ankit Rathi July 25, 2019 at 1:59 am #

Hi Jason,

I tried to apply the above code and generated caption with BLEU-1 score of 0.52. But when I observed the predicted captions, same sentence were repeating for multiple Images. For example, if a dog appears in the Image then the model generate a caption : “dog is running through the grass” for multiple image. How to make a model more accurate so that its should not repeat the same caption. ?

Thanks,

Ankit

Reply
- Jason Brownlee July 25, 2019 at 7:55 am #
  
  Perhaps try fitting the model again and selecting a final model with lower loss on the holdout dataset?
  
  Reply
  - Ankit Rathi July 26, 2019 at 1:38 am #
    
    Ok, I will try. Thank you for your quick response.
    
    Reply
    - Jason Brownlee July 26, 2019 at 8:28 am #
      
      You’re welcome.
      
      Reply
Aman August 7, 2019 at 1:41 am #

Hello Jason. I have a problem..I triyed many time but i can not fixed problem.plz tell me the solution….

Total params: 134,260,544
Trainable params: 134,260,544
Non-trainable params: 0
_________________________________________________________________
None
—————————————————————————
FileNotFoundError Traceback (most recent call last)
in
39 # extract features from all images
40 directory = ‘Flickr8k_Dataset’
—> 41 features = extract_features(directory)
42 print(‘Extracted Features: %d’ % len(features))
43 # save to file

in extract_features(directory)
18 # extract features from each photo
19 features = dict()
—> 20 for name in listdir(directory):
21 # load an image from file
22 filename = directory + ‘\\Users\Admin’ + name

FileNotFoundError: [WinError 3] The system cannot find the path specified: ‘Flickr8k_Dataset’

Reply
- Jason Brownlee August 7, 2019 at 8:02 am #
  
  It suggests the data file is not in the same location as the script. Ensure the python code and data folder are in the same directory and run from the command line:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-run-a-script-from-the-command-line
  
  Reply
- Shyam Yadav August 18, 2019 at 5:23 am #
  
  What could be the real worle applications to this project?
  
  Reply
  - Jason Brownlee August 18, 2019 at 6:51 am #
    
    Captaining photos is a real world application.
    
    Reply
Aman ullah August 22, 2019 at 1:12 am #

whats wrong plz guide.

filename = ‘model-ep002-loss3.245-val_loss3.612.h5’
model = load_model(filename)
# evaluate model
evaluate_model(model, test_descriptions, test_features, tokenizer, max_length)

OSError Traceback (most recent call last)
in
1 filename = ‘model-ep002-loss3.245-val_loss3.612.h5’
—-> 2 model = load_model(filename)
3 # evaluate model
4 evaluate_model(model, test_descriptions, test_features, tokenizer, max_length)

D:\anconda3\lib\site-packages\keras\engine\saving.py in load_model(filepath, custom_objects, compile)
415 model = None
416 opened_new_file = not isinstance(filepath, h5py.Group)
–> 417 f = h5dict(filepath, ‘r’)
418 try:
419 model = _deserialize_model(f, custom_objects, compile)

D:\anconda3\lib\site-packages\keras\utils\io_utils.py in __init__(self, path, mode)
184 self._is_file = False
185 elif isinstance(path, str):
–> 186 self.data = h5py.File(path, mode=mode)
187 self._is_file = True
188 elif isinstance(path, dict):

D:\anconda3\lib\site-packages\h5py\_hl\files.py in __init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, **kwds)
392 fid = make_fid(name, mode, userblock_size,
393 fapl, fcpl=make_fcpl(track_order=track_order),
–> 394 swmr=swmr)
395
396 if swmr_support:

D:\anconda3\lib\site-packages\h5py\_hl\files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
168 if swmr and swmr_support:
169 flags |= h5f.ACC_SWMR_READ
–> 170 fid = h5f.open(name, flags, fapl=fapl)
171 elif mode == ‘r+’:
172 fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

h5py\h5f.pyx in h5py.h5f.open()

OSError: Unable to open file (unable to open file: name = ‘model-ep002-loss3.245-val_loss3.612.h5’, errno = 2, error message = ‘No such file or directory’, flags = 0, o_flags = 0)

Reply
- Jason Brownlee August 22, 2019 at 6:30 am #
  
  I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Aman August 22, 2019 at 1:44 am #

filename = ‘model-ep002-loss4.6485-val_loss3.1147’
model = load_model(filename)
this code generate a error

OSError: Unable to open file (unable to open file: name = ‘model-ep002-loss4.6485-val_loss3.1147’, errno = 2, error message = ‘No such file or directory’, flags = 0, o_flags = 0)

whats wrong????
plz tell us..

Reply
- Jason Brownlee August 22, 2019 at 6:30 am #
  
  Ensure the file exists in the same directory as your .py file.
  
  Reply
Aman August 22, 2019 at 1:45 am #

thanks jason great tutorial.

Reply
- Jason Brownlee August 22, 2019 at 6:30 am #
  
  You’re very welcome.
  
  Reply
Aman August 22, 2019 at 6:31 pm #

whats wrong

File “D:\anconda3\lib\site-packages\IPython\core\interactiveshell.py”, line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)

File “”, line 3, in
from keras.preprocessing.text import Tokenizer

File “D:\anconda3\lib\site-packages\keras\__init__.py”, line 3, in
from . import utils

File “D:\anconda3\lib\site-packages\keras\utils\__init__.py”, line 5, in
from . import io_utils

File “D:\anconda3\lib\site-packages\keras\utils\io_utils.py”, line 13, in
import h5py

File “D:\anconda3\lib\site-packages\h5py\__init__.py”, line 49, in
from ._hl.files import (

File “D:\anconda3\lib\site-packages\h5py\_hl\files.py”, line 13
swmr_support = True
^
IndentationError: unexpected indent

Reply
- Jason Brownlee August 23, 2019 at 6:23 am #
  
  Looks like you have not preserved the indenting when you copied the code.
  
  Reply
Prashant Verma September 4, 2019 at 5:56 pm #

Hi Jason,
I am getting error while executing tokenizer

Traceback (most recent call last):
File “tokenizer.py”, line 65, in
tokenizer = load(open(‘tokenizer.pkl’, ‘rb’))
_pickle.UnpicklingError: invalid load key, ‘f’.

Reply
- Jason Brownlee September 5, 2019 at 6:49 am #
  
  Sorry to hear that, I have some suggestions here that might help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Aman September 12, 2019 at 12:17 am #

Jason Brownlee great tutorial…
After 4 Epoch val did not chnage..

Epoch 00004: val_loss improved from 3.85954 to 3.84653, saving model to model-ep004-loss3.609-val_loss3.847.h5
Epoch 5/20
– 8447s – loss: 3.5627 – val_loss: 3.8547

Epoch 00005: val_loss did not improve from 3.84653
Epoch 6/20
– 8873s – loss: 3.5319 – val_loss: 3.8623

Epoch 00006: val_loss did not improve from 3.84653
Epoch 7/20
– 8550s – loss: 3.5109 – val_loss: 3.8818

Epoch 00007: val_loss did not improve from 3.84653
Epoch 8/20
– 11736s – loss: 3.5004 – val_loss: 3.8942

Epoch 00008: val_loss did not improve from 3.84653
Epoch 9/20
– 8654s – loss: 3.4960 – val_loss: 3.9139

Epoch 00009: val_loss did not improve from 3.84653
Epoch 10/20

Reply
- Aman September 12, 2019 at 12:37 am #
  
  jason after 4 epoch value did not change.if i can stop this program right now….because value did not change.10 epoch enough to train this model.
  
  Reply
  - Jason Brownlee September 12, 2019 at 5:19 am #
    
    I believe so.
    
    Reply
- Jason Brownlee September 12, 2019 at 5:17 am #
  
  Perhaps choose the model with the lowest validation loss.
  
  Reply
Aman September 16, 2019 at 1:35 am #

Dataset: 6000
Descriptions: train=6000
Vocabulary Size: 7579
Description Length: 34
Dataset: 1000
Descriptions: test=1000
Photos: test=1000
BLEU-1: 0.535757
BLEU-2: 0.282579
BLEU-3: 0.192619
BLEU-4: 0.089498

But when I observed the predicted captions, same sentence were repeating for multiple Images. For example, if a dog appears in the Image then the model generate a caption : “dog is running through the grass” for multiple image.

Reply
- Jason Brownlee September 16, 2019 at 6:38 am #
  
  Perhaps try fitting the model again, or use a different checkpoint?
  
  Reply
  - Aman September 16, 2019 at 8:18 pm #
    
    Thanks Janson Brownlee I will try..
    Great tutorial.thanks for quick responce…
    
    Reply
    - Jason Brownlee September 17, 2019 at 6:27 am #
      
      Thanks.
      
      Reply
    - Videl June 13, 2020 at 5:51 am #
      
      Hi Aman. has your problem solved? If yes, how so?
      I am facing the same issue. The same caption is being generated for the pictures.
      Regards.
      
      Reply
Narayan Iyer September 20, 2019 at 11:26 am #

Thanks for the great tutorial! I have a doubt regarding the LSTM layer. Do we have as many memory units as the number of dimensions in the embedding matrix, and the dimension of input size to each unit is the size of the vocabulary? And is the output of each of the 256 unit 1×1, such that all outputs taken together is a 256 dimensional vector? If so, why do we need an LSTM? Can we not simply use 256 FC layers with each layer taking one dimension of all caption words generated?

Reply
- Jason Brownlee September 20, 2019 at 1:39 pm #
  
  No, the number of nodes in a layer is unrelated to the number of inputs to the layer.
  
  LSTM is needed to process the sequence of inputs (words generated so far).
  
  Reply
  - Narayan Iyer September 22, 2019 at 7:24 am #
    
    So the embeddings for the prefix generated so far is fed all at once only to the first LSTM unit? Which means the input size for the first LSTM is |vocab-size| x |Dimensionality of embedding space|.
    
    Reply
    - Jason Brownlee September 22, 2019 at 9:38 am #
      
      The LSTM receives one word at a time, yes a sequence of n words where each word is a m sized vector.
      
      Reply
      - Narayan Iyer October 12, 2019 at 2:57 pm #
        
        And the caption prefix is used as input only to the first LSTM cell?
      - Jason Brownlee October 13, 2019 at 8:25 am #
        
        Yes.
Vivek October 4, 2019 at 10:59 pm #

Thanks for the great tutorial!

Reply
- Jason Brownlee October 6, 2019 at 8:11 am #
  
  You’re welcome.
  
  Reply
Phyu Phyu Khaing October 22, 2019 at 3:30 pm #

Thanks for your great tutorial.

Now, I am learning about image captioning.
When I train the model, I got less accuracy for more epochs.
That is right?
If you have any attention example, let me know.

Best Regards

Reply
- Jason Brownlee October 23, 2019 at 6:31 am #
  
  Yes, you may need to use early stopping.
  
  Reply
  - Phyu Phyu Khaing October 23, 2019 at 2:00 pm #
    
    Thanks a lot, Sir,
    
    Reply
  - Phyu Phyu Khaing October 23, 2019 at 2:05 pm #
    
    When I train the model, I got four-loss files. When I test with that loss files, I got the following results.
    
    For model-ep001-loss4.544-val_loss4.103.h5 file, the results are:
    
    BLEU-1: 0.523776
    BLEU-2: 0.283197
    BLEU-3: 0.197006
    BLEU-4: 0.094024
    
    For model-ep002-loss3.907-val_loss3.933.h5 file, the results are:
    
    BLEU-1: 0.457893
    BLEU-2: 0.246748
    BLEU-3: 0.172982
    BLEU-4: 0.080477
    
    For model-ep003-loss3.721-val_loss3.874.h5 file, the results are:
    
    BLEU-1: 0.550797
    BLEU-2: 0.302675
    BLEU-3: 0.206961
    BLEU-4: 0.094808
    
    For model-ep005-loss3.577-val_loss3.874.h5 file, the results are:
    
    BLEU-1: 0.475467
    BLEU-2: 0.245151
    BLEU-3: 0.164806
    BLEU-4: 0.072125
    
    I don’t get your results.
    My results are less.
    What can be my faults?
    
    I also have some doubts.
    My previous understanding is that the less the loss, the better the performance.
    But I don’t get like that.
    Please explain to me.
    
    Best Regards,
    
    Reply
    - miaoyi October 23, 2019 at 2:39 pm #
      
      I got the same model as above ‘model-ep005-loss3.577-val_loss3.874.h5 file’
      My results are less too.
      
      Reply
    - Jason Brownlee October 24, 2019 at 5:33 am #
      
      Perhaps try fitting the model a few times?
      
      Loss is a good guide, but in this case we are selecting a model based on BLEU on the hold out dataset.
      
      Reply
      - miaoyi October 24, 2019 at 1:42 pm #
        
        thanks a lot
Kanaan October 29, 2019 at 6:06 am #

Dear Dr.Jason,
I have an error , I hope you can help me to fix it :

PermissionError : [Errno 13] Permission Denied: Flickr8k_Dataset/Flicker8k_Dataset’

Reply
- Jason Brownlee October 29, 2019 at 1:45 pm #
  
  Yes, I provide new links for the dataset in the tutorial. See the section titled “UPDATE (Feb/2019)”
  
  Reply
  - Kanaan October 29, 2019 at 7:58 pm #
    
    I already used the alternate download link, both Flickr8k_Dataset and Flickr8k_text already downloaded but still showing the same error :
    PermissionError : [Errno 13] Permission Denied: Flickr8k_Dataset/Flicker8k_Dataset’
    
    and features.pkl file can not be created !
    
    Reply
    - Jason Brownlee October 30, 2019 at 6:03 am #
      
      That is a strange error, I though it was your web browser.
      
      What is reporting that error exactly? Python?
      If Python could not find the file, it would say “not found”, not “permission denied”.
      
      Reply
Kanaan October 30, 2019 at 4:24 am #

Dear Dr.Jason,

I have fixed my error, I hope you fix it the given code :
in the line no. 40 directory = ‘Flickr8k_Dataset’ should be modified to directory = ‘Flickr8k_Dataset/Flicker8k_Dataset’

Reply
- Jason Brownlee October 30, 2019 at 6:06 am #
  
  Thanks, but that sounds specific to the way you have unzipped the dataset and placed it in your code directory.
  
  Reply
Kanaan October 30, 2019 at 7:52 am #

yes of course, sometime busy minds do such a mistake !

Reply
Kanaan October 30, 2019 at 8:13 am #

Dear Jason,
once agian I faced this error:

====================================================================
Total params: 5,527,963
Trainable params: 5,527,963
Non-trainable params: 0
_________________________________________________________________________
None
Traceback (most recent call last):
File “test4.py”, line 181, in
model = define_model(vocab_size, max_length)
File “test4.py”, line 138, in define_model
plot_model(model, to_file=’model.png’, show_shapes=True)
File “C:\Users\Excellence\Anaconda3\lib\site-packages\keras\utils\vis_utils.py”, line 240, in plot_model
expand_nested, dpi)
File “C:\Users\Excellence\Anaconda3\lib\site-packages\keras\utils\vis_utils.py”, line 79, in model_to_dot
_check_pydot()
File “C:\Users\Excellence\Anaconda3\lib\site-packages\keras\utils\vis_utils.py”, line 22, in _check_pydot
‘Failed to import pydot. ‘
ImportError: Failed to import pydot. Please install pydot. For example with pip install pydot.

I can install pydot but where I have to put it?

Reply
- Jason Brownlee October 30, 2019 at 1:56 pm #
  
  You can comment out the call to “plot_model()”
  
  Reply
Kanaan October 30, 2019 at 9:31 pm #

Dear Jason,
for time saving, instead of ‘epochs = 20’ if i make it ‘epochs = 5’ what will be the problem?

Reply
- Jason Brownlee October 31, 2019 at 5:28 am #
  
  No problem. Perhaps test it?
  
  Reply
  - Kanaan October 31, 2019 at 5:33 pm #
    
    I don’t know why it is breaking in epoch 4 and it is not continuing :
    
    Total params: 5,527,963
    Trainable params: 5,527,963
    Non-trainable params: 0
    __________________________________________________________________________________________________
    C:\Users\Excellence\Anaconda3\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
    Epoch 1/1
    6000/6000 [==============================] – 1953s 325ms/step – loss: 4.6870
    Epoch 1/1
    6000/6000 [==============================] – 1902s 317ms/step – loss: 3.8931
    Epoch 1/1
    6000/6000 [==============================] – 2424s 404ms/step – loss: 3.6361
    Epoch 1/1
    4834/6000 [=======================>……] – ETA: 10:46 – loss: 3.4892
    (base) C:\Users\Excellence>
    
    Reply
    - Jason Brownlee November 1, 2019 at 5:26 am #
      
      Perhaps you are running out of memory?
      
      Try running on EC2 with more RAM?
      Try fitting on less data?
      
      Reply
ishritam October 31, 2019 at 5:12 am #

Hi Jason,
Yet another great explanation. I am about to start this project. I would love to see you implement attention mechanism for this.
In case if I have missed your blog on applying attention mechanism in Image Captioning, please share me at E-mail: ishritam.ml@gmail.com

Thank you:)

Reply
- Jason Brownlee October 31, 2019 at 5:37 am #
  
  Thanks for the suggestion.
  
  Reply
Hany November 1, 2019 at 2:27 am #

Hi Jason,

Thanks for another great tutorial.
I tried your code and get one error, which is:

ValueError: Error when checking input: expected input_10 to have shape (4096,) but got array with shape (1000,) #and sometimes I get input_1

So I changed inputs1 shape in define_model() to 1000 instead of 4096, then the code worked fine.

Here is the losses values I reached model-ep004-loss3.402-val_loss3.741
and here are my Blue scores:
BLEU-1: 0.528189
BLEU-2: 0.286299
BLEU-3: 0.198691
BLEU-4: 0.093021

which is obviously lower than yours.

The problem is now the captioning is very poor, here are some examples:

– “young girl in blue and blue and blue shirt is jumping into pool”, there is neither a girl nor a pool in the image. By the way, this image is from the dataset.
– “man in red shirt is jumping into the water”, this was an image for just a beach with no people.
– “dog is running through the grass”, this was the dog example photo.

Would you please advise on how to increase the accuracy of the generated captions? Thank you

P.S.: I tried refitting the model but got the same loss value for the same number of epochs.

Reply
- Jason Brownlee November 1, 2019 at 5:41 am #
  
  Perhaps you are using a different version or you changed the code example?
  
  I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Arpit Jain November 14, 2019 at 7:00 am #

Hi Jason,

Thanks for the blog. Its wonderful.

I was following it and got stuck at the fit_generator() part.

——————————————————————————————-

# train the model, run epochs manually and save after each epoch
epochs = 20
steps = len(train_descriptions)
for i in range(epochs):
# create the data generator
generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
# fit for one epoch
model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
# save model
model.save(‘model_’ + str(i) + ‘.h5’)

———————————————————————————————————

This part of the code throws me the following error.

could not broadcast input array from shape (48,4096) into shape (48)

The shape (48,4096) here is of image_input.

I do not understand why is it behaving like this.

Please help me.

Reply
- Jason Brownlee November 14, 2019 at 8:08 am #
  
  Sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - Arpit Jain November 16, 2019 at 6:05 pm #
    
    Thanks for the response.
    
    I tried the code on Google Colab with the usual approach and not the progressive loading, and it runs perfectly.
    
    Reply
    - Jason Brownlee November 17, 2019 at 7:13 am #
      
      Well done!
      
      Reply
    - Jay Trivedi November 21, 2020 at 6:48 am #
      
      I tried with the regular approach but It doesn’t work due to full ram consumption. Runtime automatically restarts after RAM is full
      
      Reply
      - Jason Brownlee November 21, 2020 at 7:38 am #
        
        Try the progressive loading example above.
- abdo helmy February 7, 2020 at 8:12 pm #
  
  i know it’s a bit late ,but i had the same problem and was using tensorflow.keras
  when i used the keras library it worked fine
  so i’m pretty sure the problem is in one of the transformation works differently on your keras version than the one used it the tutorial
  
  Reply
  - Jason Brownlee February 8, 2020 at 7:07 am #
    
    Great tip!
    
    Yes, use standalone Keras, not tf.keras.
    
    Reply
- Kevin Jivani June 27, 2020 at 4:23 pm #
  
  First of all Jason, this blog has the best demonstration of image caption generation I have got on the internet. And after reading your other posts this blog has been the first stop for anything I want to learn in deep learning or machine learning. I want to heartily thank you for such amazing work which is of great help for undergraduate students like me.
  
  I know it is very late but it is for others who may face this problem in future.
  
  If you want to use tf.keras with progressive loading here is the trick that worked for me:
  
  in function data_generator(): change
  
  yield [[in_img, in_seq], out_word]
  
  to
  
  yield [in_img, in_seq], out_word
  
  Reply
  - Jason Brownlee June 28, 2020 at 5:42 am #
    
    Thanks for your kind words and for the sharing this tip Kevin!
    
    Reply
Jay November 23, 2019 at 6:39 am #

I was trying get my hands dirty on the code. My extracted features are in the form [1, 2560, 8] to use this in the first encoder what should I do?
When I put inputs1 = Input(shape=(2560, 8, )), I get an error
ValueError: Error when checking target: expected dense_28 to have 3 dimensions, but got array with shape (47, 7579)

Reply
- Jason Brownlee November 23, 2019 at 6:58 am #
  
  Sorry to hear that you’re having trouble.
  
  I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Natalie Jones November 26, 2019 at 10:31 am #

Great tutorial. I am having trouble when loading the photo features.

features = {k: all_features[k] for k in dataset}

gives me this error:
KeyError: ‘2657663775_bc98bf67ac’

Do you have any suggestions as to why this is? Thanks.

Reply
- Jason Brownlee November 26, 2019 at 1:30 pm #
  
  Sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
- Jamiul Haque February 13, 2020 at 3:27 am #
  
  Natalie Jones have you been able to find the error?
  
  Reply
Maya December 9, 2019 at 8:26 am #

Thank you so much for this amazing tutorial.. I’m facing a problem when I’m trying to run this line:

model = define_model(vocab_size, max_length)

it gives me the following error:

TypeError: Error converting shape to a TensorShape: int() argument must be a string, a bytes-like object or a number, not ‘function’.

Reply
- Jason Brownlee December 9, 2019 at 1:42 pm #
  
  You’re welcome!
  
  Sorry to hear that, perhaps some of the suggestions here will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Harshit Parikh December 13, 2019 at 8:24 pm #

Hello Sir, I am getting the accuracy of around 35% even after training the model for 20 epochs and there isn’t any sudden change after training another 5 epochs. Do you recommend any changes or maybe what might be the error? The BLEU scores you mentioned below are coming almost the same what you got.

Please let me know if any changes are recommended from your side. Thank you.

Reply
- Jason Brownlee December 14, 2019 at 6:16 am #
  
  Accuracy is a bad measure on this dataset. Focus on the BLEU scores.
  
  Reply
Shyam Yadav December 14, 2019 at 5:07 am #

Hello Sir,

Thank you for posting this code and explaining it such easily. I had one doubt that I had trained this model for 20 epochs, and I just got an accuracy of around 35%. After that, I trained the model for another 5 epochs, but there weren’t any considerable changes in the accuracy of the model.

What changes do you recommend in the model?

Thank you

Reply
- Jason Brownlee December 14, 2019 at 6:26 am #
  
  Do not use accuracy for photo classification, instead use BELU.
  
  Reply
  - Shyam Yadav December 15, 2019 at 4:29 am #
    
    How would using BLEU instead of accuracy differ and what are its advantages? What is a good BLEU score?
    
    Reply
    - Jason Brownlee December 15, 2019 at 6:10 am #
      
      Accuracy is invalid for evaluating image captioning because you are predicting multiple words, not class labels.
      
      More on BLEU here:
      https://machinelearningmastery.com/calculate-bleu-score-for-text-python/
      
      Reply
      - Shyam Yadav December 15, 2019 at 6:15 am #
        
        What about other evaluation metrics such as CIDER or METEOR?
      - Jason Brownlee December 16, 2019 at 6:02 am #
        
        Sorry, I don’t have tutorials on those metrics.
Shyam Yadav December 16, 2019 at 4:37 pm #

And if we wantto improve thr bleu score what changes do you recommend?

Reply
- Jason Brownlee December 17, 2019 at 6:29 am #
  
  Some of the suggestions here will help:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
  - Shyam Yadav December 17, 2019 at 6:13 pm #
    
    There isn’t anything in here to help me improve the BLEU-4 score in this code.
    
    Reply
    - Jason Brownlee December 18, 2019 at 6:01 am #
      
      I disagree, there are tutorials on diagnosing learning dynamics with learning curves, then tutorials on fixing each issue, such as regularization for overfitting and ensembles for better prediction.
      
      Reply
      - Shyam Yadav December 19, 2019 at 5:00 am #
        
        But what about just improving the BLEU score explicitly?
      - Jason Brownlee December 19, 2019 at 6:34 am #
        
        BLEU is improved by improving the model.
Rohit Halder December 18, 2019 at 5:46 pm #

Sir, I was following your steps, the test results seemed to be horrible. Predominantly, it is generating a particular type of sentence, whenever a human is detected. Or a different sentence whenever a dog is detected. Which model should be used instead of VGG16 to improve the performance?
And I played around with the structure of the model, and no of epochs, still there were no noticeable changes. What are your suggestions?
Thank you.

Reply
- Jason Brownlee December 19, 2019 at 6:23 am #
  
  Perhaps try training the model again?
  Perhaps use a different final model?
  Perhaps try adding regularization to reduce overfitting?
  
  Reply
Shyam Yadav December 19, 2019 at 8:02 pm #

What do you suggest doing to improve the model?

Reply
- Jason Brownlee December 20, 2019 at 6:43 am #
  
  Diagnose the learning curve and try the suggestions here:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
Phelan December 21, 2019 at 2:23 pm #

Hi Jason,

Thank for your great explanation.
I really love your articles.

I have a question:
total Vocabulary Size: 8,763 in token file
Why we not use this vocabulary for training instead of Vocabulary on train dataset (Vocabulary Size: 7,579) ?

IMO, It will cover more words which not appear on training dataset

Reply
- Jason Brownlee December 22, 2019 at 6:07 am #
  
  Thanks!
  
  It is good practice to use the vocab in the training set only and pretend we don’t have access to the test set until after the model is prepared.
  
  The reason for this is to develop a robust and independent estimate of the model performance when making predictions on new data – unseen during training.
  
  Reply
  - Phelan December 23, 2019 at 12:39 am #
    
    I see your point, thank you
    
    Reply
Shyam Yadav January 4, 2020 at 2:28 am #

Tune Model. The configuration of the model was not tuned on the problem. Explore alternate configurations and see if you can achieve better performance.

How do we tune the model?

Reply
- Jason Brownlee January 4, 2020 at 8:37 am #
  
  Change something, like the learning rate, run an experiment and summarize the performance. If it is better keep it. Repeat with other configration.
  
  Reply
Mehul Kohli January 5, 2020 at 4:08 am #

Hi Jason!

Nice article, and very well explained. I just had a doubt regarding the keras tokenizer used. In the code, the tokenizer is fit on the train_descriptions, and when the create_sequences is called for the test data, the same tokenizer which was fit on the train data is used. I had a doubt that how does the tokenizer work with the data which it was not fitted on? I’m kinda new to this, so I’m not really sure about that. Could you please briefly explain this?

Thanks

Reply
- Jason Brownlee January 5, 2020 at 7:09 am #
  
  Thanks.
  
  The tokenizer is fit on the training data and is then used to prepare data as input to the model. If it is used on test data that has words not seen during training, those words are marked as 0 (unknown).
  
  Reply
  - Mehul Kohli January 5, 2020 at 4:28 pm #
    
    Oh I get it now! Thanks.
    
    And one more query: why weren’t the training sequences normalized here? Like what if we just normalize by dividing it by the vocab size? I’m training the model at the moment without normalizing them. Just curious what would happen if we used normalized sequence data. I’m guessing it should atleast result in faster convergence of the model, if not significantly improve results.
    
    Reply
    - Jason Brownlee January 6, 2020 at 7:09 am #
      
      If by normalize you mean scale to the range 0-1, then this is not needed. Words are encoded as integer values, then mapped to word vectors.
      
      Reply
      - Mehul Kohli January 6, 2020 at 5:35 pm #
        
        Okay, thank you so much.
Samriddhi January 5, 2020 at 12:45 pm #

Hi Jason.
I cannot find the features.pkl file. Getting the error No such file or directory: ‘features.pkl’. Please guide me.

Reply
- Jason Brownlee January 6, 2020 at 7:06 am #
  
  You must create and save that file as an earlier step in the tutorial.
  
  Reply

Samriddhi January 5, 2020 at 1:20 pm #

Hi Jason
What does this part of the code do? Is it one hot encoding being done here-

for desc in desc_list:
			# encode the sequence
			seq = tokenizer.texts_to_sequences([desc])[0]
			# split one sequence into multiple X,y pairs
			for i in range(1, len(seq)):
				# split into input and output pair
				in_seq, out_seq = seq[:i], seq[i]
				# pad input sequence
				in_seq = pad_sequences([in_seq], maxlen=max_length)[0]
				# encode output sequence
				out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]

for desc in desc_list:

# encode the sequence

seq = tokenizer.texts_to_sequences([desc])[0]

# split one sequence into multiple X,y pairs

for i in range(1, len(seq)):

# split into input and output pair

in_seq, out_seq = seq[:i], seq[i]

# pad input sequence

in_seq = pad_sequences([in_seq], maxlen=max_length)[0]

# encode output sequence

out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]

Jason Brownlee January 6, 2020 at 7:08 am #

Encodes words as integers, then defines the input and output sequences for a given sample.

Reply

Sayak Paul January 23, 2020 at 6:03 pm #

Has anyone got the same captions no matter what the supplied image is? My BLEU score is not that bad:

BLEU-1: 0.555407
BLEU-2: 0.284509
BLEU-3: 0.181447
BLEU-4: 0.076735

For feature extraction I used the following method:

# extract features from each photo in the directory
def extract_features(filename):
	feature_extractor = VGG16(weights="imagenet", include_top=False,
		input_tensor=Input(shape=(224, 224, 3)))
	headModel = feature_extractor.output
	headModel = GlobalAveragePooling2D()(headModel)
	headModel = Dense(4096, activation="relu")(headModel)
	headModel = Dense(4096, activation="relu")(headModel)
	# place the head FC model on top of the base model (this will become
	# the actual model we will train)
	model = Model(inputs=feature_extractor.input, outputs=headModel)
	# extract features from each photo

	image = load_img(filename, target_size=(224, 224))
	# convert the image pixels to a numpy array
	image = img_to_array(image)
	# reshape data for the model
	image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
	# prepare the image for the VGG model
	image = preprocess_input(image)
	# get features
	feature = model.predict(image, verbose=0)
	# store feature
	return feature

# extract features from each photo in the directory

def extract_features(filename):

feature_extractor = VGG16(weights="imagenet", include_top=False,

input_tensor=Input(shape=(224, 224, 3)))

headModel = feature_extractor.output

headModel = GlobalAveragePooling2D()(headModel)

headModel = Dense(4096, activation="relu")(headModel)

# place the head FC model on top of the base model (this will become

# the actual model we will train)

model = Model(inputs=feature_extractor.input, outputs=headModel)

# extract features from each photo

image = load_img(filename, target_size=(224, 224))

# convert the image pixels to a numpy array

image = img_to_array(image)

# reshape data for the model

image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))

# prepare the image for the VGG model

image = preprocess_input(image)

# get features

feature = model.predict(image, verbose=0)

# store feature

return feature

Jason Brownlee January 24, 2020 at 7:44 am #

Perhaps try re-fitting your model?

Reply
Narendhiran April 15, 2020 at 11:22 pm #

I’m also stuck with same problem.

Reply

Phelan January 30, 2020 at 5:49 pm #

Hi Jason,

I tried that model with MDI dataset.
But result very bad.
I checked and have found some points:
– vocabulary size: huge more than 21000 => more difficult.
– many name such as character name in movies or place, … such as Harry Potter, …
– there are many frame same with 1 caption (now I use single frame for 1 caption and ignore other which same caption)

Do you have any suggestion for this situation?

Many thanks!

Reply
- Jason Brownlee January 31, 2020 at 7:40 am #
  
  I’m not familiar with that dataset.
  
  Try a range models.
  Try aggressively reducing the vocab.
  Try tuning the model.
  Try transfer learning.
  …
  
  Reply
Vinayak Nayak February 19, 2020 at 11:44 pm #

Hi Jason!

Thanks for this elaborate post. It is really informative and details every small step which I found quite useful in building an Image Caption Generator in PyTorch.

I used an architecture similar to yours and also used 200 dimensional Glove Vectors as word embeddings. However, after training the model, I can observe that loss is going down yet the caption predicted for every image is the same.

I ran my project in google colab and here’s the notebook link for the same:

https://colab.research.google.com/drive/1ZFxoJhWU5NfQVipOPiB3qxyyzil9TCY0

I would be highly grateful to you if you could look at it once and let me know where I’m going wrong.

I have subscribed to your mailing list and I love the tips you’ve given in the pdf for ml_performance_improvement_cheatsheet.
Thanks for your help!:)

Reply
- Jason Brownlee February 20, 2020 at 6:14 am #
  
  This is a common question hat I answer here:
  https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
  
  Reply
Vinayak Nayak February 20, 2020 at 12:37 pm #

Thanks Jason.

Reply
- Jason Brownlee February 21, 2020 at 8:16 am #
  
  You’re welcome.
  
  Reply
Anan February 20, 2020 at 1:22 pm #

The model is showing bad results even after 100 epochs.Any suggestions?

Reply
- Jason Brownlee February 21, 2020 at 8:16 am #
  
  Perhaps re-run and stop after just a few epochs?
  
  Reply
  - Anan February 21, 2020 at 5:18 pm #
    
    Tried that too but the loss rate gets increased after some epochs , the loss rate isn’t going below 3.
    
    Reply
Sakib Hossain February 26, 2020 at 8:36 pm #

ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
While downloading the VGG model

Reply
- Jason Brownlee February 27, 2020 at 5:44 am #
  
  Sounds like you are having internet issues.
  
  Perhaps try again?
  Perhaps try from a different internet connection?
  Perhaps try from a diffrent computer?
  
  Reply
Ali Raza February 28, 2020 at 12:20 am #

Hi. I am using above code for medical image captioning. but unfortunately i got the same caption for all images i give to the model. loss value on my dataset is 0,75 so i don’t think so its overfittting issue or my model would not be trained very well. can you please help me out in this matter.?

Thanks

Reply
- Jason Brownlee February 28, 2020 at 6:13 am #
  
  Perhaps the model was overfit?
  
  Perhaps use early stopping?
  Perhaps use a different architecture?
  Perhaps explore some of the ideas here?
  https://machinelearningmastery.com/start-here/#better
  
  Reply
- kumar February 17, 2022 at 9:43 pm #
  
  i am also facing this issues please help
  
  Reply
Shyam Yadav February 28, 2020 at 3:27 am #

What are the real world applications of this device?

Reply
- Jason Brownlee February 28, 2020 at 6:17 am #
  
  Describe the contents of photos.
  
  Reply
Sakib Hossain March 9, 2020 at 3:26 pm #

Traceback (most recent call last):
File “E:/MS Final/features.pkl”, line 41, in
features = extract_features(directory)
File “E:/MS Final/features.pkl”, line 23, in extract_features
image = load_img(filename, target_size=(224, 224))
File “C:\Users\hso\AppData\Local\Programs\Python\Python36\lib\site-packages\keras_preprocessing\image\utils.py”, line 110, in load_img
img = pil_image.open(path)
File “C:\Users\hso\AppData\Local\Programs\Python\Python36\lib\site-packages\PIL\Image.py”, line 2809, in open
fp = builtins.open(filename, “rb”)
PermissionError: [Errno 13] Permission denied: ‘Flickr8k_Dataset/Flicker8k_Dataset’

Please help what is the problem ?

Reply
- Jason Brownlee March 10, 2020 at 5:36 am #
  
  Looks like you don’t have permission to open the files.
  
  Perhaps change the permission?
  
  Reply
Sakib Hossain March 10, 2020 at 5:56 pm #

============= RESTART: E:/Project Study/Fitted Model – Final.py =============
Using TensorFlow backend.
Dataset: 6000
Descriptions: train=6000
Photos: train=6000
Vocabulary Size: 7579
Description Length: 34
Traceback (most recent call last):
File “E:/Project Study/Fitted Model – Final.py”, line 154, in
X1train, X2train, ytrain = create_sequences(tokenizer, max_length, train_descriptions, train_features, vocab_size)
File “E:/Project Study/Fitted Model – Final.py”, line 109, in create_sequences
return array(X1), array(X2), array(y)
MemoryError: Unable to allocate 8.65 GiB for an array with shape (306404, 7579) and data type float32
>>>

Can you please help ?

Reply
- Jason Brownlee March 11, 2020 at 5:21 am #
  
  Perhaps try the progressive loading version listed above.
  
  Reply
Dharmi March 15, 2020 at 2:45 am #

During data preprocessing,i am getting an error while using the directory variable and filename. What has to be done over there?

Reply
- Jason Brownlee March 15, 2020 at 6:20 am #
  
  Ensure you have the code and data in the same directory and run code from the command line.
  
  Reply
Jerry March 17, 2020 at 2:28 am #

Hi Jason,
Thank you so much for your super super super helpful post. I appreciate your effort!

I got my best training result is ep005-loss3.515-val_loss3.829.

I tried to use the trained model to generate a caption for a picture, in which a guy with a naked upper body is doing a chest push in the gym. The generated caption is “startseq two children are playing in the snow endseq”.

Do you have any idea how to improve the model?Thank you.

Reply
- Jason Brownlee March 17, 2020 at 8:20 am #
  
  You’re welcome!
  
  Nice work.
  
  Perhaps try using a different final model?
  Perhaps try training again?
  Perhaps try tuning the model?
  
  Ideas here:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
  - Jerry March 19, 2020 at 11:48 am #
    
    Hi Jason,
    
    Thank you for your feedback!
    
    May I ask one more question? In my mind, usually, what we need for developing and evaluating a model is a training dataset and a testing dataset. So what is the purpose of the “Flickr_8k.devImages.txt” dataset?
    
    And I find that the size of the “Flickr_8k.devImages.txt” is equal to the size of “Flickr_8k.testImages.txt”. Is this quantitative equivalent a coincidence or a necessity?
    
    Thank you!
    
    Reply
    - Jason Brownlee March 19, 2020 at 1:48 pm #
      
      I don’t recall off hand. It could be used for validation:
      https://machinelearningmastery.com/difference-test-validation-datasets/
      
      Reply
Nafees Dipta March 18, 2020 at 11:34 am #

Hello Jason,
Thank for your post. What if each image has multiple descriptions in csv file like flicker30k dataset?

Reply
- Jason Brownlee March 18, 2020 at 1:09 pm #
  
  You’re welcome.
  
  You can train the model using each description for the same image input.
  
  Reply
- Jerry March 19, 2020 at 11:58 am #
  
  Hi Nafees,
  
  Can you manage to train the model using the Flickr30k dataset? I tried it and got a “killed: 9” output, which means the training is too massive for the memory of my laptop. If you are able to come over that, could you please share how you did it?
  
  Thanks!
  
  Reply
  - Jason Brownlee March 19, 2020 at 1:49 pm #
    
    Perhaps try the progressive loading version?
    
    Reply
Ali March 28, 2020 at 6:36 pm #

Dear Jason
I am running a visual question answering task. It is seems similar to the image captioning problem. Take as input : image features (which I have saved in a h5py file) and question tokens (which I have pickled) and outputs are the answers (the whole answer is considered a target , so 3129 answers –one word or more – and 3129 labels in my case)
I am using the Keras sequence utility to create the generator.
I am getting a dimension error in the output layer when the model tries to start training.
I have copied my getitem function in the generator and also a sample of my model.
Would you please have a look my code and help figure out the problem?
Best wishes

Epoch 1/1
Traceback (most recent call last):
File “”, line 32, in
validation_data=valid_generator)
File “C:\python\envs\tf2-keras\lib\site-packages\keras\legacy\interfaces.py”, line 91, in wrapper
return func(*args, **kwargs)
File “C:\python\envs\tf2-keras\lib\site-packages\keras\engine\training.py”, line 1732, in fit_generator
initial_epoch=initial_epoch)
File “C:\python\envs\tf2-keras\lib\site-packages\keras\engine\training_generator.py”, line 220, in fit_generator
reset_metrics=False)
File “C:\python\envs\tf2-keras\lib\site-packages\keras\engine\training.py”, line 1508, in train_on_batch
class_weight=class_weight)
File “C:\python\envs\tf2-keras\lib\site-packages\keras\engine\training.py”, line 621, in _standardize_user_data
exception_prefix=’target’)
File “C:\python\envs\tf2-keras\lib\site-packages\keras\engine\training_utils.py”, line 145, in standardize_input_data
str(data_shape))

ValueError: Error when checking target: expected output to have shape (3129,) but got array with shape (1,)

# this is the getitem function
The __getitem__ of my generator look like this:
def __getitem__(self, index):
‘Generate one batch of data’

imfeatures = np.empty((self.batch_size,2048))
question_tokens = np.empty((self.batch_size,14))
answers = np.empty((self.batch_size,3129))

# Generate indexes of the batch
indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
# self.T.append(indexes)
list_IDs_temp = [self.list_IDs[k] for k in indexes]

# Generate data
for i,k in enumerate(list_IDs_temp):
temp =self.Features[‘image_features’][k]
imfeatures[i,]=temp[0,:]
question_tokens[i,]=self.Questions[indexes[i]]
answers=self.Answer[indexes[i]]

return [imfeatures,question_tokens],answers

#And this is what my model looks like:

ImInput = Input(shape=(2048,),name=’image_input’)
QInput = Input(shape=(14,),name=’question’)

# some dense layers and dropouts

#Then the layers are merged

M =Multiply()[ImInput,QInput]
#Some dense layers and dropouts

output=Dense(3129,activation=’softmax’,name=’output’)(M)

model = Model([ImInput,X],output)
model.compile(optimizer=’RMSprop’,loss=’categorical_crossentropy’,metrics = [‘accuracy’])

model.fit_generator(train_generator,
epochs=1,
verbose =1,
validation_data=valid_generator)

Reply
- Jason Brownlee March 29, 2020 at 5:51 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
  
  Reply
Manju Patil March 28, 2020 at 9:35 pm #

In Train With Progressive Loading
I get the following error. Please Help

Dataset: 6000
Descriptions: train=6000
Photos: train=6000
Vocabulary Size: 7579
Description Length: 34
Model: “model_4”
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_8 (InputLayer) (None, 34) 0
__________________________________________________________________________________________________
input_7 (InputLayer) (None, 4096) 0
__________________________________________________________________________________________________
embedding_4 (Embedding) (None, 34, 256) 1940224 input_8[0][0]
__________________________________________________________________________________________________
dropout_7 (Dropout) (None, 4096) 0 input_7[0][0]
__________________________________________________________________________________________________
dropout_8 (Dropout) (None, 34, 256) 0 embedding_4[0][0]
__________________________________________________________________________________________________
dense_10 (Dense) (None, 256) 1048832 dropout_7[0][0]
__________________________________________________________________________________________________
lstm_4 (LSTM) (None, 256) 525312 dropout_8[0][0]
__________________________________________________________________________________________________
add_4 (Add) (None, 256) 0 dense_10[0][0]
lstm_4[0][0]
__________________________________________________________________________________________________
dense_11 (Dense) (None, 256) 65792 add_4[0][0]
__________________________________________________________________________________________________
dense_12 (Dense) (None, 7579) 1947803 dense_11[0][0]
==================================================================================================
Total params: 5,527,963
Trainable params: 5,527,963
Non-trainable params: 0
__________________________________________________________________________________________________
“dot” with args [‘-Tps’, ‘C:\\Users\\MANJUP~1\\AppData\\Local\\Temp\\tmp1f9neqi1’] returned code: 1

stdout, stderr:
b”
b”‘F:\\New’ is not recognized as an internal or external command,\r\noperable program or batch file.\r\n”

—————————————————————————
AssertionError Traceback (most recent call last)
in
160
161 # define the model
–> 162 model = define_model(vocab_size, max_length)
163 # train the model, run epochs manually and save after each epoch
164 epochs = 20

in define_model(vocab_size, max_length)
128 # summarize model
129 model.summary()
–> 130 plot_model(model, to_file=’model.png’, show_shapes=True)
131 return model
132

F:\New folder\lib\site-packages\keras\utils\vis_utils.py in plot_model(model, to_file, show_shapes, show_layer_names, rankdir, expand_nested, dpi)
238 “””
239 dot = model_to_dot(model, show_shapes, show_layer_names, rankdir,
–> 240 expand_nested, dpi)
241 _, extension = os.path.splitext(to_file)
242 if not extension:

F:\New folder\lib\site-packages\keras\utils\vis_utils.py in model_to_dot(model, show_shapes, show_layer_names, rankdir, expand_nested, dpi, subgraph)
77 from ..models import Sequential
78
—> 79 _check_pydot()
80 if subgraph:
81 dot = pydot.Cluster(style=’dashed’, graph_name=model.name)

F:\New folder\lib\site-packages\keras\utils\vis_utils.py in _check_pydot()
26 # Attempt to create an image of a blank graph
27 # to check the pydot/graphviz installation.
—> 28 pydot.Dot.create(pydot.Dot())
29 except OSError:
30 raise OSError(

F:\New folder\lib\site-packages\pydot.py in create(self, prog, format, encoding)
1943 print(message)
1944
-> 1945 assert process.returncode == 0, process.returncode
1946
1947 return stdout_data

AssertionError: 1

Reply
- Jason Brownlee March 29, 2020 at 5:53 am #
  
  Sorry to hear that, perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Ramita Shrestha March 29, 2020 at 2:12 pm #

Hello Jason,
If we need to find accuracy from this then what to do? Is there any code ?

Reply
- Jason Brownlee March 30, 2020 at 5:31 am #
  
  Accuracy is a bad metric for this problem, we use BLEU instead:
  https://machinelearningmastery.com/calculate-bleu-score-for-text-python/
  
  Reply
  - Ramita Shrestha March 30, 2020 at 4:57 pm #
    
    Thank you!
    
    Reply
    - Jason Brownlee March 31, 2020 at 7:56 am #
      
      You’re welcome.
      
      Reply
Hesam March 31, 2020 at 2:41 am #

This article is awesome Jason!
here is what i got “dog is running through the water”

Reply
- Jason Brownlee March 31, 2020 at 8:15 am #
  
  Well done!
  
  Reply
Jamie April 6, 2020 at 8:55 am #

Hi Jason,

I’m wondering why the tokenizer is fit on the training, validation, and test data separately. I understand that we don’t want sequences from one set to leak into another, but wouldn’t we want the integer representation of each word to be the same across all sets, and then get the sequences from that representation?

If the model is learning relationships between integers in sequences, and the integers have totally different meaning (and thus expected placement in sequences) between train and test, then wouldn’t that be bad?

Would your answer change if word2vec was used since the location of the word in the vector space has meaning?

Thank you for the great tutorial.

best,
Jamie

Reply
- Jason Brownlee April 6, 2020 at 9:21 am #
  
  We create a single tokenizer from the training dataset.
  
  Reply
  - Jamie April 8, 2020 at 5:31 pm #
    
    Sorry, I misunderstood. I thought it would be a problem if the dev and test sets have longer sequences or different words, but I guess that’s a part of it!
    
    When I try to train the model I get the error: “NotImplementedError: Cannot convert a symbolic Tensor (args_2:0) to a numpy array.”
    
    This is in TensorFlow 2.1 so I’m importing from tensorflow.keras and using fit instead of fit_generator. I tried directly copying your complete progressive loading example (and making those two changes) and still have the error.
    
    Thank you for your help.
    
    Reply
    - Jason Brownlee April 9, 2020 at 7:58 am #
      
      I recommend not using tf.keras and instead use standalone Keras 2.3 running on top of tensorflow 2.1:
      https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-keras-and-tf-keras
      
      Reply
      - Mohit joshi April 23, 2020 at 11:17 pm #
        
        Same problem I’ m facing with tf.keras , any solution..?
      - Jason Brownlee April 24, 2020 at 5:44 am #
        
        Use standalone keras, not tf.keras.
coder April 10, 2020 at 9:34 pm #

# extract features from each photo in the directory
def extract_features(directory):
# load the model
model = VGG16()
# re-structure the model
model.layers.pop()
model = Model(inputs=model.inputs, outputs=model.layers[-1].output)
# summarize
print(model.summary())
# extract features from each photo
features = dict()
for name in listdir(directory):
# load an image from file
filename = directory + ‘/’ + name
image = load_img(filename, target_size=(224, 224))
# convert the image pixels to a numpy array
image = img_to_array(image)
# reshape data for the model
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
# prepare the image for the VGG model
image = preprocess_input(image)
# get features
feature = model.predict(image, verbose=0)
# get image id
image_id = name.split(‘.’)[0]
# store feature
features[image_id] = feature
print(‘>%s’ % name)
return features

# extract features from all images
directory = ‘C:\Flicker8k_Dataset’
features = extract_features(directory)
print(‘Extracted Features: %d’ % len(features))CA
# save to file
dump(features, open(‘features.pkl’, ‘wb’))

”’ When i run this code after loading all the images it’s showing this error”’

in extract_features(directory)
13 # load an image from file
14 filename = directory + ‘/’ + name
—> 15 image = load_img(filename, target_size=(224, 224))
16 # convert the image pixels to a numpy array
17 image = img_to_array(image)

~\.conda\envs\tensorflow\lib\site-packages\keras_preprocessing\image\utils.py in load_img(path, grayscale, color_mode, target_size, interpolation)
108 raise ImportError(‘Could not import PIL.Image. ‘
109 ‘The use of load_img requires PIL.’)
–> 110 img = pil_image.open(path)
111 if color_mode == ‘grayscale’:
112 if img.mode != ‘L’:

~\.conda\envs\tensorflow\lib\site-packages\PIL\Image.py in open(fp, mode)
2894 warnings.warn(message)
2895 raise UnidentifiedImageError(
-> 2896 “cannot identify image file %r” % (filename if filename else fp)
2897 )
2898

UnidentifiedImageError: cannot identify image file ‘C:\\Flicker8k_Dataset/Flicker8k_Dataset – Shortcut.lnk’

”’Can you please help me”’

Reply
- Jason Brownlee April 11, 2020 at 6:19 am #
  
  See this:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
- Mohit joshi April 24, 2020 at 10:59 am #
  
  can you also provide solution with tf.keras … it will help me to know more .. thanks Jason
  
  Reply
Anand Menon April 20, 2020 at 7:39 am #

Jason, Minor typo. The text for this line, “Running this example first loads the 6,000 photo identifiers in the test dataset”, should say “train dataset” instead of test dataset.

Reply
- Jason Brownlee April 20, 2020 at 7:53 am #
  
  Thanks, fixed!
  
  Reply
Mohit joshi April 23, 2020 at 11:07 pm #

This error is coming , help me if you can 🙂

NotImplementedError: Cannot convert a symbolic Tensor (args_2:0) to a numpy array.

model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
12 # save model
13 model.save(“models/model_” + str(i) + ‘.h5’)

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\util\deprecation.py in new_func(*args, **kwargs)
322 ‘in a future version’ if date is None else (‘after %s’ % date),
323 instructions)
–> 324 return func(*args, **kwargs)
325 return tf_decorator.make_decorator(
326 func, new_func, ‘deprecated’,

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
1304 use_multiprocessing=use_multiprocessing,
1305 shuffle=shuffle,
-> 1306 initial_epoch=initial_epoch)
1307
1308 @deprecation.deprecated(

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
817 max_queue_size=max_queue_size,
818 workers=workers,
–> 819 use_multiprocessing=use_multiprocessing)
820
821 def evaluate(self,

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
233 max_queue_size=max_queue_size,
234 workers=workers,
–> 235 use_multiprocessing=use_multiprocessing)
236
237 total_samples = _get_total_number_of_samples(training_data_adapter)

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in _process_training_inputs(model, x, y, batch_size, epochs, sample_weights, class_weights, steps_per_epoch, validation_split, validation_data, validation_steps, shuffle, distribution_strategy, max_queue_size, workers, use_multiprocessing)
591 max_queue_size=max_queue_size,
592 workers=workers,
–> 593 use_multiprocessing=use_multiprocessing)
594 val_adapter = None
595 if validation_data:

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in _process_inputs(model, mode, x, y, batch_size, epochs, sample_weights, class_weights, shuffle, steps, distribution_strategy, max_queue_size, workers, use_multiprocessing)
704 max_queue_size=max_queue_size,
705 workers=workers,
–> 706 use_multiprocessing=use_multiprocessing)
707
708 return adapter

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\data_adapter.py in __init__(self, x, y, sample_weights, standardize_function, workers, use_multiprocessing, max_queue_size, **kwargs)
765
766 if standardize_function is not None:
–> 767 dataset = standardize_function(dataset)
768
769 if kwargs.get(“shuffle”, False) and self.get_size() is not None:

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in standardize_function(dataset)
682 return x, y
683 return x, y, sample_weights
–> 684 return dataset.map(map_fn, num_parallel_calls=dataset_ops.AUTOTUNE)
685
686 if mode == ModeKeys.PREDICT:

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\data\ops\dataset_ops.py in map(self, map_func, num_parallel_calls)
1589 else:
1590 return ParallelMapDataset(
-> 1591 self, map_func, num_parallel_calls, preserve_cardinality=True)
1592
1593 def flat_map(self, map_func):

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\data\ops\dataset_ops.py in __init__(self, input_dataset, map_func, num_parallel_calls, use_inter_op_parallelism, preserve_cardinality, use_legacy_function)
3924 self._transformation_name(),
3925 dataset=input_dataset,
-> 3926 use_legacy_function=use_legacy_function)
3927 self._num_parallel_calls = ops.convert_to_tensor(
3928 num_parallel_calls, dtype=dtypes.int32, name=”num_parallel_calls”)

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\data\ops\dataset_ops.py in __init__(self, func, transformation_name, dataset, input_classes, input_shapes, input_types, input_structure, add_to_graph, use_legacy_function, defun_kwargs)
3145 with tracking.resource_tracker_scope(resource_tracker):
3146 # TODO(b/141462134): Switch to using garbage collection.
-> 3147 self._function = wrapper_fn._get_concrete_function_internal()
3148
3149 if add_to_graph:

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\eager\function.py in _get_concrete_function_internal(self, *args, **kwargs)
2393 “””Bypasses error checking when getting a graph function.”””
2394 graph_function = self._get_concrete_function_internal_garbage_collected(
-> 2395 *args, **kwargs)
2396 # We’re returning this concrete function to someone, and they may keep a
2397 # reference to the FuncGraph without keeping a reference to the

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\eager\function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
2387 args, kwargs = None, None
2388 with self._lock:
-> 2389 graph_function, _, _ = self._maybe_define_function(args, kwargs)
2390 return graph_function
2391

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\eager\function.py in _maybe_define_function(self, args, kwargs)
2701
2702 self._function_cache.missed.add(call_context_key)
-> 2703 graph_function = self._create_graph_function(args, kwargs)
2704 self._function_cache.primary[cache_key] = graph_function
2705 return graph_function, args, kwargs

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\eager\function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
2591 arg_names=arg_names,
2592 override_flat_arg_shapes=override_flat_arg_shapes,
-> 2593 capture_by_value=self._capture_by_value),
2594 self._function_attributes,
2595 # Tell the ConcreteFunction to clean up its graph once it goes out of

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\framework\func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
976 converted_func)
977
–> 978 func_outputs = python_func(*func_args, **func_kwargs)
979
980 # invariant: func_outputs contains only Tensors, CompositeTensors,

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\data\ops\dataset_ops.py in wrapper_fn(*args)
3138 attributes=defun_kwargs)
3139 def wrapper_fn(*args): # pylint: disable=missing-docstring
-> 3140 ret = _wrapper_helper(*args)
3141 ret = structure.to_tensor_list(self._output_structure, ret)
3142 return [ops.convert_to_tensor(t) for t in ret]

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\data\ops\dataset_ops.py in _wrapper_helper(*args)
3080 nested_args = (nested_args,)
3081
-> 3082 ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
3083 # If func returns a list of tensors, nest.flatten() and
3084 # ops.convert_to_tensor() would conspire to attempt to stack

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\autograph\impl\api.py in wrapper(*args, **kwargs)
235 except Exception as e: # pylint:disable=broad-except
236 if hasattr(e, ‘ag_error_metadata’):
–> 237 raise e.ag_error_metadata.to_exception(e)
238 else:
239 raise

NotImplementedError: in converted code:

C:\Users\Mohit\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py:677 map_fn
batch_size=None)
C:\Users\Mohit\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training.py:2410 _standardize_tensors
exception_prefix=’input’)
C:\Users\Mohit\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_utils.py:513 standardize_input_data
data = [np.asarray(d) for d in data]
C:\Users\Mohit\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_utils.py:513
data = [np.asarray(d) for d in data]
C:\Users\Mohit\Anaconda3\envs\tf-gpu\lib\site-packages\numpy\core\_asarray.py:85 asarray
return array(a, dtype, copy=False, order=order)
C:\Users\Mohit\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\framework\ops.py:728 __array__
” array.”.format(self.name))

NotImplementedError: Cannot convert a symbolic Tensor (args_2:0) to a numpy array.

Reply
- Jason Brownlee April 24, 2020 at 5:43 am #
  
  Perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Rishal April 27, 2020 at 8:09 am #

Hi Jason,

Thank you for this article. Amazing implementation and explanation of the entire process.

I had a question regarding improving the BLEU score for this task:

You said we should try out using different Pre-Trained Image models like InceptionV3 and also try to add regularization.

Where exactly according to your expertise should these changes be implemented? Should we make changes in extract_features() or in define_model() or both to get better results?

Also, is there a faster way to know if our new model will do better as these models take a couple of hours to run?

Thank you

Reply
- Jason Brownlee April 27, 2020 at 12:12 pm #
  
  It would be a change to the model itself, e.g. loading a saved model from another task and adapting it for use on this task.
  
  This will give you ideas:
  https://machinelearningmastery.com/how-to-improve-performance-with-transfer-learning-for-deep-learning-neural-networks/
  
  And this:
  https://machinelearningmastery.com/how-to-use-transfer-learning-when-developing-convolutional-neural-network-models/
  
  Reply
  - Rishal Shah April 27, 2020 at 3:07 pm #
    
    So do you mean that we use transfer learning and add a model (like VGG) in the define_model() method even after doing feature extraction using VGG?
    
    Reply
    - Jason Brownlee April 28, 2020 at 6:41 am #
      
      No, I was thinking you would transfer an NLP model from another task.
      
      You can do whatever you like though.
      
      Reply
Mohit joshi May 1, 2020 at 8:57 pm #

hey Jason, just wanna ask why you implement this by VGG16 when there is more accurate and efficient pre-trained models are available like Inception V3 ResNet etc.. just wanna know your reasons behind choosing VGG16 model….

Reply
- Jason Brownlee May 2, 2020 at 5:43 am #
  
  It is a simple model that is easy to understand and works well in may cases.
  
  Reply
Krish May 4, 2020 at 6:31 pm #

Hey Jason, Thanks for creating such a wonderful article

With standalone Keras the code runs fine

As I am now using tensorflow 2.1 and while trying to run the show, facing the below error:

NotImplementedError: Cannot convert a symbolic Tensor (args_2:0) to a numpy array.

I have gone the below link which you have posted but it’s of no help

https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me

Please have a look on it and in case required I can pass the code to you

Reply
- Jason Brownlee May 5, 2020 at 6:21 am #
  
  You can run the example using Keras 2.3 on top of TensorFlow 2.1 directly. No need to change the code.
  
  Reply
Krish May 6, 2020 at 12:00 am #

Hey Jason,

I tried using Keras(2.3) from TensorFlow(2.1) and facing the below issue:

NotImplementedError: Cannot convert a symbolic Tensor (args_2:0) to a numpy array.

Reply
- Jason Brownlee May 6, 2020 at 6:27 am #
  
  Sorry to hear that, I can confirm the code works with these versions, I suspect this will help you:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Pranshi Garg May 14, 2020 at 11:03 pm #

PermissionError Traceback (most recent call last)
in
1 directory =”D:\Flickr8k_Dataset”
—-> 2 features = extract_features(directory)
3 print(‘Extracted Features: %d’ % len(features))
4 # save to file
5 dump(features, open(r’features.pkl’, ‘rb’))

in extract_features(directory)
13 # load an image from file
14 filename = directory + ‘/’ + name
—> 15 image = load_img(filename, target_size=(224, 224))
16 # convert the image pixels to a numpy array
17 image = img_to_array(image)

~\anaconda3\lib\site-packages\keras_preprocessing\image\utils.py in load_img(path, grayscale, color_mode, target_size, interpolation)
108 raise ImportError(‘Could not import PIL.Image. ‘
109 ‘The use of load_img requires PIL.’)
–> 110 img = pil_image.open(path)
111 if color_mode == ‘grayscale’:
112 if img.mode != ‘L’:

~\anaconda3\lib\site-packages\PIL\Image.py in open(fp, mode)
2807
2808 if filename:
-> 2809 fp = builtins.open(filename, “rb”)
2810 exclusive_fp = True
2811

PermissionError: [Errno 13] Permission denied: ‘D:\\Flickr8k_Dataset/Flicker8k_Dataset’

Can you please help me?

Reply
- Jason Brownlee May 15, 2020 at 6:02 am #
  
  Looks like you don’t have permission to access your own files on your own workstation!
  
  Reply
  - Pranshi Garg May 16, 2020 at 8:16 am #
    
    how to change that?
    
    Reply
    - Jason Brownlee May 16, 2020 at 10:13 am #
      
      It will be specific to your workstation.
      
      If you are not the admin of your workstation, perhaps contact the admin.
      
      Or, perhaps try downloading the data set again and save it in a different location on your workstation.
      
      Reply
Ali May 19, 2020 at 3:19 am #

sir i used your code to develop desktop app.now i want to create android app.is it possible.if possible how?

Reply
- Jason Brownlee May 19, 2020 at 6:10 am #
  
  I don’t know about creating android apps, I teach machine learning.
  
  Reply
Karan Aryan May 19, 2020 at 8:23 pm #

Hi Jason, I am using Tensorflow 2.0 , and while running the code , I am receiving this error
I used both the method but I am receiving error . Can you please help me out.

1) with normal model.fit() method
Epoch 1/20
—————————————————————————
UnboundLocalError Traceback (most recent call last)
in ()
4 checkpoint = ModelCheckpoint(filepath, monitor=’val_loss’, verbose=1, save_best_only=True, mode=’min’)
5 # fit model
—-> 6 model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, batch_size = 3, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

1 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
857 logs = tmp_logs # No error, now safe to assign to logs.
858 callbacks.on_train_batch_end(step, logs)
–> 859 epoch_logs = copy.copy(logs)
860
861 # Run validation.

UnboundLocalError: local variable ‘logs’ referenced before assignment

2) and while running the code of Progressive Loading method. I am receiving this error

WARNING:tensorflow:Model was constructed with shape (None, 2048) for input Tensor(“input_11:0”, shape=(None, 2048), dtype=float32), but it was called on an input with incompatible shape (None, None, None).
—————————————————————————
ValueError Traceback (most recent call last)
in ()
1 for i in range(epochs):
2 generator = data_generator(train_descriptions, train_features, wordtoix, max_length, number_pics_per_batch)
—-> 3 model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
4 model.save(‘./model_weights/model_’ + str(i) + ‘.h5’)

12 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
966 except Exception as e: # pylint:disable=broad-except
967 if hasattr(e, “ag_error_metadata”):
–> 968 raise e.ag_error_metadata.to_exception(e)
969 else:
970 raise

ValueError: in user code:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:571 train_function *
outputs = self.distribute_strategy.run(
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:951 run **
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2290 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2649 _call_for_each_replica
return fn(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:541 train_step **
self.trainable_variables)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:1807 _minimize
trainable_variables))
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:521 _aggregate_gradients
filtered_grads_and_vars = _filter_grads(grads_and_vars)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:1219 _filter_grads
([v.name for _, v in grads_and_vars],))

ValueError: No gradients provided for any variable: [‘dense_15/kernel:0’, ‘dense_15/bias:0’, ‘lstm_5/lstm_cell_5/kernel:0’, ‘lstm_5/lstm_cell_5/recurrent_kernel:0’, ‘lstm_5/lstm_cell_5/bias:0’, ‘dense_16/kernel:0’, ‘dense_16/bias:0’, ‘dense_17/kernel:0’, ‘dense_17/bias:0’].

I’ll list out the main value error here
a) Normal model.fit method()
– UnboundLocalError: local variable ‘logs’ referenced before assignment

b) Progressive Loading method
– ValueError: No gradients provided for any variable: [‘dense_15/kernel:0’, ‘dense_15/bias:0’, ‘lstm_5/lstm_cell_5/kernel:0’, ‘lstm_5/lstm_cell_5/recurrent_kernel:0’, ‘lstm_5/lstm_cell_5/bias:0’, ‘dense_16/kernel:0’, ‘dense_16/bias:0’, ‘dense_17/kernel:0’, ‘dense_17/bias:0’].

Reply
- Jason Brownlee May 20, 2020 at 6:24 am #
  
  I’m sorry to hear that, I have not seen this error. Perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
- Werner June 10, 2020 at 12:09 pm #
  
  I got the same error. I also went through the checklist and have imported all packages, copied code correctly, etc. It only happens with the low_RAM version. With the original it didn’t, but I ran out of RAM.
  
  Reply
  - Rishab July 17, 2020 at 3:30 am #
    
    Hey,i also got the same error,i want to train the model on my local machine but i get the same error,i am using tensorflow 2.2.0 and keras version 2.3.0,were you able to solve it?
    ==================================================================
    Also, i tried running it on colab,it seems to work fine there, colab uses tensorflow 2.2.0 and keras 2.3.1 version,i am scared to install keras again on my local machine so that it doesnt screw up the system…if you found the solution to this error please do let me know.
    
    Reply
    - Saicharan August 6, 2020 at 4:57 pm #
      
      I am also getting the same error. I am using google colab to run the code. i got this error in google colab. Did u find the solution for the error?
      
      Reply
      - Jason Brownlee August 7, 2020 at 6:23 am #
        
        Try running on your own workstation or on an AWS EC2 instance.
      - Aayush Jain August 24, 2020 at 11:02 pm #
        
        For anyone who is getting this error on google colab, I have a temporary fix for it. Simply downgrade the version of keras and tensorflow. Use pip for this.
        Run the following code:
        
        pip uninstall keras
        pip install keras == 2.3.1
        pip uninstall tensorflow
        pip install tensorflow == 2.2
        
        After running the above codes in different cells, simply restart your runtime and your error will be solved.
      - Jason Brownlee August 25, 2020 at 6:41 am #
        
        Thanks for sharing!
- Abhinav Kumar June 24, 2020 at 12:59 am #
  
  I am also getting same error.
  
  Reply
  - Jason Brownlee June 24, 2020 at 6:34 am #
    
    Sorry to hear that, this may help:
    https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
    
    Reply
Harshit Bargali May 31, 2020 at 2:50 am #

I have used your code only in training and evaluating. But somehow I got this error while evaluating, and I am not sure why it happened.

C:\Users\Harshit\Anaconda3\envs\ImageProcessing\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
“Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
Traceback (most recent call last):

File “”, line 1, in
runfile(‘C:/Users/Harshit/Desktop/Flickr8k_dataset/Evaluate model.py’, wdir=’C:/Users/Harshit/Desktop/Flickr8k_dataset’)

File “C:\Users\Harshit\Anaconda3\envs\ImageProcessing\lib\site-packages\spyder_kernels\customize\spydercustomize.py”, line 827, in runfile
execfile(filename, namespace)

File “C:\Users\Harshit\Anaconda3\envs\ImageProcessing\lib\site-packages\spyder_kernels\customize\spydercustomize.py”, line 110, in execfile
exec(compile(f.read(), filename, ‘exec’), namespace)

File “C:/Users/Harshit/Desktop/Flickr8k_dataset/Evaluate model.py”, line 164, in
evaluate_model(model, test_descriptions, test_features, tokenizer, max_length)

File “C:/Users/Harshit/Desktop/Flickr8k_dataset/Evaluate model.py”, line 119, in evaluate_model
yhat = generate_desc(model, tokenizer, photos[key], max_length)

File “C:/Users/Harshit/Desktop/Flickr8k_dataset/Evaluate model.py”, line 98, in generate_desc
yhat = model.predict([photo,sequence], verbose=0)

File “C:\Users\Harshit\Anaconda3\envs\ImageProcessing\lib\site-packages\keras\engine\training.py”, line 1441, in predict
x, _, _ = self._standardize_user_data(x)

File “C:\Users\Harshit\Anaconda3\envs\ImageProcessing\lib\site-packages\keras\engine\training.py”, line 579, in _standardize_user_data
exception_prefix=’input’)

File “C:\Users\Harshit\Anaconda3\envs\ImageProcessing\lib\site-packages\keras\engine\training_utils.py”, line 145, in standardize_input_data
str(data_shape))

ValueError: Error when checking input: expected input_6 to have shape (28,) but got array with shape (34,)

Reply
- Jason Brownlee May 31, 2020 at 6:30 am #
  
  I’m sorry to hear that, this may help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Aniket Gupta June 9, 2020 at 12:44 am #

I have trained the model for 20 epochs after which the accuracy is around 0.5 and validation set accuracy is 0.3.
How will this be able to generate good captions?

Reply
- Jason Brownlee June 9, 2020 at 6:05 am #
  
  You can ignore accuracy, it is a poor metric for this task. We instead use BLEU.
  
  Reply
  - Aniket Gupta June 9, 2020 at 9:12 pm #
    
    Will we get better predictions if we use GloVe vectors as they cover a wider corpus and account for relations between words?
    
    Reply
    - Jason Brownlee June 10, 2020 at 6:13 am #
      
      It depends on the specifics of the model and the dataset. Try it and see.
      
      Reply
LIN June 9, 2020 at 9:13 pm #

hi, Jason
I’m a bit confused. one picture in the data set corresponds to five sentences, which means that the label of each picture will choose one from these five sentences during the training?

Reply
- Jason Brownlee June 10, 2020 at 6:14 am #
  
  We train it on one description or on each description. Give the model lots of ways to “think” about an image which may or may not be useful.
  
  Reply
  - LIN June 10, 2020 at 12:58 pm #
    
    got it, thanks
    
    Reply
Videl June 12, 2020 at 5:22 am #

Hi, Jason.
That’s a great article and it helped me a lot.
I just want to tell you that model.layers.pop() does not remove the layer from the model.
The features that I extracted are 1000 dimensional, not 4096.
(I do not know how it is working for you. Maybe it worked fine with older Keras.)

https://github.com/tensorflow/tensorflow/issues/22479

Regards.

Reply
- Jason Brownlee June 12, 2020 at 6:19 am #
  
  The tutorial uses the Keras API directly, it looks like you are trying to change it to use tf.keras.
  
  Reply
Phuc June 12, 2020 at 2:14 pm #

Hi Jason,
I am working a project of image captioning. I have read some papers of image captioning. However, I don’t which evalution they used. And I am confuse between corpus_bleu and sentence_bleu. Image captioning’s output is a sentence for an image. So I think we should calculate sentence_bleu for each image, then calculate the average, but I saw you use corpus_bleu.
Can you tell me which evaluation is suitable? Which was used in the papers?

Reply
- Jason Brownlee June 13, 2020 at 5:47 am #
  
  Good question.
  
  If you are interested in the metric used in a specific paper, perhaps contact the authors directly.
  
  To learn the difference between the different BLEU scores, see this tutorial:
  https://machinelearningmastery.com/calculate-bleu-score-for-text-python/
  
  Reply
Videl June 13, 2020 at 4:29 am #

Hi again Jason,

So I have trained the model on my own dataset (Bengali language). I got the best val loss as 2.99.
Now the problem is, when I run the generate_desc function I get the following error:

ValueError: Data cardinality is ambiguous:
x sizes: 4096, 33
Please provide data which shares the same first dimension.

There was no such issue during training at all. What could possibly cause this?

Regards.

Reply
- Videl June 13, 2020 at 5:35 am #
  
  Sorry the error is gone. But the same caption is being generated for all the images.
  
  Reply
  - Jason Brownlee June 13, 2020 at 6:11 am #
    
    Perhaps try tuning your model to the dataset.
    
    Reply
- Jason Brownlee June 13, 2020 at 6:11 am #
  
  ell done!
  
  Perhaps you will need to adapt the code, I cannot diagnose the error off the cuff.
  
  Reply
- Gaurav September 20, 2021 at 4:18 am #
  
  Hey,
  
  How did you fix the data cardinality issue? I have it too
  
  ValueError: Data cardinality is ambiguous:
  x sizes: 2048, 1
  Make sure all arrays contain the same number of samples.
  
  Reply
  - Moda February 25, 2022 at 8:55 am #
    
    I reshaped the features vector, in my case the features len for every image is 4096, the feature vector which is passed to the “generate_desc” is of shape (4096, ) , which is ambiguous for the model bcs it does not seem that it is a one sample/photo features vector.
    
    The fix:
    yhat = model.predict([np.reshape(photo, (1, 4096)),sequence], verbose=0)
    
    Reply
Adam July 2, 2020 at 3:06 pm #

Hey Jason, thank you for an amazing tutorial.

I have one question though,

I tried using the above given VGG16 model, the highest BLEU score I reached is 0.35
And using Xception, I got 0.37

How can I increase this?

Reply
- Jason Brownlee July 2, 2020 at 5:41 pm #
  
  Nice work.
  
  Good question, some of the suggestions here may help:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
Dorsa July 5, 2020 at 6:47 am #

Hi , this error is coming for me. can anyone help me plz ?

FileNotFoundError: [WinError 3] The system cannot find the path specified: ‘Flickr8k_Dataset’

Reply
- Jason Brownlee July 5, 2020 at 7:08 am #
  
  Ensure you download the dataset and run the code from the same directory as the unzipped dataset, e.g. from the command line.
  
  Reply
Jay July 7, 2020 at 4:41 am #

Hey Jason, thanks for sharing such an awesome tutorial. I enjoyed reading it now soon will try to implement it.
But just out of curiosity is there any tutorial which can generate images from captions i.e.(Text-to-Image synthesis)

Reply
- Jason Brownlee July 7, 2020 at 6:45 am #
  
  Thanks.
  
  I hope to write a tutorial on that topic in the future. Probably a GAN would be used.
  
  Reply
  - Jay July 16, 2020 at 9:53 pm #
    
    Hey, so I am interested in using this source code in my own project. So how can I cite this?
    
    Reply
    - Jason Brownlee July 17, 2020 at 6:16 am #
      
      Good question, this will help:
      https://machinelearningmastery.com/faq/single-faq/how-do-i-reference-or-cite-a-book-or-blog-post
      
      Reply
mohit July 28, 2020 at 9:39 pm #

hey jason i m getting error
—————————————————————————
KeyError Traceback (most recent call last)
in ()
142 print(‘Descriptions: train=%d’ % len(train_descriptions))
143 # photo features
–> 144 train_features = load_photo_features(‘features.pkl’, train)
145 print(‘Photos: train=%d’ % len(train_features))
146 # prepare tokenizer

1 frames
in (.0)
64 all_features = load(open(filename, ‘rb’))
65 # filter features
—> 66 features = {k: all_features[k] for k in dataset}
67 return features
68

KeyError: ‘2855417531_521bf47b50’

Reply
- Jason Brownlee July 29, 2020 at 5:51 am #
  
  Sorry to hear that, this will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Manan Kumawat July 30, 2020 at 2:49 am #

Hi jason,

AWS ec2 instance don’t have sufficient memory to extract and install CUDA for tensorflow.

Reply
- Jason Brownlee July 30, 2020 at 6:27 am #
  
  No need, if you use an existing instance, no need to install cuda as it is already installed and ready to use:
  https://machinelearningmastery.com/develop-evaluate-large-deep-learning-models-keras-amazon-web-services/
  
  Reply
- Fatma July 31, 2020 at 7:57 pm #
  
  Hello Jason
  
  I’m preparing photo data and getting this error can you resolve this.
  
  tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[25088,4096] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator mklcpu [Op:RandomUniform]
  
  Reply
  - Jason Brownlee August 1, 2020 at 6:09 am #
    
    The error suggests you have run out of memory.
    
    Perhaps try running on a machine with more memory, like an EC2 instance.
    
    Reply
    - Manan Kumawat August 1, 2020 at 12:48 pm #
      
      Yeah, I’m running it on EC2 instance 1gb ram free instance.
      
      Reply
      - Jason Brownlee August 1, 2020 at 1:29 pm #
        
        You may need more system memory, e.g. try the instance described here:
        https://machinelearningmastery.com/develop-evaluate-large-deep-learning-models-keras-amazon-web-services/
Anubhav August 1, 2020 at 3:24 am #

Hi Jason,

Thanks for another great article. You have done it again. I’d not be far off saying that you are one of my inspirations on my journey to Machine Learning. I have followed your articles for quite a while, ranging from small queries to entire topics like this one.

I was curious about how this model performs on Flickr30K dataset. Unfortunately using same hyper-parameters led to the model output just 1 sentence overall. The sentence being ” man in blue shirt is sitting on the ground with his arms crossed end end of the ground end end of the ground end end of the ground end end of the ground end end of the ground end end of the ground end end of the ground end end of the ground end end of the ground end end of the ground end end of the ground end end of the ground end”. (The max length for this dataset was 74).

I have used the progressive loading using generators as instructed by your tutorial. Could you point out any possible reasons why this would be happening?

PS: I haven’t run the model on Flickr8k to confirm that my implementation is correct or not.

Best!

Reply
- Jason Brownlee August 1, 2020 at 6:15 am #
  
  Thanks.
  
  Perhaps the model is overfit, try tuning the learning hyperparameters, or perhaps even a larger model.
  
  Reply
siddhant kandge August 5, 2020 at 1:56 am #

I have copy pasted same code but with inception model. It giving me a mismatch error.
Can you please slove my issue

Reply
- Jason Brownlee August 5, 2020 at 6:17 am #
  
  This will help:
  https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
  
  Reply
Ranganadh August 6, 2020 at 7:58 pm #

I am getting this error please help me to resolve
ValueError: could not broadcast input array from shape (47,1000) into shape (47)

Reply
- Jason Brownlee August 7, 2020 at 6:25 am #
  
  Sorry to hear that, this will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Ranganadh August 7, 2020 at 8:48 pm #

I am getting this one please help

ValueError: No gradients provided for any variable: [’embedding_8/embeddings:0′, ‘dense_24/kernel:0’, ‘dense_24/bias:0’, ‘lstm_8/lstm_cell_8/kernel:0’, ‘lstm_8/lstm_cell_8/recurrent_kernel:0’, ‘lstm_8/lstm_cell_8/bias:0’, ‘dense_25/kernel:0’, ‘dense_25/bias:0’, ‘dense_26/kernel:0’, ‘dense_26/bias:0’].

Reply
- Jason Brownlee August 8, 2020 at 6:00 am #
  
  I’m sorry to hear that, this will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - Ranganadh August 8, 2020 at 2:23 pm #
    
    I have tried that on colab but displays that error
    
    Reply
    - Jason Brownlee August 9, 2020 at 5:32 am #
      
      I recommend not using colab.
      
      Reply
      - Ranganadh August 9, 2020 at 3:05 pm #
        
        First I tried on Jupiter but I got error then tried on colab other than that?
Akash K August 18, 2020 at 3:11 am #

I’m getting a strange error.

While training X1train, X2train and ytrain using “create_sequences” function, the following error is popping up.

Traceback (most recent call last):

File “”, line 1, in
X1train, X2train, ytrain = create_sequences(tokenizer, max_length, train_descriptions, train_features, vocab_size)

File “”, line 21, in create_sequences
return array(X1), array(X2), array(y)

TypeError: array() argument 1 must be a unicode character, not list

Could you please help me with this?

Reply
- Jason Brownlee August 18, 2020 at 6:07 am #
  
  Sorry to hear that, this may help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - Akash K August 18, 2020 at 6:22 am #
    
    Are you able to confirm that your libraries are up to date (e.g, check version numbers)? – Yes, they’re up to date.
    
    Are you able to confirm that you copied all of the code exactly (preserving white space)? – Definitely
    
    Are you able to confirm that you saved any required data files in the same folder as the code? – Yes
    
    Have you tried running the code from the command line, not a notebook or an IDE? – Yes sir.
    
    Have you tried searching for a similar error in the comments or on StackOverflow? – No specific solution found on StackOverflow or Github either.
    
    Please Help!
    
    Reply
    - Jason Brownlee August 18, 2020 at 1:25 pm #
      
      I found the issue and updated the tutorial.
      
      Reply
      - Akash K August 18, 2020 at 2:01 pm #
        
        Hi Jason,
        
        Just curious about what the issue was. Do you mind sharing the details?
        
        BTW, thank you so much for such a beautiful piece of code.
      - Jason Brownlee August 19, 2020 at 5:54 am #
        
        The preparation of the VGG model required modification due to an API change. E.g. the pop() function on the layers no longer did anything.
Ram August 19, 2020 at 2:10 am #

When defining model with code

model = define_model(vocab_size, max_length)

Got error

Traceback (most recent call last):

File “”, line 1, in
model = define_model(vocab_size, max_length)

File “”, line 3, in define_model
inputs1 = input(shape=[4096,])

TypeError: raw_input() got an unexpected keyword argument ‘shape’

Help with this please.

Reply
- Jason Brownlee August 19, 2020 at 6:05 am #
  
  This will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Anisa August 25, 2020 at 4:44 pm #

I am facing Name Error for the below code while fitting the model. Please help me resolve this.
# define the captioning model
def define_model(vocab_size, max_length):

# define checkpoint callback
filepath = ‘model-ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5′
checkpoint = ModelCheckpoint(filepath, monitor=’val_loss’, verbose=1, save_best_only=True, mode=’min’)

# fit model
model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

NameError Traceback (most recent call last)
in
1 # fit model
—-> 2 model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

NameError: name ‘model’ is not defined

Please help me fix this .

Reply
- Jason Brownlee August 26, 2020 at 6:45 am #
  
  I’m sorry to hear that, this will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - Anisa August 26, 2020 at 6:02 pm #
    
    Thanks Jason, I tried that.
    I’m sorry to say, but stuck with this now:
    
    MemoryError Traceback (most recent call last)
    in
    152 print(‘Description Length: %d’ % max_length)
    153 # prepare sequences
    –> 154 X1train, X2train, ytrain = create_sequences(tokenizer, max_length, train_descriptions, train_features, vocab_size)
    155
    156 # dev dataset
    
    in create_sequences(tokenizer, max_length, descriptions, photos, vocab_size)
    107 X2.append(in_seq)
    108 y.append(out_seq)
    –> 109 return array(X1), array(X2), array(y)
    110
    111 # define the captioning model
    
    MemoryError: Unable to allocate 4.68 GiB for an array with shape (306404, 4096) and data type float32
    
    How can this be solved?
    
    Reply
    - Jason Brownlee August 27, 2020 at 6:12 am #
      
      It looks like you ran out of memory.
      
      Try running on a machine with more memory.
      Try running on AWS EC2.
      Try using a smaller model or less data.
      
      Reply
JG August 29, 2020 at 6:35 pm #

Hi Jason:

Awesome tutorial! It’s an incredible achievement of Computer Vision and NLP techniques!
Thank you Jason it’s very inspirational!

I share my comments and results.

1) Image caption HORIZON:
– For when a tutorial that gives voice to the predicted captions of the images?
– I guess these techniques can also be applied as alternative to multi-label image classification, isn’t it?
– Also It could be useful to explicit the image pattern recognition obtained during the “solo” computer vision training, by labelling the image through captioning them, do you agree?

2) MINOR COMMENTS:
– There are only 8,091 Images in image dataset folder but we got 8092 from text Image description: the missing image is: “2258277193_586949ec62.jpg”
– I realise that you do not apply “stopwords” in cleaning text process. Why?
– your manual data-generator does not admit to setup parallel process (workers/threads arguments) while fitting keras model (model.fit_generator)

3) I EXPERIMENT with YOUR BASELINE MODEL
– I reduced the vocabulary length by filtering-out words that repeat a minimum number of times (e.g. 5 times). I got worse results using 10 times repetition but, a faster code.
– I apply Glove pre-trained words library within Embedding layer training. The results are better.
– I tried to apply Conv1D besides MaxPool1D layers following the embedding layers. But I get worse results. Probably because I can not use mask_zero=True, because Conv1D does not support it.
– I realised there are other alternatives to word coding “texts_to_sequences” such as “texts_to_matrix”, or “one_hot” but as you do I do not apply either.
– I have 16 GB RAM memory on my Mac but I can run the whole dataset training but, surprisingly the code was quicker using your data generator 11minutes/epoch vs 15minutes/epoch. I guess because it works around RAM limit.
– I also applied validation data on training generation dataset. So I can apply directly the best_model from callbacks list (for progressive training).
– In addition to VGG16 Model I applied “Inceptionv3” app. The results are better than VGG16.
– I complete the code, including the output Inceptionv3 prediction within my code. So I can plot the images examples with X-axis = image caption prediction andY-axis = the Image Inceptionv3 Class prediction. So you can get a more complete vision of the results obtained using Computer vision alone plus Image Caption.
– I select a lower learning rate for the epochs training, to get a lower losses.
– Anyway, the dog image example I think it is a very simple case because, when I try other images examples, sometimes the image captions are very funny and you get a completely crazy caption !

– I believe the most singular contribution of my code experiment was replacing the “add” layer (were the two models merge) vs “Multiply” or “Subtract” layers. I got the best results with “Subtract” layer (subtracting the outputs of the NLP model from Image Features model). I do not know why the model performs better with subtract layer.
– I got for image example: “black and white dog is running on the beach” besides, the “Border_Collie” from Inceptionv3 Image prediction-. With very similars BLEU scores to you.
BLEU-1 = 0.570
BLEU-2 = 0.343
BLEU-3 = 0.133
BLEU-4 = 0.133

Jason your Tutorial collection is astonish and also because you are a great teacher. Thank you.

Reply
- Jason Brownlee August 30, 2020 at 6:34 am #
  
  Thanks JG.
  
  Images in a database can be described, then humans can search the database for images that match their free form requirements.
  
  Stop words increase the complexity of the problem and don’t add a lot of semantic meaning. You can add them back if you like.
  
  Very cool experiments, thank you so much for sharing!
  
  Reply
Dominique August 31, 2020 at 3:43 pm #

Dear Jason,

I have just finished the reading of your book « Deep Learning for Natural Language Processing». It’s excellent, contains tons of information and I thank you for this journey in NLP.

I have posted my review on your book here: http://questioneurope.blogspot.com/2020/08/deep-learning-for-natural-language.html

Dominique

Reply
- Jason Brownlee September 1, 2020 at 6:25 am #
  
  Thanks, well done!
  
  Reply
sami September 3, 2020 at 1:05 am #

hey jason,
I am having a problem,
TypeError Traceback (most recent call last)
in ()
168 generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
169 # fit for one epoch
–> 170 Model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
171 # save model
172 model.save(‘model_’ + str(i) + ‘.h5’)

/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
89 warnings.warn(‘Update your ' + object_name + ' call to the ‘ +
90 ‘Keras 2 API: ‘ + signature, stacklevel=2)
—> 91 return func(*args, **kwargs)
92 wrapper._original_function = func
93 return wrapper

TypeError: fit_generator() missing 1 required positional argument: ‘generator’

can you suggest a solution please

Reply
- Jason Brownlee September 3, 2020 at 6:09 am #
  
  Sorry to hear that, the tips here may help:
  https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
  
  Reply
Ali September 8, 2020 at 9:42 pm #

Hi. I have a question how can we add attention machenisim to this code or on this model.??

Thanks

Reply
- Jason Brownlee September 9, 2020 at 6:48 am #
  
  Not off hand.
  
  Reply
Prashant Aryal September 9, 2020 at 1:25 am #

Hello Jason,
Would you please tell me how to do this thing on Googlecolab? I am having hard time with “extract_features” function in preparation section.

Reply
- Jason Brownlee September 9, 2020 at 6:52 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/do-code-examples-run-on-google-colab
  
  Reply
Drishti September 12, 2020 at 2:41 am #

Hey I’m having this problem. Can you help me?

Traceback (most recent call last):
File “load.py”, line 72, in
train_features = load_features(‘features.pkl’,train)
File “load.py”, line 33, in load_features
features = {k: all_features[k] for k in dataset}
File “load.py”, line 33, in
features = {k: all_features[k] for k in dataset}
KeyError: ‘3356642567_f1d92cb81b.jpg’

Reply
- Jason Brownlee September 12, 2020 at 6:19 am #
  
  Sorry to hear that, the tips here may help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Reshma Jindal September 13, 2020 at 5:53 am #

epochs = 20
steps = len(train_descriptions)
for i in range(epochs):
# create the data generator
generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
# fit for one epoch
model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
# save model
model.save(‘model_’ + str(i) + ‘.h5’)

In the above code, there are two epoch initialzation , epochs=20 and epochs=1 (in fit_generator), can you describe what both of them actually mean with help of a rough example that can distinguish the significance of both ?

Reply
- Jason Brownlee September 13, 2020 at 6:14 am #
  
  There’s no contradiction. We are enumerating the epochs manually.
  
  E.g. the first is our manual outer loop, the second is the inner call to the Keras API to run one epoch.
  
  Reply
  - pankaj October 8, 2020 at 7:02 pm #
    
    while running i got
    model_1.h5
    model_2.h5 like that till model_19.h
    
    but non of file made like model-ep002-loss3.245-val_loss3.612.h5. where I can find this file.
    
    Reply
    - Jason Brownlee October 9, 2020 at 6:41 am #
      
      Your file names will be different. Use the files you have.
      
      Reply
Drishti September 14, 2020 at 7:57 pm #

Hello once again
When I trained the model under progressive loading then in the first epoch only I’m getting NaN loss. Why would that be so?

Reply
- Jason Brownlee September 15, 2020 at 5:21 am #
  
  Perhaps some of these tips will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - Drishti September 15, 2020 at 6:25 pm #
    
    I have looked for all of them but then also the loss is coming about to be Nan. Any other suggestions about why this would be happening?
    
    Reply
    - Jason Brownlee September 16, 2020 at 6:21 am #
      
      Only that library versions are not up to date or step was skipped in data preparation or modeling from the tutorial.
      
      Reply
Todd September 21, 2020 at 3:11 am #

Hello, first thanks for this great article!i’ve learnt alot from your code.

i tried it with my laptop but somehow i have a error when fitting the model(something like failed to call GraphViz),, i have it installed but the error is not gone…

just wondering if i can download the final model somewhere so i can try it with my own pics?

Thanks.

Reply
- Jason Brownlee September 21, 2020 at 8:13 am #
  
  Yes, you can install pydot and pygraphviz, or comment out the call to the plot_model – which is not needed to complete the tutorial.
  
  Reply
  - todd September 21, 2020 at 5:37 pm #
    
    Thanks Jason, that worked!
    
    Reply
    - Jason Brownlee September 22, 2020 at 6:43 am #
      
      Well done!
      
      Reply
  - Todd September 22, 2020 at 3:58 am #
    
    Also, i’m thinking if it is possible to have algorithms like YOLO that will capture objects in a pic, and then create captions from these object tags?
    like we detect one man , a dog ,grass, we may infer the caption as ‘ a man playing with a dog in a park’?
    
    Reply
    - Jason Brownlee September 22, 2020 at 6:54 am #
      
      Yes, you can use two models in that way.
      
      Reply
      - Maverick December 2, 2023 at 6:33 am #
        
        Hey, Can you please me to execute the code for image captioning using object detection model YOLO.
      - James Carmichael December 2, 2023 at 11:33 am #
        
        Hi Maverick…The following resource may be of interest to you:
        
        https://machinelearningmastery.com/how-to-perform-object-detection-with-yolov3-in-keras/
      - Maverick December 3, 2023 at 5:49 am #
        
        Hey James..I have implemented object detection model but I am not able to incorporate image captioning model with YOLO to generate caption. Can you please help me with this.
Ashutosh Garg September 30, 2020 at 8:14 am #

Hi, thanks for the article.

I followed it and trained the model but at the time of inferencing and generating new captions, I get the error

CUDNN_STATUS_BAD_PARAM.

I am running it on google colab with gpu enabled, in cpu, it might take centuries to run.

Please help of possible.

Reply
- Jason Brownlee September 30, 2020 at 2:14 pm #
  
  Sorry to hear that, I can’t help you with google colab:
  https://machinelearningmastery.com/faq/single-faq/do-code-examples-run-on-google-colab
  
  Reply
- Richard October 7, 2020 at 9:46 pm #
  
  Hey Ashutosh, how did you get started with google colab?
  I am trying to replicate the same code in colab but not getting even the pickle file of features extracted.
  
  Reply
pankaj October 8, 2020 at 7:08 pm #

Hi Jason,
Thank you for wonder full explanation.
I struct in some where
# fit for one epoch
model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
this line giving error below

File “F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\func_graph.py”, line 973, in wrapper
raise e.ag_error_metadata.to_exception(e)

ValueError: in user code:

F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:806 train_function *
return step_function(self, iterator)
F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:796 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:1211 run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2585 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2945 _call_for_each_replica
return fn(*args, **kwargs)
F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:789 run_step **
outputs = model.train_step(data)
F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:757 train_step
self.trainable_variables)
F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:2737 _minimize
trainable_variables))
F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py:562 _aggregate_gradients
filtered_grads_and_vars = _filter_grads(grads_and_vars)
F:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py:1271 _filter_grads
([v.name for _, v in grads_and_vars],))

ValueError: No gradients provided for any variable: [’embedding_5/embeddings:0′, ‘dense_15/kernel:0’, ‘dense_15/bias:0’, ‘lstm_5/lstm_cell_5/kernel:0’, ‘lstm_5/lstm_cell_5/recurrent_kernel:0’, ‘lstm_5/lstm_cell_5/bias:0’, ‘dense_16/kernel:0’, ‘dense_16/bias:0’, ‘dense_17/kernel:0’, ‘dense_17/bias:0’].

Reply
Priyanka Digambar Pawar October 9, 2020 at 8:25 pm #

I am facing this error:

ValueError: No model found in config file.
in ()

179 generator = data_generator(train_descriptions, train_features, tokenizer, max_length)

180 # fit for one epoch

–> 181 model.fit_generator(generator, epochs=1, steps_per_epoch=len(train_descriptions), verbose=1)
182 # save model
183 model.save(‘model_’ + str(i) + ‘.h5’)

12 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
971 except Exception as e: # pylint:disable=broad-except
972 if hasattr(e, “ag_error_metadata”):
–> 973 raise e.ag_error_metadata.to_exception(e)
974 else:
975 raise

How should i solve this?

Reply
- Jason Brownlee October 10, 2020 at 7:05 am #
  
  Sorry to hear that, perhaps some of these suggestions will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Manish October 10, 2020 at 11:59 am #

Hello jason,

I’ve been trying to compile the code for extracting the features from the image. I am getting the following error:

PermissionError Traceback (most recent call last)
in
30 # extract features from all images
31 directory = ‘Flickr8k_Dataset’
—> 32 features = extract_features(directory)
33 print(‘Extracted Features: %d’ % len(features))
34 # save to file

in extract_features(directory)
12 # load an image from file
13 filename = directory + ‘/’ + name
—> 14 image = load_img(filename, target_size=(224, 224))
15 # convert the image pixels to a numpy array
16 image = img_to_array(image)

~\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\keras\preprocessing\image.py in load_img(path, grayscale, color_mode, target_size, interpolation)
299 “””
300 return image.load_img(path, grayscale=grayscale, color_mode=color_mode,
–> 301 target_size=target_size, interpolation=interpolation)
302
303

~\AppData\Local\Programs\Python\Python37\lib\site-packages\keras_preprocessing\image\utils.py in load_img(path, grayscale, color_mode, target_size, interpolation)
111 raise ImportError(‘Could not import PIL.Image. ‘
112 ‘The use of load_img requires PIL.’)
–> 113 with open(path, ‘rb’) as f:
114 img = pil_image.open(io.BytesIO(f.read()))
115 if color_mode == ‘grayscale’:

PermissionError: [Errno 13] Permission denied: ‘Flickr8k_Dataset/Flicker8k_Dataset’

I have tried changing the permission for the folder by giving it full access. But the error seems to persist. I ran the next part of the code which extracts the descriptions of the images and it ran without any errors. I’m working with jupyter notbeook in Visual Studio code.

Thank you!

Reply
- Jason Brownlee October 10, 2020 at 1:54 pm #
  
  Looks like you do not have permission on your workstation to access the dataset.
  
  Maybe talk to your admin or check the help documentation for your operating system.
  
  Reply
  - Manish October 11, 2020 at 11:22 am #
    
    I run this code on my personal machine. So, I don’t know what you meant by admin. Do you have any suggestions for softwares where I can run this code?
    
    Reply
    - Jason Brownlee October 12, 2020 at 6:37 am #
      
      The error suggests you do not have permission to access files on your machine, it suggests you were using a work machine controlled by someone else.
      
      If you have control over your machine, give yourself permission to access the files, or place the files in a location where you can access them with permissions.
      
      Sorry, I don’t know a thing about windows permission administration, I have not used the operating system.
      
      Reply
      - Manish October 12, 2020 at 3:30 pm #
        
        Thank you, Jason!
        
        I had one small question. Could you let me know which IDE you have used to run this code? Do you have any suggestions where you think this code will run better?
      - Jason Brownlee October 13, 2020 at 6:31 am #
        
        Yes, I use sublime text editor and run examples from the command line:
        https://machinelearningmastery.com/faq/single-faq/how-do-i-run-a-script-from-the-command-line
- Manish October 15, 2020 at 4:25 am #
  
  Hello Jason,
  
  Im getting the following error:
  
  UnidentifiedImageError: cannot identify image file
  
  Any idea how can I solve this?
  
  Reply
  - Jason Brownlee October 15, 2020 at 6:19 am #
    
    I have not seen this error before, sorry.
    
    Reply
  - K.Apoorva January 13, 2021 at 8:03 pm #
    
    same error, any idea how to resolve?
    
    Reply
Md Shihab Uddin October 11, 2020 at 6:05 am #

Hello, I get this error.how can I solve it?

ValueError: Input 0 of layer dense_18 is incompatible with the layer: expected axis -1 of input shape to have value 4096 but received input with shape [1, 1000]

Reply
- Jason Brownlee October 11, 2020 at 7:01 am #
  
  These tips may help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
- Shavy Ghai December 6, 2020 at 4:43 am #
  
  @Md Shihab Uddin, I am facing the same error. Did you solve it?
  @Jason Brownlee, Could you help me to solve this error?
  
  Reply
  - Jason Brownlee December 6, 2020 at 7:10 am #
    
    The code works as is, see this:
    https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
    
    Reply
    - Shavy Ghai December 8, 2020 at 11:55 pm #
      
      @Jason Brownlee, I am still facing the same issue. Please help me to solve this issue.
      
      Reply
Sam Ogbonnaya October 16, 2020 at 1:44 am #

Hi,

You noted the following in your article regarding evaluating the models:

“Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.”

I can understand how the final model during training i.e. weights, loss etc may vary as you’ve stated. However, when I run the same test image through the trained model, I obtain a different prediction every time – is this due to the same reason as above? or some other reason – shouldn’t the final output always be the same for the same image?

Would appreciate your explanation.

I’m using the evaluation procedure from here: https://www.tensorflow.org/tutorials/text/image_captioning#caption

Thanks!

Reply
- Jason Brownlee October 16, 2020 at 5:56 am #
  
  A trained model should make the same prediction each time. If it does not, check your code – perhaps you are accidentally training or there is a bug.
  
  Reply
  - Sam Ogbonnaya October 17, 2020 at 2:16 am #
    
    Thanks. I’ve spotted the issue.
    
    The evaluation procedure from the TensorFlow tutorial uses the line below to convert the probability to an integer. As it’s random, it always generates random predictions. By changing the evaluation method to argmax instead, I get consistent predictions
    
    from tensorflow:
    predicted_id = tf.random.categorical(predictions, 1)[0][0].numpy()
    
    Update:
    predicted_id = tf.expand_dims(tf.argmax(predictions, -1), 0).numpy()[0][0]
    
    Reply
    - Jason Brownlee October 17, 2020 at 6:11 am #
      
      Happy to hear you solved your problems.
      
      Reply
Manish October 21, 2020 at 4:52 am #

Hello Jason,

I’ve trained the model with progressive loading for 3 epochs. When I used a new image to generate captions, it gave me the accurate caption. But everytime I use an image that has beach in it, I get the same caption, “man in red shirt is standing on the beach”, even if there no man in the image. I tried re fitting the model. But the issue is the same. Do you have any suggestions on how to improve the accuracy?

Reply
- Jason Brownlee October 21, 2020 at 6:46 am #
  
  Perhaps try retraining the model.
  Perhaps try a smaller learning rate.
  
  Reply
  - Manish October 23, 2020 at 4:15 am #
    
    Do you mean use a smaller number of epochs?
    
    Reply
    - Jason Brownlee October 23, 2020 at 6:16 am #
      
      No. But perhaps try that too.
      
      Reply
      - Manish October 24, 2020 at 8:07 am #
        
        How can I change the learning rate?
      - Jason Brownlee October 24, 2020 at 12:41 pm #
        
        If you are new to tuning the learning rate, I recommend starting here:
        https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/
Moksh Grover October 27, 2020 at 5:39 am #

model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))

Its showing that my GPU is performing tasks but by epoch is stuck at 1st one and it isn’t showing any progress.
I have an RTX 2070 Max-Q GPU and an i7 Processor
can anyone help me out?

Reply
- Jason Brownlee October 27, 2020 at 6:50 am #
  
  Perhaps some of these tips will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Mahavirbha October 29, 2020 at 1:08 am #

i’m getting this error after completing 1 epochs, while running Progressive loading code example.
ValueError: Failed to find data adapter that can handle input: ,
what should I do? please help me

Reply
- Mahavirbha October 29, 2020 at 1:09 am #
  
  ValueError: Failed to find data adapter that can handle input: , this error
  
  Reply
- Jason Brownlee October 29, 2020 at 8:04 am #
  
  Sorry to hear that, this may give you some ideas:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Rohit Kushwaha November 10, 2020 at 7:49 pm #

hello sir, while running the below code after some execution i am getting the error:

# extract features from each photo in the directory
def extract_features(directory):
# load the model
model = VGG16()
# re-structure the model
model = Model(inputs=model.inputs, outputs=model.layers[-2].output)
# summarize
#print(model.summary())

# extract features from each photo
features = dict()
for name in listdir(directory):
# load an image from file
filename = directory + ‘/’ + name
image = load_img(filename, target_size=(224, 224))
# convert the image pixels to a numpy array
image = img_to_array(image)
# reshape data for the model
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
# prepare the image for the VGG model
image = preprocess_input(image)
# get features
feature = model.predict(image, verbose=0)
# get image id
image_id = name.split(‘.’)[0]
# store feature
features[image_id] = feature
print(‘>%s’ % name)
return features

# extract features from all images
directory = ‘/content/drive/My Drive/Flickr_Data/Images/’
features = extract_features(directory)
print(‘Extracted Features: %d’ % len(features))
# save to file
dump(features, open(‘features.pkl’, ‘wb’))

This is the error comes after some execution with my image file. I could not able to figure out. please help me!!!

UnidentifiedImageError Traceback (most recent call last)
in ()
32 # extract features from all images
33 directory = ‘/content/drive/My Drive/Flickr_Data/Images/’
—> 34 features = extract_features(directory)
35 print(‘Extracted Features: %d’ % len(features))
36 # save to file

3 frames
/usr/local/lib/python3.6/dist-packages/PIL/Image.py in open(fp, mode)
2860 warnings.warn(message)
2861 raise UnidentifiedImageError(
-> 2862 “cannot identify image file %r” % (filename if filename else fp)
2863 )
2864

UnidentifiedImageError: cannot identify image file

Reply
- Jason Brownlee November 11, 2020 at 6:44 am #
  
  I’m sorry to hear that, I have some suggestions here that may help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
- vi December 6, 2021 at 6:40 pm #
  
  hi, have you solved this problem? i got this error too
  
  Reply
Jay Trivedi November 21, 2020 at 10:24 am #

I was unable to train the model normal way even using AWS m5.2xlarge (32gig). So I tried the generator variant but it shows the error when I’m fitting it

WARNING:tensorflow:From project1.py:339: Model.fit_generator (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
Please use the Model. fit, which supports generators.
Traceback (most recent call last):
File “project1.py”, line 339, in
model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py”, line 324, in new_func
return func(*args, **kwargs)
File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py”, line 1829, in fit_generator
initial_epoch=initial_epoch)
File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py”, line 108, in _method_wrapper
return method(self, *args, **kwargs)
File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py”, line 1098, in fit
tmp_logs = train_function(iterator)
File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py”, line 780, in __call__
result = self._call(*args, **kwds)
File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py”, line 823, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py”, line 697, in _initialize
*args, **kwds))
File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py”, line 2855, in _get_concrete_function_internal_garbage_collected
graph_function, _, _ = self._maybe_define_function(args, kwargs)
File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py”, line 3213, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py”, line 3075, in _create_graph_function
capture_by_value=self._capture_by_value),
File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py”, line 986, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py”, line 600, in wrapped_fn
return weak_wrapped_fn().__wrapped__(*args, **kwds)
File “/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py”, line 973, in wrapper
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:806 train_function *
return step_function(self, iterator)
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:796 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:1211 run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:2585 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:2945 _call_for_each_replica
return fn(*args, **kwargs)
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:789 run_step **
outputs = model.train_step(data)
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:757 train_step
self.trainable_variables)
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:2737 _minimize
trainable_variables))
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:562 _aggregate_gradients
filtered_grads_and_vars = _filter_grads(grads_and_vars)
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:1271 _filter_grads
([v.name for _, v in grads_and_vars],))

ValueError: No gradients provided for any variable: [’embedding/embeddings:0′, ‘dense/kernel:0’, ‘dense/bias:0’, ‘lstm/lstm_cell/kernel:0’, ‘lstm/lstm_cell/recurrent_kernel:0’, ‘lstm/lstm_cell/bias:0’, ‘dense_1/kernel:0’, ‘dense_1/bias:0’, ‘dense_2/kernel:0’, ‘dense_2/bias:0’].

Please give me a solution…….. I have to meet deadlines

Reply
- Jason Brownlee November 21, 2020 at 1:04 pm #
  
  Sorry to hear that you are having trouble, I can confirm the code works as described. Here are some suggestions:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - Venkata naresh suddula December 7, 2020 at 6:45 am #
    
    you can moddify the code of data generators return from
    
    this yield [[in_img, in_seq], out_word)
    
    to
    
    yield ([in_img, in_seq], out_word)
    
    Reply
- Jay Trivedi November 27, 2020 at 10:06 am #
  
  How can I get flickr30k dataset? I searched a lot but couldn’t find it. The link they provide after filling up the form is broken and says forbidden. Can anyone have the zip file that I can get directly
  
  Reply
  - Jason Brownlee November 27, 2020 at 1:08 pm #
    
    Sorry, I only have a copy of the Flickr8k Dataset
    
    Reply
    - Jay Trivedi November 27, 2020 at 4:23 pm #
      
      I found one on GitHub but I have to create text files by converting them from CSV
      
      Reply
  - vishal venkat December 15, 2020 at 4:10 am #
    
    hai dont worry it is vvv simple
    
    this is the link for the dataset
    
    https://www.kaggle.com/ming666/flicker8k-dataset
    
    go through that link and download the 2gb sized data file …
    
    Reply
    - Jason Brownlee December 15, 2020 at 6:29 am #
      
      The dataset download is linked directly in the tutorial.
      
      Reply
Andrew November 29, 2020 at 7:38 am #

Hi Jason. This tutorial is fantastic thank you. You also have the patience of a saint responding to all of these. Keep up the great work!

I’ve adapted this tutorial to run on some images that are pretty difficult to classify anyway (they are similar).

I’m getting an output but it is the same words for every image. Does this mean the model is overfit or underfit? What are the most likely parameters I should look at changing?

Thanks.

Reply
- Jason Brownlee November 29, 2020 at 8:17 am #
  
  Thanks!
  
  It does sound like the model is overfit.
  
  Perhaps try fitting the model again and compare results.
  Perhaps try early stopping.
  Perhaps try more capacity.
  Perhaps try some of the ideas here:
  https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/
  
  Reply
  - Andrew November 29, 2020 at 10:18 pm #
    
    Thanks Jason. I’ll give it a go.
    
    Reply
    - Jason Brownlee November 30, 2020 at 6:36 am #
      
      Let me know how you go.
      
      Reply
Venkata naresh suddula December 7, 2020 at 6:36 am #

hai jason i got this error when i running the progresive loading on my lap which has 8gb ram

ValueError: No gradients provided for any variable: [’embedding_1/embeddings:0′, ‘dense_3/kernel:0’, ‘dense_3/bias:0’, ‘lstm_1/lstm_cell_1/kernel:0’, ‘lstm_1/lstm_cell_1/recurrent_kernel:0’, ‘lstm_1/lstm_cell_1/bias:0’, ‘dense_4/kernel:0’, ‘dense_4/bias:0’, ‘dense_5/kernel:0’, ‘dense_5/bias:0’].

can u please suggest methe any solution

Reply
- Jason Brownlee December 7, 2020 at 7:38 am #
  
  Perhaps try the “progressive loading” section of the tutorial.
  
  Reply
Venkata naresh suddula December 7, 2020 at 6:46 am #

tahnk u vvv much for u jason for suchh an awsome project

Reply
- Jason Brownlee December 7, 2020 at 7:38 am #
  
  You’re welcome.
  
  Reply
Waqas December 20, 2020 at 9:23 pm #

Is this project independent of tensorflow version?

Reply
- Jason Brownlee December 21, 2020 at 6:39 am #
  
  Yes, it works with many different version, although I recommend using the latest version.
  
  Reply
Waqas December 21, 2020 at 12:59 am #

Please Help
ValueError: Layer model expects 2 input(s), but it received 3 input tensors. Inputs received: [, , ]

Reply
- Waqas December 21, 2020 at 1:01 am #
  
  This occured when i was using progressive loading (less RAM)
  
  Reply
- Jason Brownlee December 21, 2020 at 6:39 am #
  
  Perhaps these tips will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Ahsan December 21, 2020 at 8:55 pm #

Is there a latest (optimized) version of this program, because it has low accuracy. I need help how to fine tune the model, how to reduce vocabulary, and how to use inception instead of vgg16? some snippets would be really helpful, specially the code snippet to use inception instead of vgg16. Thanks any help would be appreciated. This is a great post thanks

Reply
- Jason Brownlee December 22, 2020 at 6:43 am #
  
  Thanks for the suggestion, I look into preparing an updated version.
  
  Reply
Md. Ajwad Akil December 22, 2020 at 9:20 pm #

Hello, I followed every single step but I am getting this error when I fit the model:

ValueError: Layer model expects 2 input(s), but it received 3 input tensors. Inputs received: [, , ]

I don’t understand the reason, as inputs and outputs to the model , this was supplied:
model = Model(inputs=[inputs1, inputs2], outputs=outputs)

Then What is the problem here?
I am running on google colab here.

Reply
- Jason Brownlee December 23, 2020 at 5:33 am #
  
  Perhaps try running on your workstation instead of colab? or on AWS EC2?
  
  Reply
Md. Ajwad Akil December 22, 2020 at 9:23 pm #

Sorry, the error did not come for some reason, here it is again:

ValueError: Layer model expects 2 input(s), but it received 3 input tensors. Inputs received: [, , ]

Reply
- Sahil December 23, 2020 at 7:29 am #
  
  Yes i faced same error. You faced it because you may be using latest tensorflow (v.24). Whereas this code only works with tensorflow 1.x.
  To overcome this problem i used Tensorflow 1.13.1 and keras 2.2.4.
  We really need a latest version which works on tensorflow 2
  
  Reply
  - Jason Brownlee December 23, 2020 at 8:27 am #
    
    This is incorrect.
    
    All code examples have been updated and tested on TensorFlow 2.
    
    I have updated the progressive loading example and changed
    
    yield [[in_img, in_seq], out_word]
    
    1
    
    yield [[in_img, in_seq], out_word]
    
    To
    
    yield [in_img, in_seq], out_word
    
    1
    
    yield [in_img, in_seq], out_word
    
    I have also re-run all examples this morning with the latest version of Keras and Tensorflow without incident.
    
    Please check your library versions using the script in the above tutorial, and ensure you have copied the code correctly.
    
    Reply
AndyTown December 23, 2020 at 6:29 pm #

hey Jason, I followed your tutorial to train a model using progressive loading on the Flickr 30k dataset. I’m using several VMs to train the model using 10 epochs and batch size of 3.

Train time ETA is 48 hours. Even if I decrease number of epochs or change batch size, ETA stays the same! Any explanation or tips to decrease training time?

Thank you in advance for any help you can offer!

Reply
- AndyTown December 23, 2020 at 6:48 pm #
  
  It’s training very slowly and not very well at that. After about 30 hours the current loss is 5.66. I’ve even converted the features dataframe into a Delta Lake table to be read in through the generator, so I’m not quite sure what else I could do to make it better.
  
  Reply
  - Jason Brownlee December 24, 2020 at 5:26 am #
    
    Try an AWS EC2 instance with lots of RAM and then load the entire training dataset into RAM and fit the model. It will be dramatically faster.
    
    Reply
- Jason Brownlee December 24, 2020 at 5:25 am #
  
  Yes, these tips may help:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-speed-up-the-training-of-my-model
  
  Reply
Montaser December 29, 2020 at 1:57 pm #

Hello, I am having some trouble with the model.fit_generator method.

# train the model, run epochs manually and save after each epoch
epochs = 20
steps = len(train_descriptions)
for i in range(epochs):
# create the data generator
generator = data_generator(train_descriptions, train_features, tokenizer, max_length)
# fit for one epoch
model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
# save model
model.save(‘/content/drive/Shareddrives/AITrust/model_’ + str(i) + ‘.h5’)

while running this cell, the following output is generated

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:1844: UserWarning: Model.fit_generator is deprecated and will be removed in a future version. Please use Model.fit, which supports generators.
warnings.warn(‘Model.fit_generator is deprecated and ‘
—————————————————————————
TypeError Traceback (most recent call last)
in ()
6 generator = data_generator(train_descriptions, train_features, tokenizer, max_length)
7 # fit for one epoch
—-> 8 model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
9 # save model
10 model.save(‘/content/drive/Shareddrives/AITrust/model_’ + str(i) + ‘.h5′)

9 frames
/usr/local/lib/python3.6/dist-packages/numpy/core/numeric.py in full(shape, fill_value, dtype, order)
312 if dtype is None:
313 dtype = array(fill_value).dtype
–> 314 a = empty(shape, dtype, order)
315 multiarray.copyto(a, fill_value, casting=’unsafe’)
316 return a

TypeError: ‘function’ object cannot be interpreted as an integer

I am using tensorflow 2.4.0 and keras 2.4.3. I have also changed
out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]
to
out_seq = to_categorical([out_seq], num_classes=vocab_size)

Reply
- Jason Brownlee December 30, 2020 at 6:33 am #
  
  Sorry to hear that.
  
  Perhaps trying copying the complete code example from the end of the section?
  
  Reply
Arun Kumar December 31, 2020 at 9:24 pm #

i have given data like csv consist of descriptions,imgpath, ,which tokenizer i use

Reply
- Jason Brownlee January 1, 2021 at 5:26 am #
  
  Perhaps start wit the tokenizer used above as a starting point.
  
  Reply
Ayush Gupta January 11, 2021 at 1:46 am #

InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Matrix size-incompatible: In[0]: [47,1000], In[1]: [4096,256]
[[node model_2/dense_6/MatMul (defined at :170) ]]
[[gradient_tape/model_2/embedding_2/embedding_lookup/Reshape/_34]]
sanity is not maintained in code, how to resove this?

Reply
- Jason Brownlee January 11, 2021 at 6:19 am #
  
  Sorry to hear that, perhaps these tips will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
- pakshal February 24, 2021 at 5:50 pm #
  
  Hi Ayush, can you solve this error is yes then please provide the solution.
  
  Reply
harsha February 1, 2021 at 6:00 am #

Dataset: 6000
Descriptions: train=6000
Photos: train=6000
Vocabulary Size: 7507

—————————————————————————

TypeError Traceback (most recent call last)

in ()
30 print(‘Vocabulary Size: %d’ % vocab_size)
31 # determine the maximum sequence length
—> 32 max_length = max_length(train_descriptions)
33 print(‘Description Length: %d’ % max_length)
34 # prepare sequences

TypeError: ‘int’ object is not callable

why I am getting this error even though I have copy-pasted the code. How to resolve this issue?

Reply
- Jason Brownlee February 1, 2021 at 6:29 am #
  
  Sorry to hear that, perhaps these tips will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
pakshal February 23, 2021 at 7:32 pm #

# define the model
model = define_model(vocab_size, max_length)
# train the model, run epochs manually and save after each epoch
epochs = 20
steps = len(train_descriptions)
for i in range(epochs):
# create the data generator
generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
# fit for one epoch
model.fit(generator,epochs=20,verbose=1,steps_per_epoch=steps)
# save model
model.save(‘model_’ + str(i) + ‘.h5’)

Model: “model_4”
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_10 (InputLayer) [(None, 34)] 0
__________________________________________________________________________________________________
input_9 (InputLayer) [(None, 4096)] 0
__________________________________________________________________________________________________
embedding_4 (Embedding) (None, 34, 256) 1940224 input_10[0][0]
__________________________________________________________________________________________________
dropout_8 (Dropout) (None, 4096) 0 input_9[0][0]
__________________________________________________________________________________________________
dropout_9 (Dropout) (None, 34, 256) 0 embedding_4[0][0]
__________________________________________________________________________________________________
dense_12 (Dense) (None, 256) 1048832 dropout_8[0][0]
__________________________________________________________________________________________________
lstm_4 (LSTM) (None, 256) 525312 dropout_9[0][0]
__________________________________________________________________________________________________
add_4 (Add) (None, 256) 0 dense_12[0][0]
lstm_4[0][0]
__________________________________________________________________________________________________
dense_13 (Dense) (None, 256) 65792 add_4[0][0]
__________________________________________________________________________________________________
dense_14 (Dense) (None, 7579) 1947803 dense_13[0][0]
==================================================================================================
Total params: 5,527,963
Trainable params: 5,527,963
Non-trainable params: 0
__________________________________________________________________________________________________
Epoch 1/20
—————————————————————————
InvalidArgumentError Traceback (most recent call last)
in ()
9 generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
10 # fit for one epoch
—> 11 model.fit(generator,epochs=20,verbose=1,steps_per_epoch=steps)
12 # save model
13 model.save(‘model_’ + str(i) + ‘.h5’)

6 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
58 ctx.ensure_initialized()
59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
—> 60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
62 if name is not None:

InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Can not squeeze dim[2], expected a dimension of 1, got 7579
[[node categorical_crossentropy/remove_squeezable_dimensions/Squeeze (defined at :11) ]]
[[gradient_tape/model_4/embedding_4/embedding_lookup/Reshape/_34]]
(1) Invalid argument: Can not squeeze dim[2], expected a dimension of 1, got 7579
[[node categorical_crossentropy/remove_squeezable_dimensions/Squeeze (defined at :11) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_36733]

Function call stack:
train_function -> train_function

sir please help

Reply
- Jason Brownlee February 24, 2021 at 5:29 am #
  
  Perhaps these tips will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - pakshal February 24, 2021 at 5:46 pm #
    
    Sir all libraries are seems to update becuase I use google colab.Sir please help me i stuck on this since 15 days
    
    Reply
    - Jason Brownlee February 25, 2021 at 5:26 am #
      
      The above tutorial works on the latest version of libraries.
      
      I don’t know about colab, sorry.
      
      Perhaps try running on your own machine or on AWS EC2 where you can control the environment.
      
      Reply
      - pakshal February 26, 2021 at 6:13 am #
        
        No sir you use “model.fit_generator” but as per the latest update have model.fit. That’s why most of student facing error.
pakshal February 24, 2021 at 6:02 pm #

InvalidArgumentError Traceback (most recent call last)
in ()
9 generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
10 # fit for one epoch
—> 11 model.fit(generator,epochs=1,verbose=1,steps_per_epoch=steps)
12 # save model
13 model.save(‘model_’ + str(i) + ‘.h5’)

Reply
pakshal February 24, 2021 at 6:31 pm #

/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:1844: UserWarning: Model.fit_generator is deprecated and will be removed in a future version. Please use Model.fit, which supports generators.
warnings.warn(‘Model.fit_generator is deprecated and ‘
—————————————————————————
ValueError Traceback (most recent call last)
in ()
6 generator = data_generator(train_descriptions, train_features, tokenizer, max_length, vocab_size)
7 # fit for one epoch
—-> 8 model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
9 # save model
10 model.save(‘model_’ + str(i) + ‘.h5’)

Reply
MBA ASPIRANT February 27, 2021 at 6:51 pm #

Thank you so much for u Jason for such an awsome project

Reply
- Jason Brownlee February 28, 2021 at 4:33 am #
  
  You’re welcome!
  
  Reply
MA Kabir Arif March 4, 2021 at 5:40 pm #

Sir, thanks for your explanation. Can I use this embedding (cnn-rnn) for custom Stack GAN training?

Reply
- Jason Brownlee March 5, 2021 at 5:31 am #
  
  Perhaps try it and see?
  
  Reply
  - MA Kabir Arif March 11, 2021 at 2:23 am #
    
    Okay sir. I will try it soon & let you know the results.
    
    Reply
Shivaprasad Satla March 9, 2021 at 8:45 pm #

I am getting memory error? how to overcome this one

Reply
- Jason Brownlee March 10, 2021 at 4:40 am #
  
  Perhaps try the progressive loading section.
  
  Reply
Ryo March 12, 2021 at 2:12 pm #

Could you tell me what is “FF” short for in the graph?

Reply
- Jason Brownlee March 13, 2021 at 5:24 am #
  
  FF == Feed-forward, e.g. dense layers used to interpret input and make a prediction.
  
  Reply
PAKSHAL SHETH March 14, 2021 at 5:51 pm #

Can we run the code in 8 GB ram without using progressive loading code?

Reply
- Jason Brownlee March 15, 2021 at 5:55 am #
  
  Probably not.
  
  Reply
Bellev March 18, 2021 at 6:37 am #

Hello, I think the line:
max_length = max_length(train_descriptions)

found in several places in the text contains an error – how can the variable and the function have the same name?

That apart, this is a brilliant tutorial, thank you!

Reply
- Jason Brownlee March 19, 2021 at 6:09 am #
  
  Yes, that does look like a bad idea. It can work fine if the function is called before the variable is defined – as it is in this case.
  
  Reply
Jens Br April 5, 2021 at 12:52 am #

Hi,

when I try out code, I get follow error:

File “/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/plaidml/keras/backend.py”, line 1529, in rnn
raise NotImplementedError(‘rnn is not implemented with mask support’)
NotImplementedError: rnn is not implemented with mask support

Can you tell me where the problem is?

many thanks

Reply
- Jason Brownlee April 5, 2021 at 6:12 am #
  
  Sorry to hear that, these tips will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Rohit Kushwaha April 15, 2021 at 1:36 pm #

Hello Sir, i want to plot the graph between training and testing or dev loss but with model.fit_generator i could not able to do so. i have to go with model.fit_generator because while using model.fit i am getting RAM issues so anyhow i have to go with model.fit_generator,
Please tell me how to implement the graph between training and testing.

Thanking youin advance

Reply
- Jason Brownlee April 16, 2021 at 5:29 am #
  
  You may have to write some custom code, I’m not sure off the cuff sorry.
  
  Reply
srz April 30, 2021 at 9:18 am #

Hey Jason,
Thank you for all the posts. They’re heavily informative.
How can I use checkpoints whilst using progressive loading?
I mean the model.fit is inside a loop
so I’m confused.
Thanks again

Reply
- Jason Brownlee May 1, 2021 at 6:00 am #
  
  You’re welcome.
  
  Perhaps experiment with using a callback to save checkpoints?
  
  Reply
srz May 4, 2021 at 4:10 pm #

okay, thanks a lot. worked it out.
Just a thing more, I tried to use a batch size of 64. The accuracy increased but the bleu scores dropped in comparison to when using the batch size of 6000.
Could you please guide me to the possible reason?

Reply
- Jason Brownlee May 5, 2021 at 6:07 am #
  
  I would expect accuracy is not an appropriate measure to use on this dataset, ignore it.
  
  Reply
  - srz May 6, 2021 at 7:33 am #
    
    yes, I read that in your post. Should I not be using batch training then? Since the higher the batch size the lower my bleu score goes. Just wanted to know why is this happening. couldnt get an answer anywhere.
    
    Reply
    - Jason Brownlee May 7, 2021 at 6:23 am #
      
      Perhaps try different configurations and compare the result.
      
      Reply
SK May 13, 2021 at 2:51 pm #

Hi,

Iam trying to run the code in “Prepare Photo Data” section and when I run I get the following error that says model.fit() require model.compile(), however, there is no model compilation in this code fragment.

Traceback (most recent call last): File "extract_image_features.py", line 39, in features = compute_features(IMAGE_DIR) File "extract_image_features.py", line 31, in compute_features feature = model.fit(img, verbose=0) File "/Users/saratk/envs/tf/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1032, in fit self._assert_compile_was_called() File "/Users/saratk/envs/tf/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 2592, in _assert_compile_was_called raise RuntimeError('You must compile your model before ' RuntimeError: You must compile your model before training/testing. Usemodel.compile(optimizer, loss).

I have

tensorflow: 2.4.1
keras: 2.4.0

Thanks,
SK

Reply
- Jason Brownlee May 14, 2021 at 6:20 am #
  
  Sorry to hear that, these tips may help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - Sam May 15, 2021 at 3:17 am #
    
    Sir I’m getting this error can’t resolve it.
    
    ValueError: Layer model_4 expects 2 input(s), but it received 3 input tensors. Inputs received: [, , ]
    
    Reply
    - Jason Brownlee May 15, 2021 at 6:34 am #
      
      This might help:
      https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
      
      Reply
Ankit Nigam May 17, 2021 at 12:03 am #

Hi Jason,

I am running above example and from section – Photo and Caption Dataset, getting the below error

NotFoundError Traceback (most recent call last)
in
1 #extract features from all images
2 directory=’Flicker8k_Dataset’
—-> 3 features=extract_features(directory)
4
5 print (“Extracted features : “, len(features))

in extract_features(directory)
34
35 #get features
—> 36 feature=model.predict(image, verbose=0)
37
38 #get image id

~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\engine\training.py in predict(self, x, batch_size, verbose, steps, callbacks, max_queue_size, workers, use_multiprocessing)
1627 for step in data_handler.steps():
1628 callbacks.on_predict_batch_begin(step)
-> 1629 tmp_batch_outputs = self.predict_function(iterator)
1630 if data_handler.should_sync:
1631 context.async_wait()

~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\def_function.py in __call__(self, *args, **kwds)
826 tracing_count = self.experimental_get_tracing_count()
827 with trace.Trace(self._name) as tm:
–> 828 result = self._call(*args, **kwds)
829 compiler = “xla” if self._experimental_compile else “nonXla”
830 new_tracing_count = self.experimental_get_tracing_count()

~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
893 # If we did not create any variables the trace we have is good enough.
894 return self._concrete_stateful_fn._call_flat(
–> 895 filtered_flat_args, self._concrete_stateful_fn.captured_inputs) # pylint: disable=protected-access
896
897 def fn_with_cond(inner_args, inner_kwds, inner_filtered_flat_args):

~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
1917 # No tape is watching; skip to running the function.
1918 return self._build_call_outputs(self._inference_function.call(
-> 1919 ctx, args, cancellation_manager=cancellation_manager))
1920 forward_backward = self._select_forward_and_backward_functions(
1921 args,

~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager)
558 inputs=args,
559 attrs=attrs,
–> 560 ctx=ctx)
561 else:
562 outputs = execute.execute_with_cancellation(

~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
58 ctx.ensure_initialized()
59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
—> 60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
62 if name is not None:

NotFoundError: No algorithm worked!
[[node model_1/block1_conv1/Relu (defined at :36) ]] [Op:__inference_predict_function_1219]

Function call stack:
predict_function

Could you please give some pointers to resolve the same

Thanks,
Ankit

Reply
- Jason Brownlee May 17, 2021 at 5:38 am #
  
  Sorry to hear that, perhaps some of these tips will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Ankit Nigam May 17, 2021 at 12:45 am #

Hi,

Please ignore the above issue.

It is resolved now.

Thanks,
Ankit

Reply
- Jason Brownlee May 17, 2021 at 5:39 am #
  
  I’m happy to hear that.
  
  Reply
Sam May 17, 2021 at 6:04 am #

hi Jason, here is the link to my code, see the last cell
I have done the same as mentioned in the blog, can you plz help, error coming in the last step only

https://colab.research.google.com/drive/10A4y0t_QO9VWW18D3mNRlD8qpXU4bYNw?usp=sharing

the error coming is : ValueError: Layer model_3 expects 2 input(s), but it received 3 input tensors. Inputs received: [, , ]

Reply
- Jason Brownlee May 18, 2021 at 6:07 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
  
  Reply
Adiya May 24, 2021 at 12:42 pm #

Hi Jason Sir I was hoping you could help me, I am using progressive overloading method to get the best model I get the following error stack

File "Basic_model.py", line 357, in model = define_model(vocab_size, max_length) File "Basic_model.py", line 310, in define_model se3 = LSTM(256)(se2) File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py", line 660, in __call__ return super(RNN, self).__call__(inputs, **kwargs) File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 952, in __call__ input_list) File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1091, in _functional_construction_call inputs, input_masks, args, kwargs) File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call return self._infer_output_signature(inputs, args, kwargs, input_masks) File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 863, in _infer_output_signature outputs = call_fn(inputs, *args, **kwargs) File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent_v2.py", line 1157, in call inputs, initial_state, _ = self._process_inputs(inputs, initial_state, None) File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py", line 859, in _process_inputs initial_state = self.get_initial_state(inputs) File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py", line 643, in get_initial_state inputs=None, batch_size=batch_size, dtype=dtype) File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py", line 2507, in get_initial_state self, inputs, batch_size, dtype)) File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py", line 2987, in _generate_zero_filled_state_for_cell return _generate_zero_filled_state(batch_size, cell.state_size, dtype) File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py", line 3003, in _generate_zero_filled_state return nest.map_structure(create_zeros, state_size) File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/util/nest.py", line 659, in map_structure structure[0], [func(*x) for x in entries], File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/util/nest.py", line 659, in structure[0], [func(*x) for x in entries], File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py", line 3000, in create_zeros return array_ops.zeros(init_state_size, dtype=dtype) File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper return target(*args, **kwargs) File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 2819, in wrapped tensor = fun(*args, **kwargs) File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 2868, in zeros output = _constant_if_small(zero, shape, dtype, name) File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 2804, in _constant_if_small if np.prod(shape) < 1000: File "", line 6, in prod File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 3031, in prod keepdims=keepdims, initial=initial, where=where) File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction return ufunc.reduce(obj, axis, dtype, out, **passkwargs) File "/home/aditya/miniconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 855, in __array__ " a NumPy call, which is not supported".format(self.name)) NotImplementedError: Cannot convert a symbolic Tensor (lstm/strided_slice:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported

Reply
- Jason Brownlee May 25, 2021 at 6:05 am #
  
  Sorry to hear that, perhaps some of these tips will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Prabir May 25, 2021 at 1:44 am #

Hi Jason,

I have followed the steps you mentioned in your blog. After some trials, I am able to run the entire project.
But the issue I am facing is that for whatever image I provide, my model always says either ‘A man in red t-shirt standing on street’ or ‘Two dogs are playing on ground’.

I am unable to understand which section is creating this problem. I am very new in this area. Can you please suggest in which areas might create this problem?

I can share my entire code, if you want to check.

Reply
- Jason Brownlee May 25, 2021 at 6:10 am #
  
  You may need to re-fit the model or tune the model hyperparameters.
  
  Reply
Azaz Ur Rehman Butt June 3, 2021 at 7:55 am #

Hi Jason!

First of all, thanks for this fruitful tutorial, it has helped me a lot and I’ve learnt a lot from this. I want to make an extension by changing the CNN model from VGG16 to InceptionV3. Will it be a better choice? The output of Inception V3 is (None, 2048). Please help me in this regard.

Also, I want to use BERT Transformers Model for word embeddings. Will it be a better choice?

Reply
- Jason Brownlee June 4, 2021 at 6:44 am #
  
  You’re welcome.
  
  It may, perhaps test it and discover it these changes result in better performance.
  
  Reply
Mukesh June 22, 2021 at 8:58 pm #

Can you please upload the model that you trained and used in the example? My system is not really powerful and the progressive loading method is giving out really bad models.So I would really appreciate if you could share your model ‘model-ep002-loss3.245-val_loss3.612.h5’.

Thank You!

Reply
- Jason Brownlee June 23, 2021 at 5:37 am #
  
  Sorry, I cannot.
  
  Reply
Zulfikri Mirza July 2, 2021 at 7:01 pm #

Hi Jason nice article and thank you for giving us something to learn !
a few question tho
1. can this used to video for the captioning?
2. how much photos that it need to take for the sequence to make the caption?

thank you so much
bless you !

Reply
- Jason Brownlee July 3, 2021 at 6:10 am #
  
  Perhaps you can apply the method to frames of a video.
  
  You may need to experiment to discover how much data is required and the best model for your specific dataset.
  
  Reply
Zulfikri Mirza July 3, 2021 at 5:50 pm #

currently got 23k frames data that i got from around 200 videos, i did apply it to it and it works for making model train and evaluating, didnt try the model to the video yet since i didnt see any description for how much frames could it take to make a caption in your article

any suggestion or clue which or where should i look from your code in your article to modified it if my data are video and make the system process the caption after looking from a few frames ?

Once again Thanks a lot !

Reply
- Jason Brownlee July 4, 2021 at 6:01 am #
  
  The example expects images, so perhaps you can provide video frames, or a subset of video frames to the model for prediction.
  
  Reply
  - Zulfikri Mirza July 4, 2021 at 5:37 pm #
    
    i did use this example with my video frames data with my own token etc, but i havent tried the model to video yet since i didnt see how much images(or frames because i input the data with my frames/images from my video) sequence it takes to predict and i am a little bit confused at that part.
    
    but for trying to make the model with my own data(frames from my video), i already did and it works fine,
    
    Reply
    - Jason Brownlee July 5, 2021 at 5:07 am #
      
      Great!
      
      Reply
sam July 15, 2021 at 2:35 am #

thanks a lot for your tutorial. it helped me a lot but excuse me when i print (actual and predicted sentences ) i got the same sentence every time i got the actual from test descriptions . i need to get the new captions that used the model h5 . Thanks a lot

Reply
- Jason Brownlee July 15, 2021 at 5:33 am #
  
  Perhaps re-fitting the model or using a different saved model as the final model?
  
  Reply
  - sam July 29, 2021 at 11:49 am #
    
    Thanks a lot for replying .. yes the model needs more training but I’m using coco dataset and used the same model with number of epochs 100 and learning rate 3e-4 and delete the first dropout layer and got this result
    1 loss: 4.5090 – accuracy: 0.2570 – val_loss: 3.8487 – val_accuracy: 0.3179
    2 loss: 3.7380 – accuracy: 0.3306 – val_loss: 3.6578 – val_accuracy: 0.3398
    3 loss: 3.5954 – accuracy: 0.3449 – val_loss: 3.6016 – val_accuracy: 0.3481
    4 loss: 3.5336 – accuracy: 0.3519 – val_loss: 3.5808 – val_accuracy: 0.3527
    5 loss: 3.4969 – accuracy: 0.3563 – val_loss: 3.5751 – val_accuracy: 0.3555
    6 loss: 3.4720 – accuracy: 0.3595 – val_loss: 3.5733 – val_accuracy: 0.3574
    7 loss: 3.4528 – accuracy: 0.3619 – val_loss: 3.5809 – val_accuracy: 0.3585
    8 loss: 3.4388 – accuracy: 0.3638 – val_loss: 3.5888 – val_accuracy: 0.3595
    9 loss: 3.4283 – accuracy: 0.3655 – val_loss: 3.5965 – val_accuracy: 0.3606
    10 loss: 3.4207 – accuracy: 0.3666 – val_loss: 3.6070 – val_accuracy: 0.3612
    11 loss: 3.4155 – accuracy: 0.3675 – val_loss: 3.6225 – val_accuracy: 0.3615
    12 loss: 3.4112 – accuracy: 0.3688 – val_loss: 3.6362 – val_accuracy: 0.3622
    13 loss: 3.4085 – accuracy: 0.3696 – val_loss: 3.6467 – val_accuracy: 0.3623
    14 loss: 3.4024 – accuracy: 0.3697 – val_loss: 3.6510 – val_accuracy: 0.3617
    15 loss: 3.3850 – accuracy: 0.3702 – val_loss: 3.6485 – val_accuracy: 0.3622
    16 loss: 3.3729 – accuracy: 0.3709 – val_loss: 3.6563 – val_accuracy: 0.3622
    
    Should I wait or the model needs to change ?
    
    Reply
    - Jason Brownlee July 30, 2021 at 6:26 am #
      
      We cannot know what to change to best configure a model, this is a whole area of study, see this:
      https://machinelearningmastery.com/start-here/#better
      
      Reply
Mr B August 1, 2021 at 8:12 pm #

Hi Just wondering if anyone came across this error

TypeError: Dimension value must be integer or None or have an __index__ method, got value ” with type ”

Reply
- Jason Brownlee August 2, 2021 at 4:53 am #
  
  Sorry, I have not seen this before, perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Peter Sun September 1, 2021 at 7:05 pm #

Hi, Jason, I followed through the tutorial from beginning to end. So my best model was the one with val-loss 3.882, and test BLEU-1 score was 0.5461!
I have learned that BLEU-1 score of 0.5 is a state-of-the-art performance, but when I try generating captions with new images of people and animal from the internet, the phrase “man in black shirt is sitting on the sidewalk” keeps coming up for random images. Does this mean that for these images the model does not recognize them at all?

Reply
- Jason Brownlee September 2, 2021 at 5:07 am #
  
  Perhaps the model has overfit, you could try using a different model saved during training or try re-training the model.
  
  Reply
Peter Sun September 3, 2021 at 12:36 pm #

All right! Thanks for the reply! I’ll try using different options!

Reply
- Jason Brownlee September 4, 2021 at 5:15 am #
  
  You’re welcome.
  
  Reply
Naqqash Dilshad September 9, 2021 at 1:55 pm #

Hi Dr. Brownlee

Would you please guide us on how to perform image captioning for a custom dataset? The problem is assigning unique identifiers to the labels of each image. Any kind of help will be appreciated…. thank you

Reply
- Adrian Tam September 11, 2021 at 6:05 am #
  
  Yes, that’s a boring part but you must spend time to do this tagging before you can do anything else.
  
  Reply
Karlo September 18, 2021 at 5:38 am #

While running the code in Google Colab I my runtime stops working because of this message: “Your session crashed after using all available RAM” . does anyone know what might be the reason for it or how to fix it?

Reply
- Adrian Tam September 19, 2021 at 6:22 am #
  
  You exhausted the memory. You either need to use a paid version of Colab, or use another way to run your code.
  
  Reply
Meena Vinaykumar September 19, 2021 at 4:29 pm #

Hi

I have given the flickr dataset without the train, test and val datasets separately. So how do I do the training. Please help

Reply
- Adrian Tam September 20, 2021 at 2:32 pm #
  
  Usually the dataset does not do the split. You have to split the train/test/validation split by yourself. See, for example, sklearn for a utility function: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
  
  Reply
Mandar September 23, 2021 at 4:23 am #

Hi Adrian,
Am getting an error ‘TypeError: ‘int’ object is not callable’ for the line :

# determine the maximum sequence length
max_length = max_length(train_descriptions)

What might be causing this.

Reply
- Adrian Tam September 23, 2021 at 5:35 am #
  
  max_length is a variable or max_length is a function? You are reusing the same name for two purposes.
  
  Reply
AdeN September 25, 2021 at 5:13 pm #

Thank you for the great tutorial. It’s beneficial for me. 🙂
I tried using another model, progressive loading, and added a data validation set using the generator.

Reply
João Gondim September 27, 2021 at 2:29 am #

Hi! Thank you so much for this post!

First, I’m using this code to make some tests on a translated Flickr8k dataset, I intend to publish my findings later, how can I cite you and your website? Standard latex citing the site would be ok?

Second, as told on the article, val_loss reaches minimum values on the very first epochs, but comparing BLEU score, the latest epoch showed better numbers, why do you think this might be happening?

Reply
- Adrian Tam September 27, 2021 at 10:44 am #
  
  (1) Please see: https://machinelearningmastery.com/faq/single-faq/how-do-i-reference-or-cite-a-book-or-blog-post
  (2) These are different metrics. Do you think this can help? https://machinelearningmastery.com/calculate-bleu-score-for-text-python/
  
  Reply
Muhammad Kamran October 29, 2021 at 5:33 am #

Hey Jason, is there any work done on image captioning using conventional machine learning. I am working on report generation for medical images. can you suggest me some literature that i should i review for my thesis?

Reply
- Adrian Tam October 30, 2021 at 12:30 pm #
  
  How do you think about this paper? http://proceedings.mlr.press/v37/xuc15.pdf
  
  Reply
Saurabh Sarkar November 4, 2021 at 4:08 pm #

Hello Jason,

Thanks for this article. when I am trying to pretrain my inception v3 model using the existing 8k flicker dataset that I have. I am getting error:

image_model = og_tf.keras.applications.InceptionV3(include_top=False,weights=’imagenet’)

new_input = image_model.input
hidden_layer = image_model.layers[-1].output

image_features_extract_model = og_tf.keras.Model(new_input,hidden_layer)

for img,path in img_data:
fv = image_features_extract_model(img)

Below is the ERROR:
—————————————————————————
NotFoundError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_81952/3499706949.py in
—-> 1 for img,path in img_data:
2 fv = image_features_extract_model(img)

~\miniconda3\envs\tensor\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py in __next__(self)
759 def __next__(self):
760 try:
–> 761 return self._next_internal()
762 except errors.OutOfRangeError:
763 raise StopIteration

~\miniconda3\envs\tensor\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py in _next_internal(self)
745 self._iterator_resource,
746 output_types=self._flat_output_types,
–> 747 output_shapes=self._flat_output_shapes)
748
749 try:

~\miniconda3\envs\tensor\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py in iterator_get_next(iterator, output_types, output_shapes, name)
2725 return _result
2726 except _core._NotOkStatusException as e:
-> 2727 _ops.raise_from_not_ok_status(e, name)
2728 except _core._FallbackException:
2729 pass

~\miniconda3\envs\tensor\lib\site-packages\tensorflow\python\framework\ops.py in raise_from_not_ok_status(e, name)
6895 message = e.message + (” name: ” + name if name is not None else “”)
6896 # pylint: disable=protected-access
-> 6897 six.raise_from(core._status_to_exception(e.code, message), None)
6898 # pylint: enable=protected-access
6899

~\miniconda3\envs\tensor\lib\site-packages\six.py in raise_from(value, from_value)

NotFoundError: NewRandomAccessFile failed to Create/Open: \Images\1000268201_693b08cb0e.jpg : The system cannot find the path specified.
; No such process
[[{{node ReadFile}}]] [Op:IteratorGetNext]

Any idea why the system is not able to access the path.

Reply
- Adrian Tam November 7, 2021 at 7:39 am #
  
  No idea. Are you messing up the path separators “\” with “/” ?
  
  Reply
Shivam Patel November 7, 2021 at 11:29 pm #

Hello sir, I am getting same caption for all nre images that is “startseq man in blue shirt is standing on the street endseq”.

what is the problem and how can I fix it.

Reply
- Adrian Tam November 14, 2021 at 11:58 am #
  
  Was there a problem on training? I believe the model is degenerated, but not sure what caused it.
  
  Reply
HomaK February 28, 2022 at 2:00 am #

Hi Jason
Thank you for your perfect article
I successfully train and evaluate model but I get this result:
startseq rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed rushed

how can I fix it?

Reply
- James Carmichael February 28, 2022 at 11:58 am #
  
  Hi Homak…Thanks for asking.
  
  I’m eager to help, but I just don’t have the capacity to debug code for you.
  
  I am happy to make some suggestions:
  
  Consider aggressively cutting the code back to the minimum required. This will help you isolate the problem and focus on it.
  Consider cutting the problem back to just one or a few simple examples.
  Consider finding other similar code examples that do work and slowly modify them to meet your needs. This might expose your misstep.
  Consider posting your question and code to StackOverflow.
  
  Reply
HomaK February 28, 2022 at 4:28 pm #

Thanks a lot dear Jason for your answering.
I will follow your recommendations and post the results.
Your website is like a book which I learn many thing from even comments.
best wishes

Reply
Vahid March 29, 2022 at 1:51 am #

Is it possible for you to share the h5 file?

Reply
- James Carmichael March 29, 2022 at 9:58 am #
  
  Hi Vahid…We do not share h5 files, however you may feel free to create one from the source code we provide.
  
  Reply
generator energy May 11, 2022 at 3:33 am #

Such a great information. This is really very helpful for bloggers

Reply
- James Carmichael May 13, 2022 at 12:41 am #
  
  Thank you for the feedback!
  
  Reply
Didem Damka May 16, 2022 at 7:07 am #

Hi James. Thank you for this beautiful tutorial.

I am trying to use the same code with Flickr30K dataset. And also I am computing Bleu and Cider score. It works fine with Flickr8K. I split the the Flickr30K dataset to 29000 train, 1000 validation and 1000 test images. and trained the model. But the model generates the same two sentence for every image in the list. Why does this happen. How can I fix it. Also the Bleu scores are higher the Flickr8K, but the Cider score is too low. I tried to reduce the vocabulary size like in the ” What is the Role of Recurrent Neural Networks (RNNs) in an Image
Caption Generator?” paper. But it doesn’t work. Thank you so much.

Reply
- James Carmichael May 16, 2022 at 8:50 am #
  
  Hi Didem…I have never encountered this issue. The following may help by providing another approach:
  
  https://www.youtube.com/watch?v=fUSTbGrL1tc
  
  Reply
  - Didem Damka May 16, 2022 at 8:32 pm #
    
    OK. I will try this approach. Thank you so much.
    
    Reply
    - Nani May 18, 2022 at 6:54 pm #
      
      Hey how can we identify colors of the object present in the image through captions
      
      Reply
JayC July 12, 2022 at 10:24 am #

Thank you for this excellent example of image captioning!

I am currently working on a project where I need to caption images of playing cards. Importantly, the model needs to capture the ORDER of the playing cards (from left to right).

If I train a CNN LSTM, as in your example, and the captions are correctly formatted (left-right), will this model capture such as spatial relationship?

I.e. is an image captioning model the correct approach for this task?

Reply
- James Carmichael July 13, 2022 at 7:49 am #
  
  Hi JayC…You are very welcome! Explain further what you mean by “capture such as spatial relationship”.
  
  Reply
  - JayC July 14, 2022 at 11:31 am #
    
    Sorry, my question was not very clear.
    
    I train a model, like you described above, on images of playing cards. Each image is captioned, describing the cards in strictly left-to-right order. E.g.
    
    5 of Diamonds — 3 of Clubs — Ace of Hearts
    
    I use the trained model to make a prediction on a new image of 3 playing cards. Will the predicted caption have the correct left-to-right ordering?
    
    I.e. can the image captioning approach you describe learn (in this case left-to-right) spatial relationships?
    
    Reply
    - James Carmichael July 15, 2022 at 8:32 am #
      
      Hi JayC…The answer is yes. I would recommend that you proceed with the model for your application and let us know your findings.
      
      Reply
Xuan December 11, 2022 at 8:02 am #

Hi Jason, this is a very well-written tutorial on caption generation, thank you! All procedures, including data preparation, model architecture, training and evaluation are thoroughly explained in detail using simple terms. For me, it has been a good refreshment for the encoder-decoder architecture. I also appreciate the provided code, from which I learn a lot, especially the code for recursively generating output text.

Now looking back from the end of 2022, I’m curious whether the following could increase the performance of the caption generator:
• Plug in the photo feature extractor and let it be fine-tuned along with training the decoder
• Use a transformer for the decoder

Reply
- James Carmichael December 11, 2022 at 9:31 am #
  
  Great feedback Xuan! We greatly appreciate your support!
  
  Reply
Mazen February 16, 2023 at 3:08 am #

Hi Jason, Hi all
Thanks Jason for this very helpful tutorial!

Could anyone please send a link for downloading their trained model and tokenizer? Then we can try directly the last part (Generate New Captions) without training and saving..

Reply
- Mazen February 28, 2023 at 8:41 pm #
  
  So, I searched for pre-trained models and found and tried this one, which looks very impressive (not only) for captioning: https://github.com/salesforce/LAVIS
  
  Just wanted to share this with all here, as I always benefit from this very helpful website.
  
  Reply
tounes February 20, 2023 at 9:51 am #

hello, can you please tell me if these codes are updated to the latest evolutions in image captioning field , or there are other recouse that are up to date 2023?
please reply to me
thank you.

Reply
- James Carmichael February 20, 2023 at 10:00 am #
  
  Hi tounes…Our content is up to date with stable library levels. Are you having any particular issues with executing the code that we can assist you with?
  
  Reply

Navigation

How to Develop a Deep Learning Photo Caption Generator from Scratch

Develop a Deep Learning Model to Automatically
Describe Photographs in Python with Keras, Step-by-Step.

Tutorial Overview

Python Environment

Need help with Deep Learning for Text Data?

Photo and Caption Dataset

Prepare Photo Data

Prepare Text Data

Develop Deep Learning Model

Loading Data

Defining the Model

Fitting the Model

Complete Example

Train With Progressive Loading

Evaluate Model

Generate New Captions

Extensions

Further Reading

Caption Generation Papers

Flickr8K Dataset

API

Summary

Develop Deep Learning models for Text Data Today!

Develop Your Own Text models in Minutes

Finally Bring Deep Learning to your Natural Language Processing Projects

More On This Topic

1,196 Responses to How to Develop a Deep Learning Photo Caption Generator from Scratch

Leave a Reply Click here to cancel reply.

Navigation

Develop a Deep Learning Model to Automatically Describe Photographs in Python with Keras, Step-by-Step.

Tutorial Overview

Python Environment

Need help with Deep Learning for Text Data?

Photo and Caption Dataset

Prepare Photo Data

Prepare Text Data

Develop Deep Learning Model

Loading Data

Defining the Model

Fitting the Model

Complete Example

Train With Progressive Loading

Evaluate Model

Generate New Captions

Extensions

Further Reading

Caption Generation Papers

Flickr8K Dataset

API

Summary

Develop Deep Learning models for Text Data Today!

Develop Your Own Text models in Minutes

Finally Bring Deep Learning to your Natural Language Processing Projects

More On This Topic

1,196 Responses to How to Develop a Deep Learning Photo Caption Generator from Scratch

Leave a Reply Click here to cancel reply.

Develop a Deep Learning Model to Automatically
Describe Photographs in Python with Keras, Step-by-Step.