How to Train an Object Detection Model with Keras

By Jason Brownlee on September 2, 2020 in Deep Learning for Computer Vision 666

Object detection is a challenging computer vision task that involves predicting both where the objects are in the image and what type of objects were detected.

The Mask Region-based Convolutional Neural Network, or Mask R-CNN, model is one of the state-of-the-art approaches for object recognition tasks. The Matterport Mask R-CNN project provides a library that allows you to develop and train Mask R-CNN Keras models for your own object detection tasks. Using the library can be tricky for beginners and requires the careful preparation of the dataset, although it allows fast training via transfer learning with top performing models trained on challenging object detection tasks, such as MS COCO.

In this tutorial, you will discover how to develop a Mask R-CNN model for kangaroo object detection in photographs.

After completing this tutorial, you will know:

How to prepare an object detection dataset ready for modeling with an R-CNN.
How to use transfer learning to train an object detection model on a new dataset.
How to evaluate a fit Mask R-CNN model on a test dataset and make predictions on new photos.

Kick-start your project with my new book Deep Learning for Computer Vision, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

How to Train an Object Detection Model to Find Kangaroos in Photographs (R-CNN with Keras)
Photo by Ronnie Robertson, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

How to Install Mask R-CNN for Keras
How to Prepare a Dataset for Object Detection
How to a Train Mask R-CNN Model for Kangaroo Detection
How to Evaluate a Mask R-CNN Model
How to Detect Kangaroos in New Photos

Note: This tutorial requires TensorFlow version 1.15.3 and Keras 2.2.4. It does not work with TensorFlow 2.0+ or Keras 2.2.5+ because a third-party library has not been updated at the time of writing.

You can install these specific versions of the libraries as follows:

sudo pip install --no-deps tensorflow==1.15.3
sudo pip install --no-deps keras==2.2.4

1 2	sudo pip install --no-deps tensorflow==1.15.3 sudo pip install --no-deps keras==2.2.4

How to Install Mask R-CNN for Keras

Object detection is a task in computer vision that involves identifying the presence, location, and type of one or more objects in a given image.

It is a challenging problem that involves building upon methods for object recognition (e.g. where are they), object localization (e.g. what are their extent), and object classification (e.g. what are they).

The Region-Based Convolutional Neural Network, or R-CNN, is a family of convolutional neural network models designed for object detection, developed by Ross Girshick, et al. There are perhaps four main variations of the approach, resulting in the current pinnacle called Mask R-CNN. The Mask R-CNN introduced in the 2018 paper titled “Mask R-CNN” is the most recent variation of the family of models and supports both object detection and object segmentation. Object segmentation not only involves localizing objects in the image but also specifies a mask for the image, indicating exactly which pixels in the image belong to the object.

Mask R-CNN is a sophisticated model to implement, especially as compared to a simple or even state-of-the-art deep convolutional neural network model. Instead of developing an implementation of the R-CNN or Mask R-CNN model from scratch, we can use a reliable third-party implementation built on top of the Keras deep learning framework.

The best-of-breed third-party implementations of Mask R-CNN is the Mask R-CNN Project developed by Matterport. The project is open source released under a permissive license (e.g. MIT license) and the code has been widely used on a variety of projects and Kaggle competitions.

The first step is to install the library.

At the time of writing, there is no distributed version of the library, so we have to install it manually. The good news is that this is very easy.

Installation involves cloning the GitHub repository and running the installation script on your workstation. If you are having trouble, see the installation instructions buried in the library’s readme file.

Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Step 1. Clone the Mask R-CNN GitHub Repository

This is as simple as running the following command from your command line:

git clone https://github.com/matterport/Mask_RCNN.git

1	git clone https://github.com/matterport/Mask_RCNN.git

This will create a new local directory with the name Mask_RCNN that looks as follows:

Mask_RCNN
├── assets
├── build
│   ├── bdist.macosx-10.13-x86_64
│   └── lib
│       └── mrcnn
├── dist
├── images
├── mask_rcnn.egg-info
├── mrcnn
└── samples
    ├── balloon
    ├── coco
    ├── nucleus
    └── shapes

Mask_RCNN

├── assets

├── build

│ ├── bdist.macosx-10.13-x86_64

│ └── lib

│ └── mrcnn

├── dist

├── images

├── mask_rcnn.egg-info

├── mrcnn

└── samples

├── balloon

├── coco

├── nucleus

└── shapes

Step 2. Install the Mask R-CNN Library

The library can be installed directly via pip.

Change directory into the Mask_RCNN directory and run the installation script.

From the command line, type the following:

cd Mask_RCNN
python setup.py install

1 2	cd Mask_RCNN python setup.py install

On Linux or MacOS, you may need to install the software with sudo permissions; for example, you may see an error such as:

error: can't create or remove files in install directory

1	error: can't create or remove files in install directory

In that case, install the software with sudo:

sudo python setup.py install

1	sudo python setup.py install

If you are using a Python virtual environment (virtualenv), such as on an EC2 Deep Learning AMI instance (recommended for this tutorial), you can install Mask_RCNN into your environment as follows:

sudo ~/anaconda3/envs/tensorflow_p36/bin/python setup.py install

1	sudo ~/anaconda3/envs/tensorflow_p36/bin/python setup.py install

The library will then install directly and you will see a lot of successful installation messages ending with the following:

...
Finished processing dependencies for mask-rcnn==2.1

1 2	... Finished processing dependencies for mask-rcnn==2.1

This confirms that you installed the library successfully and that you have the latest version, which at the time of writing is version 2.1.

Step 3: Confirm the Library Was Installed

It is always a good idea to confirm that the library was installed correctly.

You can confirm that the library was installed correctly by querying it via the pip command; for example:

pip show mask-rcnn

1	pip show mask-rcnn

You should see output informing you of the version and installation location; for example:

Name: mask-rcnn
Version: 2.1
Summary: Mask R-CNN for object detection and instance segmentation
Home-page: https://github.com/matterport/Mask_RCNN
Author: Matterport
Author-email: waleed.abdulla@gmail.com
License: MIT
Location: ...
Requires:
Required-by:

Name: mask-rcnn

Version: 2.1

Summary: Mask R-CNN for object detection and instance segmentation

Home-page: https://github.com/matterport/Mask_RCNN

Author: Matterport

Author-email: waleed.abdulla@gmail.com

License: MIT

Location: ...

Requires:

Required-by:

We are now ready to use the library.

How to Prepare a Dataset for Object Detection

Next, we need a dataset to model.

In this tutorial, we will use the kangaroo dataset, made available by Huynh Ngoc Anh (experiencor). The dataset is comprised of 183 photographs that contain kangaroos, and XML annotation files that provide bounding boxes for the kangaroos in each photograph.

The Mask R-CNN is designed to learn to predict both bounding boxes for objects as well as masks for those detected objects, and the kangaroo dataset does not provide masks. As such, we will use the dataset to learn a kangaroo object detection task, and ignore the masks and not focus on the image segmentation capabilities of the model.

There are a few steps required in order to prepare this dataset for modeling and we will work through each in turn in this section, including downloading the dataset, parsing the annotations file, developing a KangarooDataset object that can be used by the Mask_RCNN library, then testing the dataset object to confirm that we are loading images and annotations correctly.

Install Dataset

The first step is to download the dataset into your current working directory.

This can be achieved by cloning the GitHub repository directly, as follows:

git clone https://github.com/experiencor/kangaroo.git

1	git clone https://github.com/experiencor/kangaroo.git

This will create a new directory called “kangaroo” with a subdirectory called ‘images/‘ that contains all of the JPEG photos of kangaroos and a subdirectory called ‘annotes/‘ that contains all of the XML files that describe the locations of kangaroos in each photo.

kangaroo
├── annots
└── images

kangaroo

├── annots

└── images

Looking in each subdirectory, you can see that the photos and annotation files use a consistent naming convention, with filenames using a 5-digit zero-padded numbering system; for example:

images/00001.jpg
images/00002.jpg
images/00003.jpg
...
annots/00001.xml
annots/00002.xml
annots/00003.xml
...

images/00001.jpg

images/00002.jpg

images/00003.jpg

...

annots/00001.xml

annots/00002.xml

annots/00003.xml

...

This makes matching photographs and annotation files together very easy.

We can also see that the numbering system is not contiguous, that there are some photos missing, e.g. there is no ‘00007‘ JPG or XML.

This means that we should focus on loading the list of actual files in the directory rather than using a numbering system.

Parse Annotation File

The next step is to figure out how to load the annotation files.

First, open the first annotation file (annots/00001.xml) and take a look; you should see:

<annotation>
	<folder>Kangaroo</folder>
	<filename>00001.jpg</filename>
	<path>...</path>
	<source>
		<database>Unknown</database>
	</source>
	<size>
		<width>450</width>
		<height>319</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>kangaroo</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>233</xmin>
			<ymin>89</ymin>
			<xmax>386</xmax>
			<ymax>262</ymax>
		</bndbox>
	</object>
	<object>
		<name>kangaroo</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>134</xmin>
			<ymin>105</ymin>
			<xmax>341</xmax>
			<ymax>253</ymax>
		</bndbox>
	</object>
</annotation>

<folder>Kangaroo</folder>

<database>Unknown</database>

</source>

<size>

</size>

<name>kangaroo</name>

<pose>Unspecified</pose>

</bndbox>

</object>

<name>kangaroo</name>

<pose>Unspecified</pose>

</bndbox>

</object>

</annotation>

We can see that the annotation file contains a “size” element that describes the shape of the photograph, and one or more “object” elements that describe the bounding boxes for the kangaroo objects in the photograph.

The size and the bounding boxes are the minimum information that we require from each annotation file. We could write some careful XML parsing code to process these annotation files, and that would be a good idea for a production system. Instead, we will short-cut development and use XPath queries to directly extract the data that we need from each file, e.g. a //size query to extract the size element and a //object or a //bndbox query to extract the bounding box elements.

Python provides the ElementTree API that can be used to load and parse an XML file and we can use the find() and findall() functions to perform the XPath queries on a loaded document.

First, the annotation file must be loaded and parsed as an ElementTree object.

# load and parse the file
tree = ElementTree.parse(filename)

1 2	# load and parse the file tree = ElementTree.parse(filename)

Once loaded, we can retrieve the root element of the document from which we can perform our XPath queries.

# get the root of the document
root = tree.getroot()

1 2	# get the root of the document root = tree.getroot()

We can use the findall() function with a query for ‘.//bndbox‘ to find all ‘bndbox‘ elements, then enumerate each to extract the x and y, min and max values that define each bounding box.

The element text can also be parsed to integer values.

# extract each bounding box
for box in root.findall('.//bndbox'):
	xmin = int(box.find('xmin').text)
	ymin = int(box.find('ymin').text)
	xmax = int(box.find('xmax').text)
	ymax = int(box.find('ymax').text)
	coors = [xmin, ymin, xmax, ymax]

# extract each bounding box

for box in root.findall('.//bndbox'):

xmin = int(box.find('xmin').text)

ymin = int(box.find('ymin').text)

xmax = int(box.find('xmax').text)

ymax = int(box.find('ymax').text)

coors = [xmin, ymin, xmax, ymax]

We can then collect the definition of each bounding box into a list.

The dimensions of the image may also be helpful, which can be queried directly.

# extract image dimensions
width = int(root.find('.//size/width').text)
height = int(root.find('.//size/height').text)

# extract image dimensions

width = int(root.find('.//size/width').text)

height = int(root.find('.//size/height').text)

We can tie all of this together into a function that will take the annotation filename as an argument, extract the bounding box and image dimension details, and return them for use.

The extract_boxes() function below implements this behavior.

# function to extract bounding boxes from an annotation file
def extract_boxes(filename):
	# load and parse the file
	tree = ElementTree.parse(filename)
	# get the root of the document
	root = tree.getroot()
	# extract each bounding box
	boxes = list()
	for box in root.findall('.//bndbox'):
		xmin = int(box.find('xmin').text)
		ymin = int(box.find('ymin').text)
		xmax = int(box.find('xmax').text)
		ymax = int(box.find('ymax').text)
		coors = [xmin, ymin, xmax, ymax]
		boxes.append(coors)
	# extract image dimensions
	width = int(root.find('.//size/width').text)
	height = int(root.find('.//size/height').text)
	return boxes, width, height

# function to extract bounding boxes from an annotation file

def extract_boxes(filename):

# load and parse the file

tree = ElementTree.parse(filename)

# get the root of the document

root = tree.getroot()

# extract each bounding box

boxes = list()

for box in root.findall('.//bndbox'):

xmin = int(box.find('xmin').text)

ymin = int(box.find('ymin').text)

xmax = int(box.find('xmax').text)

ymax = int(box.find('ymax').text)

coors = [xmin, ymin, xmax, ymax]

boxes.append(coors)

# extract image dimensions

width = int(root.find('.//size/width').text)

height = int(root.find('.//size/height').text)

return boxes, width, height

We can test out this function on our annotation files, for example, on the first annotation file in the directory.

The complete example is listed below.

# example of extracting bounding boxes from an annotation file
from xml.etree import ElementTree

# function to extract bounding boxes from an annotation file
def extract_boxes(filename):
	# load and parse the file
	tree = ElementTree.parse(filename)
	# get the root of the document
	root = tree.getroot()
	# extract each bounding box
	boxes = list()
	for box in root.findall('.//bndbox'):
		xmin = int(box.find('xmin').text)
		ymin = int(box.find('ymin').text)
		xmax = int(box.find('xmax').text)
		ymax = int(box.find('ymax').text)
		coors = [xmin, ymin, xmax, ymax]
		boxes.append(coors)
	# extract image dimensions
	width = int(root.find('.//size/width').text)
	height = int(root.find('.//size/height').text)
	return boxes, width, height

# extract details form annotation file
boxes, w, h = extract_boxes('kangaroo/annots/00001.xml')
# summarize extracted details
print(boxes, w, h)

# example of extracting bounding boxes from an annotation file

from xml.etree import ElementTree

# function to extract bounding boxes from an annotation file

def extract_boxes(filename):

# load and parse the file

tree = ElementTree.parse(filename)

# get the root of the document

root = tree.getroot()

# extract each bounding box

boxes = list()

for box in root.findall('.//bndbox'):

xmin = int(box.find('xmin').text)

ymin = int(box.find('ymin').text)

xmax = int(box.find('xmax').text)

ymax = int(box.find('ymax').text)

coors = [xmin, ymin, xmax, ymax]

boxes.append(coors)

# extract image dimensions

width = int(root.find('.//size/width').text)

height = int(root.find('.//size/height').text)

return boxes, width, height

# extract details form annotation file

boxes, w, h = extract_boxes('kangaroo/annots/00001.xml')

# summarize extracted details

print(boxes, w, h)

Running the example returns a list that contains the details of each bounding box in the annotation file, as well as two integers for the width and height of the photograph.

[[233, 89, 386, 262], [134, 105, 341, 253]] 450 319

1	[[233, 89, 386, 262], [134, 105, 341, 253]] 450 319

Now that we know how to load the annotation file, we can look at using this functionality to develop a Dataset object.

Develop KangarooDataset Object

The mask-rcnn library requires that train, validation, and test datasets be managed by a mrcnn.utils.Dataset object.

This means that a new class must be defined that extends the mrcnn.utils.Dataset class and defines a function to load the dataset, with any name you like such as load_dataset(), and override two functions, one for loading a mask called load_mask() and one for loading an image reference (path or URL) called image_reference().

# class that defines and loads the kangaroo dataset
class KangarooDataset(Dataset):
	# load the dataset definitions
	def load_dataset(self, dataset_dir, is_train=True):
		# ...

	# load the masks for an image
	def load_mask(self, image_id):
		# ...

	# load an image reference
	def image_reference(self, image_id):
		# ...

# class that defines and loads the kangaroo dataset

class KangarooDataset(Dataset):

# load the dataset definitions

def load_dataset(self, dataset_dir, is_train=True):

# ...

# load the masks for an image

def load_mask(self, image_id):

# ...

# load an image reference

def image_reference(self, image_id):

# ...

To use a Dataset object, it is instantiated, then your custom load function must be called, then finally the built-in prepare() function is called.

For example, we will create a new class called KangarooDataset that will be used as follows:

# prepare the dataset
train_set = KangarooDataset()
train_set.load_dataset(...)
train_set.prepare()

# prepare the dataset

train_set = KangarooDataset()

train_set.load_dataset(...)

train_set.prepare()

The custom load function, e.g. load_dataset() is responsible for both defining the classes and for defining the images in the dataset.

Classes are defined by calling the built-in add_class() function and specifying the ‘source‘ (the name of the dataset), the ‘class_id‘ or integer for the class (e.g. 1 for the first lass as 0 is reserved for the background class), and the ‘class_name‘ (e.g. ‘kangaroo‘).

# define one class
self.add_class("dataset", 1, "kangaroo")

1 2	# define one class self.add_class("dataset", 1, "kangaroo")

Objects are defined by a call to the built-in add_image() function and specifying the ‘source‘ (the name of the dataset), a unique ‘image_id‘ (e.g. the filename without the file extension like ‘00001‘), and the path for where the image can be loaded (e.g. ‘kangaroo/images/00001.jpg‘).

This will define an “image info” dictionary for the image that can be retrieved later via the index or order in which the image was added to the dataset. You can also specify other arguments that will be added to the image info dictionary, such as an ‘annotation‘ to define the annotation path.

# add to dataset
self.add_image('dataset', image_id='00001', path='kangaroo/images/00001.jpg', annotation='kangaroo/annots/00001.xml')

1 2	# add to dataset self.add_image('dataset', image_id='00001', path='kangaroo/images/00001.jpg', annotation='kangaroo/annots/00001.xml')

For example, we can implement a load_dataset() function that takes the path to the dataset directory and loads all images in the dataset.

Note, testing revealed that there is an issue with image number ‘00090‘, so we will exclude it from the dataset.

# load the dataset definitions
def load_dataset(self, dataset_dir):
	# define one class
	self.add_class("dataset", 1, "kangaroo")
	# define data locations
	images_dir = dataset_dir + '/images/'
	annotations_dir = dataset_dir + '/annots/'
	# find all images
	for filename in listdir(images_dir):
		# extract image id
		image_id = filename[:-4]
		# skip bad images
		if image_id in ['00090']:
			continue
		img_path = images_dir + filename
		ann_path = annotations_dir + image_id + '.xml'
		# add to dataset
		self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

# load the dataset definitions

def load_dataset(self, dataset_dir):

# define one class

self.add_class("dataset", 1, "kangaroo")

# define data locations

images_dir = dataset_dir + '/images/'

annotations_dir = dataset_dir + '/annots/'

# find all images

for filename in listdir(images_dir):

# extract image id

image_id = filename[:-4]

# skip bad images

if image_id in ['00090']:

continue

img_path = images_dir + filename

ann_path = annotations_dir + image_id + '.xml'

# add to dataset

self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

We can go one step further and add one more argument to the function to define whether the Dataset instance is for training or test/validation. We have about 160 photos, so we can use about 20%, or the last 32 photos, as a test or validation dataset and the first 131, or 80%, as the training dataset.

This division can be made using the integer in the filename, where all photos before photo number 150 will be train and equal or after 150 used for test. The updated load_dataset() with support for train and test datasets is provided below.

# load the dataset definitions
def load_dataset(self, dataset_dir, is_train=True):
	# define one class
	self.add_class("dataset", 1, "kangaroo")
	# define data locations
	images_dir = dataset_dir + '/images/'
	annotations_dir = dataset_dir + '/annots/'
	# find all images
	for filename in listdir(images_dir):
		# extract image id
		image_id = filename[:-4]
		# skip bad images
		if image_id in ['00090']:
			continue
		# skip all images after 150 if we are building the train set
		if is_train and int(image_id) >= 150:
			continue
		# skip all images before 150 if we are building the test/val set
		if not is_train and int(image_id) < 150:
			continue
		img_path = images_dir + filename
		ann_path = annotations_dir + image_id + '.xml'
		# add to dataset
		self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

# load the dataset definitions

def load_dataset(self, dataset_dir, is_train=True):

# define one class

self.add_class("dataset", 1, "kangaroo")

# define data locations

images_dir = dataset_dir + '/images/'

annotations_dir = dataset_dir + '/annots/'

# find all images

for filename in listdir(images_dir):

# extract image id

image_id = filename[:-4]

# skip bad images

if image_id in ['00090']:

continue

# skip all images after 150 if we are building the train set

if is_train and int(image_id) >= 150:

continue

# skip all images before 150 if we are building the test/val set

if not is_train and int(image_id) < 150:

continue

img_path = images_dir + filename

ann_path = annotations_dir + image_id + '.xml'

# add to dataset

self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

Next, we need to define the load_mask() function for loading the mask for a given ‘image_id‘.

In this case, the ‘image_id‘ is the integer index for an image in the dataset, assigned based on the order that the image was added via a call to add_image() when loading the dataset. The function must return an array of one or more masks for the photo associated with the image_id, and the classes for each mask.

We don’t have masks, but we do have bounding boxes. We can load the bounding boxes for a given photo and return them as masks. The library will then infer bounding boxes from our “masks” which will be the same size.

First, we must load the annotation file for the image_id. This involves first retrieving the ‘image info‘ dict for the image_id, then retrieving the annotations path that we stored for the image via our prior call to add_image(). We can then use the path in our call to extract_boxes() developed in the previous section to get the list of bounding boxes and the dimensions of the image.

# get details of image
info = self.image_info[image_id]
# define box file location
path = info['annotation']
# load XML
boxes, w, h = self.extract_boxes(path)

# get details of image

info = self.image_info[image_id]

# define box file location

path = info['annotation']

# load XML

boxes, w, h = self.extract_boxes(path)

We can now define a mask for each bounding box, and an associated class.

A mask is a two-dimensional array with the same dimensions as the photograph with all zero values where the object isn’t and all one values where the object is in the photograph.

We can achieve this by creating a NumPy array with all zero values for the known size of the image and one channel for each bounding box.

# create one array for all masks, each on a different channel
masks = zeros([h, w, len(boxes)], dtype='uint8')

1 2	# create one array for all masks, each on a different channel masks = zeros([h, w, len(boxes)], dtype='uint8')

Each bounding box is defined as min and max, x and y coordinates of the box.

These can be used directly to define row and column ranges in the array that can then be marked as 1.

# create masks
for i in range(len(boxes)):
	box = boxes[i]
	row_s, row_e = box[1], box[3]
	col_s, col_e = box[0], box[2]
	masks[row_s:row_e, col_s:col_e, i] = 1

# create masks

for i in range(len(boxes)):

box = boxes[i]

row_s, row_e = box[1], box[3]

col_s, col_e = box[0], box[2]

masks[row_s:row_e, col_s:col_e, i] = 1

All objects have the same class in this dataset. We can retrieve the class index via the ‘class_names‘ dictionary, then add it to a list to be returned alongside the masks.

self.class_names.index('kangaroo')

1	self.class_names.index('kangaroo')

Tying this together, the complete load_mask() function is listed below.

# load the masks for an image
def load_mask(self, image_id):
	# get details of image
	info = self.image_info[image_id]
	# define box file location
	path = info['annotation']
	# load XML
	boxes, w, h = self.extract_boxes(path)
	# create one array for all masks, each on a different channel
	masks = zeros([h, w, len(boxes)], dtype='uint8')
	# create masks
	class_ids = list()
	for i in range(len(boxes)):
		box = boxes[i]
		row_s, row_e = box[1], box[3]
		col_s, col_e = box[0], box[2]
		masks[row_s:row_e, col_s:col_e, i] = 1
		class_ids.append(self.class_names.index('kangaroo'))
	return masks, asarray(class_ids, dtype='int32')

# load the masks for an image

def load_mask(self, image_id):

# get details of image

info = self.image_info[image_id]

# define box file location

path = info['annotation']

# load XML

boxes, w, h = self.extract_boxes(path)

# create one array for all masks, each on a different channel

masks = zeros([h, w, len(boxes)], dtype='uint8')

# create masks

class_ids = list()

for i in range(len(boxes)):

box = boxes[i]

row_s, row_e = box[1], box[3]

col_s, col_e = box[0], box[2]

masks[row_s:row_e, col_s:col_e, i] = 1

class_ids.append(self.class_names.index('kangaroo'))

return masks, asarray(class_ids, dtype='int32')

Finally, we must implement the image_reference() function.

This function is responsible for returning the path or URL for a given ‘image_id‘, which we know is just the ‘path‘ property on the ‘image info‘ dict.

# load an image reference
def image_reference(self, image_id):
	info = self.image_info[image_id]
	return info['path']

# load an image reference

def image_reference(self, image_id):

info = self.image_info[image_id]

return info['path']

And that’s it. We have successfully defined a Dataset object for the mask-rcnn library for our Kangaroo dataset.

The complete listing of the class and creating a train and test dataset is provided below.

# split into train and test set
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from mrcnn.utils import Dataset

# class that defines and loads the kangaroo dataset
class KangarooDataset(Dataset):
	# load the dataset definitions
	def load_dataset(self, dataset_dir, is_train=True):
		# define one class
		self.add_class("dataset", 1, "kangaroo")
		# define data locations
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		# find all images
		for filename in listdir(images_dir):
			# extract image id
			image_id = filename[:-4]
			# skip bad images
			if image_id in ['00090']:
				continue
			# skip all images after 150 if we are building the train set
			if is_train and int(image_id) >= 150:
				continue
			# skip all images before 150 if we are building the test/val set
			if not is_train and int(image_id) < 150:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			# add to dataset
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	# extract bounding boxes from an annotation file
	def extract_boxes(self, filename):
		# load and parse the file
		tree = ElementTree.parse(filename)
		# get the root of the document
		root = tree.getroot()
		# extract each bounding box
		boxes = list()
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		# extract image dimensions
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# load the masks for an image
	def load_mask(self, image_id):
		# get details of image
		info = self.image_info[image_id]
		# define box file location
		path = info['annotation']
		# load XML
		boxes, w, h = self.extract_boxes(path)
		# create one array for all masks, each on a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create masks
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

	# load an image reference
	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']

# train set
train_set = KangarooDataset()
train_set.load_dataset('kangaroo', is_train=True)
train_set.prepare()
print('Train: %d' % len(train_set.image_ids))

# test/val set
test_set = KangarooDataset()
test_set.load_dataset('kangaroo', is_train=False)
test_set.prepare()
print('Test: %d' % len(test_set.image_ids))

# split into train and test set

from os import listdir

from xml.etree import ElementTree

from numpy import zeros

from numpy import asarray

from mrcnn.utils import Dataset

# class that defines and loads the kangaroo dataset

class KangarooDataset(Dataset):

# load the dataset definitions

def load_dataset(self, dataset_dir, is_train=True):

# define one class

self.add_class("dataset", 1, "kangaroo")

# define data locations

images_dir = dataset_dir + '/images/'

annotations_dir = dataset_dir + '/annots/'

# find all images

for filename in listdir(images_dir):

# extract image id

image_id = filename[:-4]

# skip bad images

if image_id in ['00090']:

continue

# skip all images after 150 if we are building the train set

if is_train and int(image_id) >= 150:

continue

# skip all images before 150 if we are building the test/val set

if not is_train and int(image_id) < 150:

continue

img_path = images_dir + filename

ann_path = annotations_dir + image_id + '.xml'

# add to dataset

self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

# extract bounding boxes from an annotation file

def extract_boxes(self, filename):

# load and parse the file

tree = ElementTree.parse(filename)

# get the root of the document

root = tree.getroot()

# extract each bounding box

boxes = list()

for box in root.findall('.//bndbox'):

xmin = int(box.find('xmin').text)

ymin = int(box.find('ymin').text)

xmax = int(box.find('xmax').text)

ymax = int(box.find('ymax').text)

coors = [xmin, ymin, xmax, ymax]

boxes.append(coors)

# extract image dimensions

width = int(root.find('.//size/width').text)

height = int(root.find('.//size/height').text)

return boxes, width, height

# load the masks for an image

def load_mask(self, image_id):

# get details of image

info = self.image_info[image_id]

# define box file location

path = info['annotation']

# load XML

boxes, w, h = self.extract_boxes(path)

# create one array for all masks, each on a different channel

masks = zeros([h, w, len(boxes)], dtype='uint8')

# create masks

class_ids = list()

for i in range(len(boxes)):

box = boxes[i]

row_s, row_e = box[1], box[3]

col_s, col_e = box[0], box[2]

masks[row_s:row_e, col_s:col_e, i] = 1

class_ids.append(self.class_names.index('kangaroo'))

return masks, asarray(class_ids, dtype='int32')

# load an image reference

def image_reference(self, image_id):

info = self.image_info[image_id]

return info['path']

# train set

train_set = KangarooDataset()

train_set.load_dataset('kangaroo', is_train=True)

train_set.prepare()

print('Train: %d' % len(train_set.image_ids))

# test/val set

test_set = KangarooDataset()

test_set.load_dataset('kangaroo', is_train=False)

test_set.prepare()

print('Test: %d' % len(test_set.image_ids))

Running the example successfully loads and prepares the train and test dataset and prints the number of images in each.

Train: 131
Test: 32

1 2	Train: 131 Test: 32

Now that we have defined the dataset, let’s confirm that the images, masks, and bounding boxes are handled correctly.

Test KangarooDataset Object

The first useful test is to confirm that the images and masks can be loaded correctly.

We can test this by creating a dataset and loading an image via a call to the load_image() function with an image_id, then load the mask for the image via a call to the load_mask() function with the same image_id.

# load an image
image_id = 0
image = train_set.load_image(image_id)
print(image.shape)
# load image mask
mask, class_ids = train_set.load_mask(image_id)
print(mask.shape)

# load an image

image_id = 0

image = train_set.load_image(image_id)

print(image.shape)

# load image mask

mask, class_ids = train_set.load_mask(image_id)

print(mask.shape)

Next, we can plot the photograph using the Matplotlib API, then plot the first mask over the top with an alpha value so that the photograph underneath can still be seen

# plot image
pyplot.imshow(image)
# plot mask
pyplot.imshow(mask[:, :, 0], cmap='gray', alpha=0.5)
pyplot.show()

# plot image

pyplot.imshow(image)

# plot mask

pyplot.imshow(mask[:, :, 0], cmap='gray', alpha=0.5)

pyplot.show()

The complete example is listed below.

# plot one photograph and mask
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from mrcnn.utils import Dataset
from matplotlib import pyplot

# class that defines and loads the kangaroo dataset
class KangarooDataset(Dataset):
	# load the dataset definitions
	def load_dataset(self, dataset_dir, is_train=True):
		# define one class
		self.add_class("dataset", 1, "kangaroo")
		# define data locations
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		# find all images
		for filename in listdir(images_dir):
			# extract image id
			image_id = filename[:-4]
			# skip bad images
			if image_id in ['00090']:
				continue
			# skip all images after 150 if we are building the train set
			if is_train and int(image_id) >= 150:
				continue
			# skip all images before 150 if we are building the test/val set
			if not is_train and int(image_id) < 150:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			# add to dataset
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	# extract bounding boxes from an annotation file
	def extract_boxes(self, filename):
		# load and parse the file
		tree = ElementTree.parse(filename)
		# get the root of the document
		root = tree.getroot()
		# extract each bounding box
		boxes = list()
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		# extract image dimensions
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# load the masks for an image
	def load_mask(self, image_id):
		# get details of image
		info = self.image_info[image_id]
		# define box file location
		path = info['annotation']
		# load XML
		boxes, w, h = self.extract_boxes(path)
		# create one array for all masks, each on a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create masks
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

	# load an image reference
	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']

# train set
train_set = KangarooDataset()
train_set.load_dataset('kangaroo', is_train=True)
train_set.prepare()
# load an image
image_id = 0
image = train_set.load_image(image_id)
print(image.shape)
# load image mask
mask, class_ids = train_set.load_mask(image_id)
print(mask.shape)
# plot image
pyplot.imshow(image)
# plot mask
pyplot.imshow(mask[:, :, 0], cmap='gray', alpha=0.5)
pyplot.show()

# plot one photograph and mask

from os import listdir

from xml.etree import ElementTree

from numpy import zeros

from numpy import asarray

from mrcnn.utils import Dataset

from matplotlib import pyplot

# class that defines and loads the kangaroo dataset

class KangarooDataset(Dataset):

# load the dataset definitions

def load_dataset(self, dataset_dir, is_train=True):

# define one class

self.add_class("dataset", 1, "kangaroo")

# define data locations

images_dir = dataset_dir + '/images/'

annotations_dir = dataset_dir + '/annots/'

# find all images

for filename in listdir(images_dir):

# extract image id

image_id = filename[:-4]

# skip bad images

if image_id in ['00090']:

continue

# skip all images after 150 if we are building the train set

if is_train and int(image_id) >= 150:

continue

# skip all images before 150 if we are building the test/val set

if not is_train and int(image_id) < 150:

continue

img_path = images_dir + filename

ann_path = annotations_dir + image_id + '.xml'

# add to dataset

self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

# extract bounding boxes from an annotation file

def extract_boxes(self, filename):

# load and parse the file

tree = ElementTree.parse(filename)

# get the root of the document

root = tree.getroot()

# extract each bounding box

boxes = list()

for box in root.findall('.//bndbox'):

xmin = int(box.find('xmin').text)

ymin = int(box.find('ymin').text)

xmax = int(box.find('xmax').text)

ymax = int(box.find('ymax').text)

coors = [xmin, ymin, xmax, ymax]

boxes.append(coors)

# extract image dimensions

width = int(root.find('.//size/width').text)

height = int(root.find('.//size/height').text)

return boxes, width, height

# load the masks for an image

def load_mask(self, image_id):

# get details of image

info = self.image_info[image_id]

# define box file location

path = info['annotation']

# load XML

boxes, w, h = self.extract_boxes(path)

# create one array for all masks, each on a different channel

masks = zeros([h, w, len(boxes)], dtype='uint8')

# create masks

class_ids = list()

for i in range(len(boxes)):

box = boxes[i]

row_s, row_e = box[1], box[3]

col_s, col_e = box[0], box[2]

masks[row_s:row_e, col_s:col_e, i] = 1

class_ids.append(self.class_names.index('kangaroo'))

return masks, asarray(class_ids, dtype='int32')

# load an image reference

def image_reference(self, image_id):

info = self.image_info[image_id]

return info['path']

# train set

train_set = KangarooDataset()

train_set.load_dataset('kangaroo', is_train=True)

train_set.prepare()

# load an image

image_id = 0

image = train_set.load_image(image_id)

print(image.shape)

# load image mask

mask, class_ids = train_set.load_mask(image_id)

print(mask.shape)

# plot image

pyplot.imshow(image)

# plot mask

pyplot.imshow(mask[:, :, 0], cmap='gray', alpha=0.5)

pyplot.show()

Running the example first prints the shape of the photograph and mask NumPy arrays.

We can confirm that both arrays have the same width and height and only differ in terms of the number of channels. We can also see that the first photograph (e.g. image_id=0) in this case only has one mask.

(626, 899, 3)
(626, 899, 1)

1 2	(626, 899, 3) (626, 899, 1)

A plot of the photograph is also created with the first mask overlaid.

In this case, we can see that one kangaroo is present in the photo and that the mask correctly bounds the kangaroo.

Photograph of Kangaroo With Object Detection Mask Overlaid

We could repeat this for the first nine photos in the dataset, plotting each photo in one figure as a subplot and plotting all masks for each photo.

# plot first few images
for i in range(9):
	# define subplot
	pyplot.subplot(330 + 1 + i)
	# plot raw pixel data
	image = train_set.load_image(i)
	pyplot.imshow(image)
	# plot all masks
	mask, _ = train_set.load_mask(i)
	for j in range(mask.shape[2]):
		pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)
# show the figure
pyplot.show()

# plot first few images

for i in range(9):

# define subplot

pyplot.subplot(330 + 1 + i)

# plot raw pixel data

image = train_set.load_image(i)

pyplot.imshow(image)

# plot all masks

mask, _ = train_set.load_mask(i)

for j in range(mask.shape[2]):

pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)

# show the figure

pyplot.show()

Running the example shows that photos are loaded correctly and that those photos with multiple objects correctly have separate masks defined.

Plot of First Nine Photos of Kangaroos in the Training Dataset With Object Detection Masks

Another useful debugging step might be to load all of the ‘image info‘ objects in the dataset and print them to the console.

This can help to confirm that all of the calls to the add_image() function in the load_dataset() function worked as expected.

# enumerate all images in the dataset
for image_id in train_set.image_ids:
	# load image info
	info = train_set.image_info[image_id]
	# display on the console
	print(info)

# enumerate all images in the dataset

for image_id in train_set.image_ids:

# load image info

info = train_set.image_info[image_id]

# display on the console

print(info)

Running this code on the loaded training dataset will then show all of the ‘image info‘ dictionaries, showing the paths and ids for each image in the dataset.

{'id': '00132', 'source': 'dataset', 'path': 'kangaroo/images/00132.jpg', 'annotation': 'kangaroo/annots/00132.xml'}
{'id': '00046', 'source': 'dataset', 'path': 'kangaroo/images/00046.jpg', 'annotation': 'kangaroo/annots/00046.xml'}
{'id': '00052', 'source': 'dataset', 'path': 'kangaroo/images/00052.jpg', 'annotation': 'kangaroo/annots/00052.xml'}
...

{'id': '00132', 'source': 'dataset', 'path': 'kangaroo/images/00132.jpg', 'annotation': 'kangaroo/annots/00132.xml'}

{'id': '00046', 'source': 'dataset', 'path': 'kangaroo/images/00046.jpg', 'annotation': 'kangaroo/annots/00046.xml'}

{'id': '00052', 'source': 'dataset', 'path': 'kangaroo/images/00052.jpg', 'annotation': 'kangaroo/annots/00052.xml'}

...

Finally, the mask-rcnn library provides utilities for displaying images and masks. We can use some of these built-in functions to confirm that the Dataset is operating correctly.

For example, the mask-rcnn library provides the mrcnn.visualize.display_instances() function that will show a photograph with bounding boxes, masks, and class labels. This requires that the bounding boxes are extracted from the masks via the extract_bboxes() function.

# define image id
image_id = 1
# load the image
image = train_set.load_image(image_id)
# load the masks and the class ids
mask, class_ids = train_set.load_mask(image_id)
# extract bounding boxes from the masks
bbox = extract_bboxes(mask)
# display image with masks and bounding boxes
display_instances(image, bbox, mask, class_ids, train_set.class_names)

# define image id

image_id = 1

# load the image

image = train_set.load_image(image_id)

# load the masks and the class ids

mask, class_ids = train_set.load_mask(image_id)

# extract bounding boxes from the masks

bbox = extract_bboxes(mask)

# display image with masks and bounding boxes

display_instances(image, bbox, mask, class_ids, train_set.class_names)

For completeness, the full code listing is provided below.

# display image with masks and bounding boxes
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from mrcnn.utils import Dataset
from mrcnn.visualize import display_instances
from mrcnn.utils import extract_bboxes

# class that defines and loads the kangaroo dataset
class KangarooDataset(Dataset):
	# load the dataset definitions
	def load_dataset(self, dataset_dir, is_train=True):
		# define one class
		self.add_class("dataset", 1, "kangaroo")
		# define data locations
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		# find all images
		for filename in listdir(images_dir):
			# extract image id
			image_id = filename[:-4]
			# skip bad images
			if image_id in ['00090']:
				continue
			# skip all images after 150 if we are building the train set
			if is_train and int(image_id) >= 150:
				continue
			# skip all images before 150 if we are building the test/val set
			if not is_train and int(image_id) < 150:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			# add to dataset
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	# extract bounding boxes from an annotation file
	def extract_boxes(self, filename):
		# load and parse the file
		tree = ElementTree.parse(filename)
		# get the root of the document
		root = tree.getroot()
		# extract each bounding box
		boxes = list()
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		# extract image dimensions
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# load the masks for an image
	def load_mask(self, image_id):
		# get details of image
		info = self.image_info[image_id]
		# define box file location
		path = info['annotation']
		# load XML
		boxes, w, h = self.extract_boxes(path)
		# create one array for all masks, each on a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create masks
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

	# load an image reference
	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']

# train set
train_set = KangarooDataset()
train_set.load_dataset('kangaroo', is_train=True)
train_set.prepare()
# define image id
image_id = 1
# load the image
image = train_set.load_image(image_id)
# load the masks and the class ids
mask, class_ids = train_set.load_mask(image_id)
# extract bounding boxes from the masks
bbox = extract_bboxes(mask)
# display image with masks and bounding boxes
display_instances(image, bbox, mask, class_ids, train_set.class_names)

# display image with masks and bounding boxes

from os import listdir

from xml.etree import ElementTree

from numpy import zeros

from numpy import asarray

from mrcnn.utils import Dataset

from mrcnn.visualize import display_instances

from mrcnn.utils import extract_bboxes

# class that defines and loads the kangaroo dataset

class KangarooDataset(Dataset):

# load the dataset definitions

def load_dataset(self, dataset_dir, is_train=True):

# define one class

self.add_class("dataset", 1, "kangaroo")

# define data locations

images_dir = dataset_dir + '/images/'

annotations_dir = dataset_dir + '/annots/'

# find all images

for filename in listdir(images_dir):

# extract image id

image_id = filename[:-4]

# skip bad images

if image_id in ['00090']:

continue

# skip all images after 150 if we are building the train set

if is_train and int(image_id) >= 150:

continue

# skip all images before 150 if we are building the test/val set

if not is_train and int(image_id) < 150:

continue

img_path = images_dir + filename

ann_path = annotations_dir + image_id + '.xml'

# add to dataset

self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

# extract bounding boxes from an annotation file

def extract_boxes(self, filename):

# load and parse the file

tree = ElementTree.parse(filename)

# get the root of the document

root = tree.getroot()

# extract each bounding box

boxes = list()

for box in root.findall('.//bndbox'):

xmin = int(box.find('xmin').text)

ymin = int(box.find('ymin').text)

xmax = int(box.find('xmax').text)

ymax = int(box.find('ymax').text)

coors = [xmin, ymin, xmax, ymax]

boxes.append(coors)

# extract image dimensions

width = int(root.find('.//size/width').text)

height = int(root.find('.//size/height').text)

return boxes, width, height

# load the masks for an image

def load_mask(self, image_id):

# get details of image

info = self.image_info[image_id]

# define box file location

path = info['annotation']

# load XML

boxes, w, h = self.extract_boxes(path)

# create one array for all masks, each on a different channel

masks = zeros([h, w, len(boxes)], dtype='uint8')

# create masks

class_ids = list()

for i in range(len(boxes)):

box = boxes[i]

row_s, row_e = box[1], box[3]

col_s, col_e = box[0], box[2]

masks[row_s:row_e, col_s:col_e, i] = 1

class_ids.append(self.class_names.index('kangaroo'))

return masks, asarray(class_ids, dtype='int32')

# load an image reference

def image_reference(self, image_id):

info = self.image_info[image_id]

return info['path']

# train set

train_set = KangarooDataset()

train_set.load_dataset('kangaroo', is_train=True)

train_set.prepare()

# define image id

image_id = 1

# load the image

image = train_set.load_image(image_id)

# load the masks and the class ids

mask, class_ids = train_set.load_mask(image_id)

# extract bounding boxes from the masks

bbox = extract_bboxes(mask)

# display image with masks and bounding boxes

display_instances(image, bbox, mask, class_ids, train_set.class_names)

Running the example creates a plot showing the photograph with the mask for each object in a separate color.

The bounding boxes match the masks exactly, by design, and are shown with dotted outlines. Finally, each object is marked with the class label, which in this case is ‘kangaroo‘.

Photograph Showing Object Detection Masks, Bounding Boxes, and Class Labels

Now that we are confident that our dataset is being loaded correctly, we can use it to fit a Mask R-CNN model.

How to Train Mask R-CNN Model for Kangaroo Detection

A Mask R-CNN model can be fit from scratch, although like other computer vision applications, time can be saved and performance can be improved by using transfer learning.

The Mask R-CNN model pre-fit on the MS COCO object detection dataset can be used as a starting point and then tailored to the specific dataset, in this case, the kangaroo dataset.

The first step is to download the model file (architecture and weights) for the pre-fit Mask R-CNN model. The weights are available from the GitHub project and the file is about 250 megabytes.

Download the model weights to a file with the name ‘mask_rcnn_coco.h5‘ in your current working directory.

Download Weights (mask_rcnn_coco.h5) 246M

Next, a configuration object for the model must be defined.

This is a new class that extends the mrcnn.config.Config class and defines properties of both the prediction problem (such as name and the number of classes) and the algorithm for training the model (such as the learning rate).

The configuration must define the name of the configuration via the ‘NAME‘ attribute, e.g. ‘kangaroo_cfg‘, that will be used to save details and models to file during the run. The configuration must also define the number of classes in the prediction problem via the ‘NUM_CLASSES‘ attribute. In this case, we only have one object type of kangaroo, although there is always an additional class for the background.

Finally, we must define the number of samples (photos) used in each training epoch. This will be the number of photos in the training dataset, in this case, 131.

Tying this together, our custom KangarooConfig class is defined below.

# define a configuration for the model
class KangarooConfig(Config):
	# Give the configuration a recognizable name
	NAME = "kangaroo_cfg"
	# Number of classes (background + kangaroo)
	NUM_CLASSES = 1 + 1
	# Number of training steps per epoch
	STEPS_PER_EPOCH = 131

# prepare config
config = KangarooConfig()

# define a configuration for the model

class KangarooConfig(Config):

# Give the configuration a recognizable name

NAME = "kangaroo_cfg"

# Number of classes (background + kangaroo)

NUM_CLASSES = 1 + 1

# Number of training steps per epoch

STEPS_PER_EPOCH = 131

# prepare config

config = KangarooConfig()

Next, we can define our model.

This is achieved by creating an instance of the mrcnn.model.MaskRCNN class and specifying the model will be used for training via setting the ‘mode‘ argument to ‘training‘.

The ‘config‘ argument must also be specified with an instance of our KangarooConfig class.

Finally, a directory is needed where configuration files can be saved and where checkpoint models can be saved at the end of each epoch. We will use the current working directory.

# define the model
model = MaskRCNN(mode='training', model_dir='./', config=config)

1 2	# define the model model = MaskRCNN(mode='training', model_dir='./', config=config)

Next, the pre-defined model architecture and weights can be loaded. This can be achieved by calling the load_weights() function on the model and specifying the path to the downloaded ‘mask_rcnn_coco.h5‘ file.

The model will be used as-is, although the class-specific output layers will be removed so that new output layers can be defined and trained. This can be done by specifying the ‘exclude‘ argument and listing all of the output layers to exclude or remove from the model after it is loaded. This includes the output layers for the classification label, bounding boxes, and masks.

# load weights (mscoco)
model.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",  "mrcnn_bbox", "mrcnn_mask"])

1 2	# load weights (mscoco) model.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"])

Next, the model can be fit on the training dataset by calling the train() function and passing in both the training dataset and the validation dataset. We can also specify the learning rate as the default learning rate in the configuration (0.001).

We can also specify what layers to train. In this case, we will only train the heads, that is the output layers of the model.

# train weights (output layers or 'heads')
model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers='heads')

1 2	# train weights (output layers or 'heads') model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers='heads')

We could follow this training with further epochs that fine-tune all of the weights in the model. This could be achieved by using a smaller learning rate and changing the ‘layer’ argument from ‘heads’ to ‘all’.

The complete example of training a Mask R-CNN on the kangaroo dataset is listed below.

This may take some time to execute on the CPU, even with modern hardware. I recommend running the code with a GPU, such as on Amazon EC2, where it will finish in about five minutes on a P3 type hardware.

# fit a mask rcnn on the kangaroo dataset
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from mrcnn.utils import Dataset
from mrcnn.config import Config
from mrcnn.model import MaskRCNN

# class that defines and loads the kangaroo dataset
class KangarooDataset(Dataset):
	# load the dataset definitions
	def load_dataset(self, dataset_dir, is_train=True):
		# define one class
		self.add_class("dataset", 1, "kangaroo")
		# define data locations
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		# find all images
		for filename in listdir(images_dir):
			# extract image id
			image_id = filename[:-4]
			# skip bad images
			if image_id in ['00090']:
				continue
			# skip all images after 150 if we are building the train set
			if is_train and int(image_id) >= 150:
				continue
			# skip all images before 150 if we are building the test/val set
			if not is_train and int(image_id) < 150:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			# add to dataset
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	# extract bounding boxes from an annotation file
	def extract_boxes(self, filename):
		# load and parse the file
		tree = ElementTree.parse(filename)
		# get the root of the document
		root = tree.getroot()
		# extract each bounding box
		boxes = list()
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		# extract image dimensions
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# load the masks for an image
	def load_mask(self, image_id):
		# get details of image
		info = self.image_info[image_id]
		# define box file location
		path = info['annotation']
		# load XML
		boxes, w, h = self.extract_boxes(path)
		# create one array for all masks, each on a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create masks
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

	# load an image reference
	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']

# define a configuration for the model
class KangarooConfig(Config):
	# define the name of the configuration
	NAME = "kangaroo_cfg"
	# number of classes (background + kangaroo)
	NUM_CLASSES = 1 + 1
	# number of training steps per epoch
	STEPS_PER_EPOCH = 131

# prepare train set
train_set = KangarooDataset()
train_set.load_dataset('kangaroo', is_train=True)
train_set.prepare()
print('Train: %d' % len(train_set.image_ids))
# prepare test/val set
test_set = KangarooDataset()
test_set.load_dataset('kangaroo', is_train=False)
test_set.prepare()
print('Test: %d' % len(test_set.image_ids))
# prepare config
config = KangarooConfig()
config.display()
# define the model
model = MaskRCNN(mode='training', model_dir='./', config=config)
# load weights (mscoco) and exclude the output layers
model.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",  "mrcnn_bbox", "mrcnn_mask"])
# train weights (output layers or 'heads')
model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers='heads')

100

101

102

103

104

105

106

107

108

109

# fit a mask rcnn on the kangaroo dataset

from os import listdir

from xml.etree import ElementTree

from numpy import zeros

from numpy import asarray

from mrcnn.utils import Dataset

from mrcnn.config import Config

from mrcnn.model import MaskRCNN

# class that defines and loads the kangaroo dataset

class KangarooDataset(Dataset):

# load the dataset definitions

def load_dataset(self, dataset_dir, is_train=True):

# define one class

self.add_class("dataset", 1, "kangaroo")

# define data locations

images_dir = dataset_dir + '/images/'

annotations_dir = dataset_dir + '/annots/'

# find all images

for filename in listdir(images_dir):

# extract image id

image_id = filename[:-4]

# skip bad images

if image_id in ['00090']:

continue

# skip all images after 150 if we are building the train set

if is_train and int(image_id) >= 150:

continue

# skip all images before 150 if we are building the test/val set

if not is_train and int(image_id) < 150:

continue

img_path = images_dir + filename

ann_path = annotations_dir + image_id + '.xml'

# add to dataset

self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

# extract bounding boxes from an annotation file

def extract_boxes(self, filename):

# load and parse the file

tree = ElementTree.parse(filename)

# get the root of the document

root = tree.getroot()

# extract each bounding box

boxes = list()

for box in root.findall('.//bndbox'):

xmin = int(box.find('xmin').text)

ymin = int(box.find('ymin').text)

xmax = int(box.find('xmax').text)

ymax = int(box.find('ymax').text)

coors = [xmin, ymin, xmax, ymax]

boxes.append(coors)

# extract image dimensions

width = int(root.find('.//size/width').text)

height = int(root.find('.//size/height').text)

return boxes, width, height

# load the masks for an image

def load_mask(self, image_id):

# get details of image

info = self.image_info[image_id]

# define box file location

path = info['annotation']

# load XML

boxes, w, h = self.extract_boxes(path)

# create one array for all masks, each on a different channel

masks = zeros([h, w, len(boxes)], dtype='uint8')

# create masks

class_ids = list()

for i in range(len(boxes)):

box = boxes[i]

row_s, row_e = box[1], box[3]

col_s, col_e = box[0], box[2]

masks[row_s:row_e, col_s:col_e, i] = 1

class_ids.append(self.class_names.index('kangaroo'))

return masks, asarray(class_ids, dtype='int32')

# load an image reference

def image_reference(self, image_id):

info = self.image_info[image_id]

return info['path']

# define a configuration for the model

class KangarooConfig(Config):

# define the name of the configuration

NAME = "kangaroo_cfg"

# number of classes (background + kangaroo)

NUM_CLASSES = 1 + 1

# number of training steps per epoch

STEPS_PER_EPOCH = 131

# prepare train set

train_set = KangarooDataset()

train_set.load_dataset('kangaroo', is_train=True)

train_set.prepare()

print('Train: %d' % len(train_set.image_ids))

# prepare test/val set

test_set = KangarooDataset()

test_set.load_dataset('kangaroo', is_train=False)

test_set.prepare()

print('Test: %d' % len(test_set.image_ids))

# prepare config

config = KangarooConfig()

config.display()

# define the model

model = MaskRCNN(mode='training', model_dir='./', config=config)

# load weights (mscoco) and exclude the output layers

model.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"])

# train weights (output layers or 'heads')

model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers='heads')

Running the example will report progress using the standard Keras progress bars.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

We can see that there are many different train and test loss scores reported for each of the output heads of the network. It can be quite confusing as to which loss to pay attention to.

In this example where we are interested in object detection instead of object segmentation, I recommend paying attention to the loss for the classification output on the train and validation datasets (e.g. mrcnn_class_loss and val_mrcnn_class_loss), as well as the loss for the bounding box output for the train and validation datasets (mrcnn_bbox_loss and val_mrcnn_bbox_loss).

Epoch 1/5
131/131 [==============================] - 106s 811ms/step - loss: 0.8491 - rpn_class_loss: 0.0044 - rpn_bbox_loss: 0.1452 - mrcnn_class_loss: 0.0420 - mrcnn_bbox_loss: 0.2874 - mrcnn_mask_loss: 0.3701 - val_loss: 1.3402 - val_rpn_class_loss: 0.0160 - val_rpn_bbox_loss: 0.7913 - val_mrcnn_class_loss: 0.0092 - val_mrcnn_bbox_loss: 0.2263 - val_mrcnn_mask_loss: 0.2975
Epoch 2/5
131/131 [==============================] - 69s 526ms/step - loss: 0.4774 - rpn_class_loss: 0.0025 - rpn_bbox_loss: 0.1159 - mrcnn_class_loss: 0.0170 - mrcnn_bbox_loss: 0.1134 - mrcnn_mask_loss: 0.2285 - val_loss: 0.6261 - val_rpn_class_loss: 8.9502e-04 - val_rpn_bbox_loss: 0.1624 - val_mrcnn_class_loss: 0.0197 - val_mrcnn_bbox_loss: 0.2148 - val_mrcnn_mask_loss: 0.2282
Epoch 3/5
131/131 [==============================] - 67s 515ms/step - loss: 0.4471 - rpn_class_loss: 0.0029 - rpn_bbox_loss: 0.1153 - mrcnn_class_loss: 0.0234 - mrcnn_bbox_loss: 0.0958 - mrcnn_mask_loss: 0.2097 - val_loss: 1.2998 - val_rpn_class_loss: 0.0144 - val_rpn_bbox_loss: 0.6712 - val_mrcnn_class_loss: 0.0372 - val_mrcnn_bbox_loss: 0.2645 - val_mrcnn_mask_loss: 0.3125
Epoch 4/5
131/131 [==============================] - 66s 502ms/step - loss: 0.3934 - rpn_class_loss: 0.0026 - rpn_bbox_loss: 0.1003 - mrcnn_class_loss: 0.0171 - mrcnn_bbox_loss: 0.0806 - mrcnn_mask_loss: 0.1928 - val_loss: 0.6709 - val_rpn_class_loss: 0.0016 - val_rpn_bbox_loss: 0.2012 - val_mrcnn_class_loss: 0.0244 - val_mrcnn_bbox_loss: 0.1942 - val_mrcnn_mask_loss: 0.2495
Epoch 5/5
131/131 [==============================] - 65s 493ms/step - loss: 0.3357 - rpn_class_loss: 0.0024 - rpn_bbox_loss: 0.0804 - mrcnn_class_loss: 0.0193 - mrcnn_bbox_loss: 0.0616 - mrcnn_mask_loss: 0.1721 - val_loss: 0.8878 - val_rpn_class_loss: 0.0030 - val_rpn_bbox_loss: 0.4409 - val_mrcnn_class_loss: 0.0174 - val_mrcnn_bbox_loss: 0.1752 - val_mrcnn_mask_loss: 0.2513

Epoch 1/5

131/131 [==============================] - 106s 811ms/step - loss: 0.8491 - rpn_class_loss: 0.0044 - rpn_bbox_loss: 0.1452 - mrcnn_class_loss: 0.0420 - mrcnn_bbox_loss: 0.2874 - mrcnn_mask_loss: 0.3701 - val_loss: 1.3402 - val_rpn_class_loss: 0.0160 - val_rpn_bbox_loss: 0.7913 - val_mrcnn_class_loss: 0.0092 - val_mrcnn_bbox_loss: 0.2263 - val_mrcnn_mask_loss: 0.2975

Epoch 2/5

131/131 [==============================] - 69s 526ms/step - loss: 0.4774 - rpn_class_loss: 0.0025 - rpn_bbox_loss: 0.1159 - mrcnn_class_loss: 0.0170 - mrcnn_bbox_loss: 0.1134 - mrcnn_mask_loss: 0.2285 - val_loss: 0.6261 - val_rpn_class_loss: 8.9502e-04 - val_rpn_bbox_loss: 0.1624 - val_mrcnn_class_loss: 0.0197 - val_mrcnn_bbox_loss: 0.2148 - val_mrcnn_mask_loss: 0.2282

Epoch 3/5

131/131 [==============================] - 67s 515ms/step - loss: 0.4471 - rpn_class_loss: 0.0029 - rpn_bbox_loss: 0.1153 - mrcnn_class_loss: 0.0234 - mrcnn_bbox_loss: 0.0958 - mrcnn_mask_loss: 0.2097 - val_loss: 1.2998 - val_rpn_class_loss: 0.0144 - val_rpn_bbox_loss: 0.6712 - val_mrcnn_class_loss: 0.0372 - val_mrcnn_bbox_loss: 0.2645 - val_mrcnn_mask_loss: 0.3125

Epoch 4/5

131/131 [==============================] - 66s 502ms/step - loss: 0.3934 - rpn_class_loss: 0.0026 - rpn_bbox_loss: 0.1003 - mrcnn_class_loss: 0.0171 - mrcnn_bbox_loss: 0.0806 - mrcnn_mask_loss: 0.1928 - val_loss: 0.6709 - val_rpn_class_loss: 0.0016 - val_rpn_bbox_loss: 0.2012 - val_mrcnn_class_loss: 0.0244 - val_mrcnn_bbox_loss: 0.1942 - val_mrcnn_mask_loss: 0.2495

Epoch 5/5

131/131 [==============================] - 65s 493ms/step - loss: 0.3357 - rpn_class_loss: 0.0024 - rpn_bbox_loss: 0.0804 - mrcnn_class_loss: 0.0193 - mrcnn_bbox_loss: 0.0616 - mrcnn_mask_loss: 0.1721 - val_loss: 0.8878 - val_rpn_class_loss: 0.0030 - val_rpn_bbox_loss: 0.4409 - val_mrcnn_class_loss: 0.0174 - val_mrcnn_bbox_loss: 0.1752 - val_mrcnn_mask_loss: 0.2513

A model file is created and saved at the end of each epoch in a subdirectory that starts with ‘kangaroo_cfg‘ followed by random characters.

A model must be selected for use; in this case, the loss continues to decrease for the bounding boxes on each epoch, so we will use the final model at the end of the run (‘mask_rcnn_kangaroo_cfg_0005.h5‘).

Copy the model file from the config directory into your current working directory. We will use it in the following sections to evaluate the model and make predictions.

The results suggest that perhaps more training epochs could be useful, perhaps fine-tuning all of the layers in the model; this might make an interesting extension to the tutorial.

Next, let’s look at evaluating the performance of this model.

How to Evaluate a Mask R-CNN Model

The performance of a model for an object recognition task is often evaluated using the mean absolute precision, or mAP.

We are predicting bounding boxes so we can determine whether a bounding box prediction is good or not based on how well the predicted and actual bounding boxes overlap. This can be calculated by dividing the area of the overlap by the total area of both bounding boxes, or the intersection divided by the union, referred to as “intersection over union,” or IoU. A perfect bounding box prediction will have an IoU of 1.

It is standard to assume a positive prediction of a bounding box if the IoU is greater than 0.5, e.g. they overlap by 50% or more.

Precision refers to the percentage of the correctly predicted bounding boxes (IoU > 0.5) out of all bounding boxes predicted. Recall is the percentage of the correctly predicted bounding boxes (IoU > 0.5) out of all objects in the photo.

As we make more predictions, the recall percentage will increase, but precision will drop or become erratic as we start making false positive predictions. The recall (x) can be plotted against the precision (y) for each number of predictions to create a curve or line. We can maximize the value of each point on this line and calculate the average value of the precision or AP for each value of recall.

Note: there are variations on how AP is calculated, e.g. the way it is calculated for the widely used PASCAL VOC dataset and the MS COCO dataset differ.

The average or mean of the average precision (AP) across all of the images in a dataset is called the mean average precision, or mAP.

The mask-rcnn library provides a mrcnn.utils.compute_ap to calculate the AP and other metrics for a given images. These AP scores can be collected across a dataset and the mean calculated to give an idea at how good the model is at detecting objects in a dataset.

First, we must define a new Config object to use for making predictions, instead of training. We can extend our previously defined KangarooConfig to reuse the parameters. Instead, we will define a new object with the same values to keep the code compact. The config must change some of the defaults around using the GPU for inference that are different from how they are set for training a model (regardless of whether you are running on the GPU or CPU).

# define the prediction configuration
class PredictionConfig(Config):
	# define the name of the configuration
	NAME = "kangaroo_cfg"
	# number of classes (background + kangaroo)
	NUM_CLASSES = 1 + 1
	# simplify GPU config
	GPU_COUNT = 1
	IMAGES_PER_GPU = 1

# define the prediction configuration

class PredictionConfig(Config):

# define the name of the configuration

NAME = "kangaroo_cfg"

# number of classes (background + kangaroo)

NUM_CLASSES = 1 + 1

# simplify GPU config

GPU_COUNT = 1

IMAGES_PER_GPU = 1

Next, we can define the model with the config and set the ‘mode‘ argument to ‘inference‘ instead of ‘training‘.

# create config
cfg = PredictionConfig()
# define the model
model = MaskRCNN(mode='inference', model_dir='./', config=cfg)

# create config

cfg = PredictionConfig()

# define the model

model = MaskRCNN(mode='inference', model_dir='./', config=cfg)

Next, we can load the weights from our saved model.

We can do that by specifying the path to the model file. In this case, the model file is ‘mask_rcnn_kangaroo_cfg_0005.h5‘ in the current working directory.

# load model weights
model.load_weights('mask_rcnn_kangaroo_cfg_0005.h5', by_name=True)

1 2	# load model weights model.load_weights('mask_rcnn_kangaroo_cfg_0005.h5', by_name=True)

Next, we can evaluate the model. This involves enumerating the images in a dataset, making a prediction, and calculating the AP for the prediction before predicting a mean AP across all images.

First, the image and ground truth mask can be loaded from the dataset for a given image_id. This can be achieved using the load_image_gt() convenience function.

# load image, bounding boxes and masks for the image id
image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)

1 2	# load image, bounding boxes and masks for the image id image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)

Next, the pixel values of the loaded image must be scaled in the same way as was performed on the training data, e.g. centered. This can be achieved using the mold_image() convenience function.

# convert pixel values (e.g. center)
scaled_image = mold_image(image, cfg)

1 2	# convert pixel values (e.g. center) scaled_image = mold_image(image, cfg)

The dimensions of the image then need to be expanded one sample in a dataset and used as input to make a prediction with the model.

sample = expand_dims(scaled_image, 0)
# make prediction
yhat = model.detect(sample, verbose=0)
# extract results for first sample
r = yhat[0]

sample = expand_dims(scaled_image, 0)

# make prediction

yhat = model.detect(sample, verbose=0)

# extract results for first sample

r = yhat[0]

Next, the prediction can be compared to the ground truth and metrics calculated using the compute_ap() function.

# calculate statistics, including AP
AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'])

1 2	# calculate statistics, including AP AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'])

The AP values can be added to a list, then the mean value calculated.

Tying this together, the evaluate_model() function below implements this and calculates the mAP given a dataset, model and configuration.

# calculate the mAP for a model on a given dataset
def evaluate_model(dataset, model, cfg):
	APs = list()
	for image_id in dataset.image_ids:
		# load image, bounding boxes and masks for the image id
		image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)
		# convert pixel values (e.g. center)
		scaled_image = mold_image(image, cfg)
		# convert image into one sample
		sample = expand_dims(scaled_image, 0)
		# make prediction
		yhat = model.detect(sample, verbose=0)
		# extract results for first sample
		r = yhat[0]
		# calculate statistics, including AP
		AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'])
		# store
		APs.append(AP)
	# calculate the mean AP across all images
	mAP = mean(APs)
	return mAP

# calculate the mAP for a model on a given dataset

def evaluate_model(dataset, model, cfg):

APs = list()

for image_id in dataset.image_ids:

# load image, bounding boxes and masks for the image id

image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)

# convert pixel values (e.g. center)

scaled_image = mold_image(image, cfg)

# convert image into one sample

sample = expand_dims(scaled_image, 0)

# make prediction

yhat = model.detect(sample, verbose=0)

# extract results for first sample

r = yhat[0]

# calculate statistics, including AP

AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'])

# store

APs.append(AP)

# calculate the mean AP across all images

mAP = mean(APs)

return mAP

We can now calculate the mAP for the model on the train and test datasets.

# evaluate model on training dataset
train_mAP = evaluate_model(train_set, model, cfg)
print("Train mAP: %.3f" % train_mAP)
# evaluate model on test dataset
test_mAP = evaluate_model(test_set, model, cfg)
print("Test mAP: %.3f" % test_mAP)

# evaluate model on training dataset

train_mAP = evaluate_model(train_set, model, cfg)

print("Train mAP: %.3f" % train_mAP)

# evaluate model on test dataset

test_mAP = evaluate_model(test_set, model, cfg)

print("Test mAP: %.3f" % test_mAP)

The full code listing is provided below for completeness.

# evaluate the mask rcnn model on the kangaroo dataset
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from numpy import expand_dims
from numpy import mean
from mrcnn.config import Config
from mrcnn.model import MaskRCNN
from mrcnn.utils import Dataset
from mrcnn.utils import compute_ap
from mrcnn.model import load_image_gt
from mrcnn.model import mold_image

# class that defines and loads the kangaroo dataset
class KangarooDataset(Dataset):
	# load the dataset definitions
	def load_dataset(self, dataset_dir, is_train=True):
		# define one class
		self.add_class("dataset", 1, "kangaroo")
		# define data locations
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		# find all images
		for filename in listdir(images_dir):
			# extract image id
			image_id = filename[:-4]
			# skip bad images
			if image_id in ['00090']:
				continue
			# skip all images after 150 if we are building the train set
			if is_train and int(image_id) >= 150:
				continue
			# skip all images before 150 if we are building the test/val set
			if not is_train and int(image_id) < 150:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			# add to dataset
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	# extract bounding boxes from an annotation file
	def extract_boxes(self, filename):
		# load and parse the file
		tree = ElementTree.parse(filename)
		# get the root of the document
		root = tree.getroot()
		# extract each bounding box
		boxes = list()
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		# extract image dimensions
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# load the masks for an image
	def load_mask(self, image_id):
		# get details of image
		info = self.image_info[image_id]
		# define box file location
		path = info['annotation']
		# load XML
		boxes, w, h = self.extract_boxes(path)
		# create one array for all masks, each on a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create masks
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

	# load an image reference
	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']

# define the prediction configuration
class PredictionConfig(Config):
	# define the name of the configuration
	NAME = "kangaroo_cfg"
	# number of classes (background + kangaroo)
	NUM_CLASSES = 1 + 1
	# simplify GPU config
	GPU_COUNT = 1
	IMAGES_PER_GPU = 1

# calculate the mAP for a model on a given dataset
def evaluate_model(dataset, model, cfg):
	APs = list()
	for image_id in dataset.image_ids:
		# load image, bounding boxes and masks for the image id
		image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)
		# convert pixel values (e.g. center)
		scaled_image = mold_image(image, cfg)
		# convert image into one sample
		sample = expand_dims(scaled_image, 0)
		# make prediction
		yhat = model.detect(sample, verbose=0)
		# extract results for first sample
		r = yhat[0]
		# calculate statistics, including AP
		AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'])
		# store
		APs.append(AP)
	# calculate the mean AP across all images
	mAP = mean(APs)
	return mAP

# load the train dataset
train_set = KangarooDataset()
train_set.load_dataset('kangaroo', is_train=True)
train_set.prepare()
print('Train: %d' % len(train_set.image_ids))
# load the test dataset
test_set = KangarooDataset()
test_set.load_dataset('kangaroo', is_train=False)
test_set.prepare()
print('Test: %d' % len(test_set.image_ids))
# create config
cfg = PredictionConfig()
# define the model
model = MaskRCNN(mode='inference', model_dir='./', config=cfg)
# load model weights
model.load_weights('mask_rcnn_kangaroo_cfg_0005.h5', by_name=True)
# evaluate model on training dataset
train_mAP = evaluate_model(train_set, model, cfg)
print("Train mAP: %.3f" % train_mAP)
# evaluate model on test dataset
test_mAP = evaluate_model(test_set, model, cfg)
print("Test mAP: %.3f" % test_mAP)

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

# evaluate the mask rcnn model on the kangaroo dataset

from os import listdir

from xml.etree import ElementTree

from numpy import zeros

from numpy import asarray

from numpy import expand_dims

from numpy import mean

from mrcnn.config import Config

from mrcnn.model import MaskRCNN

from mrcnn.utils import Dataset

from mrcnn.utils import compute_ap

from mrcnn.model import load_image_gt

from mrcnn.model import mold_image

# class that defines and loads the kangaroo dataset

class KangarooDataset(Dataset):

# load the dataset definitions

def load_dataset(self, dataset_dir, is_train=True):

# define one class

self.add_class("dataset", 1, "kangaroo")

# define data locations

images_dir = dataset_dir + '/images/'

annotations_dir = dataset_dir + '/annots/'

# find all images

for filename in listdir(images_dir):

# extract image id

image_id = filename[:-4]

# skip bad images

if image_id in ['00090']:

continue

# skip all images after 150 if we are building the train set

if is_train and int(image_id) >= 150:

continue

# skip all images before 150 if we are building the test/val set

if not is_train and int(image_id) < 150:

continue

img_path = images_dir + filename

ann_path = annotations_dir + image_id + '.xml'

# add to dataset

self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

# extract bounding boxes from an annotation file

def extract_boxes(self, filename):

# load and parse the file

tree = ElementTree.parse(filename)

# get the root of the document

root = tree.getroot()

# extract each bounding box

boxes = list()

for box in root.findall('.//bndbox'):

xmin = int(box.find('xmin').text)

ymin = int(box.find('ymin').text)

xmax = int(box.find('xmax').text)

ymax = int(box.find('ymax').text)

coors = [xmin, ymin, xmax, ymax]

boxes.append(coors)

# extract image dimensions

width = int(root.find('.//size/width').text)

height = int(root.find('.//size/height').text)

return boxes, width, height

# load the masks for an image

def load_mask(self, image_id):

# get details of image

info = self.image_info[image_id]

# define box file location

path = info['annotation']

# load XML

boxes, w, h = self.extract_boxes(path)

# create one array for all masks, each on a different channel

masks = zeros([h, w, len(boxes)], dtype='uint8')

# create masks

class_ids = list()

for i in range(len(boxes)):

box = boxes[i]

row_s, row_e = box[1], box[3]

col_s, col_e = box[0], box[2]

masks[row_s:row_e, col_s:col_e, i] = 1

class_ids.append(self.class_names.index('kangaroo'))

return masks, asarray(class_ids, dtype='int32')

# load an image reference

def image_reference(self, image_id):

info = self.image_info[image_id]

return info['path']

# define the prediction configuration

class PredictionConfig(Config):

# define the name of the configuration

NAME = "kangaroo_cfg"

# number of classes (background + kangaroo)

NUM_CLASSES = 1 + 1

# simplify GPU config

GPU_COUNT = 1

IMAGES_PER_GPU = 1

# calculate the mAP for a model on a given dataset

def evaluate_model(dataset, model, cfg):

APs = list()

for image_id in dataset.image_ids:

# load image, bounding boxes and masks for the image id

image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)

# convert pixel values (e.g. center)

scaled_image = mold_image(image, cfg)

# convert image into one sample

sample = expand_dims(scaled_image, 0)

# make prediction

yhat = model.detect(sample, verbose=0)

# extract results for first sample

r = yhat[0]

# calculate statistics, including AP

AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'])

# store

APs.append(AP)

# calculate the mean AP across all images

mAP = mean(APs)

return mAP

# load the train dataset

train_set = KangarooDataset()

train_set.load_dataset('kangaroo', is_train=True)

train_set.prepare()

print('Train: %d' % len(train_set.image_ids))

# load the test dataset

test_set = KangarooDataset()

test_set.load_dataset('kangaroo', is_train=False)

test_set.prepare()

print('Test: %d' % len(test_set.image_ids))

# create config

cfg = PredictionConfig()

# define the model

model = MaskRCNN(mode='inference', model_dir='./', config=cfg)

# load model weights

model.load_weights('mask_rcnn_kangaroo_cfg_0005.h5', by_name=True)

# evaluate model on training dataset

train_mAP = evaluate_model(train_set, model, cfg)

print("Train mAP: %.3f" % train_mAP)

# evaluate model on test dataset

test_mAP = evaluate_model(test_set, model, cfg)

print("Test mAP: %.3f" % test_mAP)

Running the example will make a prediction for each image in the train and test datasets and calculate the mAP for each.

A mAP above 90% or 95% is a good score. We can see that the mAP score is good on both datasets, and perhaps slightly better on the test dataset, instead of the train dataset.

This may be because the dataset is very small, and/or because the model could benefit from further training.

Train mAP: 0.929
Test mAP: 0.958

1 2	Train mAP: 0.929 Test mAP: 0.958

Now that we have some confidence that the model is sensible, we can use it to make some predictions.

How to Detect Kangaroos in New Photos

We can use the trained model to detect kangaroos in new photographs, specifically, in photos that we expect to have kangaroos.

First, we need a new photo of a kangaroo.

We could go to Flickr and find a random photo of a kangaroo. Alternately, we can use any of the photos in the test dataset that were not used to train the model.

We have already seen in the previous section how to make a prediction with an image. Specifically, scaling the pixel values and calling model.detect(). For example:

# example of making a prediction
...
# load image
image = ...
# convert pixel values (e.g. center)
scaled_image = mold_image(image, cfg)
# convert image into one sample
sample = expand_dims(scaled_image, 0)
# make prediction
yhat = model.detect(sample, verbose=0)
...

# example of making a prediction

...

# load image

image = ...

# convert pixel values (e.g. center)

scaled_image = mold_image(image, cfg)

# convert image into one sample

sample = expand_dims(scaled_image, 0)

# make prediction

yhat = model.detect(sample, verbose=0)

...

Let’s take it one step further and make predictions for a number of images in a dataset, then plot the photo with bounding boxes side-by-side with the photo and the predicted bounding boxes. This will provide a visual guide to how good the model is at making predictions.

The first step is to load the image and mask from the dataset.

# load the image and mask
image = dataset.load_image(image_id)
mask, _ = dataset.load_mask(image_id)

# load the image and mask

image = dataset.load_image(image_id)

mask, _ = dataset.load_mask(image_id)

Next, we can make a prediction for the image.

# convert pixel values (e.g. center)
scaled_image = mold_image(image, cfg)
# convert image into one sample
sample = expand_dims(scaled_image, 0)
# make prediction
yhat = model.detect(sample, verbose=0)[0]

# convert pixel values (e.g. center)

scaled_image = mold_image(image, cfg)

# convert image into one sample

sample = expand_dims(scaled_image, 0)

# make prediction

yhat = model.detect(sample, verbose=0)[0]

Next, we can create a subplot for the ground truth and plot the image with the known bounding boxes.

# define subplot
pyplot.subplot(n_images, 2, i*2+1)
# plot raw pixel data
pyplot.imshow(image)
pyplot.title('Actual')
# plot masks
for j in range(mask.shape[2]):
	pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)

# define subplot

pyplot.subplot(n_images, 2, i*2+1)

# plot raw pixel data

pyplot.imshow(image)

pyplot.title('Actual')

# plot masks

for j in range(mask.shape[2]):

pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)

We can then create a second subplot beside the first and plot the first, plot the photo again, and this time draw the predicted bounding boxes in red.

# get the context for drawing boxes
pyplot.subplot(n_images, 2, i*2+2)
# plot raw pixel data
pyplot.imshow(image)
pyplot.title('Predicted')
ax = pyplot.gca()
# plot each box
for box in yhat['rois']:
	# get coordinates
	y1, x1, y2, x2 = box
	# calculate width and height of the box
	width, height = x2 - x1, y2 - y1
	# create the shape
	rect = Rectangle((x1, y1), width, height, fill=False, color='red')
	# draw the box
	ax.add_patch(rect)

# get the context for drawing boxes

pyplot.subplot(n_images, 2, i*2+2)

# plot raw pixel data

pyplot.imshow(image)

pyplot.title('Predicted')

ax = pyplot.gca()

# plot each box

for box in yhat['rois']:

# get coordinates

y1, x1, y2, x2 = box

# calculate width and height of the box

width, height = x2 - x1, y2 - y1

# create the shape

rect = Rectangle((x1, y1), width, height, fill=False, color='red')

# draw the box

ax.add_patch(rect)

We can tie all of this together into a function that takes a dataset, model, and config and creates a plot of the first five photos in the dataset with ground truth and predicted bound boxes.

# plot a number of photos with ground truth and predictions
def plot_actual_vs_predicted(dataset, model, cfg, n_images=5):
	# load image and mask
	for i in range(n_images):
		# load the image and mask
		image = dataset.load_image(i)
		mask, _ = dataset.load_mask(i)
		# convert pixel values (e.g. center)
		scaled_image = mold_image(image, cfg)
		# convert image into one sample
		sample = expand_dims(scaled_image, 0)
		# make prediction
		yhat = model.detect(sample, verbose=0)[0]
		# define subplot
		pyplot.subplot(n_images, 2, i*2+1)
		# plot raw pixel data
		pyplot.imshow(image)
		pyplot.title('Actual')
		# plot masks
		for j in range(mask.shape[2]):
			pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)
		# get the context for drawing boxes
		pyplot.subplot(n_images, 2, i*2+2)
		# plot raw pixel data
		pyplot.imshow(image)
		pyplot.title('Predicted')
		ax = pyplot.gca()
		# plot each box
		for box in yhat['rois']:
			# get coordinates
			y1, x1, y2, x2 = box
			# calculate width and height of the box
			width, height = x2 - x1, y2 - y1
			# create the shape
			rect = Rectangle((x1, y1), width, height, fill=False, color='red')
			# draw the box
			ax.add_patch(rect)
	# show the figure
	pyplot.show()

# plot a number of photos with ground truth and predictions

def plot_actual_vs_predicted(dataset, model, cfg, n_images=5):

# load image and mask

for i in range(n_images):

# load the image and mask

image = dataset.load_image(i)

mask, _ = dataset.load_mask(i)

# convert pixel values (e.g. center)

scaled_image = mold_image(image, cfg)

# convert image into one sample

sample = expand_dims(scaled_image, 0)

# make prediction

yhat = model.detect(sample, verbose=0)[0]

# define subplot

pyplot.subplot(n_images, 2, i*2+1)

# plot raw pixel data

pyplot.imshow(image)

pyplot.title('Actual')

# plot masks

for j in range(mask.shape[2]):

pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)

# get the context for drawing boxes

pyplot.subplot(n_images, 2, i*2+2)

# plot raw pixel data

pyplot.imshow(image)

pyplot.title('Predicted')

ax = pyplot.gca()

# plot each box

for box in yhat['rois']:

# get coordinates

y1, x1, y2, x2 = box

# calculate width and height of the box

width, height = x2 - x1, y2 - y1

# create the shape

rect = Rectangle((x1, y1), width, height, fill=False, color='red')

# draw the box

ax.add_patch(rect)

# show the figure

pyplot.show()

The complete example of loading the trained model and making a prediction for the first few images in the train and test datasets is listed below.

# detect kangaroos in photos with mask rcnn model
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from numpy import expand_dims
from matplotlib import pyplot
from matplotlib.patches import Rectangle
from mrcnn.config import Config
from mrcnn.model import MaskRCNN
from mrcnn.model import mold_image
from mrcnn.utils import Dataset

# class that defines and loads the kangaroo dataset
class KangarooDataset(Dataset):
	# load the dataset definitions
	def load_dataset(self, dataset_dir, is_train=True):
		# define one class
		self.add_class("dataset", 1, "kangaroo")
		# define data locations
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		# find all images
		for filename in listdir(images_dir):
			# extract image id
			image_id = filename[:-4]
			# skip bad images
			if image_id in ['00090']:
				continue
			# skip all images after 150 if we are building the train set
			if is_train and int(image_id) >= 150:
				continue
			# skip all images before 150 if we are building the test/val set
			if not is_train and int(image_id) < 150:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			# add to dataset
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	# load all bounding boxes for an image
	def extract_boxes(self, filename):
		# load and parse the file
		root = ElementTree.parse(filename)
		boxes = list()
		# extract each bounding box
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		# extract image dimensions
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# load the masks for an image
	def load_mask(self, image_id):
		# get details of image
		info = self.image_info[image_id]
		# define box file location
		path = info['annotation']
		# load XML
		boxes, w, h = self.extract_boxes(path)
		# create one array for all masks, each on a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create masks
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

	# load an image reference
	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']

# define the prediction configuration
class PredictionConfig(Config):
	# define the name of the configuration
	NAME = "kangaroo_cfg"
	# number of classes (background + kangaroo)
	NUM_CLASSES = 1 + 1
	# simplify GPU config
	GPU_COUNT = 1
	IMAGES_PER_GPU = 1

# plot a number of photos with ground truth and predictions
def plot_actual_vs_predicted(dataset, model, cfg, n_images=5):
	# load image and mask
	for i in range(n_images):
		# load the image and mask
		image = dataset.load_image(i)
		mask, _ = dataset.load_mask(i)
		# convert pixel values (e.g. center)
		scaled_image = mold_image(image, cfg)
		# convert image into one sample
		sample = expand_dims(scaled_image, 0)
		# make prediction
		yhat = model.detect(sample, verbose=0)[0]
		# define subplot
		pyplot.subplot(n_images, 2, i*2+1)
		# plot raw pixel data
		pyplot.imshow(image)
		pyplot.title('Actual')
		# plot masks
		for j in range(mask.shape[2]):
			pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)
		# get the context for drawing boxes
		pyplot.subplot(n_images, 2, i*2+2)
		# plot raw pixel data
		pyplot.imshow(image)
		pyplot.title('Predicted')
		ax = pyplot.gca()
		# plot each box
		for box in yhat['rois']:
			# get coordinates
			y1, x1, y2, x2 = box
			# calculate width and height of the box
			width, height = x2 - x1, y2 - y1
			# create the shape
			rect = Rectangle((x1, y1), width, height, fill=False, color='red')
			# draw the box
			ax.add_patch(rect)
	# show the figure
	pyplot.show()

# load the train dataset
train_set = KangarooDataset()
train_set.load_dataset('kangaroo', is_train=True)
train_set.prepare()
print('Train: %d' % len(train_set.image_ids))
# load the test dataset
test_set = KangarooDataset()
test_set.load_dataset('kangaroo', is_train=False)
test_set.prepare()
print('Test: %d' % len(test_set.image_ids))
# create config
cfg = PredictionConfig()
# define the model
model = MaskRCNN(mode='inference', model_dir='./', config=cfg)
# load model weights
model_path = 'mask_rcnn_kangaroo_cfg_0005.h5'
model.load_weights(model_path, by_name=True)
# plot predictions for train dataset
plot_actual_vs_predicted(train_set, model, cfg)
# plot predictions for test dataset
plot_actual_vs_predicted(test_set, model, cfg)

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

# detect kangaroos in photos with mask rcnn model

from os import listdir

from xml.etree import ElementTree

from numpy import zeros

from numpy import asarray

from numpy import expand_dims

from matplotlib import pyplot

from matplotlib.patches import Rectangle

from mrcnn.config import Config

from mrcnn.model import MaskRCNN

from mrcnn.model import mold_image

from mrcnn.utils import Dataset

# class that defines and loads the kangaroo dataset

class KangarooDataset(Dataset):

# load the dataset definitions

def load_dataset(self, dataset_dir, is_train=True):

# define one class

self.add_class("dataset", 1, "kangaroo")

# define data locations

images_dir = dataset_dir + '/images/'

annotations_dir = dataset_dir + '/annots/'

# find all images

for filename in listdir(images_dir):

# extract image id

image_id = filename[:-4]

# skip bad images

if image_id in ['00090']:

continue

# skip all images after 150 if we are building the train set

if is_train and int(image_id) >= 150:

continue

# skip all images before 150 if we are building the test/val set

if not is_train and int(image_id) < 150:

continue

img_path = images_dir + filename

ann_path = annotations_dir + image_id + '.xml'

# add to dataset

self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

# load all bounding boxes for an image

def extract_boxes(self, filename):

# load and parse the file

root = ElementTree.parse(filename)

boxes = list()

# extract each bounding box

for box in root.findall('.//bndbox'):

xmin = int(box.find('xmin').text)

ymin = int(box.find('ymin').text)

xmax = int(box.find('xmax').text)

ymax = int(box.find('ymax').text)

coors = [xmin, ymin, xmax, ymax]

boxes.append(coors)

# extract image dimensions

width = int(root.find('.//size/width').text)

height = int(root.find('.//size/height').text)

return boxes, width, height

# load the masks for an image

def load_mask(self, image_id):

# get details of image

info = self.image_info[image_id]

# define box file location

path = info['annotation']

# load XML

boxes, w, h = self.extract_boxes(path)

# create one array for all masks, each on a different channel

masks = zeros([h, w, len(boxes)], dtype='uint8')

# create masks

class_ids = list()

for i in range(len(boxes)):

box = boxes[i]

row_s, row_e = box[1], box[3]

col_s, col_e = box[0], box[2]

masks[row_s:row_e, col_s:col_e, i] = 1

class_ids.append(self.class_names.index('kangaroo'))

return masks, asarray(class_ids, dtype='int32')

# load an image reference

def image_reference(self, image_id):

info = self.image_info[image_id]

return info['path']

# define the prediction configuration

class PredictionConfig(Config):

# define the name of the configuration

NAME = "kangaroo_cfg"

# number of classes (background + kangaroo)

NUM_CLASSES = 1 + 1

# simplify GPU config

GPU_COUNT = 1

IMAGES_PER_GPU = 1

# plot a number of photos with ground truth and predictions

def plot_actual_vs_predicted(dataset, model, cfg, n_images=5):

# load image and mask

for i in range(n_images):

# load the image and mask

image = dataset.load_image(i)

mask, _ = dataset.load_mask(i)

# convert pixel values (e.g. center)

scaled_image = mold_image(image, cfg)

# convert image into one sample

sample = expand_dims(scaled_image, 0)

# make prediction

yhat = model.detect(sample, verbose=0)[0]

# define subplot

pyplot.subplot(n_images, 2, i*2+1)

# plot raw pixel data

pyplot.imshow(image)

pyplot.title('Actual')

# plot masks

for j in range(mask.shape[2]):

pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)

# get the context for drawing boxes

pyplot.subplot(n_images, 2, i*2+2)

# plot raw pixel data

pyplot.imshow(image)

pyplot.title('Predicted')

ax = pyplot.gca()

# plot each box

for box in yhat['rois']:

# get coordinates

y1, x1, y2, x2 = box

# calculate width and height of the box

width, height = x2 - x1, y2 - y1

# create the shape

rect = Rectangle((x1, y1), width, height, fill=False, color='red')

# draw the box

ax.add_patch(rect)

# show the figure

pyplot.show()

# load the train dataset

train_set = KangarooDataset()

train_set.load_dataset('kangaroo', is_train=True)

train_set.prepare()

print('Train: %d' % len(train_set.image_ids))

# load the test dataset

test_set = KangarooDataset()

test_set.load_dataset('kangaroo', is_train=False)

test_set.prepare()

print('Test: %d' % len(test_set.image_ids))

# create config

cfg = PredictionConfig()

# define the model

model = MaskRCNN(mode='inference', model_dir='./', config=cfg)

# load model weights

model_path = 'mask_rcnn_kangaroo_cfg_0005.h5'

model.load_weights(model_path, by_name=True)

# plot predictions for train dataset

plot_actual_vs_predicted(train_set, model, cfg)

# plot predictions for test dataset

plot_actual_vs_predicted(test_set, model, cfg)

Running the example first creates a figure showing five photos from the training dataset with the ground truth bounding boxes, with the same photo and the predicted bounding boxes alongside.

We can see that the model has done well on these examples, finding all of the kangaroos, even in the case where there are two or three in one photo. The second photo down (in the right column) does show a slip-up where the model has predicted a bounding box around the same kangaroo twice.

Plot of Photos of Kangaroos From the Training Dataset With Ground Truth and Predicted Bounding Boxes

A second figure is created showing five photos from the test dataset with ground truth bounding boxes and predicted bounding boxes.

These are images not seen during training, and again, in each photo, the model has detected the kangaroo. We can see that in the case of the second last photo that a minor mistake was made. Specifically, the same kangaroo was detected multiple times.

No doubt these differences can be ironed out with more training, perhaps with a larger dataset and/or data augmentation, to encourage the model to detect people as background and to detect a given kangaroo once only.

Plot of Photos of Kangaroos From the Training Dataset With Ground Truth and Predicted Bounding Boxes

Summary

In this tutorial, you discovered how to develop a Mask R-CNN model for kangaroo object detection in photographs.

Specifically, you learned:

How to prepare an object detection dataset ready for modeling with an R-CNN.
How to use transfer learning to train an object detection model on a new dataset.
How to evaluate a fit Mask R-CNN model on a test dataset and make predictions on new photos.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

666 Responses to How to Train an Object Detection Model with Keras

Milemi June 1, 2019 at 5:38 am #

Great tutorial !
Could you give us advice how to annotate images, please ?
What is the best practice ?
How many images per object is enough ?
How to annotate when there are several objects in the same image and they overlap ?
Thank you.

Reply
- Jason Brownlee June 1, 2019 at 6:17 am #
  
  Great questions, thanks!
  
  I hope to cover the topic in the future.
  
  Reply
  - Yajuan October 15, 2020 at 7:40 pm #
    
    Dear Dr. Jason,
    
    I am a student from China. I am dealing with a problem related to scene classification and wondering if you could provide some good methods and materials.
    
    best, looking forward to hearing from you, Thank you for your time.
    
    Warm regards,
    Yajuan Xu
    
    Reply
    - Jason Brownlee October 16, 2020 at 5:52 am #
      
      Thanks for the suggestion, I hope to write about the topic in the future.
      
      Reply
- Usama Ahmed October 7, 2019 at 8:40 pm #
  
  Here is the image annotation tool.
  
  https://github.com/tzutalin/labelImg
  
  Reply
  - Jason Brownlee October 8, 2019 at 7:59 am #
    
    Thanks for sharing.
    
    Reply
- waleed June 4, 2020 at 11:51 pm #
  
  HEY JASON I need help in my satellite building images dataset I have labels in JSON format in which image coordinates in polygon shapes more than 4 points so mask rcnn is suitable for this kind of dataset because RPN needs 4 points to make a box but I have more than 4 points in my labels and annotated images as well so how it works for polygon? please help is there any method to convert polygon to 4 coordinates or any function which can help.
  
  Reply
  - Jason Brownlee June 5, 2020 at 8:13 am #
    
    There maybe, I’m not sure off hand sorry.
    
    Reply
- harsh verma October 9, 2021 at 8:20 pm #
  
  use labelimg repo
  
  Reply
simonYU June 4, 2019 at 6:05 pm #

hi, Jason, while display_instances:

running : display_instances(image, bbox, mask, class_ids, train_set.class_names)

An error ocurred while starting the kernel ,

home/user/anaconda3/bin/python: symbol lookup error: /home/user/anaconda3/lib/python3.6/site‑packages/numpy/core/../../../../libmkl_intel_thread.so: undefined symbol: __kmpc_global_thread_num

Pls find the solution .
thanks

Reply
- Jason Brownlee June 5, 2019 at 8:33 am #
  
  This looks like it might be an issue with your library installation.
  
  Perhaps this post will help you to setup your development environment:
  https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/
  
  Reply
  - simonYU June 5, 2019 at 8:12 pm #
    
    yes!thanks for your guide !and now , it works well on Win10 plateform although still running for much much more time waiting .
    The issue is focused on the MKL lib .
    
    But for Ubuntu issue remained ……
    
    Reply
    - Jason Brownlee June 6, 2019 at 6:23 am #
      
      Nice work!
      
      Reply
      - SimonYu June 10, 2019 at 12:16 pm #
        
        hi,Jason, thank for your kind tutorials , and for this case-study,
        what is the function of train_set.prepare()?
        Please provide much more of HowTo about it ? thanks !
      - Jason Brownlee June 10, 2019 at 1:57 pm #
        
        Good question, you can see what prepare() does here:
        https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/utils.py#L294
    - Amine December 23, 2019 at 1:02 am #
      
      Hi any developments?
      
      Reply
roopesh June 4, 2019 at 6:19 pm #

very nice steps !! How to predict with real time video (CCTV) instead of images, Thanks.

Reply
- Jason Brownlee June 5, 2019 at 8:33 am #
  
  Great suggestion, I hope to cover it in the future.
  
  Reply
- Usama Ahmed September 26, 2019 at 11:59 pm #
  
  Use OpenCV to capture video from attached camera.
  
  Reply
  - Jason Brownlee September 27, 2019 at 8:02 am #
    
    Agreed!
    
    Reply
maryam June 8, 2019 at 5:02 am #

Hi Jason,
Thank you very much for the precious tutorial. I face a problem in people counting project when I am going to track people though detecting them is not hard.
would you please give me a tutorial about the best tracking methods such as “deep tracking” or other else?
Best
Maryam

Reply
- Jason Brownlee June 8, 2019 at 7:05 am #
  
  Thanks for the suggestion.
  
  Reply

gary June 19, 2019 at 8:51 pm #

Thank you very much for such a beautiful yet detailed tutorial. Its been great learning from you.

Jason Brownlee June 20, 2019 at 8:31 am #

Thanks, I’m glad it helped.

gary June 24, 2019 at 10:27 pm #

Hi jason, i am trying to train multiple object, how can i change the code to import multiple classes?
Do i use multiple lines of:
self.add_class(“dataset”, 1, “kangaroo”)
self.add_class(“dataset”, 2, “tiger”)?

Jason Brownlee June 25, 2019 at 6:20 am #

You can specify all of your classes with a unique integer.

Romell Domínguez August 28, 2019 at 6:47 am #

Hi Jason, and then just add each image using:
self.add_class(“dataset”, 1, “kangaroo”)
self.add_class(“dataset”, 2, “tiger”)?

self.add_image(‘dataset’, … )
what parameter do i need set for identify ‘the class’
Romell Domínguez August 30, 2019 at 8:25 pm #

i solve that problem for polygons shapes
Jason Brownlee August 31, 2019 at 6:04 am #

I’m happy to hear that, well done!
Biki November 8, 2019 at 12:28 am #

Please let me know how to do this

Akshay February 12, 2020 at 3:01 am #

If we have both kangaroo and tiger inside single image, then how van I load the mask?

self.add_class(“dataset”, 1, “kangaroo”)
self.add_class(“dataset”, 2, “tiger”)

I meant this part!!!

	def load_mask(self, image_id):
		# get details of image
		info = self.image_info[image_id]
		# define box file location
		path = info['annotation']
		# load XML
		boxes, w, h = self.extract_boxes(path)
		# create one array for all masks, each on a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create masks
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

def load_mask(self, image_id):

# get details of image

info = self.image_info[image_id]

# define box file location

path = info['annotation']

# load XML

boxes, w, h = self.extract_boxes(path)

# create one array for all masks, each on a different channel

masks = zeros([h, w, len(boxes)], dtype='uint8')

# create masks

class_ids = list()

for i in range(len(boxes)):

box = boxes[i]

row_s, row_e = box[1], box[3]

col_s, col_e = box[0], box[2]

masks[row_s:row_e, col_s:col_e, i] = 1

class_ids.append(self.class_names.index('kangaroo'))

return masks, asarray(class_ids, dtype='int32')

Jason Brownlee February 12, 2020 at 5:50 am #

Sorry, I cannot review/debug your code.

marry June 20, 2019 at 12:27 pm #

ValueError: Dimension 1 in both shapes must be equal, but are 8 and 16. Shapes are [1024,8] and [1024,16]. for ‘Assign_682’ (op: ‘Assign’) with input shapes: [1024,8], [1024,16].

hello,Jason,How to solve this error when calculating the mAP value?

Reply
- Jason Brownlee June 20, 2019 at 2:00 pm #
  
  Sorry to hear that, I have some suggestions here that might help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
- anish jain April 13, 2020 at 8:22 pm #
  
  If we have both tom and jerry inside single image, then how van I load the mask?
  
  self.add_class(“dataset”, 1, “jerry”)
  self.add_class(“dataset”, 2, “tom”)
  
  is it true to do same
  
  Reply
mahmoud July 10, 2019 at 6:30 pm #

hi jason,

I want to inquire about this file ~~mask_rcnn_kangaroo_cfg_0005.h5 ,
how i can find it also why you seprate the training and predicting
,I mean at the last version of file it contains only the predicting with out the training ,how the model have saved the new weights after training so it can be used on the predicting step

Reply
- Jason Brownlee July 11, 2019 at 9:46 am #
  
  The model is fit on the training dataset, saved, loaded and used to make prediction on a hold out test dataset.
  
  Does that help?
  
  Reply
  - mahmoud July 11, 2019 at 6:18 pm #
    
    ya but my question befor train on dataset kangaroo i load weights to model
    # load weights (mscoco) and exclude the output layers
    model.load_weights(‘mask_rcnn_coco.h5’, by_name=True, exclude=[“mrcnn_class_logits”, “mrcnn_bbox_fc”, “mrcnn_bbox”, “mrcnn_mask”])
    then after training
    # load model weights
    model.load_weights(‘mask_rcnn_kangaroo_cfg_0005.h5’, by_name=True)
    why we load the weights again
    
    i have the file mask_rcnn_coco.h5, i think it have any initial weights ,but i do not know what is the file mask_rcnn_kangaroo_cfg_0005.h5 contains and where i can find this problem
    
    Reply
    - Jason Brownlee July 12, 2019 at 8:32 am #
      
      The new set of weights is focused on only detecting kangaroos based on our own dataset.
      
      Does that help?
      
      Reply
      - mahmoud July 12, 2019 at 5:58 pm #
        
        ya but i can not find this new set of weights ,i mean when it creats the file mask_rcnn_kangaroo_cfg_0005.h5
      - Jason Brownlee July 13, 2019 at 6:52 am #
        
        It will be in the same directory as the python file.
      - mahmoud July 14, 2019 at 7:25 am #
        
        thnx for your response ,another question how i can prepare my images to be on same structure of Kangaroo dataset to train and apply the model on it
      - Jason Brownlee July 14, 2019 at 8:18 am #
        
        It is not required, but it might be a helpful start if you are having trouble.
      - mahmoud eltaher July 15, 2019 at 7:27 pm #
        
        ya I need to do this because I want to implement the model on my problem so I have some images with some circles and I want to detect these circles
Wolverin July 13, 2019 at 3:29 am #

same problem with me, i am using google colab.

this ‘mask_rcnn_kangaroo_cfg_0005.h5’ file is created while training as said in the blog. but i cannot find anywhere in my gdrive.

Reply
- Jason Brownlee July 13, 2019 at 7:00 am #
  
  Perhaps try running on your workstation from the command line?
  
  Reply
- kevi October 8, 2019 at 1:03 am #
  
  I have ran on Google colab and .h5 saved in (/content/Mask_RCNN/kangaroo_cfg*/) folder. Check
  
  Reply
- zakaria October 10, 2019 at 3:40 am #
  
  Hi plz i am using colab google, i am having trouble installing RCNN Librery in python setup.py
  
  Reply
  - Jason Brownlee October 10, 2019 at 7:03 am #
    
    I don’t know about colab, sorry.
    
    Perhaps try posting on stackoverflow?
    
    Reply
Jeremy Immanuel Putra Tandjung July 16, 2019 at 2:25 pm #

Hello Jason,

First of all, nice tutorial! Having the overall code at the end of each step really helped keep track of where I am in the code! Keep up the good job!

I have a question, I notice that it took you on average a minute per epoch to train. However, I tried doing this with a different dataset and right now i’m on my first epoch and it’s ETA 3.5 hours. My desktop is fairly fast with a ryzen 7 cpu and a nvidia 1050Ti gpu.

So is there something that I’m missing? My training dataset consist of 296 pictures of playing cards in different situations with a total file size of 30.4 MB (I’m trying to train a model to detect playing cards)

Or is that a normal? Or is there some setting I’m missing?

Thanks!

Reply
- Jason Brownlee July 17, 2019 at 8:15 am #
  
  It may be a factor of the number of images?
  
  It may be hardware?
  
  Perhaps experiment on some p3 EC2 instances or with a smaller dataset?
  
  Reply
Choi July 16, 2019 at 4:51 pm #

Hi Jason.
This post is so helpful to me to learn R-CNN training!

As I do my work, I encounter some problems now.
First I train the model based on ‘mask_rcnn_coco.h5’ weight first.

So i got the model weight : ‘mask_rcnn_carpk_cfg_0010.h5’ file
how can i append more training images and train based on above file?

I just tried to append more images by load_images function, and next I trained the model by load_weights(‘mask_rcnn_carpk_cfg_0010.h5’, by_name=True, exclude=[“mrcnn_class_logits”, “mrcnn_bbox_fc”, “mrcnn_bbox”, “mrcnn_mask”])
But it did not work..

Is there any other things to set??

Thank you!!

Reply
- Jason Brownlee July 17, 2019 at 8:18 am #
  
  Good question, I don’t have an example of this sorry. You may need to dive into the mask rcnn API.
  
  Reply
  - Choi July 18, 2019 at 2:05 am #
    
    hmm.. Please could you tell me some recommended papers or blogs about Mask R-CNN API for implementing my task?
    
    Thank you!
    
    Reply
    - Jason Brownlee July 18, 2019 at 8:32 am #
      
      Perhaps start here:
      https://machinelearningmastery.com/how-to-perform-object-detection-in-photographs-with-mask-r-cnn-in-keras/
      
      Reply
Nathan Starliper July 17, 2019 at 5:33 am #

Hi Jason,

Great tutorial. However, I am bit confused as to why you used Mask RCNN instead of Faster RCNN? Mask RCNN is essentially Faster RCNN except with segmentation added. Here in this example you basically converted the segmentation into bounding boxes so it seems to me that it would have saved you quite a bit of effort and manual labor to just use Faster RCNN model instead?

Thanks,
Nate

Reply
- Jason Brownlee July 17, 2019 at 8:31 am #
  
  Good question.
  
  Optionality. We can do object detection which is what most people want, with ability to do segmentation if needed.
  
  Reply
SATYAM SAREEN July 22, 2019 at 7:43 pm #

Great Tutorial Sir,
I really learned a lot.
I have a doubt regarding multiclass detection. I have 2 classes: person with a helmet, person without a helmet. what changes should I make in the program? Like adding classes through add_class function.
Huge Respect and Love.
Satyam Sareen

Reply
- Jason Brownlee July 23, 2019 at 8:00 am #
  
  Perhaps this tutorial will help you train your model:
  https://machinelearningmastery.com/how-to-train-an-object-detection-model-with-keras/
  
  Reply
  - SATYAM SAREEN July 23, 2019 at 7:38 pm #
    
    Good Afternoon Sir,
    
    You have attached the link to the same blog. Can you suggest the changes to be made in your code so that it runs smoothly for multiclass object detection?
    
    Warm Regards
    Satyam Sareen
    
    Reply
    - Jason Brownlee July 24, 2019 at 7:52 am #
      
      What do you mean smoothly?
      
      Reply
    - AutoRoboCulture November 18, 2019 at 3:05 am #
      
      Hello Satyam Sareen,
      
      Check out the code below, I have changed it to your requirement. If any query comment it down. Keep it up!
      
      Code:
      
      class KangarooDataset(Dataset):
      # load the dataset definitions
      def load_dataset(self, dataset_dir, is_train=True):
      # define two class
      self.add_class(“dataset”, 1, “personWithHelmet”) #Change required
      self.add_class(“dataset”, 2, “personWithoutHelmet”) #Change required
      # define data locations
      images_dir = dataset_dir + ‘/images/’
      annotations_dir = dataset_dir + ‘/annots/’
      # find all images
      
      for filename in listdir(images_dir):
      # extract image id
      image_id = filename[:-4]
      #print(‘IMAGE ID: ‘,image_id)
      # skip all images after 90 if we are building the train set
      if is_train and int(image_id) >= 90: #set limit for your train and test set
      continue
      # skip all images before 90 if we are building the test/val set
      if not is_train and int(image_id) < 90:
      continue
      img_path = images_dir + filename
      ann_path = annotations_dir + image_id + '.xml'
      # add to dataset
      self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path, class_ids = [0,1,2]) # for your case it is 0:BG, 1:PerWithHel.., 2:PersonWithoutHel… #Change required
      
      # extract bounding boxes from an annotation file
      def extract_boxes(self, filename):
      # load and parse the file
      tree = ElementTree.parse(filename)
      # get the root of the document
      root = tree.getroot()
      # extract each bounding box
      boxes = list()
      #for box in root.findall('.//bndbox'):
      for box in root.findall('.//object'): #Change required
      name = box.find('name').text #Change required
      xmin = int(box.find('./bndbox/xmin').text)
      ymin = int(box.find('./bndbox/ymin').text)
      xmax = int(box.find('./bndbox/xmax').text)
      ymax = int(box.find('./bndbox/ymax').text)
      #coors = [xmin, ymin, xmax, ymax, name]
      coors = [xmin, ymin, xmax, ymax, name] #Change required
      boxes.append(coors)
      # extract image dimensions
      width = int(root.find('.//size/width').text)
      height = int(root.find('.//size/height').text)
      return boxes, width, height
      
      # load the masks for an image
      def load_mask(self, image_id):
      # get details of image
      info = self.image_info[image_id]
      # define box file location
      path = info['annotation']
      # load XML
      boxes, w, h = self.extract_boxes(path)
      # create one array for all masks, each on a different channel
      masks = zeros([h, w, len(boxes)], dtype='uint8')
      # create masks
      class_ids = list()
      for i in range(len(boxes)):
      box = boxes[i]
      row_s, row_e = box[1], box[3]
      col_s, col_e = box[0], box[2]
      if (box[4] == 'personWithHelmet'): #Change required #change this to your .XML file
      masks[row_s:row_e, col_s:col_e, i] = 2 #Change required #assign number to your class_id
      class_ids.append(self.class_names.index('personWithHelmet')) #Change required
      else:
      masks[row_s:row_e, col_s:col_e, i] = 1 #Change required
      class_ids.append(self.class_names.index('personWithoutHelmet')) #Change required
      
      return masks, asarray(class_ids, dtype='int32')
      
      # load an image reference
      def image_reference(self, image_id):
      info = self.image_info[image_id]
      return info['path']
      
      # define a configuration for the model
      class KangarooConfig(Config):
      # define the name of the configuration
      NAME = "kangaroo_cfg"
      # number of classes (background + personWithoutHelmet + personWithHelmet)
      NUM_CLASSES = 1 + 2 #Change required
      # number of training steps per epoch
      STEPS_PER_EPOCH = 90
      
      Reply
      - Jason Brownlee November 18, 2019 at 6:50 am #
        
        Thanks for sharing.
      - Ashutosh Srivastava February 11, 2020 at 10:21 pm #
        
        You are great @AutoRoboCulture.
      - Akshay February 12, 2020 at 3:22 am #
        
        Halo, I tried to train with multiple classes in a single image, I am gettting an error like this
        
        File “C:……….\lib\site-packages\keras\engine\training_utils.py”, line 145, in standardize_input_data
        str(data_shape))
        
        ValueError: Error when checking input: expected input_image to have shape (None, None, 1) but got array with shape (1024, 1024, 3).
        
        PS: I am working with gray scale images. and 3 classes. Inside single image both classes are present
      - Ademola Okerinde February 26, 2020 at 10:47 am #
        
        Thanks
      - Nourhan March 4, 2020 at 9:25 pm #
        
        Thank you so much for sharing these changes.
        However, after I followed all of them and adjusted the whole thing to fit my dataset, I keep getting this error:
        RuntimeError: generator raised StopIteration
        
        from that training line:
        model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers=’heads’)
        
        Do you have any suggestions to overcome it?
      - hila March 12, 2020 at 10:14 pm #
        
        hey i was using your code for the training.
        can you please show us your prediction code? i was trying to use Jason’s but it came with a lot of errors which i cannot solve.
      - Andy August 2, 2020 at 1:36 am #
        
        Can you share the prediction part, please?
mahmoud July 25, 2019 at 7:25 pm #

is this model also suppose to detect the mask of the objects ,for the kangaroo on the images or we will need some modification to segment the images.

Reply
- Jason Brownlee July 26, 2019 at 8:19 am #
  
  Yes, if masks are provided.
  
  In the case of kangaroos, we do not provide masks – just bounding boxes, therefore masks cannot be learned.
  
  Reply
  - mahmoud July 29, 2019 at 8:47 pm #
    
    when i try to test image with multiple kangaroos ,it failed to detect them is there are two kangaroos interference it detect them as only one ?? any advice
    
    Reply
    - Jason Brownlee July 30, 2019 at 6:11 am #
      
      Perhaps the model requires more training on photos with multiple kangaroos?
      
      Reply
      - mahmoud July 30, 2019 at 10:02 pm #
        
        thanks for your response, another question is there a new version of Mask RCNN avilable on github .
        also what i need to have mask on my model how i can provide the model and make my model learn it also
      - Jason Brownlee July 31, 2019 at 6:52 am #
        
        The model can learn the mask, if you provide a dataset that has masks on the images.
      - mahmoud August 22, 2019 at 7:22 pm #
        
        thanks for your response i confused about some thing ,now we train model without mask ,so what is the mask loss on this case,and how it is calculated??
      - Jason Brownlee August 23, 2019 at 6:24 am #
        
        I don’t follow sorry, what do you mean exactly?
Nishant Gaurav July 29, 2019 at 6:39 pm #

I am getting this error. Please help
OSError Traceback (most recent call last)
in ()
—-> 1 model.load_weights(‘mask_rcnn_kangaroo_cfg_0005.h5’, by_name=True)

2 frames
/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
140 if swmr and swmr_support:
141 flags |= h5f.ACC_SWMR_READ
–> 142 fid = h5f.open(name, flags, fapl=fapl)
143 elif mode == ‘r+’:
144 fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5f.pyx in h5py.h5f.open()

OSError: Unable to open file (unable to open file: name = ‘mask_rcnn_kangaroo_cfg_0005.h5’, errno = 2, error message = ‘No such file or directory’, flags = 0, o_flags = 0)

Reply
- Jason Brownlee July 30, 2019 at 6:06 am #
  
  The error suggests that the path to your data file is incorrect or the file is corrupted in some way?
  
  Reply
  - Nishant Gaurav July 31, 2019 at 9:45 pm #
    
    Thanks for the suggestion. The problem was resolved.
    How do we resolve the problem with the multiclass label? If we have to identify numbers and characters given in the same image and want to label all the characters and images, then how do we apply the multiclass label.
    
    Reply
    - Jason Brownlee August 1, 2019 at 6:49 am #
      
      Perhaps extract the images of detected numbers (called segmentation), then classify each segmented image.
      
      Reply
- Mark November 10, 2020 at 1:59 pm #
  
  Hi I had the same “could not find file” problem. May you share how you resolve the problem?
  
  Reply
  - Jason Brownlee November 11, 2020 at 6:43 am #
    
    Perhaps some of these suggestions will help:
    https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
    
    Reply
Dicko July 29, 2019 at 8:13 pm #

Hi there, when I copied the example exactly, I am getting a train mAP of 0.000 and a test mAP of 0.000 also. Clearly something is wrong, I was wondering if anyone knew what the issue could be and how to resolve it. Thank you.

Reply
- Jason Brownlee July 30, 2019 at 6:09 am #
  
  Sorry to hear that, I have some suggestions here that might help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
- Jassin February 24, 2023 at 7:45 pm #
  
  Hey, did you find why you are receiving 0.000, i´m having the same problem. Thanks in advance!
  
  Reply
saka July 30, 2019 at 2:26 pm #

Dear Jason, Thanks! I really learned a lot.

I am getting this error for the coding line “from mrcnn.utils import Dataset”.

” from mrcnn.utils import Dataset

ModuleNotFoundError: No module named ‘mrcnn’ “.

However, I checked if the library was installed by typing “show mask-rcnn” and got the results below,

Name: mask-rcnn
Version: 2.1
Summary: Mask R-CNN for object detection and instance segmentation
Home-page: https://github.com/matterport/Mask_RCNN
Author: Matterport
Author-email: waleed.abdulla@gmail.com
License: MIT
Location: c:\users\sakal\appdata\local\continuum\anaconda3\lib\site-packages\mask_rcnn-2.1-py3.7.egg

According the information above, It seems no problem about the library installed. Could you please advise me about this. Thanks!!

Reply
- Jason Brownlee July 31, 2019 at 6:44 am #
  
  Sorry to hear that.
  
  Are you running the code from the command line instead of a notebook or IDE?
  
  Reply
  - saka July 31, 2019 at 2:20 pm #
    
    I run your example code on Spyder IDE
    
    Reply
    - Jason Brownlee August 1, 2019 at 6:41 am #
      
      I recommend not using an IDE and instead running the code from the command line:
      https://machinelearningmastery.com/faq/single-faq/how-do-i-run-a-script-from-the-command-line
      
      Reply
      - saka August 1, 2019 at 9:47 am #
        
        It works now by running the code from the command line. Thanks!
        
        just curious the reason why it is different from running from an IDE
      - Jason Brownlee August 1, 2019 at 2:11 pm #
        
        Happy to hear that.
        
        It is a very common problem, I explain more here:
        https://machinelearningmastery.com/faq/single-faq/why-dont-use-or-recommend-notebooks
    - Ashish kumar May 6, 2020 at 4:28 pm #
      
      if you want to run in IDE, you have to import the path where your mrcnn library is install, to do that you can write
      
      import sys
      sys.path.insert(0, ‘Directory where your mrcnn library is installed’)
      
      by default library will be installed in the location where you have cloned your mrcnn repository
      
      Reply
      - Jason Brownlee May 7, 2020 at 6:40 am #
        
        Great tip.
        
        I recommend running from the command line:
        https://machinelearningmastery.com/faq/single-faq/how-do-i-run-a-script-from-the-command-line
  - saka July 31, 2019 at 2:31 pm #
    
    By the way, I got the Messages below when I installed by typing “python setup.py install”
    
    WARNING:root:Fail load requirements file, so using default ones.
    running install
    .
    .
    .
    Processing dependencies for mask-rcnn==2.1
    Finished processing dependencies for mask-rcnn==2.1
    
    Do you think the warning above matter? Thanks!!
    
    Reply
    - Jason Brownlee August 1, 2019 at 6:41 am #
      
      Probably not.
      
      Reply
Dicko July 30, 2019 at 7:13 pm #

Thanks for that I’ll have a look through the code and see if I’ve made a mistake somewhere when copying.
Is there a file which has the complete code written so that i can just copy and past the whole lot rather than bits at a time?

Thank you 🙂

Reply
- Jason Brownlee July 31, 2019 at 6:48 am #
  
  Each of my tutorials has the complete file embedding, you can copy-paste it directly.
  
  Reply
Nishant Gaurav July 31, 2019 at 9:52 pm #

File “”, line 21
self.add_class(“dataset”, 2, “1”)
^
IndentationError: unindent does not match any outer indentation level
Hi
I am getting this error when I added just two new lines, in the code.

def load_dataset(self, dataset_dir, is_train=True):
# define one class
self.add_class(“dataset”, 1, “N”)
self.add_class(“dataset”, 2, “1”) //Added this new line
# define data locations
images_dir = dataset_dir + ‘/images/’
annotations_dir = dataset_dir + ‘/annots/’

for i in range(len(boxes)):
box = boxes[i]
row_s, row_e = box[1], box[3]
col_s, col_e = box[0], box[2]
masks[row_s:row_e, col_s:col_e, i] = 1
class_ids.append(self.class_names.index(‘N’))
class_ids.append(self.class_names.index(‘1′)) //Added this new line.
return masks, asarray(class_ids, dtype=’int32’)

Reply
- Jason Brownlee August 1, 2019 at 6:50 am #
  
  Sorry to hear that, it looks like you did not copy the code with white space.
  
  I show how to copy the code correctly here:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-copy-code-from-a-tutorial
  
  Reply
Nishant Gaurav July 31, 2019 at 11:14 pm #

IndexError Traceback (most recent call last)
in
2 plt.imshow(image)
3 # plot mask
—-> 4 plt.imshow(mask[:, :, 0], cmap=’gray’, alpha=0.1)
5 plt.show()

IndexError: index 0 is out of bounds for axis 2 with size 0

I am getting this error after i added that extre two lines.

Reply
- Jason Brownlee August 1, 2019 at 6:51 am #
  
  Sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
ahmadreza August 1, 2019 at 2:44 am #

hi Sir
I am getting this error. Please help

if is_train and int(image_id) >= 150:

ValueError: invalid literal for int() with base 10: ‘Thumb’

Reply
- Jason Brownlee August 1, 2019 at 6:55 am #
  
  Sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
- Julio César Álvarez Iglesias September 13, 2019 at 8:55 pm #
  
  I am facing the same problem. Did you manage to resolve this issue?
  
  Reply
  - Jason Brownlee September 14, 2019 at 6:17 am #
    
    Perhaps you have a thumb nail file in the folder?
    
    If so, perhaps try deleting it?
    
    Reply
- Divya A September 3, 2021 at 5:22 pm #
  
  did you resovle this problem
  ValueError: invalid literal for int() with base 10: ‘Thumb.db’
  
  Reply
Nishant Gaurav August 1, 2019 at 2:16 pm #

Hi Sir,
Could you please give some insight where do I need to make changes for the multi-class label in the code so that I could identify the different characters and numbers in a single image?
Please give some insight with examples so that it is easier to understand.
Thanks so much for helping.

Reply
- Jason Brownlee August 2, 2019 at 6:40 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/can-you-change-the-code-in-the-tutorial-to-___
  
  Reply
mh August 6, 2019 at 6:58 pm #

Thanks for your tutorial.

But i want to ask is there any model can deal with the objects which have similar color on the back ground.

Reply
- Jason Brownlee August 7, 2019 at 7:45 am #
  
  Perhaps. You may have to do some testing, or perhaps use transfer learning to tune an existing model.
  
  Reply
Tal August 9, 2019 at 12:33 am #

Thank you very much for this great and clear tutorial!
If I may ask:
Is there a way to evaluate the model while training? For example at the end of each epoch?

Thanks a million,

Tal

Reply
- Jason Brownlee August 9, 2019 at 8:15 am #
  
  Yes, you can use a hold out validation dataset:
  https://machinelearningmastery.com/difference-test-validation-datasets/
  
  Reply
  - Tal August 14, 2019 at 9:37 pm #
    
    Thank you for the quick response!
    
    Reply
    - Jason Brownlee August 15, 2019 at 8:09 am #
      
      No problem.
      
      Reply
Selman Bozkır August 14, 2019 at 4:57 am #

Hi Jason,

I have a problem. My dataset contains only 872 training images and 15 classes. Meanwhile, my images are rather bigger than kangroo or pascal voc files. They are around 1500 pixel wide and 1000 pixel tall. I have changed the python codes in order to apply multi-class classification. My equipment is 1050 ti on a 24 GB memory system. I have run your code for kangroo data, it was ok. But whenever I have done it for my custom data, the memory requirement is getting higher than 20 GB and makes the ubuntu run on slow swap memory yielding a dead situation.

What is the problem? is it normal? What about the ram consumption in your case. I did not check it for kangroo data. But I remember that, on 5th epoch it activated the swap memory.
What could be a walk-around about this problem?

Reply
- Jason Brownlee August 14, 2019 at 6:46 am #
  
  Perhaps you can reduce the size of the image prior to modeling?
  
  Reply
  - Selman Bozkır August 14, 2019 at 6:59 am #
    
    Well, for a fair scientfic study, i would not reduce it but, the only way I found is to reduce IMAGE_MIN_DIM =400 and IMAGE_MAX_DIM= 512. However, it is interesting that, for each epoch, the total memory consumption is getting higher.
    
    Moreover, I need to say that, the training procedure always starts with giving warnings such as “UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.”
    
    This is the problem actually. Is it possible to solve this? I have googled it but the solutions did not come so clear to me (or it sound so technical).
    
    Currently, I can train the model for only 4 epochs. More needs more memory. This is for me, a certain bug since, the advancing epochs should not increase the memory consumption.
    
    Btw, I really thank for your reply.
    
    As I told, this memory issue really made me sad. Is this normal?
    
    Reply
    - Jason Brownlee August 14, 2019 at 2:08 pm #
      
      Perhaps you can use progressive loading and only load/yield one batch of images into memory at a time.
      
      This can be achieved with the ImageDataGenerator:
      https://machinelearningmastery.com/how-to-load-large-datasets-from-directories-for-deep-learning-with-keras/
      
      Reply
      - Selman Bozkır August 14, 2019 at 7:26 pm #
        
        Dear Jason;
        
        Thanks so much for your advice. Here, I would like to share my experience with you and others. The only solution I have found so far is that setting
        the use_multiprocessing=False in model.py and reducing the number of workers to 1. This has helped me. Btw, I am now using 384×384 images by reducing the IMAGE_MAX_DIM = 384 and IMAGE_MIN_DIM =384 . Now I can train it with 20 epochs. This has really helped me.
        
        I hope this information may help others whom lived the same problems.
        
        Cheers
      - Jason Brownlee August 15, 2019 at 8:00 am #
        
        Nice! Thanks for sharing.
- K Ramesh May 8, 2021 at 3:06 pm #
  
  Hi Selman, How you annotated that many images, and how much time it took? Can someone help me to annotate automatically
  
  Reply
N. Arvind August 18, 2019 at 1:58 am #

Dear Jason
Good morning!

We have used this model to detect bounding boxes and masks for id cards.

We provided annotations in .csv files as quadrilaterals and modified ‘load_mask’ function accordingly. We are looking for quadrilateral shaped masks.

We are able to detect bounding boxes correctly. We are not able to detect masks correctly. Although incorrect masks do show up.

We have used the exact code. Learning rate is 0.00001. We have used 800 images and 65 epochs for training. A higher learning rate gives NaN loss. We have checked the entire dataset for any discrepancy.

Can you guide where we are going wrong ? Can we use this exact code with exactly the same config with four vertices to generate masks ?

Warm regards,
N. Arvind

Reply
- Jason Brownlee August 18, 2019 at 6:48 am #
  
  Well done!
  
  Perhaps look into data preparation?
  
  Reply
Per Nord August 22, 2019 at 12:34 am #

Great tutorial! I’ve managed to successfully train a model and now I want to use the model in Android and iOS.

I’ve learned that his requires me to convert my model.h5 file to model.pb and then to a Tensorflow Lite format.

I expected this to be trivial, but alas. The MaskRCNN issue list is riddled with people having problems with this.

Did you ever try this?
If not, it would be a great continuation to this tutorial.

Reply
- Jason Brownlee August 22, 2019 at 6:29 am #
  
  Sorry, I have not tried this.
  
  Reply
Abilash August 24, 2019 at 10:03 pm #

Hi Jason,

That was nice Tutorial, i have some errors on trying with multiclass.

IndexError: boolean index did not match indexed array along dimension 0; dimension is 2 but corresponding boolean dimension is 1

I have two class ( full glass and empty glass) and have made NUM_CLASSES = 1 + 2 in config along with self.add_class(“dataset”, 1, “Full Glass”) and self.add_class(“dataset”, 2, “Not Full Glass”) also made changes class_ids.append(self.class_names.index(‘Full’))
class_ids.append(self.class_names.index(‘Not Full’)).

Please help me out, i am unable to resolve the error since many attempts.

Reply
- Jason Brownlee August 25, 2019 at 6:37 am #
  
  It’s hard to debug this for you off the cuff, sorry.
  
  Perhaps double check you made all of the required changes?
  
  Reply
- Julio César Álvarez Iglesias September 14, 2019 at 4:42 am #
  
  I am facing the same problem. Did you manage to resolve this issue?
  
  Reply
  - Jason Brownlee September 14, 2019 at 6:23 am #
    
    Try removing “Thumb” files from your folder.
    
    Reply
- Shubhangi November 5, 2019 at 3:51 pm #
  
  Hi
  Having same error!! Could you find some solutions regarding this.
  
  Reply
Alaki September 4, 2019 at 5:14 pm #

I hope its not a repeated question. I wonder if you have tutorial on training a model for custom multi-object detection ? basically, an image taken where we would like to recognize multiple images in an image. There is no pre-trained model on these objects, and we have labeled a few set of images. (again each image, is labeled with multiple rectangular which are covering each object).

Thank you again for all these nice tutorials.

Reply
- Jason Brownlee September 5, 2019 at 6:49 am #
  
  I believe you adapt the above tutorial for this purpose, the model supports multiple objects in one image, and they can be different types.
  
  Reply
Ade September 6, 2019 at 5:45 pm #

Dear Dr Jason,

Good day sir, I am a Machine Learning Engineer. I am currently working on logo detection system. I have tried MobileNet SSD, Faster RCNN and their seemed to be a higher number of false positives when I try the model out. It seems its not too good for logo that is very small in size. I have also created Haar and LBP cascade model and it seemed to perform better than the deep learning model, false positive wise. My question: is there any other technique that can do very well with small logos with different contrast, orientations? Thank you.

Reply
- Jason Brownlee September 7, 2019 at 5:22 am #
  
  I’m not sure off hand, sorry. Perhaps check the literature?
  
  I recall some interesting work on test-time augmentation that might be very helpful to you.
  
  Reply
K_gao September 9, 2019 at 5:01 pm #

Dear Jason!

You’ve made a great work again. Thank you for this post!

What if I want to train an add not jus one object to my model? For example, I want to add 100 new class. If I have 100 class, and every class has 500 images, how can I train the model? Impossible to load 50.000 image into the memory! It is possible to make it with loop, and add a new class to the modell with every iteration?

Do you have a post about this?

Thanx

Reply
- Jason Brownlee September 10, 2019 at 5:37 am #
  
  Yes, you can use progressive loading with a data generator, see this post:
  https://machinelearningmastery.com/how-to-load-large-datasets-from-directories-for-deep-learning-with-keras/
  
  Reply
Ankit September 13, 2019 at 4:58 am #

Hi Jason,

The notebook is very helpful and full of knowledge but I am having problems while training the model on a different dataset(fruits -apple, banana, orange).
After loading the images,annots and masks when I try to train the model i am getting the following error:
RemoteTraceback Traceback (most recent call last)

RemoteTraceback:
“””
Traceback (most recent call last):
File “/usr/lib/python3.6/multiprocessing/pool.py”, line 119, in worker
result = (True, func(*args, **kwds))
File “/usr/local/lib/python3.6/dist-packages/keras/utils/data_utils.py”, line 641, in next_sample
return six.next(_SHARED_SEQUENCES[uid])
File “/content/Mask_RCNN/mrcnn/model.py”, line 1709, in data_generator
use_mini_mask=config.USE_MINI_MASK)
File “/content/Mask_RCNN/mrcnn/model.py”, line 1265, in load_image_gt
class_ids = class_ids[_idx]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 6 but corresponding boolean dimension is 2
“””

The above exception was the direct cause of the following exception:

IndexError Traceback (most recent call last)

in ()
3 learning_rate = config.LEARNING_RATE,
4 epochs = 10,
—-> 5 layers = ‘all’ )

7 frames

/usr/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
642 return self._value
643 else:
–> 644 raise self._value
645
646 def _set(self, i, obj):

IndexError: boolean index did not match indexed array along dimension 0; dimension is 6 but corresponding boolean dimension is 2

Please provide a hint about the same.
Also, I am using multiclass for 3 fruits.

Reply
- Jason Brownlee September 13, 2019 at 5:45 am #
  
  Perhaps double check that you are loading the data correctly or as you expect?
  
  Reply
- bella October 22, 2019 at 4:03 am #
  
  Hey, How did you solve this issue of IndexError?
  
  Reply
Shubhangi September 17, 2019 at 4:33 pm #

I do train model on my won dataset but the prediction of model is not getting right. can you pls help me ?
Actually i have train for kangaroo class name but in prediction i am getting person class tag

Reply
- Jason Brownlee September 18, 2019 at 5:57 am #
  
  Perhaps start with the example in the tutorial and adapt it for your specific dataset?
  
  Reply
  - Shubhangi September 20, 2019 at 3:17 pm #
    
    Thank You i solved it ….But i have total of 125 images of id card and aim is to get id card from images but i am not getting correct output after training of model object detection is results is not good at all …..i have done 50 epochs at 25 steps…Can you pls help me?
    
    Reply
    - Jason Brownlee September 21, 2019 at 6:44 am #
      
      I have some general suggestions for diagnosing and improving deep learning model performance here that may help:
      https://machinelearningmastery.com/start-here/#better
      
      Reply
      - Shubhangi October 7, 2019 at 8:50 pm #
        
        Hello. I found some issues regarding accuracy of model. I dont know what issue is there which effect accuracy of model. Same cnfiguration as described above is used in my model but accuracy is no good. The ROI getting from prediction of model is not correct. Can some one Please help me out
      - Jason Brownlee October 8, 2019 at 8:00 am #
        
        Is this on your own dataset or the dataset used in the above tutorial?
        
        I have some general suggestions here that might help to diagnose and address performance issues:
        https://machinelearningmastery.com/start-here/#better
Jeorge September 21, 2019 at 4:32 pm #

Hello. I Have this error. I dont know how to solve it:

~/.ve/main/lib/python3.7/site-packages/mask_rcnn-2.1-py3.7.egg/mrcnn/model.py in compile(self, learning_rate, momentum)
2197 tf.reduce_mean(layer.output, keepdims=True)
2198 * self.config.LOSS_WEIGHTS.get(name, 1.))
-> 2199 self.keras_model.metrics_tensors.append(loss)
2200
2201 def set_trainable(self, layer_regex, keras_model=None, indent=0, verbose=1):

AttributeError: ‘Model’ object has no attribute ‘metrics_tensors’

Reply
- Jason Brownlee September 22, 2019 at 9:26 am #
  
  Sorry, I have not seen that error before.
  
  Are you able to confirm that your Keras/TensorFlow/RCNN libraries are up to date?
  
  Are you able to try Python 3.6 instead, I don’t think Python 3.7 is supported?
  
  Reply
- Kay September 25, 2019 at 8:57 pm #
  
  You can add the line
  
  model.keras_model.metrics_tensors = []
  
  right after the model definition to circumvent the error.
  
  Reply
  - Jason Brownlee September 26, 2019 at 6:34 am #
    
    Thanks for sharing.
    
    Reply
  - ahasan September 29, 2019 at 2:17 am #
    
    exactly where should I change in the model.py?
    
    Reply
Mikael October 2, 2019 at 7:32 am #

See https://github.com/matterport/Mask_RCNN/issues/1754

Reply
- Jason Brownlee October 2, 2019 at 8:07 am #
  
  Why did you share this Mikael?
  
  Reply
Kevin October 8, 2019 at 1:20 am #

Hi Jason Brownlee,

Great tutorial for object detection. This is the first time, I visited this site and I loved the way to document your post. I have walk-through each line of code and successfully implemented kangaroo detection. You have developed well documented code guide for us. Based on your tutorial, I have managed to run this model on Weed detection problem. And yes, I am able detect weed with these. Thanks a lot for your post.

———————————————————————
By the way, I have one question:

–> How to save full keras model (architecture + weights)? I want to convert it to TensorRT for that I need full model.

———————————————————————–
I have tried:

1) self.keras_model.save(“model_name.h5”)

2) save weights only = False

but It gives error:

[TypeError: can’t pickle _thread.RLock objects]
————————————————————————-

If possible please help me on this.

Thanks,
Kevin

Reply
- Jason Brownlee October 8, 2019 at 8:07 am #
  
  Well done Kevin!
  
  model.save() should be sufficient. Perhaps there is an issue with your development environment?
  
  Perhaps try AWS:
  https://machinelearningmastery.com/develop-evaluate-large-deep-learning-models-keras-amazon-web-services/
  
  Reply
Reem October 8, 2019 at 2:28 am #

How can I create the annotated xml file? The VGG tool only creates a csv file or json file. Could you please assist in the way of creating the xml file or the conversion from csv/json to xml?

Thanks

Reply
- Jason Brownlee October 8, 2019 at 8:08 am #
  
  I believe there are a ton of image annotation tools available that can create the annotations with/for you.
  
  Reply
- Kevin October 8, 2019 at 5:39 pm #
  
  Hi Reem,
  
  Check out this annotation tool, this will create .xml file for you. As used in this model. Link: [https://github.com/tzutalin/labelImg]
  
  Reply
Akash Nakarmi October 8, 2019 at 3:18 am #

Jason,

Thanks for the very nice tutorial. I was able to train the model and get mask_rcnn_kangaroo_cfg_0005.h5 created. However, when I ran the model evaluation code, I got the following error. Could you help me resolve this?

AssertionError: Create model in inference mode, and it is complaining on line yhat=model.detect(sample, verbose=0) saying that len(images) must be equal to BATCH_SIZE.

Thanks.
Akash

Reply
- Jason Brownlee October 8, 2019 at 8:08 am #
  
  Sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
- Fuxin Hao May 22, 2020 at 6:06 pm #
  
  Add BATCH_SIZE = 1 to PredictionConfig as below:
  # define the prediction configuration
  class PredictionConfig(Config):
  # define the name of the configuration
  NAME = “id_card_cfg”
  # number of classes (background + kangaroo)
  NUM_CLASSES = 1 + 1
  # simplify GPU config
  GPU_COUNT = 1
  IMAGES_PER_GPU = 1
  BATCH_SIZE = 1
  
  Reply
Shubhangi October 9, 2019 at 6:48 pm #

I have my own data set ….Thank you for general suggestions this is helpful for me but i don’t understand that why accuracy of model is not good even using same structure and configuration of model as suggested above.

And I have also tried on different data set for all issues is same by that I conclude that there is some minor issue in the script which is not detected by me so please help me out ….

If u want my source code i will that also

ThankYou

Reply
- Jason Brownlee October 10, 2019 at 6:55 am #
  
  What problem are you having exactly?
  
  Reply
  - Shubhangi October 10, 2019 at 9:24 pm #
    
    With the accuracy of model
    
    Reply
    - Jason Brownlee October 11, 2019 at 6:18 am #
      
      You can discover general advice on diagnosing issues and improving performance with neural nets here:
      https://machinelearningmastery.com/start-here/#better
      
      Reply
Dinesh Kumar October 10, 2019 at 4:58 pm #

Hello Jason,

While trying to train the model I got the following message.

File “C:\Users\userid\AppData\Local\Continuum\anaconda3\lib\site-packages\tensorflow_core\python\framework\ops.py”, line 523, in _disallow_in_graph_mode
” this function with @tf.function.”.format(task))

OperatorNotAllowedInGraphError: using a tf.Tensor as a Python bool is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.

Could you please suggest on this

Reply
- Jason Brownlee October 11, 2019 at 6:15 am #
  
  Sorry to hear that, are you able to confirm that you are using Python 3.6, TensorFlow 1.14, and Keras 2.3 or better?
  
  Reply
  - Dinesh Kumar October 11, 2019 at 7:09 pm #
    
    Hello Jason,
    
    Thank you for your reply.
    
    I am using python 3.7, TensorFlow 2.0.0 and keras 2.3.1
    
    Regards,
    Dinesh
    
    Reply
    - Jason Brownlee October 12, 2019 at 6:52 am #
      
      This example will not work with TF 2.0. You must use TF 1.14. I believe I mention this right at the top of the page:
      
      Note: This tutorial requires TensorFlow version 1.14 or higher. It currently does not work with TensorFlow 2 because some third-party libraries have not been updated at the time of writing.
      
      Reply
      - Dinesh Kumar October 15, 2019 at 3:49 pm #
        
        Hello Jason,
        
        Thanks for your reply,
        
        I will use TF 1.14.
        
        Regards,
        Dinesh
mark October 18, 2019 at 6:59 am #

hello jason,
i just wanted to know how much time it takes to make a prediction on a new image.
so basically how long does it take to run
yhat = model.detect(sample, verbose=0)[0]

thank you for your time.

Reply
- Jason Brownlee October 18, 2019 at 8:18 am #
  
  Fractions of a second, although depends on hardware of course.
  
  Reply
  - mark October 19, 2019 at 3:52 pm #
    
    well i need to know how many times it can be run in 1 second.if run on your computer can you give me an estimate of how many times it would run in 1 second. (5,10,20,30,40, 50, 60, 60+)
    
    thanks,
    mark
    
    Reply
    - Jason Brownlee October 20, 2019 at 6:15 am #
      
      Perhaps you can calculate those estimates yourself on your own hardware with your data – that way they will be meaningful/useful to your project?
      
      Reply
Jerico October 18, 2019 at 12:31 pm #

Pretty cool tutorial, definitely will help us.
Brother is it possible to determine the size or dimension of kangaroo?

Reply
- Jason Brownlee October 18, 2019 at 2:52 pm #
  
  Thanks.
  
  In real life from a photo? Not using these models, sorry.
  
  Reply
  - Jerico October 20, 2019 at 6:24 pm #
    
    yep! from a photo. Assuming i took a pix of a kangaroo and test it on your model . definitely your model will recognize it as kangaroo. what i’m opt is, the dimension of kangaroo, i’m sure you have technique on how to determine its size using the model that you had created.
    
    Reply
    - Jason Brownlee October 21, 2019 at 6:16 am #
      
      No idea off the cuff, sorry.
      
      It does not sound tractable as each photo has a different scale.
      
      Reply
Yaroslav October 20, 2019 at 4:19 am #

Hi.
I found out, that we can’t assign image id randomly (not from 0). Perhaps class Dataset creates list, not a numpy array. I checked myself and realized that I can’t access image with id, for example, 317 while i have only 100 images.
Thus, I don’t know why this field “image id” exists, when it numbered anyway from 0, increasing by 1.

Reply
- Jason Brownlee October 20, 2019 at 6:25 am #
  
  Thanks for sharing.
  
  Reply
  - Yaroslav October 20, 2019 at 10:33 pm #
    
    Thanks for your great article. It’s the best tutorial about object detection. It helped a lot.
    
    Reply
    - Jason Brownlee October 21, 2019 at 6:18 am #
      
      Thanks!
      
      Reply
JuanM October 22, 2019 at 3:19 am #

Good afternoon, I have a problema with the code. When I start the training the procces is stack in the first Epoch. What can i do to ?

Reply
- JuanM October 22, 2019 at 3:23 am #
  
  WARNING:tensorflow:From C:\Users\Juan\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\callbacks.py:708: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.
  
  Epoch 1/5
  
  in this momento I have the problem
  
  Reply
  - Jason Brownlee October 22, 2019 at 5:58 am #
    
    TensorFlow 2.0 is not support for this tutorial at the moment, try TensorFlow 1.14 instead.
    
    Reply
- Jason Brownlee October 22, 2019 at 5:57 am #
  
  Sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - JuanM October 23, 2019 at 2:51 am #
    
    In this momment my tensorflow is 1.14.0. there´s no solution ?
    
    Reply
    - Jason Brownlee October 23, 2019 at 6:54 am #
      
      Try down-grading tip 1.14? Or perhaps try a different tutorial/library?
      
      Reply
Akash Joshi October 29, 2019 at 4:47 am #

Hi Jason,
it was a very great article and thoroughly explained code.
I have a question for you regarding this tutorial. I am trying out this tutorial on my laptop and I have limited processing power.When I tried with the full data set of kangaroos the first epoch took around 8 hrs approx.I stopped it in between then I tried to reduce the data set to about 10 images and started training process but it still showed 7 hours as the ETA and each epoch had 131 steps.

As per my thinking if I reduce the number of images in the data set the training time should reduce and instead of 131 steps it should have 10 steps in each epoch as the data set has only 10 images.I am currently willing to have a lower accuracy.

Can you let me know if my understanding in wrong?

Reply
- Jason Brownlee October 29, 2019 at 5:32 am #
  
  Less images might impact model performance generally.
  
  Perhaps try running on EC2?
  
  Reply
  - Akash Joshi November 2, 2019 at 12:16 am #
    
    Hi Jason,
    
    I tried using less no. of images but i cannot complete the training process as i am getting the following message
    
    2019-11-01 18:37:31.547297: W T:\src\github\tensorflow\tensorflow\core\framework\allocator.cc:108] Allocation of 603979776 exceeds 10% of system memory.
    
    can you tell me why do i get this message?
    
    Reply
    - Jason Brownlee November 2, 2019 at 6:44 am #
      
      Try even fewer images?
      Try EC2 with more RAM?
      Try a smaller model?
      Try progressive loading?
      
      Reply
      - Akash November 2, 2019 at 7:39 am #
        
        I tried with 3 images also but got the same issue.can you explain or give links to the the last two options you mentioned.
Asjad Murtaza October 29, 2019 at 9:54 am #

Hi Jason, I plan on following this tutorial for skin segmentation on compaq dataset. The labels are in PBM(Portable Bitmap) format. Is it fine or do I need to do somethings differently ?
Regards

Reply
- Jason Brownlee October 29, 2019 at 1:49 pm #
  
  I don’t think it matters as long as the images can be loaded to numpy arrays.
  
  Reply
Juan Pablo November 1, 2019 at 8:34 am #

Hi Jason,

Thanks for this great article!

One question:

I already have my model trained and my weights (mask_rcnn_kangaroo_cfg_0019.h5).
How can I valid this with new images?

I mean not to call the test or train datasets

plot_actual_vs_predicted(‘MY PHOTO’, model, cfg)

Reply
- Jason Brownlee November 1, 2019 at 1:40 pm #
  
  Load the model and use it to make predictions on a test dataset and compare predictions with the expected values.
  
  The section “How to Evaluate a Mask R-CNN Model” will provide a useful guide.
  
  Reply
Juan Pablo November 2, 2019 at 5:01 am #

Thanks Jason,

But why would I need de annots if I want to validate the model with a new image.

Reply
- Jason Brownlee November 2, 2019 at 6:52 am #
  
  To confirm the predictions match the expectations and calculate an evaluation score.
  
  Reply
osteocyte November 5, 2019 at 8:59 pm #

Hi Jason,
thanks a lot for this great tutorial! Could you please give me a quick hint how one can extract the total number of detected objects in each image?
Thanks a lot, osteocyte

Reply
- Jason Brownlee November 6, 2019 at 6:33 am #
  
  It will be the number of bounding boxes returned from a call to predict.
  
  Reply
Felipe Correa November 5, 2019 at 11:12 pm #

Hi, I already have my trained model (generated with this tutorial). Is it posible to use this model for video live detection?

Do you have a script example or something that you could help me out with.?

Best regards!

Reply
- Jason Brownlee November 6, 2019 at 6:33 am #
  
  Yes, perhaps apply to each frame of the video, or every 20th frame?
  
  I don’t have an example at this stage.
  
  Reply
Florian Garrigues November 6, 2019 at 1:18 am #

Hello
first thanks for this amazing tutorial!
Second i have an question how can we modifie you’re code to have mask and box

Thank you

Reply
- Jason Brownlee November 6, 2019 at 6:36 am #
  
  You can define a mask and a box and then fit the model on it. In my example I treat them as the same.
  
  Reply
tejas November 6, 2019 at 4:13 pm #

self.add_class(“dataset”, 1, “kangaroo”)
self.add_class(“dataset”, 2, “tiger”)
self.add_class(“dataset”, 3, “dog”)

class_ids.append(self.class_names.index(‘kangaroo’))
class_ids.append(self.class_names.index(‘tiger’))
class_ids.append(self.class_names.index(‘dog’))
for multi class classification is this changes are enough anything more needded?

Reply
- Jason Brownlee November 7, 2019 at 6:34 am #
  
  Looks good to me, off the cuff at least.
  
  Reply
- tejas November 8, 2019 at 12:00 am #
  
  Please tell how to do for multi class classification?
  
  Reply
- Suave December 20, 2019 at 5:08 am #
  
  Take a look on your xml file and then and then modify the parsing.Goal is: to get the right boundig boxes to the right class name. Then create your mask with the right boundig box classname corelation
  
  Reply
  - Jason Brownlee December 20, 2019 at 6:55 am #
    
    Great tip!
    
    Reply
Shubhangi November 6, 2019 at 11:35 pm #

Hello
first thanks for this amazing tutorial!
Second i have an question how many epochs and time steps are required for 2 lakh dataset

Reply
- Jason Brownlee November 7, 2019 at 6:42 am #
  
  Perhaps test different configurations and see what works best for your specific dataset?
  
  Reply
shriya November 8, 2019 at 12:32 am #

boolean index did not match indexed array along dimension 0; dimension is 4 but corresponding boolean dimension is 2

Reply
- Jason Brownlee November 8, 2019 at 6:44 am #
  
  Sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - shriya November 8, 2019 at 3:01 pm #
    
    i can understand your point but help me .I am enable to figure out.
    
    Reply
yamuna November 12, 2019 at 4:40 pm #

based on colour object is detecting.How can i avoid this type of situation

Reply
- Jason Brownlee November 13, 2019 at 5:34 am #
  
  Sorry, I don’t understand your question, can you elaborate please?
  
  Reply
Dave November 13, 2019 at 12:23 am #

Thanks for these tutorials, I’m making good progress on my projects.

Can I please ask: Is it solely tagged content that contributes to the training/prediction, or is it the whole image?

If I create a dataset of 100 photos (as an example), and tag the easiest elements (say people) in these photos, will untagged people in these photos work to “untrain” the model? Would I be better off creating a smaller dataset that is more thoroughly tagged, or do untagged elements not matter? Thanks.

Reply
- Jason Brownlee November 13, 2019 at 5:46 am #
  
  It is the localized object within the image. Both.
  
  Good question. Test both and compare.
  
  Reply
Saurabh November 13, 2019 at 12:24 am #

Hello Jason,

Thanks for the interesting technical blog.

I am looking for “How to train SSD based object detection on the custom dataset?”. Could you please provide a pointer?

Thanking you!

Reply
- Jason Brownlee November 13, 2019 at 5:46 am #
  
  I don’t have an example, I hope to have one in the future.
  
  Reply
  - Saurabh November 13, 2019 at 6:20 pm #
    
    Thank you!
    
    Reply
    - Jason Brownlee November 14, 2019 at 7:58 am #
      
      You’re welcome.
      
      Reply
yamuna November 13, 2019 at 7:18 pm #

i have done object detection to detect gloves.
the gloves are white in colour.
but if the person where white colour shirt then also it is detecting as gloves

Reply
- Jason Brownlee November 14, 2019 at 7:59 am #
  
  Well done!
  
  Perhaps expand the training dataset or try data augmentation during training?
  
  Reply
  - yamuna November 15, 2019 at 4:58 am #
    
    i have augmented the images then i need to do annotations separately? or is there any other way?
    
    Reply
    - Jason Brownlee November 15, 2019 at 7:56 am #
      
      You can use augmentation that is “annotation-aware”, e.g. apply augmentation in a consistent way to images and annotations.
      
      Big labs might have code for this, e.g. facebook. Otherwise, custom code will be required.
      
      Reply
bhandavi November 15, 2019 at 4:55 am #

how to retrain the already trained weights with more images?

Reply
- Jason Brownlee November 15, 2019 at 7:56 am #
  
  That is exactly what we do in this tutorial.
  
  Reply
  - bhandavi November 15, 2019 at 10:27 pm #
    
    i am asking already trained kangaroo weight file for more kangaroo images.
    
    Replacing coco file with kangaroo .h5 file?
    
    Reply
    - Jason Brownlee November 16, 2019 at 7:24 am #
      
      Yes, follow this tutorial and adapt the coco weights with your own dataset.
      
      Reply
Dave November 15, 2019 at 10:07 am #

Hi Jason,

Thanks for a great tutorial. My trained model gives many bbox predictions of different sizes for the same kangaroo, and also for random background objects. This was after training for 2 epochs. After training for further epochs, the losses all flatlined to NaN or 0. Just wondering if you’ve ever experienced this.

Thanks again,
Dave

Reply
- Jason Brownlee November 16, 2019 at 7:16 am #
  
  Not really.
  
  Perhaps try fitting the model a few times and compare results?
  
  Reply
Aqiff November 19, 2019 at 2:45 pm #

This is a very great tutorial. For the training, I am stuck with this line model = MaskRCNN(mode=’training’, model_dir=’./’, config = config)
The error is: ‘NoneType’ object has no attribute ‘lower’. How can I fix this?

Reply
- Jason Brownlee November 20, 2019 at 6:07 am #
  
  Sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
yamuna November 21, 2019 at 6:34 pm #

I am trying to predict hand gloves and spects using mask rcnn. I am facing the following issues:

1.the people who are not wearing gloves also it is taking as glove.i think it is taking hand structure
2.It is complety getting baised on colur.where ever it find’s white color it is predicting as gloves.

Please help me. I have 1000 images as by training .I have done for nearly 50 epochs

Reply
- Jason Brownlee November 22, 2019 at 5:59 am #
  
  Perhaps include training examples with hands and gloves in the same image to help the model tell the difference?
  
  Reply
Mursyideen November 24, 2019 at 7:27 pm #

Hello there, I am trying to execute this code using my own GPU, however, i have this error
ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[2,512,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node rpn_model_11/rpn_class_raw/convolution-0-TransposeNHWCToNCHW-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[Mean_23/_13623]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[2,512,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node rpn_model_11/rpn_class_raw/convolution-0-TransposeNHWCToNCHW-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

Reply
- Jason Brownlee November 25, 2019 at 6:27 am #
  
  Sorry, I don’t know about this error, perhaps try posting to stckoverflow?
  
  Reply
- Maged December 4, 2019 at 7:33 am #
  
  Hey Mursyideen, how did you solve this issue ? 🙂 I am having the same problem
  
  Reply
Ozan Veranyurt November 25, 2019 at 6:29 am #

Hi Jason, I used your tutorial to prepare a Pistol detector. The data is properly loaded and when I try to train the epochs are frozen. It sometimes freezes on images randomly. Here is output

Epoch 1/5
25/150 [====>…………………….] – ETA: 3:13 – loss: 3.2542 – rpn_class_loss: 0.0182 – rpn_bbox_loss: 0.6457 – mrcnn_class_loss: 0.5098 – mrcnn_bbox_loss: 0.8810 – mrcnn_mask_loss: 1.1995

It stops on different images. I checked all images and annexes are oke. I followed the suggestions here : https://github.com/matterport/Mask_RCNN/issues/287 (Made modifications in the model.py under mrcnn )
My tensorflow is 1.15
and keras : 2.2.4

Any suggestions? I am working on different approachs for pistol detection and mrcnn is one of them. It is critical for my thesis. So I will appreciate any suggestions. Maybe a working combination of keras – tensorflow with mrcnn.

Reply
- Jason Brownlee November 25, 2019 at 6:35 am #
  
  I wonder if you are running out of memory or having a hardware fault?
  
  Perhaps try running on an AWS EC2 instance?
  
  Reply

Yussi Eikelman November 28, 2019 at 7:13 pm #

Jason Hi,
I have a set of grayscale images of shape(192,384,3) with none/one/multiple masks in each of size (5,5).
I’m able to train my model, but unable to receive any result – the tuple from the detect() appears to be empty. In rare cases there is a prediction, which is not good enough.
Please help, thanks!

Jason Brownlee November 29, 2019 at 6:47 am #

Perhaps the model is not detecting anything on the test images?

Yussi Eikelman December 1, 2019 at 1:44 am #

A different question:

for i in range(len(boxes)):
		box = boxes[i]
		row_s, row_e = box[1], box[3]
		col_s, col_e = box[0], box[2]
		masks[row_s:row_e, col_s:col_e, i] = 1 
		class_ids.append(self.class_names.index('kangaroo'))
	return masks, asarray(class_ids, dtype='int32')

for i in range(len(boxes)):

box = boxes[i]

row_s, row_e = box[1], box[3]

col_s, col_e = box[0], box[2]

masks[row_s:row_e, col_s:col_e, i] = 1

class_ids.append(self.class_names.index('kangaroo'))

return masks, asarray(class_ids, dtype='int32')

if the masks = zeros([h, w, len(boxes)], dtype=’uint8′),
in my case each mask is (h = 5,w = 5, i) and the bounding box, for example, is (5, 5, 10, 10).
How masks[row_s:row_e, col_s:col_e, i] = 1, where the indexes are not in the original mask range is (5,5), are affected by the bounding box indexes?

Jason Brownlee December 1, 2019 at 5:43 am #

Sorry, I don’t follow your question, are you able to elaborate?

Reply

bts December 1, 2019 at 6:25 pm #

Hello Jason,
I am running this code on my mac and I get this error the running epoch 1 and the program gets stuck here.

Epoch 1/5
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/keras/utils/data_utils.py:709: UserWarning: An input could not be retrieved. It could be because a worker has died.We do not have any information on the lost sample.

Any idea about this?
Also, should this code be run only on GPU machines?

Reply
- Jason Brownlee December 2, 2019 at 6:00 am #
  
  I have not seen that before.
  
  No, the code works fine on the GPU or CPU.
  
  Perhaps try re-installing your development environment?
  Perhaps try running either on the cpu or gpu?
  Perhaps try posting/searching on stackoverflow?
  Perhaps try running other examples and see if they work on your workstation?
  
  Reply
- Ali April 23, 2020 at 1:06 am #
  
  I’ve seen this error before and I fixed it by lowering my tensorflow version from 2.1 to 1.12 and by installing the appropriate keras-gpu libraries for that.
  
  Reply
  - Jason Brownlee April 23, 2020 at 6:07 am #
    
    Yes, as stated at the top, the tutorial does not work with tensorflow 2 because the maskrcnn lib has not been updated.
    
    Reply
shankar December 2, 2019 at 7:19 am #

Hi Jason, what an amazing post..well done on your hard work!

For my application, in addition to the predicted bounding box+mask+class, I also need to extract the last fully-connected layer of the mask_rcnn model (that is, the feature vector representation of the input image).

In keras, we can save a model’s json and weight files. And then load them again. And extract the output of any intermediate layer as:

1. model.summary()
2. feature_extractor = tf.keras.models.Model(inputs=model.input, outputs=model.get_layer(‘avg_pool’).output)
3. features = feature_extractor.predict(my_image)

In mask_rcnn, we load the pre-trained model mask_rcnn_coco.h5.. Do you know how we can access and extract the last fully-connected weights?

My research is stuck because I am unable to complete this step. I shall be grateful if you can guide me (either via email, or on this forum).

Regards-Shankar

Reply
- Jason Brownlee December 2, 2019 at 1:53 pm #
  
  Thanks!
  
  Great question.
  
  Hmmm, not off hand, sorry. Some experimentation will be required.
  
  Reply
Maged December 4, 2019 at 7:45 am #

Hey @Jason thank you for a fantastic tutorial. Please keep it up :)!

Two questions if you may,

– How can we reduce the batch_size ?
– How can we reduce the image_dimensions given to the model?

Both of these are attempts to fix the “..Resource exhausted: OOM when allocating tensor with shape..” error

Reply
- Jason Brownlee December 4, 2019 at 1:56 pm #
  
  Thanks.
  
  Good question about the batch size, I’m not sure off the cuff. Perhaps check the code for the train() function?
  
  I believe you have control over the images sizes – so you can define your own fixed size.
  
  Reply
  - Maged December 5, 2019 at 9:49 pm #
    
    Thanks Jason for your reply, Here is how I fixed it by modifying the KangarooConfig class
    
    class KangarooConfig(Config):
    
    # define the name of the configuration
    NAME = “kangaroo_cfg”
    # number of classes (background + kangaroo)
    NUM_CLASSES = 1 + 1
    STEPS_PER_EPOCH = 131
    
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1
    
    IMAGE_MIN_DIM = 400
    IMAGE_MAX_DIM = 512
    
    Reply
    - Jason Brownlee December 6, 2019 at 5:15 am #
      
      Well done, thanks for sharing!
      
      Reply
    - João Granzotti March 12, 2020 at 4:44 am #
      
      what this parameters means?
      
      GPU_COUNT = 1
      IMAGES_PER_GPU = 1
      
      they are the nunber of GPUs that i have and the batch size ? I want to reduce the batch size to.
      
      Thanks for you help.
      
      Reply
    - Taki March 2, 2021 at 4:54 am #
      
      Hi Maged,
      
      You can also change the batch size before the train starts like this.
      
      config.BATCH_SIZE=1
      
      Thanks
      
      Reply
Tim December 8, 2019 at 2:05 am #

Hey Jason, When I plot the graph “Actual” vs “Predicted”, the “actual” photos all appear so dark. Is there a way to tweak it so it appears similar to the “Predicted Photo” with red boxes on a transparent photo. Thank you,

Reply
- Jason Brownlee December 8, 2019 at 6:16 am #
  
  Yes, I intentionally darken the photo to highlight the detection.
  
  You can remove the code to do that. Just plot the photo and use the box to drop a colored rectangle.
  
  Reply
  - amine December 21, 2019 at 6:24 am #
    
    hi trying to train on my dataset , however i get this error when trying to load the data
    help please
    
    FileNotFoundError Traceback (most recent call last)
    in ()
    1 image_id = 1
    —-> 2 image = train_set.load_image(image_id)
    3 print(image.shape)
    4 # load image mask
    5 mask, class_ids = train_set.load_mask(image_id)
    
    6 frames
    /usr/local/lib/python3.6/dist-packages/imageio/core/request.py in _parse_uri(self, uri)
    271 # Reading: check that the file exists (but is allowed a dir)
    272 if not os.path.exists(fn):
    –> 273 raise FileNotFoundError(“No such file: ‘%s'” % fn)
    274 else:
    275 # Writing: check that the directory to write to does exist
    
    FileNotFoundError: No such file: ‘/content/Mask_RCNN/Amine/imagessacdf21.JPG’
    
    Reply
    - Jason Brownlee December 21, 2019 at 7:18 am #
      
      Looks like the image you are trying to load does not exist on your workstation.
      
      Reply
    - J February 12, 2022 at 12:24 am #
      
      have you dealt with the problem? got the same issue – in utils.py, the load_mask doesn not add ‘/’ to the path – which results the /images to be concatenated to the image’s name.
      
      Reply
Narottam December 12, 2019 at 9:40 pm #

Hi Jason, please confirm for mask RCNN model do we need to mask new images also (i.e need to create .xml file) ? If no, then please suggest changes in function ‘def plot_actual_vs_predicted’ for me to get better output the way we got after using ‘display_instances(image, bbox, mask, class_ids, dataset.class_names)’ under evaluate_model function.

Reply
- Jason Brownlee December 13, 2019 at 6:01 am #
  
  You, you can just work with object boxes – and use them as masks, and prepare the data any way you wish.
  
  Reply
Sally Jac December 14, 2019 at 4:32 am #

Hi Jason, when I am creating the model, I keep getting this error

/anaconda3/lib/python3.7/site-packages/mask_rcnn-2.1-py3.7.egg/mrcnn/model.py in detection_targets_graph(proposals, gt_class_ids, gt_boxes, gt_masks, config)
551 positive_count = int(config.TRAIN_ROIS_PER_IMAGE *
552 config.ROI_POSITIVE_RATIO)
–> 553 positive_indices = tf.random_shuffle(positive_indices)[:positive_count]
554 positive_count = tf.shape(positive_indices)[0]
555 # Negative ROIs. Add enough to maintain positive:negative ratio.

AttributeError: module ‘tensorflow’ has no attribute ‘random_shuffle’

I am unsure of how to debug this. I tried changing random_shuffle to random.shuffle in model.py but it does not work. Or have I downloaded the wrong MaskRCNN? What is the link to download the MaskRCNN? Thank you for your help.

Reply
- Jason Brownlee December 14, 2019 at 6:26 am #
  
  It looks like you are using tensorflow version 2, and the maskrcnn model requires tensorflow 1.14 or 1.15.
  
  This is mentioned right at the top of the tutorial.
  
  Reply
  - Twayne Jeremy December 15, 2019 at 5:40 pm #
    
    Hello Jason, your tutorial is really helpful. However, I’ve seen some errors while trying it.
    
    When I am evaluating the model, I received this error and I am unsure of how to debug this.
    
    ValueError: shapes (1,1048576) and (1050624,1) not aligned: 1048576 (dim 1) != 1050624 (dim 0)
    
    Thank you for your help.
    
    Reply
    - Jason Brownlee December 16, 2019 at 6:13 am #
      
      I’m sorry to hear that, I have some suggestions here:
      https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
      
      Reply
      - Yang-Yin May 6, 2020 at 4:01 pm #
        
        Hi Jason,
        
        I really like your detailed tutorial. Excellent work, thanks.
        
        I am able to run the example without a problem. However, when working with my own images, I got this kind of error when calculating the mAP.
        
        “ValueError: shapes (7,1048576) and (1104896,1) not aligned: 1048576 (dim 1) != 1104896 (dim 0)”.
        
        I was able to train the model and make predictions with other images. But I just cannot evaluate the model’s performance in terms of mAP via compute_ap().
        
        I checked this issue online for some days and didn’t find any solutions. Are you able to show any guidance?
        
        Thanks very much for your help.
      - Jason Brownlee May 7, 2020 at 6:40 am #
        
        Perhaps confirm the data was loaded as you expect and that the inputs to the metric are as required by the API?
      - Yang-Yin May 14, 2020 at 9:28 am #
        
        Hi Jason,
        
        Thanks for your comments. Yes, some of the input data were not working well for some reason (I will double-check it). I really appreciated your help!
        
        I have one more question about this model: in addition to calculating the mAP, precision, and recall, how to plot accuracy and loss during training to monitor overfit or determine the number of epochs to stop training?
        
        Thank you in advance.
      - Jason Brownlee May 14, 2020 at 1:26 pm #
        
        Well done!
        
        Good question, I don’t have an example of plotting the history of this specific model. Perhaps investigate the use of tensorboard?
    - Gary May 5, 2020 at 7:07 am #
      
      Hi Twayne,
      
      I received a similar message when trying it with my dataset. Have you figured it out? Thanks,
      
      Reply
      - Peter January 26, 2021 at 3:11 am #
        
        I have the same problem. Have you managed to solve it?
Ekrem Fatih Yılmazer December 16, 2019 at 2:18 am #

I have a data set of liver CT which is grayscale.
Is it possible for me to apply the same model (also transfer learning) for grayscale images . Since the pretained models are for RGB images, I am curious about whether I can convert them for my application purpose?

Reply
- Jason Brownlee December 16, 2019 at 6:18 am #
  
  Perhaps try it and compare to fitting a new model from scratch?
  
  Reply
Zain December 16, 2019 at 3:44 am #

Thank you very much for such an informative article.

I have created a colab notebook which walks through this article and here it is.

Reply
Yansen December 18, 2019 at 7:07 am #

Hi Jason, thanks for the tutorial. Following your instruction I fitted a custom dataset of 200 photos with one label. I got a Train mAP of 0.986 and a Test mAP of 1.000. The detection results are great and even see things I would miss if I do labeling. My question is that: is 1.000 too good to be true?

Reply
- Jason Brownlee December 18, 2019 at 1:26 pm #
  
  Wow, well done.
  
  Perhaps think of ways that you could have a misleading result and test them?
  
  e.g. more/less data? Different measures? Inspect predictions? etc.
  
  Reply
amine December 20, 2019 at 8:31 am #

hi Mr Brownlee
thanks for this awsome tutorial however when i tried to run it on my datatset (13 images jus for fun) on collab
i get this message:

ValueError Traceback (most recent call last)
in ()
79 # train set
80 train_set = KangarooDataset()
—> 81 train_set.load_dataset(‘kangaroo’, is_train=True)
82 train_set.prepare()
83 print(‘Train: %d’ % len(train_set.image_ids))

in load_dataset(self, dataset_dir, is_train)
22 continue
23 # skip all images after 150 if we are building the train set
—> 24 if is_train and int(image_id) >= 10:
25 continue
26 # skip all images before 150 if we are building the test/val set

ValueError: invalid literal for int() with base 10: ‘sacdf21’
gratefuly yours

Reply
- Jason Brownlee December 20, 2019 at 1:05 pm #
  
  Sorry to hear that. Perhaps start with the working tutorial and slowly adapt it to your needs?
  
  Reply
  - amine December 21, 2019 at 7:46 am #
    
    absolutely , the tutorial just did awsome but the crafting part keeps bugging over and over again is there any other way to skip this bug, it’s just the splitting part train/test datasets that does not work, I m running on collab if helps?
    please I m stuck for hours now
    very grateful
    
    Reply
    - Jason Brownlee December 21, 2019 at 8:16 am #
      
      Yes, don’t split into train and test sets or split using your own method that does not use file names.
      
      Reply
      - residence les jardins December 22, 2019 at 4:00 am #
        
        to code does not work without splitting, it considers all data as a unique block, we lose the val dataset?
João Vitor Granzotti Machado December 22, 2019 at 3:05 am #

Hello, I am a student from Brazil and I am having a problem executing the code. on the line:

model.train (train_set, test_set, learning_rate = config.LEARNING_RATE, epochs = 5, layers = ‘heads’)

When running the program I get the following error:

raise StopIteration ()
StopIteration

What could be the cause of this ??

Reply
- Jason Brownlee December 22, 2019 at 6:15 am #
  
  Sorry, I have not seen this error before. I have some suggestions here that might help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - João Vitor Granzotti Machado December 24, 2019 at 3:51 am #
    
    I checked the versions of python, tensorflow and even numpy and they are all correct. The output when executing the code is as follows:
    
    C:\Users\João Vitor\trabalho>python object_detection.py
    Using TensorFlow backend.
    Train: 131
    Test: 32
    
    Configurations:
    BACKBONE resnet101
    BACKBONE_STRIDES [4, 8, 16, 32, 64]
    BATCH_SIZE 2
    BBOX_STD_DEV [0.1 0.1 0.2 0.2]
    COMPUTE_BACKBONE_SHAPE None
    DETECTION_MAX_INSTANCES 100
    DETECTION_MIN_CONFIDENCE 0.7
    DETECTION_NMS_THRESHOLD 0.3
    FPN_CLASSIF_FC_LAYERS_SIZE 1024
    GPU_COUNT 1
    GRADIENT_CLIP_NORM 5.0
    IMAGES_PER_GPU 2
    IMAGE_CHANNEL_COUNT 3
    IMAGE_MAX_DIM 1024
    IMAGE_META_SIZE 14
    IMAGE_MIN_DIM 800
    IMAGE_MIN_SCALE 0
    IMAGE_RESIZE_MODE square
    IMAGE_SHAPE [1024 1024 3]
    LEARNING_MOMENTUM 0.9
    LEARNING_RATE 0.001
    LOSS_WEIGHTS {‘rpn_class_loss’: 1.0, ‘rpn_bbox_loss’: 1.0, ‘mrcnn_class_loss’: 1.0, ‘mrcnn_bbox_loss’: 1.0, ‘mrcnn_mask_loss’: 1.0}
    MASK_POOL_SIZE 14
    MASK_SHAPE [28, 28]
    MAX_GT_INSTANCES 100
    MEAN_PIXEL [123.7 116.8 103.9]
    MINI_MASK_SHAPE (56, 56)
    NAME kangaroo_cfg
    NUM_CLASSES 2
    POOL_SIZE 7
    POST_NMS_ROIS_INFERENCE 1000
    POST_NMS_ROIS_TRAINING 2000
    PRE_NMS_LIMIT 6000
    ROI_POSITIVE_RATIO 0.33
    RPN_ANCHOR_RATIOS [0.5, 1, 2]
    RPN_ANCHOR_SCALES (32, 64, 128, 256, 512)
    RPN_ANCHOR_STRIDE 1
    RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2]
    RPN_NMS_THRESHOLD 0.7
    RPN_TRAIN_ANCHORS_PER_IMAGE 256
    STEPS_PER_EPOCH 131
    TOP_DOWN_PYRAMID_SIZE 256
    TRAIN_BN False
    TRAIN_ROIS_PER_IMAGE 200
    USE_MINI_MASK True
    USE_RPN_ROIS True
    VALIDATION_STEPS 50
    WEIGHT_DECAY 0.0001
    
    WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py:492: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
    
    WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py:63: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
    
    WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py:3630: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
    
    WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py:3458: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.
    
    WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py:1822: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.
    
    WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py:1208: calling reduce_max_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
    Instructions for updating:
    keep_dims is deprecated, use keepdims instead
    WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py:1242: calling reduce_sum_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
    Instructions for updating:
    keep_dims is deprecated, use keepdims instead
    WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\array_ops.py:1354: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.where in 2.0, which has the same broadcast rule as np.where
    WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\mask_rcnn-2.1-py3.6.egg\mrcnn\model.py:553: The name tf.random_shuffle is deprecated. Please use tf.random.shuffle instead.
    
    WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\mask_rcnn-2.1-py3.6.egg\mrcnn\utils.py:202: The name tf.log is deprecated. Please use tf.math.log instead.
    
    WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\mask_rcnn-2.1-py3.6.egg\mrcnn\model.py:600: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
    Instructions for updating:
    box_ind is deprecated, use box_indices instead
    2019-12-23 13:45:34.251579: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
    
    Starting at epoch 0. LR=0.001
    
    Checkpoint Path: ./kangaroo_cfg20191223T1345\mask_rcnn_kangaroo_cfg_{epoch:04d}.h5
    Selecting layers to train
    fpn_c5p5 (Conv2D)
    fpn_c4p4 (Conv2D)
    fpn_c3p3 (Conv2D)
    fpn_c2p2 (Conv2D)
    fpn_p5 (Conv2D)
    fpn_p2 (Conv2D)
    fpn_p3 (Conv2D)
    fpn_p4 (Conv2D)
    In model: rpn_model
    rpn_conv_shared (Conv2D)
    rpn_class_raw (Conv2D)
    rpn_bbox_pred (Conv2D)
    mrcnn_mask_conv1 (TimeDistributed)
    mrcnn_mask_bn1 (TimeDistributed)
    mrcnn_mask_conv2 (TimeDistributed)
    mrcnn_mask_bn2 (TimeDistributed)
    mrcnn_class_conv1 (TimeDistributed)
    mrcnn_class_bn1 (TimeDistributed)
    mrcnn_mask_conv3 (TimeDistributed)
    mrcnn_mask_bn3 (TimeDistributed)
    mrcnn_class_conv2 (TimeDistributed)
    mrcnn_class_bn2 (TimeDistributed)
    mrcnn_mask_conv4 (TimeDistributed)
    mrcnn_mask_bn4 (TimeDistributed)
    mrcnn_bbox_fc (TimeDistributed)
    mrcnn_mask_deconv (TimeDistributed)
    mrcnn_class_logits (TimeDistributed)
    mrcnn_mask (TimeDistributed)
    WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\optimizers.py:711: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
    
    C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
    C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
    C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
    WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py:675: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\callbacks.py:705: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.
    
    WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\callbacks.py:708: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.
    
    Epoch 1/5
    Traceback (most recent call last):
    File “object_detection.py”, line 109, in
    model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers=’heads’)
    File “C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\mask_rcnn-2.1-py3.6.egg\mrcnn\model.py”, line 2374, in train
    File “C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\legacy\interfaces.py”, line 87, in wrapper
    return func(*args, **kwargs)
    File “C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\training.py”, line 2065, in fit_generator
    generator_output = next(output_generator)
    File “C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\utils\data_utils.py”, line 710, in get
    raise StopIteration()
    StopIteration
    
    I can’t figure out what’s wrong. I am very interested in Mask R-CNN and would like to see it working. Can you help me plz ?? Thank you for your attention and greetings from Brazil
    
    Reply
    - Jason Brownlee December 24, 2019 at 6:43 am #
      
      It looks like you are using tensorflow 2.
      
      You must use tensorflow 1.15.
      
      Reply
      - João Vitor Granzotti Machado December 24, 2019 at 11:32 am #
        
        The versions of the libraries I am using are these:
        
        Python: 3.6.8
        Tensorflow: 1.15.0
        Numpy: 1.16.0
        Keras: 2.1.0
        Scipy: 1.4.1
        
        So I think the error is not related to the libraries version, because everything is in line with the tutorial.
      - Jason Brownlee December 24, 2019 at 4:57 pm #
        
        I recommend updating to keras 2.2, at least.
      - Thanakorn March 4, 2020 at 4:46 am #
        
        Hi Jason,
        
        My project is on GOOGLE COLAB. Even though the version of my libraries are “Tensorflow: 1.15.0” and “Keras: 2.2.5”, it still appears these several lines, so how can fix this out
      - Jason Brownlee March 4, 2020 at 6:01 am #
        
        Perhaps colab is inappropriate.
      - Anand Nataraj June 5, 2020 at 1:14 am #
        
        Could you please help us in letting me know where er have to make changes if i wanted to add another label say: Monkey
      - Jason Brownlee June 5, 2020 at 8:17 am #
        
        Yes, in all the places were we add kangaroo.
amine December 22, 2019 at 3:56 am #

hi i tried to subset my data to train and val files , the way data is slit in balloon dataset
here is the error
FileNotFoundError Traceback (most recent call last)
in ()
85 # train set
86 train_set = KangarooDataset()
—> 87 train_set.load_dataset(‘Amine’,”train”, is_train=True)
88 train_set.prepare()
89 print(‘train: %d’ % len(train_set.image_ids))

in load_dataset(self, dataset_dir, subset, is_train)
26 #annotations_dir = dataset_dir + ‘/Amine/’
27 # find all images
—> 28 for filename in listdir(images_dir):
29 # extract image id
30 image_id = filename[:-4]

FileNotFoundError: [Errno 2] No such file or directory: ‘Amine/train’

any way to get out of this bug
thanks

Reply
- Jason Brownlee December 22, 2019 at 6:18 am #
  
  Looks like the data is not in the required location your workstation.
  
  Perhaps put the data in the same directory as your code, and run the code from the command line.
  
  Reply
  - Amine December 23, 2019 at 1:18 am #
    
    Hi, how could I select the dataset by names rather than I split them by index before and after your breakup point,(150), it seems to me that could be a better fixer for this bug without having to manipulate files?
    What would be your code to change the splitting key?.
    
    Reply
    - Jason Brownlee December 23, 2019 at 6:55 am #
      
      Sorry, I don’t have the capacity to prepare custom code.
      
      Perhaps focus on Python basics first?
      
      Reply
      - amine December 24, 2019 at 4:25 am #
        
        hi,
        i figuered out how to split the data to train and val with in each file others sub file (annots and images) to respect your data structure
        her is the result
        
        train: 8
        test: 4 # seems ok but
        —————————————————————————
        NameError Traceback (most recent call last)
        in ()
        103 mask, class_ids = train_set.load_mask(image_id)
        104 # extract bounding boxes from the masks
        –> 105 bbox = extract_bboxes(mask)
        106 # display image with masks and bounding boxes
        107 display_instances(image, bbox, mask, class_ids, train_set.class_names)
        
        NameError: name ‘extract_bboxes’ is not defined
      - Jason Brownlee December 24, 2019 at 6:46 am #
        
        I have some suggestions here:
        https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
amine December 25, 2019 at 7:12 am #

hi, I tried all clues without any success , I know this would not take few minutes to get solved with a professional like you, all I m asking for is some compassion
thanks

Reply
- amine December 25, 2019 at 9:58 am #
  
  hi, I finally get it screwed away.
  thanks
  
  Reply
  - Jason Brownlee December 25, 2019 at 10:42 am #
    
    Well done!
    
    Reply
- Jason Brownlee December 25, 2019 at 10:40 am #
  
  Sorry, I don’t have the capacity to customize tutorials – I get hundreds of emails/comments per day – lots of people to help.
  
  More here:
  https://machinelearningmastery.com/faq/single-faq/can-you-change-the-code-in-the-tutorial-to-___
  
  If adapting the code is challenging, perhaps start with simpler tutorials here and build up to this more advanced tutorial:
  https://machinelearningmastery.com/start-here/#dlfcv
  
  Or, perhaps hire a contractor.
  
  Reply
raj January 6, 2020 at 9:07 pm #

(raj) ➜ Mask_RCNN git:(master) ✗ python start.py
Using TensorFlow backend.
Train: 131
Test: 32
WARNING:tensorflow:From /home/debu/.virtualenvs/raj/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:514: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /home/debu/.virtualenvs/raj/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:71: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /home/debu/.virtualenvs/raj/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:4076: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /home/debu/.virtualenvs/raj/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:3900: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

WARNING:tensorflow:From /home/debu/.virtualenvs/raj/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:1982: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.

WARNING:tensorflow:From /home/debu/raj/Mask_RCNN/mrcnn/model.py:341: The name tf.log is deprecated. Please use tf.math.log instead.

WARNING:tensorflow:From /home/debu/raj/Mask_RCNN/mrcnn/model.py:399: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /home/debu/raj/Mask_RCNN/mrcnn/model.py:423: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
WARNING:tensorflow:From /home/debu/raj/Mask_RCNN/mrcnn/model.py:720: The name tf.sets.set_intersection is deprecated. Please use tf.sets.intersection instead.

WARNING:tensorflow:From /home/debu/raj/Mask_RCNN/mrcnn/model.py:722: The name tf.sparse_tensor_to_dense is deprecated. Please use tf.sparse.to_dense instead.

WARNING:tensorflow:From /home/debu/raj/Mask_RCNN/mrcnn/model.py:772: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Traceback (most recent call last):
File “start.py”, line 150, in
model.load_weights(model_path, by_name=True)
File “/home/debu/raj/Mask_RCNN/mrcnn/model.py”, line 2130, in load_weights
saving.load_weights_from_hdf5_group_by_name(f, layers)
File “/home/debu/.virtualenvs/raj/lib/python3.6/site-packages/keras/engine/saving.py”, line 1018, in load_weights_from_hdf5_group_by_name
str(weight_values[i].shape) + ‘.’)
ValueError: Layer #389 (named “mrcnn_bbox_fc”), weight has shape (1024, 8), but the saved weight has shape (1024, 324).
(raj) ➜ Mask_RCNN git:(master) ✗

Reply
- Jason Brownlee January 7, 2020 at 7:22 am #
  
  Looks like a problem with your development environment?
  
  Perhaps confirm TensorFlow 1.15 and Keras 2.2.
  
  Reply
- Bence December 12, 2020 at 4:02 am #
  
  Hey there,
  
  Set the number of Classes to match your input class:
  
  # number of classes (background + kangaroo)
  NUM_CLASSES = 1 + 1
  
  Reply
Saurabh January 9, 2020 at 12:08 am #

Hello Jason,

First of all Happy New Year 2020 and looking forward for more exciting blogs from you.

I have one question regarding labelimg tool. As I looked into labelimg tool but there is no way to rotate bounding box. In custom dataset, object is not straight and I can’t rotate images.

Could you please suggest me any other labeling tool which allows to rotate even bounding box?

Thanking you,
Saurabh

Reply
- Jason Brownlee January 9, 2020 at 7:26 am #
  
  Sorry, I don’t have good advice for image annotation tools.
  
  Reply
  - Saurabh January 10, 2020 at 12:53 am #
    
    Thank you!
    
    Reply
  - Siddhartha Pachhai January 21, 2020 at 7:10 am #
    
    Hi, Jason, the best solution I have found for this is Jupyter Innotator, Its very convenient and easy to use.
    
    https://github.com/ideonate/jupyter-innotater
    
    *Also: The output of the tool does not resemble the xml file structure that is often used in object detection, but the tool produces enough so that you can generate a xml conversion script in python.
    
    *I remember having some difficulties installing the package in Mac initially (something to do with icy-widgets, but I think its fixable), hopefully this was mitigated.
    
    Reply
    - Jason Brownlee January 21, 2020 at 7:22 am #
      
      Thanks for sharing.
      
      Reply
- Suman January 14, 2020 at 7:32 am #
  
  Hi Saurab,
  
  May be for object detection you can use labelImg or labelme. For segmentation you can use CVAT tool.
  
  Reply
Suman January 14, 2020 at 7:36 am #

Hi Jason Brownlee,

Great tutorial for beginners like me, thanks.
Here the mask-rcnn is saving weights, but i want to save the model along with the weights like model.save(‘xxxx.h5) . But this function is not working here. Please reply me as soon as possible.

Thanks

Reply
- Jason Brownlee January 14, 2020 at 1:46 pm #
  
  Thanks!
  
  I believe it is using the tensorflow API. Perhaps investigate an appropriate function.
  
  Reply
Niall Delany January 15, 2020 at 7:53 pm #

Thanks for the great tutorial, very helpful in getting started with this kind of work and was able to apply it to a custom dataset.

My question is, suppose I also have binary mask annotations for each image (png files), how would I load them into the model instead of the xml annotations so that the model prediction is a mask rather than a bounding box?

Reply
- Jason Brownlee January 16, 2020 at 6:12 am #
  
  You’re welcome.
  
  Sure, load any custom masks you like.
  
  Reply
JJ January 23, 2020 at 10:37 am #

Hi.
I’m stuck at “Parse Annotation File” step.
Where could I type the “tree = ElementTree.parse(filename)”?

Reply
- Jason Brownlee January 23, 2020 at 12:56 pm #
  
  Sorry to hear that, perhaps try copying the “complete examples” at the end of each section.
  
  Reply
Nelli January 23, 2020 at 2:27 pm #

After training, in the prediction, the displayed image is showing with a bounding box but the label is not there. Please reply to me. Thanks in advance

Reply
- Jason Brownlee January 24, 2020 at 7:42 am #
  
  In this case there is only one label, which is kangaroo.
  
  For a more general example with box and label see this tutorial:
  https://machinelearningmastery.com/how-to-perform-object-detection-in-photographs-with-mask-r-cnn-in-keras/
  
  Reply
  - Nelli January 28, 2020 at 6:07 am #
    
    Thanks for your quick reply. The suggested link is helped me.
    Could you please help me in converting the above example model to an apk file.
    
    Reply
    - Jason Brownlee January 28, 2020 at 7:59 am #
      
      What is “apk”?
      
      Reply
      - Suman January 30, 2020 at 7:56 am #
        
        I want to convert the trained model to apk file to deploy on the mobile devices. Please suggest to me.
      - Jason Brownlee January 30, 2020 at 2:13 pm #
        
        I don’t know what apk is sorry, or about putting it on mobile devices.
        
        Perhaps try posting your question to stackoverflow?
Samrawit January 26, 2020 at 1:30 pm #

Hi
Does Mask-R-cnn only work in annotated image only, can i use normal image? And which annotation approach (automatic,manual or semi automatic) could gives better results?

Reply
- Jason Brownlee January 27, 2020 at 7:01 am #
  
  It learns from annotated images.
  
  It is used on normal images.
  
  Reply
  - Samrawit January 27, 2020 at 5:39 pm #
    
    Thank you. So is their any example on automatically annotating image data-set and how to use them for object detection and mask an object?
    
    Reply
    - Jason Brownlee January 28, 2020 at 7:51 am #
      
      No, I believe it is manual at this stage.
      
      Reply
      - Samrawit January 28, 2020 at 7:07 pm #
        
        Thank you very much! One more question, Is their any example on Mask-R-cnn with out using pre-trained weights?
      - Jason Brownlee January 29, 2020 at 6:31 am #
        
        I don’t have such an example.
        
        It makes sense to use pre-trained weights as a starting point for transfer learning.
mahmoud January 29, 2020 at 9:12 am #

hi Jason,thanks for your illustration
i run the MaskRCNN on my dataset and it gives me horrible result

Train mAP: 0.818
Test mAP: 0.549
can you advice me why it can result in such this a big difference on the Train and Test set ???
how i can face this problem.

Reply
- Jason Brownlee January 29, 2020 at 1:46 pm #
  
  The model has overfit your training dataset.
  
  This might help:
  https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/
  
  Reply
- Samrawit March 27, 2020 at 6:48 pm #
  
  Hi, i was trying to use Mask_RCNN and i don’t know how to first train the feature map (Backbone) before i return the model to pressed to the next level (Region Proposal Network) because i wanted to see the accuracy of convolution layer.
  
  Thank you!
  
  Reply
  - Jason Brownlee March 28, 2020 at 6:16 am #
    
    Not sure that is possible…
    
    Reply
Rohit January 30, 2020 at 5:12 pm #

Hi Jason, thank you for this wonderful article.

I am working on a case where we have multiple labels for each object in an image.
The task is similar to the one asked in the following problem:

https://stackoverflow.com/questions/49358088/does-tensorflows-object-detection-api-support-multi-class-multi-label-detection

Could you suggest how to approach to this problem?

Reply
- Jason Brownlee January 31, 2020 at 7:39 am #
  
  Yes, I believe the mask rcnn can support that.
  
  Perhaps start with this model:
  https://machinelearningmastery.com/how-to-perform-object-detection-in-photographs-with-mask-r-cnn-in-keras/
  
  Reply
Sai Abinesh February 5, 2020 at 8:34 pm #

Hello Jason,

Thank you very much for a great tutorial. It’s a great resource for anyone trying to get started with object detection and for people who need to check their configurations.

I am retraining just the “heads” layer of a resnet101 backbone, on a 3d synthetic dataset generated using Unreal Engine and python. I have 7 object classes + 1 background, and a total of 591 training images and 60 real images for validation.
Using default training config from the maskrcnn official repo, I suspect there is a case of over-fitting, as the val loss decreases while the training loss decreases.

Here they are pasted below.
https://imgur.com/a/CgYJxCs

I also constructed a training curve of my own, by calculating the AP50 (Average Precision at 50% Intersection Over Union) for all the epochs from epoch 1 to epoch 100. It seems like the network is not improving a lot. The curve can be found below.
https://imgur.com/MWzvWZz

How should I adjust my learning rates, weight decays? What kind of heuristics/rules of thumb to use based on the size of the dataset, number of classes etc? My config can be found below.

class aerial_trains_Config(Config):
“””Configuration for training on the toy shapes dataset.
Derives from the base Config class and overrides values specific
to the toy shapes dataset.
“””
# Give the configuration a recognizable name
NAME = “Baldonnell_from_scratch_from9m”

# Train on 1 GPU and 8 images per GPU. We can put multiple images on each
# GPU because the images are small. Batch size is 8 (GPUs * images/GPU).
GPU_COUNT = 1
IMAGES_PER_GPU = 2

# Number of classes (including background)
NUM_CLASSES = 1 + 7 # background + 80 default classes

# Use small images for faster training. Set the limits of the small side
# the large side, and that determines the image shape.
IMAGE_MIN_DIM = 256
IMAGE_MAX_DIM = 2048

# Use smaller anchors because our image and objects are small
RPN_ANCHOR_SCALES = (64, 128, 256, 512, 1024) # anchor side in pixels

# Reduce training ROIs per image because the images are small and have
# few objects. Aim to allow ROI sampling to pick 33% positive ROIs.
TRAIN_ROIS_PER_IMAGE = 32

# Use a small epoch since the data is simple
STEPS_PER_EPOCH = 600

LEARNING_RATE = 0.001
LEARNING_MOMENTUM = 0.9

# Weight decay regularization
WEIGHT_DECAY = 0.0001

Reply
- Jason Brownlee February 6, 2020 at 8:23 am #
  
  Very cool!
  
  This might give you ideas:
  https://machinelearningmastery.com/learning-rate-for-deep-learning-neural-networks/
  
  Reply
Ofis Taşıma February 6, 2020 at 6:18 pm #

Thank you ver much your great article about Object Detection Model with Keras

Reply
- Jason Brownlee February 7, 2020 at 8:10 am #
  
  I’m happy it helped!
  
  Reply
Ashutosh Srivastava February 7, 2020 at 8:15 pm #

Hi Jason,

This is really a great article. I am trying to solve my multi-object detection problem following your approach, i think there will be a need of just a little tweak into this code but i am stuck.

I have added multiple classes in load_dataset function:
self.add_class(“dataset”, 1, “list”)
self.add_class(“dataset”, 2, “Menu”)
self.add_class(“dataset”, 3, “Home”)

but here in load_mask function you are appending class_ids statically as 1 “kangaroos”,
i want to add classes w.r.t objects found.
Kindly check and help.

Reply
- Ashutosh Srivastava February 7, 2020 at 8:19 pm #
  
  And as per me, here object detection is implemented but classification is missing as their is only one object. Correct me if i am wrong here.
  
  Reply
  - Jason Brownlee February 8, 2020 at 7:08 am #
    
    I don’t understand, sorry? Can you elaborate?
    
    Reply
- Jason Brownlee February 8, 2020 at 7:08 am #
  
  Looks fine to be, perhaps test it?
  
  Reply
Kavılca February 11, 2020 at 4:06 am #

Awesome article, thank you for this blog

Reply
- Jason Brownlee February 11, 2020 at 5:18 am #
  
  You’re welcome.
  
  Reply
Saurabh February 11, 2020 at 11:50 pm #

Hello Jason,

Could you please share your views on “How to label overlapping objects?” What is the best practice with reference to overlapping objects? The problem is most of the labeling tools don’t support oriented bounding boxes.

How can I inform my object detector that it should look at only certain part of images without cropping images? Can I edit images and put white/black (constant) color so that object detector will ignore such areas?

Kindly share your views.

Thanking you!

Reply
- Jason Brownlee February 12, 2020 at 5:47 am #
  
  I don’t have specific advice on the topic, sorry.
  
  Reply
  - Saurabh February 12, 2020 at 7:24 am #
    
    Thank you!
    
    Reply
Savyasachi February 12, 2020 at 4:26 am #

Hello Dr. Brownlee!
I’m running this matterport/mrcnn code on my custom dataset (to detect comic characters). I’m using a total of 6500 images. My training model saturates with a loss of 0.873 (Steps: 2500, Batch: 2, Epoch(at which saturation happens):9th-10th) and it breaks my heart. What are the ways I could tweak my code to lower the loss? (Rest of the config is default)

Thank you so much!

Reply
- Jason Brownlee February 12, 2020 at 5:54 am #
  
  Some of the suggestions here might help:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
  - Savyasachi February 20, 2020 at 4:35 am #
    
    Hi again Dr. Brownlee!
    Is there a way to know if my code will perform well/worse in the first epoch (or some time sooner) rather than waiting for 6 long hours to get a loss value?
    
    Every time I make some changes, I have to run it through the whole cycle till I see the saturation (in loss) after which, I have to manually perform a ‘Keyboard Interrupt’
    
    Reply
    - Jason Brownlee February 20, 2020 at 6:21 am #
      
      No.
      
      Reply
Phil February 13, 2020 at 11:42 am #

Hi Jason! This is a great tutorial.This is the exact solution to the problem I’m trying to solve.
One quick question. My model gets trained fine but it is not creating the checkpoint models at any point during the training or at the end. So I’m basically left with a trained model object.

I’ve searched my whole system in the case it was cached at some other location. Could not find it though.
Could you please tell me if you’ve come across this kind of a problem before and how to solve it?

I am on windows 10
With keras==2.2.5
tensorflow==1.15
mask-rcnn-12rics==0.2.3

Reply
- Jason Brownlee February 13, 2020 at 1:24 pm #
  
  The models are saved in the current working directory I believe, under a subdirectory for the run.
  
  Reply
Juan February 14, 2020 at 11:58 pm #

Hi Jason, I’m having troubles understandig how to detect multiple classes.

During the annotation process, do we need to split each class in its own folder?
e.g. class1/annots and class1/images, class2/annots and class2/images

I don’t know if this is the good approach, since there might be images where both classes appear.

It would be great to know how should the folder structure be and the code for the load_dataset function.

Thanks!

Reply
- Jason Brownlee February 15, 2020 at 6:31 am #
  
  The choice is yours, as long as it is presented consistently to the model during training.
  
  Reply
Steven February 18, 2020 at 6:26 pm #

Hey Jason,

Thanks, Thanks, Thanks.
This is the best tutorial i found for keras.
I had no hard problems to do this.

You did very well.

Reply
- Jason Brownlee February 19, 2020 at 7:59 am #
  
  Thanks!
  
  Reply
Narottam February 18, 2020 at 9:26 pm #

Hi Jason, i build a single/multi-class classification poc project on different object using your tutorials. Thanks for the neat explanation above.
Now as a part of complete project I require your’s suggestion on below points:
1. Ideally in which case model accuracy will be high i.e in single class model or multi-class model (I did single and multi-class on different object) and accuracy on new data seems to be low on traning epoch-100 and learning_rate = 0.0001
2. What are the different hyper-parameters I can tune apart from the learning_rate and epoch for getting better accuracy using Mask RCNN
3. I’m working on architecture project, how can i detect the line connecting different object like A———B, how can i detect the line between A & B

Your help will be very much appreciated !!!

Reply
- Jason Brownlee February 19, 2020 at 8:03 am #
  
  Well done!
  
  This might give you ideas for improving model performance generally:
  https://machinelearningmastery.com/start-here/#better
  
  Not sure about detecting lines, sorry. Sounds like classical computer vision might be useful.
  
  Reply
- Andy August 2, 2020 at 1:25 am #
  
  Can you share your code for multi class ?
  
  Reply
Helmy February 23, 2020 at 5:22 am #

Hey Jason, is it worth it to pass the images through an edge detector like Sobel, prewitt, canny as a pre-processing step before sending them off to Mask RCNN ?

In an attempt to make it “Easier” to increase accuracy ? Any literature or references you recommend reading ?

Reply
- Jason Brownlee February 23, 2020 at 7:32 am #
  
  Probably not. Perhaps try it?
  
  Reply
G February 27, 2020 at 3:57 pm #

Hi! Can you explain what’s going on with:

pyplot.subplot(330 + 1 + i)

Why those numbers?

Reply
- Jason Brownlee February 28, 2020 at 5:56 am #
  
  3 rows, 3 columns and the image number from 1 to 9.
  
  Reply
Runist February 28, 2020 at 6:04 pm #

The code give me a lot of warning such as “Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.”
May be I should change a better comptuer.But is there a cheap method?

Reply
- Jason Brownlee February 29, 2020 at 7:09 am #
  
  Perhaps try running on a machine with more RAM, e.g. EC2?
  
  Reply
  - Runist February 29, 2020 at 1:03 pm #
    
    You mean GPU more RAM or CPU?
    
    Reply
    - Jason Brownlee March 1, 2020 at 5:20 am #
      
      Perhaps.
      
      Reply
He March 3, 2020 at 12:29 am #

Hi Jason, can you kindly create a tutorial to Estimate the Speed of Object in the detected boxes? Or have any reference to such tutorials?

Reply
- Jason Brownlee March 3, 2020 at 6:00 am #
  
  Thanks for the suggestion.
  
  Reply
Nourhan March 4, 2020 at 9:21 pm #

Hello Mr. Jason, thank you for the very beneficial and informative tutorials you are making. I appreciate your great effort. I would like to suggest having a similar tutorial in multiple classes object detection not only a one, if possible. Thanks again.

Reply
- Jason Brownlee March 5, 2020 at 6:34 am #
  
  Thanks.
  
  Great suggestion.
  
  Reply
Dimitrios Politikos March 6, 2020 at 10:13 pm #

Hi Jason,

When I’m trying to evaluate the PredictionConfig in a case study with marine litter images

cfg = PredictionConfig()
# define the model
model = MaskRCNN(mode=’inference’, model_dir=’./’, config=cfg)
# load model weights
model.load_weights(‘mask_rcnn_train_config_0005.h5’, by_name=True)
# evaluate model on training dataset
train_mAP = evaluate_model(train_set, model, cfg)

I get as a message:

“re-start from epoch 5 ” and the run stucks there.

Should I wait or there is a bug in my code?

Thanks,

Dimitris

Reply
- Jason Brownlee March 7, 2020 at 7:17 am #
  
  No, you can ignore the warning I think.
  
  Reply
  - Dimitrios Politikos March 7, 2020 at 8:03 am #
    
    Thank you for your response. Really appreciated!
    
    D.
    
    Reply
    - Jason Brownlee March 8, 2020 at 6:00 am #
      
      You’re welcome.
      
      Reply
Steven March 7, 2020 at 12:48 am #

Hey Jason,

I successfully run your prejct on my cpu.
Now i want to do this on my gpu.

I installed the latest versions of all libaries.
TF-gpu : 2.1.0
keras-gpu: 2.3.1
cudnn: 7.6.5
cudatoolkit: 10.1.243

the problem is, that model.py thorws many Errors like renaming tf.log(x) to tf.math.log(x)…

the question is:
1. Can you publish a project for latest libary versions?
2. Can you say wich libary versions i have to install for using your project on gpu?
Like I said it works fine in an environment without gpu usage. But with is never happened.

I hope you can help me.

Reply
- Jason Brownlee March 7, 2020 at 7:19 am #
  
  The example will not work with TensorFlow 2 because the Mask RCNN library has not yet been updated to support it.
  
  Reply
  - Steven March 7, 2020 at 12:52 pm #
    
    could you give me please an example for settings?
    
    wich version do you use, or wich are available.
    Stucked on it for 2 days now…
    
    Reply
    - Jason Brownlee March 8, 2020 at 6:01 am #
      
      Yes, I mention this at the top of the page.
      
      You can use TensorFlow 1.14 or 1.15.
      
      Reply
      - Steven March 11, 2020 at 2:56 am #
        
        I tried but dont works how it should.
        
        I got now:
        tf-gpu: 1.14
        keras: 2.2.5
        cuda: 10.0
        cudnn: 7.4.1.5
        
        The script run until Epoch 1/20:
        Image 1/100 [………]
        
        an dit doesnt make progress.
        
        Can u give me please your versions of theese 4 things, to get it work?
      - Jason Brownlee March 11, 2020 at 5:28 am #
        
        Perhaps there is something going no with your workstation.
        
        Perhaps try running other code to confirm your libraries can fit a basic model.
        Perhaps try running the code on another machine to confirm you have everything you need?
jackson March 12, 2020 at 2:43 am #

Hello Jason,

I have a question!
Why is it that when training the model, the loss for the classification output on the train set is usually lower than that of the validation datasets (e.g. mrcnn_class_loss and val_mrcnn_class_loss), as well as why is the loss for the bounding box output for the train lower than that of the validation datasets (mrcnn_bbox_loss and val_mrcnn_bbox_loss)?

Thank you.

Reply
- Jason Brownlee March 12, 2020 at 8:52 am #
  
  Some difference between the two sets is to be expected, see this:
  https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/
  
  Reply
Steve March 16, 2020 at 7:46 pm #

Hey Jason,
is it possible that my training on gpu (8GB gpu) not work because the net is to big for this problem?
is tried to use resnet50 but ist got Allocation problems.

How many gpu memory do you have?

Reply
- Jason Brownlee March 17, 2020 at 8:13 am #
  
  Maybe.
  
  I generally recommend training on AWS EC2:
  https://machinelearningmastery.com/develop-evaluate-large-deep-learning-models-keras-amazon-web-services/
  
  Reply
Steven March 16, 2020 at 10:50 pm #

Hey Jason,

i’m not sure if i can train pictures with the size 1024×1024 with resnet50 on my gpu

I ‘ve got a GeForce RTX 2070 and i can’t run it on gpu.
I don’t have any Exceptions. The commandpromt just hanging.
Over a monitor for gpu i see that it want to use all, but i think it’s not enough.

Can you help me?

Another question is, how can i manipulate the resnet50 to a smaller net (if it’s the solution for my problem)

Reply
- Jason Brownlee March 17, 2020 at 8:16 am #
  
  Might be too large. Perhaps try smaller images first.
  
  Reply
  - Steven March 18, 2020 at 2:39 am #
    
    I tried to use 64×64 images but it still not works.
    I also tried just one picture per epoch.
    
    The problem occured fot others too. https://github.com/matterport/Mask_RCNN/issues/287
    
    Now i’m wondering if its a problem of the generator.
    
    I dont think it’s a problem of storage of gpu because the Script stucks without any errors.
    
    Can u pls help with some advices?
    
    Reply
    - Jason Brownlee March 18, 2020 at 6:12 am #
      
      Sorry to hear that, I don’t have any good advice.
      
      Reply
      - Steven March 18, 2020 at 11:27 pm #
        
        did you run your project on gpu ?
        is it even possible?
        
        and what libaries you use?
        
        pls say me your versions of :
        Tensorflow
        Keras
        Cudnn
        Cuda
        Python
        
        I really have no other ideas than trying your versions and hope for working
      - Jason Brownlee March 19, 2020 at 6:27 am #
        
        Yes.
        
        Tensorflow 1.14 or 1.15, Python 3.6 and an EC2 instance:
        https://machinelearningmastery.com/develop-evaluate-large-deep-learning-models-keras-amazon-web-services/
        
        It also works just fine on CPU with the same libraries.
chiraz March 18, 2020 at 11:52 pm #

Hello Jason

I am working with Faster rcnn for defects detection and i would like that you help me how to detect objects from scratch with my own dataset with of course a pretrained cnn like vgg16 or resnet. How to prepare the data and insert it in Jupyter notebook or even in anaconda virtual environment. I will be very thankful

Thanks

Reply
- Jason Brownlee March 19, 2020 at 6:28 am #
  
  This tutorial will help you to setup your development environment:
  https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/
  
  Reply
Thanakorn March 20, 2020 at 12:36 am #

Hi Jason,

I would like to ask you that how can I add the name of the label with the AP at the top left of the rectangle ??

Reply
- Jason Brownlee March 20, 2020 at 8:46 am #
  
  You can draw text directly onto the image. Perhaps review the pillow API or the matplitlib API.
  
  Reply
João Vitor Granzotti Machado March 24, 2020 at 2:55 am #

Hi Jason, I’m trying to make a traffic light detector, I have a very large dataset of images known as DTLD and I would like to use it in this tutorial.
The images of the dataseet have dimensions 2048X1024 and the objects to be detected are very small. When performing the training and validation for the first time, the result obtained was very bad. I imagine it is due to the resizing performed on the images.
If I change the IMAGE_RESIZE_MODE parameter from “square” to “none” can I continue using transfer learning normally? Or would it be necessary to train the network from scratch?

In the config.py file the following information is provided, however I don’t know if I can change this parameter according to my will.

# Input image resizing
# Generally, use the “square” resizing mode for training and predicting
# and it should work well in most cases. In this mode, images are scaled
# up such that the small side is = IMAGE_MIN_DIM, but ensuring that the
# scaling doesn’t make the long side> IMAGE_MAX_DIM. Then the image is
# padded with zeros to make it a square so multiple images can be put
# in one batch.
# Available resizing modes:
# none: No resizing or padding. Return the image unchanged.
# square: Resize and pad with zeros to get a square image
# of size [max_dim, max_dim].
# pad64: Pads width and height with zeros to make them multiples of 64.
# If IMAGE_MIN_DIM or IMAGE_MIN_SCALE are not None, then it scales
# up before padding. IMAGE_MAX_DIM is ignored in this mode.
# The multiple of 64 is needed to ensure smooth scaling of feature
# maps up and down the 6 levels of the FPN pyramid (2 ** 6 = 64).
# crop: Picks random crops from the image. First, scales the image based
# on IMAGE_MIN_DIM and IMAGE_MIN_SCALE, then picks a random crop of
# size IMAGE_MIN_DIM x IMAGE_MIN_DIM. Can be used in training only.
# IMAGE_MAX_DIM is not used in this mode.
IMAGE_RESIZE_MODE = “square”
IMAGE_MIN_DIM = 800
IMAGE_MAX_DIM = 1024

Reply
- Jason Brownlee March 24, 2020 at 6:08 am #
  
  I wonder if you can use smaller images.
  
  It might be worth looking in the literature for models that are appropriate for this specific problem or detecting small objects generally.
  
  Reply
  - João Vitor Granzotti Machado March 24, 2020 at 9:38 am #
    
    In this case an interesting processing would be to change the size of the 2048×1024 to 2048×512 images, cutting the lower half of the image, as it is a known fact that there are no traffic lights below the horizon line.
    Using the default values for maximum and minimum size of images (IMAGE_MIN_DIM = 800, IMAGE_MAX_DIM = 1024) I didn’t get a good result, I was wondering if it would be possible to increase the values IMAGE_MIN_DIM and IMAGE_MAX_DIM and continue using transfer learning.
    
    Reply
    - Jason Brownlee March 24, 2020 at 1:44 pm #
      
      Good question, perhaps try it and compare results?
      
      Reply
      - João Vitor Granzotti Machado March 25, 2020 at 3:27 am #
        
        anging the values of IMAGE_MIN_DIM and IMAGE_MAX_DIM I get the following error:
        OSError: [Errno 12] Cannot allocate memory
        I’m running the code on Google Colab, as I don’t have the processing power necessary to train the base in a reasonable time on my computer.
        Therefore, there are two possibilities for this error, either it is related to excess size of the images or it is not possible to carry out transfer learning by changing the mentioned parameters.
      - Jason Brownlee March 25, 2020 at 6:39 am #
        
        Perhaps try and AWS EC2 with more memory, say 64GB?
  - João Vitor Granzotti Machado March 24, 2020 at 10:11 am #
    
    The fact that my images are in BGR (openCV) format and not in RGB format may be sabotaging my training
    
    Reply
    - Jason Brownlee March 24, 2020 at 1:44 pm #
      
      Perhaps you can convert some and see if it makes a difference?
      
      Reply
Prashanth Mariappan March 28, 2020 at 2:50 am #

Hey this is a great tutorial it is very helpful could you please tell what are all the changes required if we want to train multiple classes. I tried on my own iam getting some errors in load_mask() function

Reply
- Jason Brownlee March 28, 2020 at 6:26 am #
  
  Very few changes, just to the definition of the model – e.g. how the dataset is loaded and classes are defined.
  
  Reply
  - Prashanth Mariappan March 28, 2020 at 3:14 pm #
    
    I’ve made the cahnges . But I am getting the following error in the part where we check out data set with masks. Please help me out
    
    AssertionError Traceback (most recent call last)
    in ()
    105 bbox = extract_bboxes(mask)
    106 # display image with masks and bounding boxes
    –> 107 display_instances(image, bbox, mask, class_ids, train_set.class_names)
    
    /content/drive/My Drive/masked rcnn/Mask_RCNN/mrcnn/visualize.py in display_instances(image, boxes, masks, class_ids, class_names, scores, title, figsize, ax, show_mask, show_bbox, colors, captions)
    103 print(“\n*** No instances to display *** \n”)
    104 else:
    –> 105 assert boxes.shape[0] == masks.shape[-1] == class_ids.shape[0]
    106
    107 # If no axis is passed, create one and automatically call show()
    
    AssertionError:
    
    In load data set i have added the 2nd class using self.add class and also in load_masks funcion I have added it to class_ids. What else should Idp
    
    Reply
    - Jason Brownlee March 29, 2020 at 5:49 am #
      
      Sorry, I don’t know the cause of your fault. Perhaps try posting your code and issue to stackoveflow?
      
      Reply
Rajesh March 30, 2020 at 6:14 pm #

Can we use other weights for training. If so where can we download it.

Reply
- Jason Brownlee March 31, 2020 at 7:59 am #
  
  I’m not aware of other pretrained weights.
  
  Reply
Dmitry Kroytor April 6, 2020 at 12:10 am #

Thanks for the tutorial!

I tried to run this on google colab and i had this erorr:
[ module ‘tensorflow’ has no attribute ‘placeholder’ ]

finaly solve it by caling this lines before everything:
%tensorflow_version 1.x
import tensorflow
print(tensorflow.__version__)

Reply
- Jason Brownlee April 6, 2020 at 6:08 am #
  
  Sorry, I don’t know about google colab.
  
  Reply
kaka April 6, 2020 at 10:07 pm #

Hi Great tutorial, How about the images that no Kangaroo? it seems that all the images have Kangaroo. How to set the model for images don’t have Kangaroo or have in train data and test data?

Reply
- Jason Brownlee April 7, 2020 at 5:49 am #
  
  Good question. The model could include some images with no objects during training.
  
  Reply
  - kaka April 7, 2020 at 10:49 am #
    
    thanks for your answer, How to input the data that have no kangaroo? I mean that the xml and images file that have no kangaroo to be the model input? Could you give details how to deal with this. Thanks in advance!
    
    Reply
    - Jason Brownlee April 7, 2020 at 1:29 pm #
      
      Not sure, I have not done it. Perhaps try experimenting.
      
      Reply
Teh April 6, 2020 at 11:19 pm #

Hi, may i know how i want to save the model and how is the code to do the prediction on the new images?

Reply
- Jason Brownlee April 7, 2020 at 5:50 am #
  
  We do exactly this in the tutorial above.
  
  Reply
Arsalan April 7, 2020 at 4:16 am #

Sir, i didn’t understand the ranges of rows and columns you’ve set for creating masks , can you kindly explain it?

Reply
- Jason Brownlee April 7, 2020 at 5:57 am #
  
  Which part doesn’t make sense to you?
  
  Reply
kaka April 7, 2020 at 6:08 pm #

Hi How to set use one GPU in the code?

Reply
- Jason Brownlee April 8, 2020 at 7:48 am #
  
  Configure tensorflow on your workstation to use GPU, then the example will run in the GPU.
  
  Reply
  - kaka April 8, 2020 at 5:19 pm #
    
    thanks for answer, I mean how to revise the code to use one GPU? I mainly use the GCP GPU.
    
    Reply
    - Jason Brownlee April 9, 2020 at 7:57 am #
      
      No change to the code, only a change to your tensorflow library.
      
      Reply
kaka April 7, 2020 at 9:27 pm #

Hi If my image id is not int type, how to change the code of load_image?

Reply
- Jason Brownlee April 8, 2020 at 7:51 am #
  
  Sorry, I don’t have the capacity to help you customize the code.
  
  Reply
zenith April 10, 2020 at 4:10 pm #

Hi I am building a model for image recognition.
The model should able to identify which image is provided by user.
I have two (2) sets of images.
Passport images and Driving Liscence images.
I am building a model using these images.
I am having only 119 images of passport for train.
I am training on passport images
After completion of model when i test the model it gives more probability on Driving liscence images than on passport images.
Whaty can the issue will be?
How i do it with adding a bounding Box for on training images

Reply
- Jason Brownlee April 11, 2020 at 6:08 am #
  
  You can prepare the data with bounding boxes defined and the model will lean how to localize the items in new photos.
  
  Reply
zenith April 10, 2020 at 4:12 pm #

i have a model for image recognition.
i am using passport images for training.
when i test it using liscence images it gives more probability on liscence images.
What can be the issue will be?

Reply
- Jason Brownlee April 11, 2020 at 6:08 am #
  
  Perhaps the model has overfit?
  
  Some of these tutorials will help:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
anish jain April 13, 2020 at 10:39 pm #

Starting at epoch 0. LR=0.001

Checkpoint Path: ./content/kangaroo_cfg20200413T1223/mask_rcnn_kangaroo_cfg_{epoch:04d}.h5
Selecting layers to train
fpn_c5p5 (Conv2D)
fpn_c4p4 (Conv2D)
fpn_c3p3 (Conv2D)
fpn_c2p2 (Conv2D)
fpn_p5 (Conv2D)
fpn_p2 (Conv2D)
fpn_p3 (Conv2D)
fpn_p4 (Conv2D)
In model: rpn_model
rpn_conv_shared (Conv2D)
rpn_class_raw (Conv2D)
rpn_bbox_pred (Conv2D)
mrcnn_mask_conv1 (TimeDistributed)
mrcnn_mask_bn1 (TimeDistributed)
mrcnn_mask_conv2 (TimeDistributed)
mrcnn_mask_bn2 (TimeDistributed)
mrcnn_class_conv1 (TimeDistributed)
mrcnn_class_bn1 (TimeDistributed)
mrcnn_mask_conv3 (TimeDistributed)
mrcnn_mask_bn3 (TimeDistributed)
mrcnn_class_conv2 (TimeDistributed)
mrcnn_class_bn2 (TimeDistributed)
mrcnn_mask_conv4 (TimeDistributed)
mrcnn_mask_bn4 (TimeDistributed)
mrcnn_bbox_fc (TimeDistributed)
mrcnn_mask_deconv (TimeDistributed)
mrcnn_class_logits (TimeDistributed)
mrcnn_mask (TimeDistributed)
—————————————————————————
AttributeError Traceback (most recent call last)
in ()
—-> 1 model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers=’heads’)

1 frames
/content/Mask_RCNN/mrcnn/model.py in compile(self, learning_rate, momentum)
2197 tf.reduce_mean(layer.output, keepdims=True)
2198 * self.config.LOSS_WEIGHTS.get(name, 1.))
-> 2199 self.keras_model.metrics_tensors.append(loss)
2200
2201 def set_trainable(self, layer_regex, keras_model=None, indent=0, verbose=1):

AttributeError: ‘Model’ object has no attribute ‘metrics_tensors’

this error comes when i train model
tensorflow 1.15
in google colab

Reply
- Jason Brownlee April 14, 2020 at 6:18 am #
  
  Perhaps try running on your workstation or on ec2. Perhaps colab is the issue?
  
  Reply
  - Akash Kumar April 24, 2020 at 7:54 am #
    
    Collab has no issue. I have trained and achieved results using tensorflow 1.15.0 and Keras 2.2.4. However, I want to detect in the video after training on images. How should I achieve it?
    
    Reply
    - Jason Brownlee April 24, 2020 at 8:03 am #
      
      Perhaps you can extract frames of the video and pass them to your model?
      
      Reply
Williana April 16, 2020 at 2:54 am #

Hi, I’m a Brazilian student!
I am replicating your tutorial for my own dataset. I’m also using Mask-RCNN for object detection only. During training, only two metrics are presented: loss and val_loss. The metrics you talked about (e.g. mrcnn_class_loss and val_mrcnn_class_loss, mrcnn_bbox_loss and val_mrcnn_bbox_loss), are not displayed during the training, do you know why this happens? I’m using verbose = 1.

Reply
- Jason Brownlee April 16, 2020 at 6:04 am #
  
  Well done.
  
  I don’t know why there is a difference.
  
  Reply
Ghafour April 17, 2020 at 11:24 am #

When I execute the above complete code the error “module ‘dask.dataframe’ has no attribute ‘Series'” is taken. and I can not solve the problem
What happened

Reply
- Jason Brownlee April 17, 2020 at 1:31 pm #
  
  I’m sorry to hear that, perhaps this will give you some ideas:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - Ghafour April 18, 2020 at 12:43 am #
    
    Thanks for your reply, I uninstall flask and reinstall it and the problem of “module ‘dask.dataframe’ has no attribute ‘Series’” solved.
    
    Reply
    - Jason Brownlee April 18, 2020 at 6:03 am #
      
      Sorry, I don’t know. It says “dask” not “flask” are they related?
      
      Try posting/searching on stackoverflow.
      
      Reply
Ghafour April 18, 2020 at 9:02 pm #

dask is true. Sorry about my mistake

Reply
- Jason Brownlee April 19, 2020 at 5:55 am #
  
  No problem.
  
  Reply
saipavankumar April 23, 2020 at 2:38 am #

Hi Jason, I have searched for text detection and recognition on your blog and I haven’t found anything and can I use RCNN for text detection and what about faster RCNN.

Reply
- Jason Brownlee April 23, 2020 at 6:11 am #
  
  I don’t think have tutorials on that topic, sorry.
  
  Reply
Lam Thanh April 23, 2020 at 6:46 am #

Background class detection

Hi when I input an image without Kangaroo, the model outputs y_hat as empty arrays, I think it should be 0’s (no kangaroo then it should see as background class).

yhat[0][‘class_ids’]

>> array([], dtype=int32)

is it true that I’m supposed to have an image dataset without kangaroo so that the model can learn to detect background class?

Thank you a lot,
Look forward to your response,

Reply
- Jason Brownlee April 23, 2020 at 7:45 am #
  
  You could change the model to operate that way if you wish.
  
  Reply
  - Lam Thanh April 23, 2020 at 9:14 pm #
    
    hi, I’m thinking some solutions to applying in that way,
    
    1. Add a background dataset
    2. Change the source code
    
    I hope you could please tell me which direction is fine.
    Thank you a lot
    
    Reply
    - Jason Brownlee April 24, 2020 at 5:41 am #
      
      Perhaps explore both and see what works/makes sense?
      
      Reply
Akash Kumar April 24, 2020 at 7:51 am #

I have trained the model on images and want to test on videos and lebel objects in video. Any suggestions and links are appreciated.

Reply
- Jason Brownlee April 24, 2020 at 8:03 am #
  
  Perhaps you can extract frames of the video and pass them to your model?
  
  Reply
Lorenzo Gabrielli April 24, 2020 at 6:19 pm #

Hi,
I’m trying your code on Colab, to use GPU, but when the train starts it says that I’m not using GPU.

Do you know if I have to run something different or it is simply a system problem?

Thanks a lot.

Reply
- Jason Brownlee April 25, 2020 at 6:42 am #
  
  I don’t know about colab, I recommend running on AWS ec2:
  https://machinelearningmastery.com/develop-evaluate-large-deep-learning-models-keras-amazon-web-services/
  
  Reply
shradha April 24, 2020 at 8:41 pm #

Hi, I am trying to use my custom images for training with 1 class. When the object isnt there in any image, then I do not generate the xml. Due to this training ‘model.train’ is throwing error ‘No such file or directory’ How to handle this situation.
(Particularly i m trying to solve the kaggle competition for table detection in document images).

Reply
- Jason Brownlee April 25, 2020 at 6:46 am #
  
  Perhaps define a new class as “none”.
  
  Reply
Ketil April 28, 2020 at 12:16 am #

So… did this post copy this one: https://towardsdatascience.com/object-detection-using-mask-r-cnn-on-a-custom-dataset-4f79ab692f6d

Or did she copy you?

Not that it matters much, but I can’t find any attribution in either piece, which would be normal courtesy. Credit where it’s due.

Reply
- Jason Brownlee April 28, 2020 at 6:47 am #
  
  They copied me, check the publication dates.
  
  I get ripped off every day. It sucks.
  
  Reply
hard May 9, 2020 at 2:55 pm #

hi jason, can this code applied for another object like bloods or single object and work from anaconda envt

Reply
- Jason Brownlee May 10, 2020 at 5:55 am #
  
  Maybe. Perhaps prototype a model on your dataset and see how it goes. Also, perhaps check the literature for other solutions to the type of problem you are working on and see what types of models they use.
  
  Reply
Williana May 10, 2020 at 2:16 pm #

Could you please explain to me how to calculate mAR from the “utils.compute_recall” function, I understand that it returns the AR, but how should I calculate the mAR? Please help me!!

Reply
- Jason Brownlee May 10, 2020 at 4:10 pm #
  
  Sorry, I don’t have an example. Thanks for the suggestion!
  
  Reply
tunnn May 17, 2020 at 8:13 pm #

hi jason
i got a broble when i try to run this

from mrcnn.model import MaskRCNN

output is

ModuleNotFoundError: No module named ‘keras’

how to fix it?

Reply
- Jason Brownlee May 18, 2020 at 6:11 am #
  
  You need to install Keras:
  https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/
  
  Reply
Ellie May 19, 2020 at 6:05 pm #

Hey Jason, thank you for the tutorial! I have two questions:

1) I’m getting a terrible train and test mAP. Is the only way to improve this via more data, or do you have any other ideas? (Dataset includes 95 photos for training, and 26 photos for testing)
Train mAP: 0.423
Test mAP: 0.546

2) How could I use this for video multi-object recognition/tracking? Should I just run the video frame by frame? Is there a way I can use my segmentation data to show objects moving across the screen, for example?

Thanks!

Reply
- Ellie May 19, 2020 at 6:31 pm #
  
  Sorry one more question: In relation to the question above about using this for a frame by frame video, I’m wondering if you have any tutorials or ideas on doing a “total object count” for a video. For example, if you were tracking kangaroos across the screen in a video, how could you assign a unique identifier to a newly recognized kangaroo and then report the total number of kangaroos in the video at the end, even if some appeared and left during the video?
  
  Thanks!
  
  Reply
  - Jason Brownlee May 20, 2020 at 6:22 am #
    
    Sorry, I do not have tutorials on video or object counts.
    
    Reply
- Jason Brownlee May 20, 2020 at 6:21 am #
  
  Perhaps try data augmentation?
  Perhaps try changing the model?
  Perhaps try changing learning parameters?
  
  For video, perhaps try applying the model to each frame or a subset of frames?
  
  Reply
Baran May 22, 2020 at 9:31 am #

Hi Jason,

Thanks for the amazing tutorial. I’ve got some brief questions.

Firstly, just to confirm, the masks are passed to the model in the form of an array of shape (H, W, num_masks), correct? This appears to be what’s going on in the load_masks method.

Secondly, I can’t quite identify where the sizes of the training images comes into play. For example, you haven’t specified a specific image size that the model should expect. So, does the model expect a particular input size (i.e. H x W x num. channels) – if so, what is it?

Thanks!

Reply
- Jason Brownlee May 22, 2020 at 1:20 pm #
  
  You can plot the image with the mask to confirm they are as you expect. I show this in the tutorial.
  
  Good question. From memory, I believe the model expects a fixed sized images and the library around it handles image resizing.
  
  Reply
Remi May 24, 2020 at 1:33 pm #

Awesome tutorial ! Thank you.

Reply
- Jason Brownlee May 25, 2020 at 5:43 am #
  
  Thanks!
  
  Reply
kevinn May 25, 2020 at 11:38 pm #

Wonderful Works! Thank You!

Reply
- Jason Brownlee May 26, 2020 at 6:25 am #
  
  Well done!
  
  Reply
utkarsh May 27, 2020 at 6:47 pm #

hi jason ,
i have trained my model successfully but it is making many masks more then i expct how to solve this can u please tell me

Reply
- Jason Brownlee May 28, 2020 at 6:12 am #
  
  You will have to debug your code to discover the answer.
  
  Reply
Ardhika Nofardiansa June 5, 2020 at 4:29 am #

hi Jason ,
i have trained my model successfully based by your tutorial(My model is for motorbike detection). And then, how i can get the output file of this trained? Are .h is the output? I mean, i want to just call the output trained model if i want use to other source code for motorbike detection. So, i don’t need to train it from the beginning again if i want detect the motorbike.

Thank you.

Reply
- Jason Brownlee June 5, 2020 at 8:23 am #
  
  You can make a prediction by calling the predict() function on the model:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  You can save the model to file, load it later and make predictions on new data. No need to retrain:
  https://machinelearningmastery.com/save-load-keras-deep-learning-models/
  
  Reply
Anand Nataraj June 5, 2020 at 6:36 am #

What so ever I do i’m getting the below error while implementing multiclass:

IndexError: boolean index did not match indexed array along dimension 0; dimension is 2 but corresponding boolean dimension is 1

I would request a kind favour from the author to help me in implenting the multiclass object detection.

Reply
- Jason Brownlee June 5, 2020 at 8:26 am #
  
  Sorry to hear that. This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/can-you-change-the-code-in-the-tutorial-to-___
  
  Reply
Donald June 5, 2020 at 8:37 am #

Hi Jason, thank you for the tutorial. I’d like to ask something about anchor boxes. If i have anchor_box_scales = [32, 64, 128], what does these values mean exactly? Are they square pixels (area), or are they scalar values? If i have very small objects that range between 20×20 pixels to 40×40 what should i put as values? Can I only put two? I would love some guidance and insight if possible, of course.
Thank you again.

Reply
- Jason Brownlee June 5, 2020 at 1:40 pm #
  
  The average sizes of objects in the dataset used to train the model, I believe.
  
  You can try smaller boxes and see if it makes a difference for your dataset.
  
  Reply
Anand June 5, 2020 at 1:48 pm #

Is Mask-RCNN better than retinaNet? May I know what is the best of all models available for object detection?

Reply
- Jason Brownlee June 6, 2020 at 7:40 am #
  
  It may depend on the specifics of your dataset.
  
  Perhaps test a suite of techniques on your dataset and discover which best meets your needs.
  
  Reply
Reki Dian June 6, 2020 at 6:00 pm #

Hi jason,

In this tutorial, use 32 last file for test, right? But, How should i do if i want to get randoms data test? so the data test is not the 32 last file but random file. Thank You

Reply
- Jason Brownlee June 7, 2020 at 6:20 am #
  
  Sorry, I don’t understand. Can you please rephrase or elaborate your question?
  
  Reply
Anand Nataraj June 7, 2020 at 12:47 am #

Is there any way to print the Train and Validation accuracy in the callback?

Reply
- Jason Brownlee June 7, 2020 at 6:28 am #
  
  Probably not a good idea to print from a callback, but perhaps try it directly and see.
  
  Reply
Siddhant K. Sancheti June 7, 2020 at 5:35 am #

Hello Jason,
Thanks a lot for such a great tutorial.
Firstly, how long does it take to calculate mAP scores?? its been half an hour its still processing. i think I am in a loop!!
I just had a doubt. Why do u need to use the scaled image for evaluation during prediction as per your code?
def evaluate_model(dataset, model, cfg):
APs = list()
for image_id in dataset.image_ids:
# load image, bounding boxes and masks for the image id
image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)
# convert pixel values (e.g. center)
scaled_image = mold_image(image, cfg)
# convert image into one sample
sample = expand_dims(scaled_image, 0)
# make prediction
yhat = model.detect(sample, verbose=0)
# extract results for first sample
r = yhat[0]
# calculate statistics, including AP
AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r[“rois”], r[“class_ids”], r[“scores”], r[‘masks’])
# store
APs.append(AP)
# calculate the mean AP across all images
mAP = mean(APs)
return mAP

Also, I didn’t get your following statements:

1.the pixel values of the loaded image must be scaled in the same way as was performed on the training data, e.g. centered. This can be achieved using the mold_image() convenience function.

2. The dimensions of the image then need to be expanded one sample in a dataset and used as input to make a prediction with the model.

Thanks in advance!!!

Reply
- Jason Brownlee June 7, 2020 at 6:30 am #
  
  That sounds too long. Perhaps try running on a faster machine or double check your code.
  
  Yes, any data prep applied to the training data must be also be applied to new data, like test data. This often means scaling the pixels in the same way.
  
  Yes, the model expects one or more samples as input, in this case images. We need to ensure the input has appropriate dimension to meet the expectations of the model.
  
  Reply
Siddhant Sancheti June 7, 2020 at 1:33 pm #

Hello Jason,
How can I improve my mAP scores? I’ve been getting scores as follows
Train mAP: 0.760
Test mAP: 0.657
don’t know why such low score b’cause prediction its predicting each and every defined object i.e. gun, knife, and sword in my case very accurately.

Thanks in advance!!

Also, I guess something with this particular webpage. This page is working too slowly and getting lagged while other pages of machinelearningmastery or other websites are working perfectly fine

Reply
- Jason Brownlee June 8, 2020 at 6:02 am #
  
  Good question.
  
  As a first step, perhaps try tuning the model and/or getting more data.
  
  Beyond that, the tutorials here will teach you how to get more out of your model:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
- Andy August 2, 2020 at 1:16 am #
  
  Can you show the code for multiple class?
  
  Reply
Ali June 11, 2020 at 1:03 am #

Hello. I Have this error. I dont know how to solve it:

Traceback (most recent call last):

File “”, line 47, in
train_mAP = evaluate_model(train_set, model, cfg)

File “”, line 32, in evaluate_model
AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r[“rois”], r[“class_ids”], r[“scores”], r[‘masks’])

File “C:\Users\lenovo\Anaconda3\envs\tensorflow\lib\site-packages\mrcnn\utils.py”, line 739, in compute_ap
iou_threshold)

File “C:\Users\lenovo\Anaconda3\envs\tensorflow\lib\site-packages\mrcnn\utils.py”, line 691, in compute_matches
overlaps = compute_overlaps_masks(pred_masks, gt_masks)

File “C:\Users\lenovo\Anaconda3\envs\tensorflow\lib\site-packages\mrcnn\utils.py”, line 107, in compute_overlaps_masks
masks1 = np.reshape(masks1 > .5, (-1, masks1.shape[-1])).astype(np.float32)

File “C:\Users\lenovo\Anaconda3\envs\tensorflow\lib\site-packages\numpy\core\fromnumeric.py”, line 257, in reshape
return _wrapfunc(a, ‘reshape’, newshape, order=order)

File “C:\Users\lenovo\Anaconda3\envs\tensorflow\lib\site-packages\numpy\core\fromnumeric.py”, line 52, in _wrapfunc
return getattr(obj, method)(*args, **kwds)

ValueError: cannot reshape array of size 0 into shape (0)

can you help please

Reply
- Jason Brownlee June 11, 2020 at 5:58 am #
  
  Sorry to hear that, this may help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Shrey June 22, 2020 at 11:16 pm #

Hello Jason,

First of all, wanted so thank you for this tutorial. So far, it has been really thorough, especially in helping a ML beginner like me to grasp relatively complex ideas.

I have been following the tutorial step-by-step so far with code on Google Colab and python v 3.6 installed as well. I am using this to work on a school project that uses object detection to recognize traffic lights in images.

When train the model, it gives me an error that others have experienced before i.e. AttributeError: ‘Model’ object has no attribute ‘metrics_tensors’

Not exactly sure what is incorrect here, have looked at the model.py file and there only seems to be that single instance of metrics_tensors when we add metrics to the losses. Were you or the others (who might have faced similar errors) able to identify the source of the error?

Cheers

Reply
- Jason Brownlee June 23, 2020 at 6:26 am #
  
  Well done on your progress.
  
  Perhaps there is a library version problem with your environment. Maybe try and run the example locally instead?
  
  Reply
Samek July 7, 2020 at 7:16 pm #

I am geting this error while trying

model = MaskRCNN(mode=’training’, model_dir=’./’, config=config)

error

The following Variables were created within a Lambda layer (anchors)
but are not tracked by said layer:

The layer cannot safely ensure proper Variable reuse across multiple
calls, and consquently this behavior is disallowed for safety. Lambda
layers are not well suited to stateful computation; instead, writing a
subclassed Layer is the recommend way to define layers with
Variables.

Reply
- Samek July 7, 2020 at 7:40 pm #
  
  if i use
  
  model = modellib.MaskRCNN(mode=”inference”, config=config, model_dir=’./’)
  
  i get error
  
  ValueError: Tried to convert ‘shape’ to a tensor and failed. Error: None values not supported.
  
  Reply
  - Bob Masters July 9, 2020 at 5:58 am #
    
    Matterport’s Mask R-CNN code is incompatible with the latest versions of tensorflow and keras. I eliminated such errors by installing TensorFlow 1.5.1 and Keras 2.0.8
    
    Reply
    - Jason Brownlee July 9, 2020 at 6:44 am #
      
      Correct.
      
      Reply
- Jason Brownlee July 8, 2020 at 6:30 am #
  
  Sorry to hear that, perhaps some of these suggestions will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Anand July 10, 2020 at 2:51 am #

mAP is always coming 1.0 and in some cases it exceeded one (1.00000876). Could you please suggest what would be the suitable cause?

Reply
- Jason Brownlee July 10, 2020 at 6:05 am #
  
  Not sure off the cuff, perhaps try experimenting with the model and specific inputs.
  
  Reply
SaiManikanta Vuppala July 17, 2020 at 7:23 pm #

I have data that consists of images and their corresponding annotation files. I have to detect two classes. I intend to use my own neural network. Can you explain to me how to load data into the network.

Reply
- Jason Brownlee July 18, 2020 at 6:00 am #
  
  You may need to write custom code to load your dataset.
  
  Reply
SaiManikanta Vuppala July 19, 2020 at 4:27 am #

could you please suggest me an article to follow

Reply
- Jason Brownlee July 19, 2020 at 6:35 am #
  
  Yes, the above tutorial shows how to load a custom dataset, perhaps you can adapt it to load your custom dataset.
  
  Reply
Rishikesh Pathak July 19, 2020 at 10:43 pm #

Great tutorial..!
I used this to make a multi-class model (face mask and without mask) I trained it and got the output.
But it is not differentiating Between both classes as this code is for single class..
How can I get it to classify the classes seperately to detect faces without masks..help.

Reply
- Jason Brownlee July 20, 2020 at 6:13 am #
  
  Well done!
  
  Good question, you can prepare and load the data to have two classes instead of one. Note the location where we define the classes when loading and defining the dataset.
  
  Reply
  - Rishikesh Pathak July 20, 2020 at 5:45 pm #
    
    Yes, I did that. I updated the code for 2 classes and trained it. but for the predictions(output) how to differentiate between both classes. its showing boxes on all faces(with mask and without) I want to know which face is without mask or with a mask.
    So that I can write a script to detect face mask in a photograph.
    
    Reply
    - Jason Brownlee July 21, 2020 at 5:55 am #
      
      Nice work.
      
      Yes, the model output will indicate the box and the label.
      
      Reply
      - Rishikesh Pathak July 22, 2020 at 2:52 am #
        
        Yes, it’s working, thank you so much. Please keep up the good work. this tutorial helped a lot.
      - Jason Brownlee July 22, 2020 at 5:43 am #
        
        Well done!
- Jonas June 7, 2022 at 4:12 pm #
  
  Hi i am also trying to build a multi class model(Bicycle and car) but I cant seem to get it to work, what changes did you make to the code? my epoch just run forever without exiting
  
  Reply
Narottam Saini August 3, 2020 at 3:50 pm #

Hi Jason, I’m facing issue while trying to run the Mask-RCNN over the google-COLAB environment where first epoch run not getting completed. I tried to solve it by following multiple steps mention by people on various forum but still facing issue. Including trying on various version of tensorflow from version 1.14 to 1.5.1 and keras from 2.0.8 to 2.1.0.

Will it be please possible for you to run the code again at your end with multiple epoch run and then share the requirement.txt file.

Waiting for your reply…

Reply
- Jason Brownlee August 4, 2020 at 6:34 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/do-code-examples-run-on-google-colab
  
  Reply
Michelangiolo August 6, 2020 at 1:35 am #

Hi, great tutorial! I am stuck at

from mrcnn.model import MaskRCNN

output:
ImportError: Keras requires TensorFlow 2.2 or higher. Install TensorFlow via pip install tensorflow

The issue is that MaskRCNN seems incompatible with the latest version of tf. I have been installing tensorflow 1.5 to avoid issues with model_dir not recognized.

Thank you

Reply
- Jason Brownlee August 6, 2020 at 6:15 am #
  
  You must use TF 1.14 and Keras 2.2.
  
  Reply
  - Nils September 8, 2020 at 9:02 pm #
    
    Hey Jason,
    
    I have the same problem, but following my error im unable to download an earlier version of tf.
    
    ERROR: Could not find a version that satisfies the requirement tensorflow==1.15 (from versions: 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0, 2.3.0rc0, 2.3.0rc1, 2.3.0rc2, 2.3.0)
    ERROR: No matching distribution found for tensorflow-gpu==1.15
    
    Thank you for all you great work
    
    Reply
    - Nils September 8, 2020 at 10:54 pm #
      
      Hey all,
      
      python 3.8 does not support lower tensorflow versions. Python 3.7 can.
      
      The next steps can be used when working in anaconda
      
      This can be installed via conda with the command conda install -c anaconda python=3.7 as per https://anaconda.org/anaconda/python.
      
      Though not all packages support 3.7 yet, running conda update –all may resolve some dependency failures.
      
      Reply
      - Jason Brownlee September 9, 2020 at 6:51 am #
        
        Thanks for sharing.
        
        Generally, Python 3.6 is recommend.
    - Jason Brownlee September 9, 2020 at 6:48 am #
      
      Sorry to hear that.
      
      Reply
Peter August 8, 2020 at 11:04 pm #

Hello Jason!
Love your articles.
How about if I want to train from scratch,what changes have to be made to the code.
Thanks Peter.

Reply
- Jason Brownlee August 9, 2020 at 5:44 am #
  
  Do not load the pre-trained weights.
  
  Reply
  - Peter August 9, 2020 at 4:34 pm #
    
    Ok thanks
    Another question…Please must all input be of same shape
    
    Reply
    - Jason Brownlee August 10, 2020 at 5:45 am #
      
      It is common to reshape images to the same size/shape prior to modeling.
      
      The model will do this for you I believe.
      
      Reply
Anand Nataraj August 13, 2020 at 2:04 pm #

Hi Jason,

Is this architecture capable to work for multi class like predicting kangaroo, lion, tiger, etc?

I tried it but getting very less accuracy. Your advice would help.

Thanks,
Anand.

Reply
- Jason Brownlee August 14, 2020 at 5:53 am #
  
  Yes, see this example:
  https://machinelearningmastery.com/how-to-perform-object-detection-in-photographs-with-mask-r-cnn-in-keras/
  
  Reply
Dan September 14, 2020 at 11:18 pm #

Hi Jason and thanks a million for the tutorial. I implemented it and it is fully functional.

It’s been 24 hours since I started to learn ML, bear in mind.

I want to adapt your code to detect certain photos of items in scanned images, so it is not kangaroos. I need to train the model on a completely new type of object.

I can create the training set folders with images and annotations as you defined them, no problem.

But what would I have to change in the code in order to train it on a completely new type of object? I started by eliminating the load_weights instruction.

Reply
- Jason Brownlee September 15, 2020 at 5:26 am #
  
  Well done!
  
  Good question. Load the weights as before. The change is focused on how you load your custom dataset – to ensure that the class, image, and mask are represented correctly using the Mask RCNN API – use the existing code as a guide.
  
  Reply
  - Dan September 17, 2020 at 8:01 pm #
    
    thanks for the help Jason. I now understand some more about the topic.
    
    When I try to train the model with your code it always gives me this error at the end. Have you noticed this before? I changed the number of epochs to run the program faster and trigger the error sooner for debugging. The error appears with the kangaroo dataset as well as with my dataset.
    
    5/6 [========================>…..] – ETA: 49s – loss: 3.1602 – rpn_class_loss: 0.0097 – rpn_bbox_loss: 0.5310 – mrcnn_class_loss: 0.2470 – mrcnn_bbox_loss: 1.4792 – mrcnn_mask_loss: 0.8933 C:\Python\lib\site-packages\skimage\transform\_warps.py:830: FutureWarning: Input image dtype is bool. Interpolation is not defined with bool data type. Please set order to 0 or explicitely cast input image to another data type. Starting from version 0.19 a ValueError will be raised instead of this warning.
    order = _validate_interpolation_order(image.dtype, order)
    C:\Python\lib\site-packages\skimage\transform\_warps.py:830: FutureWarning: Input image dtype is bool. Interpolation is not defined with bool data type. Please set order to 0 or explicitely cast input image to another data type. Starting from version 0.19 a ValueError will be raised instead of this warning.
    order = _validate_interpolation_order(image.dtype, order)
    Traceback (most recent call last):
    File “C:\Python\lib\site-packages\mask_rcnn-2.1-py3.6.egg\mrcnn\model.py”, line 1692, in data_generator
    ZeroDivisionError: integer division or modulo by zero
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
    File “C:/Users/GABI/PycharmProjects/Object_Recognition/main.py”, line 109, in
    model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers=’heads’)
    File “C:\Python\lib\site-packages\mask_rcnn-2.1-py3.6.egg\mrcnn\model.py”, line 2374, in train
    File “C:\Python\lib\site-packages\keras\legacy\interfaces.py”, line 91, in wrapper
    return func(*args, **kwargs)
    File “C:\Python\lib\site-packages\keras\engine\training.py”, line 1418, in fit_generator
    initial_epoch=initial_epoch)
    File “C:\Python\lib\site-packages\keras\engine\training_generator.py”, line 234, in fit_generator
    workers=0)
    File “C:\Python\lib\site-packages\keras\legacy\interfaces.py”, line 91, in wrapper
    return func(*args, **kwargs)
    File “C:\Python\lib\site-packages\keras\engine\training.py”, line 1472, in evaluate_generator
    verbose=verbose)
    File “C:\Python\lib\site-packages\keras\engine\training_generator.py”, line 330, in evaluate_generator
    generator_output = next(output_generator)
    File “C:\Python\lib\site-packages\mask_rcnn-2.1-py3.6.egg\mrcnn\model.py”, line 1810, in data_generator
    UnboundLocalError: local variable ‘image_id’ referenced before assignment
    
    Reply
    - Jason Brownlee September 18, 2020 at 6:44 am #
      
      Sorry to hear that, I have not seen this error.
      
      Do your Keras and TF versions match the expected versions listed at the top of the tutorial?
      Did you copy all of the code?
      Are you running from the command line?
      
      Reply
    - Fabian January 4, 2021 at 8:14 am #
      
      @Dan I have the same error, did you manage to get rid of it?
      
      Reply
      - Mauricio June 29, 2021 at 6:46 am #
        
        Same thing, did you ever fix it Fabian?
Laura September 17, 2020 at 3:31 am #

Hi Jason!

Similar to the last question, I want to train on a new dataset in which the objects that I want to detect are not even closely related to the objects the pre-trained model has seen. Is this actually possible or do we have to have images that are similar to the pre-trained model? I guess the root of my question is – how flexible is transfer learning? Can I really take a pretrained model trained on kangaroos and get it to learn to detect random shapes in a new image?

Can a MASK RCNN detect overlapping objects?

Reply
- Jason Brownlee September 17, 2020 at 6:51 am #
  
  It is critical to train on data close to what you want to make predictions on in the future.
  
  Reply
Richie September 22, 2020 at 8:53 am #

Hello and thank you for such a great tutorial! I am stuck on a certain part though. I can’t seem to find the mask_rcnn_kangaroo_cfg files that are supposed to be generated. It’s supposed to be saved in your working directory right? Is there another place the .h5 files could be saved?

Reply
- Jason Brownlee September 22, 2020 at 1:35 pm #
  
  The mask_rcnn_kangaroo_cfg are model files created after the code is run.
  
  They are in a subdirectory that is in same directory as the code file, when running the code from the command line:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-run-a-script-from-the-command-line
  
  Reply
Manikanteswar September 25, 2020 at 8:34 pm #

Great Tutorial Jason.

I understood your code, but i have one doubt .

You didn’t include any background images in training (i mean image without kangaroo).

So, how to include background images in training because background images don’t have any masks right?

So please tell me how to do this.

Thanks,
Manikanteswar.

Reply
- Jason Brownlee September 26, 2020 at 6:18 am #
  
  I don’t think it is needed.
  
  But perhaps you can provide images without kangaroos and see if the API/model accepts them.
  
  Reply
chadi September 30, 2020 at 4:59 pm #

Hi Jason,

thanks for this great tutorial. how to have the class_name i.e ‘kangaroo’ displayed on the picture? more importantly, how to extract it and save it in some list ?

many thanks

Reply
- Jason Brownlee October 1, 2020 at 6:24 am #
  
  You can use matplotlib to write text on images:
  https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.text.html
  
  You can save a list to file:
  https://machinelearningmastery.com/how-to-save-a-numpy-array-to-file-for-machine-learning/
  
  Reply
Shriram October 6, 2020 at 5:20 am #

Hey Jason,

Lets just say you saved my life.

Reply
- Jason Brownlee October 6, 2020 at 7:00 am #
  
  Happy to hear that.
  
  Reply
Jan Beneš October 12, 2020 at 8:27 am #

Hi, is there a way for me to see validation/training accuracy in each epoch? Will model.history.keys() even show something? Is there a way? Thank you.

Reply
- Jason Brownlee October 12, 2020 at 9:16 am #
  
  Good question, the performance is reported on the command line.
  
  Perhaps check the API for the train() function to see if it returns a history object.
  
  Reply
  - Jan Beneš October 14, 2020 at 11:52 pm #
    
    Thank you, found that can use Tensorboard for the graphic view. I have another question/problem. With my custom dataset around 1000 images and128x128 pixels, and I somehow manage to run out of memory, is there a fix for that?
    
    Reply
    - Jan Beneš October 15, 2020 at 5:49 am #
      
      Well, managed to fix that by, reducing learning rate to 0.00001 and steps per epoch to 50.
      
      Reply
      - Jason Brownlee October 15, 2020 at 6:20 am #
        
        Well done!
    - Jason Brownlee October 15, 2020 at 6:10 am #
      
      Use a machine with more memory, like AWS EC2.
      Use a smaller dataset.
      Use smaller images.
      
      Reply
Edward October 14, 2020 at 9:37 pm #

Hi Jason,

Thank you for this great tutorial. Would you have a similar tutorial using YOLO for Keras instead of R-CNN for Keras?

Thank you very much.

Reply
- Jason Brownlee October 15, 2020 at 6:09 am #
  
  Not at this stage.
  
  Reply
Anand October 16, 2020 at 10:05 pm #

Is Mask-RCNN better than yoloV3? I’m trying to build a model which could predict stamps in the given bank forms.

Reply
- Jason Brownlee October 17, 2020 at 6:03 am #
  
  I believe it is. It might a good idea to test a suite of models and discover what works best for your specific dataset.
  
  Reply
Jan Beneš October 19, 2020 at 11:21 pm #

Hi, is it possible to predict in real time? Or would it be possible to get each image shown right after it predicts, so I dont have to wait for the whole batch to finish?

Thanks.

Reply
- Jan Beneš October 20, 2020 at 12:05 am #
  
  Ohh, sorry for asking too much questions. What if I wanted to just predict the picture and save it with the bounding box filled?
  
  Reply
  - Jason Brownlee October 20, 2020 at 6:25 am #
    
    Sure, you can save anything you like.
    
    Reply
- Jason Brownlee October 20, 2020 at 6:25 am #
  
  Yes, you can call predict() with one image in real time.
  
  Reply
Nassif October 21, 2020 at 7:08 am #

For people facing memory problems when running the code in the training part, add IMAGES_PER_GPU = 1 in the “#define a configuration for the model” section.

Thanks for the tutorial really helpful

Reply
- Jason Brownlee October 21, 2020 at 7:50 am #
  
  Thanks for sharing!
  
  Reply
amin November 1, 2020 at 3:55 pm #

Hello my good friends
I want to diagnose a car (car brand) through yolo.
Thank you for your help.

09174286232 WhatsApp
asadi.amin.ai@gmail.com

Reply
Nils November 3, 2020 at 1:49 am #

Hello Jason,

Based on this link “mAP (mean Average Precision) for Object Detection, 2018” i cannot really figure out what kind of method is used to calculate the mAP and where i can find it. Is the Pascal Voc used or MS coco. If MS coco is used the interpolation of 101 points is ment by it right? Where could i find it myself next time?

Thank you for your great work,

Nils

Reply
- Jason Brownlee November 3, 2020 at 6:55 am #
  
  mAP is calculated in the rcnn library.
  
  Reply
Alma November 3, 2020 at 6:10 pm #

Thanks for the great writeup. Was able to successfully implement this. Question:

If training on new images—I assume we have to come up with an xml file to classify “where in an image an object is”. What is the best way to generate that file?

Also—I did this in tensorflow 2.0. Must be an update. That said, I used your recommended keras. Perhaps you want it add this information to your article.

Reply
- Jason Brownlee November 4, 2020 at 6:37 am #
  
  Thanks for the suggestion.
  
  Reply
Alma November 20, 2020 at 9:46 am #

Is there an easy way to convert your programs output .h5 file to a .pb file for TensorRT use?

Reply
- Jason Brownlee November 20, 2020 at 1:05 pm #
  
  I don’t know about those formatas, sorry.
  
  Reply
Saad Khan December 6, 2020 at 3:01 am #

Currently, I’m getting 0.0 train and test mAP accuracy. What could potentially be the issue?

Reply
- Jason Brownlee December 6, 2020 at 7:08 am #
  
  Sorry to hear that, some of these tips may help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
darkrider97 December 27, 2020 at 2:37 am #

Can you be a bit more clear about why mold_image is required ?
I can see in source code that mold_image does normalization of the image, but I haven’t seen the same normalization done for Kangaroo Dataset which is used for training the model.
So, why are we doing normalization while predicting ?

Reply
- Jason Brownlee December 27, 2020 at 5:03 am #
  
  If I recall, it is because the data prep is performed automatically when training the model, and when predicting/evaluating we are loading new data and must perform data prep manually.
  
  Reply
Sohini Mallick February 9, 2021 at 10:41 pm #

Traceback (most recent call last):
File “C:\Users\User\anaconda3-38\lib\site-packages\IPython\core\interactiveshell.py”, line 3418, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File “”, line 8, in
train_mAP = evaluate_model(train_set, model, cfg)
File “”, line 60, in evaluate_model
AP, _, _, _ = mrcnn.utils.compute_ap(gt_bbox, gt_class_id, gt_mask, r[“rois”], r[“class_ids”], r[“scores”], r[‘masks’])
File “C:\Users\User\PycharmProjects\Test2\mrcnn\utils.py”, line 727, in compute_ap
gt_match, pred_match, overlaps = compute_matches(
File “C:\Users\User\PycharmProjects\Test2\mrcnn\utils.py”, line 682, in compute_matches
overlaps = compute_overlaps_masks(pred_masks, gt_masks)
File “C:\Users\User\PycharmProjects\Test2\mrcnn\utils.py”, line 115, in compute_overlaps_masks
intersections = np.dot(masks1.T, masks2)
File “”, line 5, in dot
ValueError: shapes (2,1048576) and (3136,2) not aligned: 1048576 (dim 1) != 3136 (dim 0)

Is there a solution to this error?

Reply
- Jason Brownlee February 10, 2021 at 8:08 am #
  
  Sorry to hear that you’re having trouble, these tips may help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
- Sofia December 15, 2021 at 10:34 pm #
  
  Hi! I am having this error too using tf 2.0. Do you have any updates?
  
  thanks
  
  Reply
  - Adrian Tam December 17, 2021 at 7:09 am #
    
    What is your error message?
    
    Reply
Jan Beneš March 7, 2021 at 5:34 am #

Hello, thanks for the tutorial, I managed to run realtime recognition through opencv.
But I have a problem and I am hoping I can get it resolved.
When I call model.predict() it takes 0.54 seconds until it finishes, that is very very slow like 2 frames pre second, how can I speed it up?

Thank you.

Reply
- Jason Brownlee March 7, 2021 at 9:30 am #
  
  You’re welcome!
  
  Perhaps some of these suggestions will help:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-speed-up-the-training-of-my-model
  
  Reply
  - Jan Beneš March 7, 2021 at 10:34 am #
    
    Thanks, but I dont have problem with training, but the prediction rate is too slow, I would like to get it atleast to 0.1s per image. Is there a way to reduce the time it takes to compute where the bnd box is? Thank you.
    
    Reply
    - Jason Brownlee March 8, 2021 at 4:37 am #
      
      I understand, the suggestions in the link may help, e.g. run on faster machine, find alternate implementation, etc…
      
      Reply
Neha March 24, 2021 at 11:52 pm #

Hi Jason, Did you get a chance to write on image annotation?
Though I am aware of a couple of tools like makesense.ai but they all work on a single image at a time. This is cumbersome when there are thousands of images.

Are you aware of any platform where we upload a list of images and their corresponding labels to generate annotations for all in one go?

Reply
- Jason Brownlee March 25, 2021 at 4:45 am #
  
  No, sorry. I have not taken a close look at image annotation.
  
  Reply
Noushin April 1, 2021 at 9:32 pm #

A question is bothering me for a while and that is about the limitation of Neural Networks in general ( CNNs, RNNs, or other structures) for detecting small objects. I know small object detection is itself a challenging topic. However, is there any limitation in the size of the object that these models can detect as the smallest kernel size can be used is 3 by 3? Please correct me if I am looking at this issue from the wrong point of view (relating kernel size and object size).

Reply
- Jason Brownlee April 2, 2021 at 5:38 am #
  
  Yes, really small or really large objects can be missed and may require specalized handling of the data or custom models that can operate at multiple scales in parallel.
  
  Reply
  - Noushin April 13, 2021 at 11:16 pm #
    
    thanks a lot, very straight and clear
    
    Reply
    - Jason Brownlee April 14, 2021 at 6:26 am #
      
      You’re welcome.
      
      Reply
Neha April 6, 2021 at 6:58 pm #

Hi Jason,

In the code above, at the time of model evaluation or running prediction on a single image, function: mold_image(..) is used to perform pixel centering, This step wasn’t explicit in model training. Is it that this step is taken care off by MaskRCNN model training behind the scenes?

Thanks in advance!!!

Reply
- Jason Brownlee April 7, 2021 at 5:08 am #
  
  Yes.
  
  Reply
  - Neha April 8, 2021 at 5:18 pm #
    
    Thanks Jason for a quick response, I have a follow-up question.
    My dataset is images of emergency and non-emergency vehicles. After model training and evaluation, when running it on test set images, model couldn’t detect vehicle in one image. And when I commented the step of mold_image(…), it could successfully detect the vehicle.
    
    So, is it right to say that pre-processing step – centering of image should not be done on this dataset? If so, how do I turn that off during model training.
    
    Reply
    - Jason Brownlee April 9, 2021 at 5:20 am #
      
      You’re welcome.
      
      Interesting. If the pre-processing was used during training, it should be used on new data.
      
      Perhaps confirm it was applied during training.
      Perhaps confirm any other assumptions.
      
      Reply
Amb April 8, 2021 at 11:04 pm #

Hi Jason,

Thanks for your guide!

How would I build a data set of the images that aren’t flagged as difficult?

Thanks

Reply
- Jason Brownlee April 9, 2021 at 5:25 am #
  
  Perhaps exclude all images from train and test that cannot be predicted with a simple model?
  
  But why?
  
  Reply
amb April 9, 2021 at 12:33 am #

actually I’ve just worked it out

wooo!

Reply
- Jason Brownlee April 9, 2021 at 5:26 am #
  
  Well done!
  
  Reply
James Chang April 23, 2021 at 7:18 pm #

According to https://www.tensorflow.org/hub/tutorials/object_detection,

The coordinate of the bounding boxes should be in the form of [ymin, xmin, ymax, xmax], which is different from yours. I am a bit confused.

Reply
- Jason Brownlee April 24, 2021 at 5:18 am #
  
  Perhaps it is a different API.
  
  Reply
Aishwarya G April 28, 2021 at 8:52 pm #

Hello Jason,

Greetings for the day!

While training the model I am receiving the the following errors:

1. File “/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/util/module_wrapper.py”, line 193, in __getattr__
attr = getattr(self._tfmw_wrapped_module, name)
AttributeError: module ‘tensorflow’ has no attribute ‘name_scope’

2. File “/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/keras/applications/__init__.py”, line 22, in
import keras_applications
ModuleNotFoundError: No module named ‘keras_applications’

Could please help me to solve these errors.

Thank you!!

Reply
- Jason Brownlee April 29, 2021 at 6:25 am #
  
  Sorry to hear that, these suggestions may help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Johan May 30, 2021 at 5:03 am #

AttributeError Traceback (most recent call last)
in ()
6 from mrcnn.utils import Dataset
7 from mrcnn.config import Config
—-> 8 from mrcnn.model import MaskRCNN
9
10

/content/Mask-Rcnn/mrcnn/model.py in ()
253
254
–> 255 class ProposalLayer(KE.Layer):
256 “””Receives anchor scores and selects a subset to pass as proposals
257 to the second stage. Filtering is done based on anchor scores and

AttributeError: module ‘keras.engine’ has no attribute ‘Layer’

I am getting this error can you help pls

Reply
- Jason Brownlee May 30, 2021 at 5:52 am #
  
  Perhaps ensure you are using the version of tensorfow and keras described at the top of the tutorial.
  
  Reply
Radhika June 3, 2021 at 10:21 pm #

Hello Jason ,
Thank you for this comprehensive tutorial !

Issues encountered :
(1)AttributeError: module ‘keras.engine’ has no attribute ‘Layer’ : This issue (also reported by somebedy else in the comments section ) gets resolved (as you have suggested above in another comment)after installing the tensor flow version and keras version that you have mentioned

However , after having resolved the above issue, there is another error :

AttributeError: module ‘tensorflow’ has no attribute ‘name_scope’

Any suggestions on resolving this error please ?

Reply
- Jason Brownlee June 4, 2021 at 6:53 am #
  
  You’re welcome.
  
  Perhaps ensure you are using the version of keras and tensorflow listed above.
  
  Reply
Radhika June 3, 2021 at 11:28 pm #

please ignore/delete comment , issue resolved

Reply
- Jason Brownlee June 4, 2021 at 6:54 am #
  
  I’m happy to hear that.
  
  Reply
Steve June 8, 2021 at 6:02 pm #

Hey Jason,

very nice work.

I have a question for the output.
We get a single class output with a confidence score for this class.

Is it possible to get a class vector for each box?
Example: 2 Classes (Dog, Cat)
Box[x, y , width, height, class:[0.3, 0.7]
So is it possible to say this box is 30% Dog and 70% Cat or something like that?

Even better would be that each class could be 0%-100% for itself. So 65%/100% its a dog and 80%/100% its a cat.

I want decide myself wich class it should take.

Reply
- Jason Brownlee June 9, 2021 at 5:40 am #
  
  Thanks!
  
  Yes, I believe it provides a box for each item discovered in the image and probabilities for all known classes.
  
  Reply
  - Steve June 9, 2021 at 4:57 pm #
    
    Hey Jason,
    
    unluckily it doesnt.
    I just get one class for each box and not a multiclass vector.
    
    My problem is that i don’t find any chance to customize the code for output a vector…
    
    If you believe, how can I get this vector…
    
    Thanks for your answer. 🙂
    It’s nice that you’re still helping people.
    
    Reply
    - Jason Brownlee June 10, 2021 at 5:23 am #
      
      I believe it does, see the kangaroo example where two “objects” are found in one image.
      
      Reply
      - Steve June 18, 2021 at 4:42 pm #
        
        I understand, that I can detect multiple objects in one image.
        But i need a multiclass “vector” for one object.
        
        I need: “This object is 60% a dog and 40% a cat.” for example.
      - Jason Brownlee June 19, 2021 at 5:48 am #
        
        Yes, a given prediction gives a vector of probabilities across all known classes I believe. You can sorry by probability and report the top 5. I think I have an example of this for pre-trained image classification models on the blog.
Angel0 June 24, 2021 at 8:04 pm #

Ciao a tutti, sono alle prime armi con il deep learning…ho addestrato un modello tramite una CNN con Keras ed ho salvato il modello.h5 – Da questo come posso fare object detection per rilevare gli oggetti nelle immagini ? Chi mi aiuta ad eseguire questo prox step? GRAZIE mille

Reply
- Jason Brownlee June 25, 2021 at 6:13 am #
  
  You can load the model and call model.predict() with an input image to perform object detection.
  
  Perhaps this will help:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  Reply
Simeon July 17, 2021 at 5:50 am #

Hello Jason,
I have a different dataset from this. It is a CSV annotation file and it has more than one class in the dataset such as person, car, cat, etc. The bounding box coordinates are in x_min, y_min, x_max, y_max format, where x_min, y_min is the top-left coordinate whereas x_max, y_max is the bottom right coordinate. The class names are text files and I need to change them into integer representation, I think. I have seen in some datasets that they arranged based on classes, but in this dataset, all the images are in one folder. I want to parse the CSV file and preprocess it before loading it to the object detection model. Each image contains more than one object. How can I parse my dataset? I used the pandas read_csv () file function, but I ended up with an error saying that the length of the input image and the images in the annotation file are not equal. This is because the image names are repeated for each bounding box in the annotation.
I really appreciate your suggestion and help
Simeon.

Reply
- Jason Brownlee July 18, 2021 at 5:19 am #
  
  You may have to write some custom code to load your dataset.
  
  Reply
Simeon July 17, 2021 at 6:04 am #

Correction: The class names are categorical data. Sorry, I wrote it as text data.

Reply
Nghia Nguyen July 22, 2021 at 12:25 am #

I have reused above code, but I got below error. Is there any one got the same issue and how to solve it ? Thanks.

Traceback (most recent call last):
File “.\kangaroo_detection2.py”, line 124, in
model = MaskRCNN(mode=’training’, model_dir=’./’, config=config)
File “C:\Users\HP\Documents\000_DATA\100_TECH\100_CODE\100_TRAIN\300_ComputerVision\envWinComputerVision_python37\lib\site-packages\mask_rcnn-2.1-py3.7.egg\mrcnn\model.py”, line 1849, in __init__
File “C:\Users\HP\Documents\000_DATA\100_TECH\100_CODE\100_TRAIN\300_ComputerVision\envWinComputerVision_python37\lib\site-packages\mask_rcnn-2.1-py3.7.egg\mrcnn\model.py”, line 1978, in build
File “C:\Users\HP\Documents\000_DATA\100_TECH\100_CODE\100_TRAIN\300_ComputerVision\envWinComputerVision_python37\lib\site-packages\keras\engine\base_layer.py”, line 457, in __call__
output = self.call(inputs, **kwargs)
File “C:\Users\HP\Documents\000_DATA\100_TECH\100_CODE\100_TRAIN\300_ComputerVision\envWinComputerVision_python37\lib\site-packages\mask_rcnn-2.1-py3.7.egg\mrcnn\model.py”, line 323, in call
File “C:\Users\HP\Documents\000_DATA\100_TECH\100_CODE\100_TRAIN\300_ComputerVision\envWinComputerVision_python37\lib\site-packages\mask_rcnn-2.1-py3.7.egg\mrcnn\utils.py”, line 820, in batch_slice
File “C:\Users\HP\Documents\000_DATA\100_TECH\100_CODE\100_TRAIN\300_ComputerVision\envWinComputerVision_python37\lib\site-packages\mask_rcnn-2.1-py3.7.egg\mrcnn\model.py”, line 321, in
File “C:\Users\HP\Documents\000_DATA\100_TECH\100_CODE\100_TRAIN\300_ComputerVision\envWinComputerVision_python37\lib\site-packages\mask_rcnn-2.1-py3.7.egg\mrcnn\model.py”, line 263, in clip_boxes_graph
File “C:\Users\HP\Documents\000_DATA\100_TECH\100_CODE\100_TRAIN\300_ComputerVision\envWinComputerVision_python37\lib\site-packages\tensorflow_core\python\framework\ops.py”, line 645, in set_shape
raise ValueError(str(e))
ValueError: Shapes must be equal rank, but are 3 and 2

Reply
- Jason Brownlee July 22, 2021 at 5:36 am #
  
  Sorry to hear that, perhaps some of these tips will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Minkle July 26, 2021 at 8:12 am #

Hi! Firstly, thank you so much for this guide! It has been insanely helpful and I really appreciate it!

I was wondering if I could get your help on something. I’m currently trying to train the RCNN to detect insects in my backyard and the network is picking up other things like chairs, vases, and people and classifying it as insect. I believe this is from the base network that has 80 objects trained.

Is there anyway I can separate these 80 objects from the network and prevent it from detecting other things and only detect the new insect classes I want?

Thank you!

Reply
- Jason Brownlee July 27, 2021 at 5:04 am #
  
  Perhaps you can write some code to interpret the prediction from the model and only report relevant objects to the user.
  
  Reply
  - Minkle July 27, 2021 at 5:25 pm #
    
    Thank you!
    
    Reply
    - Jason Brownlee July 28, 2021 at 5:24 am #
      
      You’re welcome!
      
      Reply
Fancy August 12, 2021 at 6:53 am #

Hi Jason,
I love this tutorial, very detailed explanation.
I’m using Keras 2.2.4 , TF 1.15, and h5py 3.3.0, My problem is that I’m stuck at ‘model.load_weights’, error message says:

ImportError: dlopen(/Users/fancy/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/h5py/defs.cpython-37m-darwin.so, 2): Symbol not found: _H5Pget_fapl_ros3
Referenced from: /Users/fancy/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/h5py/defs.cpython-37m-darwin.so
Expected in: /Users/fancy/opt/anaconda3/envs/myenv/lib/libhdf5.103.dylib
in /Users/fancy/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/h5py/defs.cpython-37m-darwin.so

I searched a lot but can’t figure out the solution, could you help?

Thank you so much!

Reply
- Adrian Tam August 12, 2021 at 7:34 am #
  
  Error like this are due to library installation. May be you have some conflicting mix of libraries? Try uninstall h5py and reinstall. That may help.
  
  Reply
  - Fatma Mazen September 9, 2021 at 1:09 am #
    
    This worked for me
    !pip install ‘h5py==2.10.0’ –force-reinstall
    
    Reply
Fatma Mazen September 9, 2021 at 1:14 am #

Thank you for this informative tutorial
I have a question about image size
I am having this error
IndexError: index 2048 is out of bounds for axis 0 with size 2048
because my image height is 2048
What to do with large-size images?
Thanks in advance.

Reply
- Adrian Tam September 9, 2021 at 4:43 am #
  
  There can be a lot of things to do with large images but your error seems to be accessing outside of the array. It sounds to me like some coding mistake more than anything else.
  
  Reply
  - Fatma Mazen September 15, 2021 at 5:52 pm #
    
    Thank you for your reply
    I have already annotated the large size images using VIA tool
    the largest image is 2322*4096 and the smallest one is 720 *1280
    Should the image resize mode be “square” with max_size=1024 and min_size=800 as default values?
    Or I should modify them according to my dataset image size?
    Can you kindly tell me what should be the optimum value for max_size and min_size?
    
    Reply
    - Adrian Tam September 16, 2021 at 12:50 am #
      
      I would refer to rescale the image rather than modify the model. The reason is that, modifying the model means retraining, which is very time consuming.
      
      Reply
      - Fatma Mazen September 16, 2021 at 3:05 am #
        
        I have already annotated the dataset which was a time-consuming task. I think I will have to re annotate them for the new resized images
        I have three questions now:
        1.Should I set image_resize_Mode to “square” or “crop”?
        2.Do you think that setting max_size and min_size to larger values like 2048 or 4096 rather than 1024 will give better results?
        3.Are there any parameters needs to be modified if I change min_size and max_size?
        Thanks in advance
      - Adrian Tam September 16, 2021 at 11:49 pm #
        
        I don’t think re-annotate is necessary. There should be a tool to resize/crop image together with the annotation. For square or crop, I would prefer whatever to keep the aspect ratio. And for the size, I would prefer to make it as small as possible while you can still identify the object. You shouldn’t be greedy here, but rather, keep the minimum information for the model so it will not learn from the noise and converge faster.
Fatma Mazen September 17, 2021 at 7:58 am #

Thank you for your reply
I would be grateful for you if you tell me tool name to crop the image with annotation
I have used vgg image annotator (VIA)
Thanks in advance

Reply
- Adrian Tam September 19, 2021 at 6:08 am #
  
  That should not be difficult to write your own program to do the cropping. You may also see if this is something helpful for you: https://github.com/italojs/resize_dataset_pascalvoc
  
  Reply
Erfan Hatefi September 25, 2021 at 5:09 am #

Hi Jason
TNX for sharing such a great article!!
Well described part by part of it

Personally, I faced some debugging errors.
I’m using colab. Installed the Tensroflow and Keras with the specific version mentioned in the begining of the article.
First, after running the cell(in which the trainng starts), I got the Error
ModuleNotFoundError: No module named ‘keras_applications’
in which is described in one of above comments(Back to April 28th).

By a little search, I supposed that all of it is because of h5py version. So, I tried installing h5py v2.8.0. However, the funny thing is I got a new completely different error which is:(The last traceback)

Traceback (most recent call last):
File “/usr/local/lib/python3.7/dist-packages/IPython/core/ultratb.py”, line 1132, in get_records
return _fixed_getinnerframes(etb, number_of_lines_of_context, tb_offset)
File “/usr/local/lib/python3.7/dist-packages/IPython/core/ultratb.py”, line 313, in wrapped
return f(*args, **kwargs)
File “/usr/local/lib/python3.7/dist-packages/IPython/core/ultratb.py”, line 358, in _fixed_getinnerframes
records = fix_frame_records_filenames(inspect.getinnerframes(etb, context))
File “/usr/lib/python3.7/inspect.py”, line 1502, in getinnerframes
frameinfo = (tb.tb_frame,) + getframeinfo(tb, context)
File “/usr/lib/python3.7/inspect.py”, line 1460, in getframeinfo
filename = getsourcefile(frame) or getfile(frame)
File “/usr/lib/python3.7/inspect.py”, line 696, in getsourcefile
if getattr(getmodule(object, filename), ‘__loader__’, None) is not None:
File “/usr/lib/python3.7/inspect.py”, line 733, in getmodule
if ismodule(module) and hasattr(module, ‘__file__’):
File “/usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py”, line 50, in __getattr__
module = self._load()
File “/usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py”, line 44, in _load
module = _importlib.import_module(self.__name__)
File “/usr/lib/python3.7/importlib/__init__.py”, line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 1006, in _gcd_import
File “”, line 983, in _find_and_load
File “”, line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named ‘tensorflow_core.estimator’

Have you got any ideas for dealing with this?
I appreciate your response

Reply
- Adrian Tam September 27, 2021 at 10:23 am #
  
  I believe that is more like tensorflow 1.x vs 2.x issue. There’s a lot of change in this major version upgrade and breaks a lot of old code.
  
  Reply
Sultan October 2, 2021 at 5:10 pm #

Got an error in /Mask_RCNN/mrcnn/model.py”, line 20,

ImportError: cannot import name ‘get_config’ from ‘tensorflow.python.eager.context’.
Tensorflow versions issues as always

Reply
Josh Blumer October 25, 2021 at 6:46 pm #

Hey Jason, thanks so much for all the great tutorials. I’m having an issue with an “AttributeError: ‘str’ object has no attribute ‘decode'” error when trying to execute the “model.load_weights” block. The error line reads “original_keras_version = f.attrs[‘keras_version’].decode(‘utf8’). Google suggests dropping the “.decode(‘utf8) because it’s no longer necessary after python 3 but that’s not possible due to it being source code. I’m using python 3.6 and force installed tensorflow 1.15.3 and keras 2.2.4 as you directed at the beginning of the tutorial. Any advice is greatly appreciated, thank you.

Reply
- Sofia December 7, 2021 at 10:45 am #
  
  Hi Josh,
  I have the same problem as you. did you manage to fix it?
  
  thank you
  
  Reply
Adrian Tam December 8, 2021 at 7:51 am #

Try to use a newer Tensorflow (e.g., 2.x) which I believe it has better support of Python 3

Reply
- Chiedozie February 7, 2022 at 12:35 am #
  
  Hi Sofia and Josh,
  
  Did any of you successfully fix this issue? Please do let me know.
  
  Reply
Eric Yi December 16, 2021 at 6:43 am #

Hi James I am facing an issue when trying out the codes.
When using functions like .image_reference(), and .load_image() where inside the functions it will call the self.image_info[image_id] function

I will get this error:

TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15376/2053193333.py in
1 # load an image
2 image_id = ’30_days_01.jpg’
—-> 3 dataset_train.image_reference(image_id)
4
5 # image = dataset_train.load_image(image_id)

~\AppData\Local\Temp/ipykernel_15376/2509526515.py in image_reference(self, image_id)
116 def image_reference(self, image_id):
117 “””Return the path of the image.”””
–> 118 info = self.image_info[image_id]
119 if info[“source”] == “object”:
120 return info[“path”]

TypeError: list indices must be integers or slices, not str

the keras version im using is 2.2.5 and tensorflow version is 1.13.1

I did tried out with the verions mentiond in your article but I’m still having the same error too.

Hope to hear from you soon
Thank you

Reply
- Adrian Tam December 17, 2021 at 7:18 am #
  
  The error message tells it all – you need image_id to be an integer to make it work.
  
  Reply
Eric Yi January 14, 2022 at 1:24 pm #

Thank Adrian,
It is just as what you mentioned. 😀

Reply
John February 2, 2022 at 6:33 am #

Hello, great tutorial! I ran the 1st part of the code you have at the beginning of the tutorial. How can I modify this line of code: “display_instances(image, bbox, mask, class_ids, train_set.class_names)” in order to print the image in original dimensions? Because I use very big UAV images and the squares seem very small.
Great work!!!

Reply
John February 2, 2022 at 11:27 pm #

One of the greatest tutorial on the internet!! Very understandable!!! What modifications should I do in the above code to make it train and work with my custom dataset? Once again, THANKS for the great tutorial and the information…You are awesome!

Reply
Jeffrey March 29, 2022 at 11:32 am #

Thanks for providing this article… it might be a start for me. I just got started into object detection and I’m working on a project that detect bank cheques from images. Can I use this procedures in training with my datasets for my project.
I would be very grateful for response
Best regards

Reply
- James Carmichael March 30, 2022 at 3:50 am #
  
  Hi Jeffrey…Yes, but understand that all code and material on my site and in my books was developed and provided for educational purposes only.
  
  I take no responsibility for the code, what it might do, or how you might use it.
  
  If you use my code or material in your own project, please reference the source, including:
  
  The Name of the author, e.g. “Jason Brownlee”.
  The Title of the tutorial or book.
  The Name of the website, e.g. “Machine Learning Mastery”.
  The URL of the tutorial or book.
  The Date you accessed or copied the code.
  For example:
  
  Jason Brownlee, Machine Learning Algorithms in Python, Machine Learning Mastery, Available from https://machinelearningmastery.com/machine-learning-with-python/, accessed April 15th, 2018.
  Also, if your work is public, contact me, I’d love to see it out of general interest.
  
  Reply
Kiran Agashe April 16, 2022 at 1:31 pm #

Hi Jason,
Thanks for the great tutorial, with excellent explaination.
While using example, I am facing following issue (mentioned above by others too).

model.load_weights(“mask_rcnn_coco.h5”, by_name=True, exclude=[“mrcnn_class_logits”, “mrcnn_bbox_fc”, “mrcnn_bbox”, “mrcnn_mask”])

Traceback (most recent call last):
File “/home/kiran/cds2_cp_team5/mrcnn/COCO_creator/CioccaDataset.py”, line 324, in
model.load_weights(str, by_name=True, exclude=[“mrcnn_class_logits”, “mrcnn_bbox_fc”, “mrcnn_bbox”, “mrcnn_mask”])
File “/home/kiran/.pyenv/versions/3.7.13/lib/python3.7/site-packages/mask_rcnn-2.1-py3.7.egg/mrcnn/model.py”, line 2130, in load_weights
File “/home/kiran/.pyenv/versions/3.7.13/lib/python3.7/site-packages/keras/engine/saving.py”, line 1083, in load_weights_from_hdf5_group_by_name
original_keras_version = f.attrs[‘keras_version’].decode(‘utf8’)

I am using TF version 1.15.3 and Keras 2.2.4
Python version: 3.7.13

Can you please help figure out the issue?

Reply
- James Carmichael April 17, 2022 at 8:00 am #
  
  Hi Kiran…I would highly recommend that you run your code in Google Colab to determine if there could be versioning issues on your local machine.
  
  Reply
  - Kiran Agashe May 3, 2022 at 4:19 pm #
    
    Thanks James!
    Finally I got around version problems by
    1. using 2.x compatible version @ (https://github.com/ahmedfgad/Mask-RCNN-TF2.git)
    2. Following versions worked for me
    tensorflow==2.2.0
    keras==2.3.1
    
    Reply
rizwan October 26, 2022 at 11:36 pm #

This was lovely to read.

Reply
- James Carmichael October 27, 2022 at 7:39 am #
  
  Thank you for your support and feedback!
  
  Reply
R K November 28, 2022 at 12:43 am #

Hi Guys,
Thanks for the fantastic blog .
A quick question please : Lets suppose I’v trained another Data Set in exactly this way on this model and lets assume I’ve done everything right . Inspite of that ,If I am not getting good results , how should I deal with this ?

How can I tune this model(this particular matter port implementation) if need be ?Is there an option to tune it ?
Do I have the option of training more than just the top layers ?Do I have the option to change hyper params ?

If yes ,would I need to modify the Matter port source code for all of the above or is there anyway around this ?

Reply
- R K November 28, 2022 at 1:03 am #
  
  Please ignore /delete this comment , I missed the section where you have already mentioned in the blog that we can finetune more layers , looks like this is a configuration that is available . Thanks !
  
  Reply
Mahmoud March 16, 2023 at 3:08 am #

why running model on GPU return Nan values

Regards,

Reply
Hassan Said June 14, 2023 at 4:51 am #

If I have to detect defects in the images, but there are not defects in all images, do I need to train with images withou defects also?

Reply
- James Carmichael June 14, 2023 at 7:58 am #
  
  Hi Hassan…Best practices can be found here:
  
  https://medium.com/swlh/how-to-detect-defects-on-images-16d6cf3ddc1a
  
  https://aaron-hkheng.medium.com/defect-detection-using-image-recognition-9873236545b8
  
  Reply
Anil July 25, 2023 at 1:39 am #

Thanks for the nice tutorial.
When I execute the training on my GPU I see that some rpn_loss is nan , what could be the reason?

loss: nan – rpn_class_loss: nan – rpn_bbox_loss: nan – mrcnn_class_loss: 0.3515 – mrcnn_bbox_loss: 0.0023 – mrcnn_mask_loss: 0.0010 – val_loss: nan – val_rpn_class_loss: nan – val_rpn_bbox_loss: nan – val_mrcnn_class_loss: 0.1193 – val_mrcnn_bbox_loss: 0.0000e+00 – val_mrcnn_mask_loss: 0.0000e+00

and also I get no object detections for the image after the train on my custom dataset.

Reply
- James Carmichael July 25, 2023 at 8:37 am #
  
  Hi Anil…You are very welcome! We have not experienced that issue. Perhaps you could try an experiment with executing your code in Google Colab with the GPU option or AWS. Let us know what you find out!
  
  https://machinelearningmastery.com/google-colab-for-machine-learning-projects/
  
  https://machinelearningmastery.com/develop-evaluate-large-deep-learning-models-keras-amazon-web-services/
  
  Reply
Geoffrey Peart September 11, 2023 at 6:30 am #

Hi, I’m trying to get this up and running with my daughter for a science fair project, I will admit to having spent most of my development career in Java, so not needing to worry as much about hardware.

We are finding ourselves locked in loop, we are trying to get working Tensorflow and Keras versions (recommended above) that both work with this code, each other and the M1 chip. When I checkout Tensorflow GitHub, they only have a Mac x86 version of the older version of TensorFlow. Anyone tackled this one?

Also, just a shout out to the commenters and writers, this has been a really great tutorial and community, so thank you all!

Reply
- James Carmichael September 12, 2023 at 10:35 am #
  
  Hi Geoffrey…You may consider investigating Google Colab to get started to avoid complications of a local environment:
  
  https://machinelearningmastery.com/google-colab-for-machine-learning-projects/
  
  Reply
Tobi January 21, 2024 at 5:27 am #

Hey there,
is there a possibility to re-train/finetune a pretrained MR-CNN on a custom dataset?
Thanks in advance.

Reply
- James Carmichael January 21, 2024 at 10:27 am #
  
  Hi Tobi…Absolutely! The following resource may be of interest to you:
  
  https://pyimagesearch.com/2019/06/03/fine-tuning-with-keras-and-deep-learning/
  
  Reply
Tobias January 21, 2024 at 9:39 pm #

Thanks for your answer but as i see this resource only shows finetuning classification model(s) but not an object detector

Reply
- James Carmichael January 22, 2024 at 10:27 am #
  
  Hi Tobias…Here are some additional thoughts:
  
  Fine-tuning an object detector involves several key steps, and it’s a process used to adapt a pre-trained model to your specific task, improving its accuracy on your particular dataset. Here’s a step-by-step guide on how to fine-tune an object detection model:
  
  ### 1. Choose a Pre-trained Model
  Start with a pre-trained object detection model that has been trained on a large and general dataset like COCO, Pascal VOC, or ImageNet. Popular architectures include YOLO (You Only Look Once), SSD (Single Shot MultiDetector), and Faster R-CNN.
  
  ### 2. Collect and Prepare Your Dataset
  – **Collect a dataset** that is relevant to your specific task. Your dataset should include images that represent the kind of objects you want to detect.
  – **Annotate your images** by drawing bounding boxes around the objects of interest and labeling them. There are various annotation tools available, such as LabelImg or CVAT.
  – **Split your dataset** into training, validation, and test sets. A common split ratio is 70% for training, 15% for validation, and 15% for testing.
  
  ### 3. Configure the Model for Your Dataset
  – **Modify the model’s head** if necessary, to match the number of classes in your dataset. For instance, if you’re detecting three types of objects, the final layer should output three classes.
  – **Adjust the configuration settings** of the model, such as the learning rate, batch size, and the number of epochs. You might start with the configuration of the pre-trained model and adjust based on your dataset size and complexity.
  
  ### 4. Augment Your Data (Optional)
  Data augmentation involves artificially increasing the size and diversity of your training dataset by applying various transformations like flipping, scaling, cropping, and color variation. This can help improve the robustness of your model.
  
  ### 5. Fine-Tune the Model
  – **Load the pre-trained model** and modify it for your dataset.
  – **Freeze the early layers** of the model to retain learned features that are generally applicable to most visual tasks. Only train the latter layers that are more specific to the detection task.
  – **Train the model** on your dataset. Use the training set to train the model and the validation set to tune the hyperparameters and avoid overfitting.
  
  ### 6. Evaluate the Model
  – **Use the test set** to evaluate the model’s performance. Common metrics for object detection include Precision, Recall, and the mean Average Precision (mAP).
  – **Iterate** on your training process by adjusting model configurations, augmenting your data differently, or even collecting more data based on the performance on the test set.
  
  ### 7. Deploy the Model
  Once satisfied with the model’s performance, deploy it for real-world usage or further testing.
  
  ### Tools and Libraries
  You can use deep learning frameworks like TensorFlow (with its object detection API), PyTorch (with libraries like Detectron2 or Torchvision), or even higher-level APIs like Keras for fine-tuning object detection models.
  
  Fine-tuning is an iterative process. It might take several rounds of adjustment and training to get the desired accuracy and performance from your object detector.
  
  Reply

Navigation

How to Train an Object Detection Model with Keras

Tutorial Overview

How to Install Mask R-CNN for Keras

Want Results with Deep Learning for Computer Vision?

Step 1. Clone the Mask R-CNN GitHub Repository

Step 2. Install the Mask R-CNN Library

Step 3: Confirm the Library Was Installed

How to Prepare a Dataset for Object Detection

Install Dataset

Parse Annotation File

Develop KangarooDataset Object

Test KangarooDataset Object

How to Train Mask R-CNN Model for Kangaroo Detection

How to Evaluate a Mask R-CNN Model

How to Detect Kangaroos in New Photos

Further Reading

Papers

Projects

APIs

Articles

Summary

Develop Deep Learning Models for Vision Today!

Develop Your Own Vision Models in Minutes

Finally Bring Deep Learning to your Vision Projects

More On This Topic

666 Responses to How to Train an Object Detection Model with Keras

Leave a Reply Click here to cancel reply.