How to Train an Object Detection Model with Keras

Object detection is a challenging computer vision task that involves predicting both where the objects are in the image and what type of objects were detected.

The Mask Region-based Convolutional Neural Network, or Mask R-CNN, model is one of the state-of-the-art approaches for object recognition tasks. The Matterport Mask R-CNN project provides a library that allows you to develop and train Mask R-CNN Keras models for your own object detection tasks. Using the library can be tricky for beginners and requires the careful preparation of the dataset, although it allows fast training via transfer learning with top performing models trained on challenging object detection tasks, such as MS COCO.

In this tutorial, you will discover how to develop a Mask R-CNN model for kangaroo object detection in photographs.

After completing this tutorial, you will know:

  • How to prepare an object detection dataset ready for modeling with an R-CNN.
  • How to use transfer learning to train an object detection model on a new dataset.
  • How to evaluate a fit Mask R-CNN model on a test dataset and make predictions on new photos.

Kick-start your project with my new book Deep Learning for Computer Vision, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

How to Train an Object Detection Model to Find Kangaroos in Photographs (R-CNN with Keras)

How to Train an Object Detection Model to Find Kangaroos in Photographs (R-CNN with Keras)
Photo by Ronnie Robertson, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. How to Install Mask R-CNN for Keras
  2. How to Prepare a Dataset for Object Detection
  3. How to a Train Mask R-CNN Model for Kangaroo Detection
  4. How to Evaluate a Mask R-CNN Model
  5. How to Detect Kangaroos in New Photos

Note: This tutorial requires TensorFlow version 1.15.3 and Keras 2.2.4. It does not work with TensorFlow 2.0+ or Keras 2.2.5+ because a third-party library has not been updated at the time of writing.

You can install these specific versions of the libraries as follows:

How to Install Mask R-CNN for Keras

Object detection is a task in computer vision that involves identifying the presence, location, and type of one or more objects in a given image.

It is a challenging problem that involves building upon methods for object recognition (e.g. where are they), object localization (e.g. what are their extent), and object classification (e.g. what are they).

The Region-Based Convolutional Neural Network, or R-CNN, is a family of convolutional neural network models designed for object detection, developed by Ross Girshick, et al. There are perhaps four main variations of the approach, resulting in the current pinnacle called Mask R-CNN. The Mask R-CNN introduced in the 2018 paper titled “Mask R-CNN” is the most recent variation of the family of models and supports both object detection and object segmentation. Object segmentation not only involves localizing objects in the image but also specifies a mask for the image, indicating exactly which pixels in the image belong to the object.

Mask R-CNN is a sophisticated model to implement, especially as compared to a simple or even state-of-the-art deep convolutional neural network model. Instead of developing an implementation of the R-CNN or Mask R-CNN model from scratch, we can use a reliable third-party implementation built on top of the Keras deep learning framework.

The best-of-breed third-party implementations of Mask R-CNN is the Mask R-CNN Project developed by Matterport. The project is open source released under a permissive license (e.g. MIT license) and the code has been widely used on a variety of projects and Kaggle competitions.

The first step is to install the library.

At the time of writing, there is no distributed version of the library, so we have to install it manually. The good news is that this is very easy.

Installation involves cloning the GitHub repository and running the installation script on your workstation. If you are having trouble, see the installation instructions buried in the library’s readme file.

Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Step 1. Clone the Mask R-CNN GitHub Repository

This is as simple as running the following command from your command line:

This will create a new local directory with the name Mask_RCNN that looks as follows:

Step 2. Install the Mask R-CNN Library

The library can be installed directly via pip.

Change directory into the Mask_RCNN directory and run the installation script.

From the command line, type the following:

On Linux or MacOS, you may need to install the software with sudo permissions; for example, you may see an error such as:

In that case, install the software with sudo:

If you are using a Python virtual environment (virtualenv), such as on an EC2 Deep Learning AMI instance (recommended for this tutorial), you can install Mask_RCNN into your environment as follows:

The library will then install directly and you will see a lot of successful installation messages ending with the following:

This confirms that you installed the library successfully and that you have the latest version, which at the time of writing is version 2.1.

Step 3: Confirm the Library Was Installed

It is always a good idea to confirm that the library was installed correctly.

You can confirm that the library was installed correctly by querying it via the pip command; for example:

You should see output informing you of the version and installation location; for example:

We are now ready to use the library.

How to Prepare a Dataset for Object Detection

Next, we need a dataset to model.

In this tutorial, we will use the kangaroo dataset, made available by Huynh Ngoc Anh (experiencor). The dataset is comprised of 183 photographs that contain kangaroos, and XML annotation files that provide bounding boxes for the kangaroos in each photograph.

The Mask R-CNN is designed to learn to predict both bounding boxes for objects as well as masks for those detected objects, and the kangaroo dataset does not provide masks. As such, we will use the dataset to learn a kangaroo object detection task, and ignore the masks and not focus on the image segmentation capabilities of the model.

There are a few steps required in order to prepare this dataset for modeling and we will work through each in turn in this section, including downloading the dataset, parsing the annotations file, developing a KangarooDataset object that can be used by the Mask_RCNN library, then testing the dataset object to confirm that we are loading images and annotations correctly.

Install Dataset

The first step is to download the dataset into your current working directory.

This can be achieved by cloning the GitHub repository directly, as follows:

This will create a new directory called “kangaroo” with a subdirectory called ‘images/‘ that contains all of the JPEG photos of kangaroos and a subdirectory called ‘annotes/‘ that contains all of the XML files that describe the locations of kangaroos in each photo.

Looking in each subdirectory, you can see that the photos and annotation files use a consistent naming convention, with filenames using a 5-digit zero-padded numbering system; for example:

This makes matching photographs and annotation files together very easy.

We can also see that the numbering system is not contiguous, that there are some photos missing, e.g. there is no ‘00007‘ JPG or XML.

This means that we should focus on loading the list of actual files in the directory rather than using a numbering system.

Parse Annotation File

The next step is to figure out how to load the annotation files.

First, open the first annotation file (annots/00001.xml) and take a look; you should see:

We can see that the annotation file contains a “size” element that describes the shape of the photograph, and one or more “object” elements that describe the bounding boxes for the kangaroo objects in the photograph.

The size and the bounding boxes are the minimum information that we require from each annotation file. We could write some careful XML parsing code to process these annotation files, and that would be a good idea for a production system. Instead, we will short-cut development and use XPath queries to directly extract the data that we need from each file, e.g. a //size query to extract the size element and a //object or a //bndbox query to extract the bounding box elements.

Python provides the ElementTree API that can be used to load and parse an XML file and we can use the find() and findall() functions to perform the XPath queries on a loaded document.

First, the annotation file must be loaded and parsed as an ElementTree object.

Once loaded, we can retrieve the root element of the document from which we can perform our XPath queries.

We can use the findall() function with a query for ‘.//bndbox‘ to find all ‘bndbox‘ elements, then enumerate each to extract the x and y, min and max values that define each bounding box.

The element text can also be parsed to integer values.

We can then collect the definition of each bounding box into a list.

The dimensions of the image may also be helpful, which can be queried directly.

We can tie all of this together into a function that will take the annotation filename as an argument, extract the bounding box and image dimension details, and return them for use.

The extract_boxes() function below implements this behavior.

We can test out this function on our annotation files, for example, on the first annotation file in the directory.

The complete example is listed below.

Running the example returns a list that contains the details of each bounding box in the annotation file, as well as two integers for the width and height of the photograph.

Now that we know how to load the annotation file, we can look at using this functionality to develop a Dataset object.

Develop KangarooDataset Object

The mask-rcnn library requires that train, validation, and test datasets be managed by a mrcnn.utils.Dataset object.

This means that a new class must be defined that extends the mrcnn.utils.Dataset class and defines a function to load the dataset, with any name you like such as load_dataset(), and override two functions, one for loading a mask called load_mask() and one for loading an image reference (path or URL) called image_reference().

To use a Dataset object, it is instantiated, then your custom load function must be called, then finally the built-in prepare() function is called.

For example, we will create a new class called KangarooDataset that will be used as follows:

The custom load function, e.g. load_dataset() is responsible for both defining the classes and for defining the images in the dataset.

Classes are defined by calling the built-in add_class() function and specifying the ‘source‘ (the name of the dataset), the ‘class_id‘ or integer for the class (e.g. 1 for the first lass as 0 is reserved for the background class), and the ‘class_name‘ (e.g. ‘kangaroo‘).

Objects are defined by a call to the built-in add_image() function and specifying the ‘source‘ (the name of the dataset), a unique ‘image_id‘ (e.g. the filename without the file extension like ‘00001‘), and the path for where the image can be loaded (e.g. ‘kangaroo/images/00001.jpg‘).

This will define an “image info” dictionary for the image that can be retrieved later via the index or order in which the image was added to the dataset. You can also specify other arguments that will be added to the image info dictionary, such as an ‘annotation‘ to define the annotation path.

For example, we can implement a load_dataset() function that takes the path to the dataset directory and loads all images in the dataset.

Note, testing revealed that there is an issue with image number ‘00090‘, so we will exclude it from the dataset.

We can go one step further and add one more argument to the function to define whether the Dataset instance is for training or test/validation. We have about 160 photos, so we can use about 20%, or the last 32 photos, as a test or validation dataset and the first 131, or 80%, as the training dataset.

This division can be made using the integer in the filename, where all photos before photo number 150 will be train and equal or after 150 used for test. The updated load_dataset() with support for train and test datasets is provided below.

Next, we need to define the load_mask() function for loading the mask for a given ‘image_id‘.

In this case, the ‘image_id‘ is the integer index for an image in the dataset, assigned based on the order that the image was added via a call to add_image() when loading the dataset. The function must return an array of one or more masks for the photo associated with the image_id, and the classes for each mask.

We don’t have masks, but we do have bounding boxes. We can load the bounding boxes for a given photo and return them as masks. The library will then infer bounding boxes from our “masks” which will be the same size.

First, we must load the annotation file for the image_id. This involves first retrieving the ‘image info‘ dict for the image_id, then retrieving the annotations path that we stored for the image via our prior call to add_image(). We can then use the path in our call to extract_boxes() developed in the previous section to get the list of bounding boxes and the dimensions of the image.

We can now define a mask for each bounding box, and an associated class.

A mask is a two-dimensional array with the same dimensions as the photograph with all zero values where the object isn’t and all one values where the object is in the photograph.

We can achieve this by creating a NumPy array with all zero values for the known size of the image and one channel for each bounding box.

Each bounding box is defined as min and max, x and y coordinates of the box.

These can be used directly to define row and column ranges in the array that can then be marked as 1.

All objects have the same class in this dataset. We can retrieve the class index via the ‘class_names‘ dictionary, then add it to a list to be returned alongside the masks.

Tying this together, the complete load_mask() function is listed below.

Finally, we must implement the image_reference() function.

This function is responsible for returning the path or URL for a given ‘image_id‘, which we know is just the ‘path‘ property on the ‘image info‘ dict.

And that’s it. We have successfully defined a Dataset object for the mask-rcnn library for our Kangaroo dataset.

The complete listing of the class and creating a train and test dataset is provided below.

Running the example successfully loads and prepares the train and test dataset and prints the number of images in each.

Now that we have defined the dataset, let’s confirm that the images, masks, and bounding boxes are handled correctly.

Test KangarooDataset Object

The first useful test is to confirm that the images and masks can be loaded correctly.

We can test this by creating a dataset and loading an image via a call to the load_image() function with an image_id, then load the mask for the image via a call to the load_mask() function with the same image_id.

Next, we can plot the photograph using the Matplotlib API, then plot the first mask over the top with an alpha value so that the photograph underneath can still be seen

The complete example is listed below.

Running the example first prints the shape of the photograph and mask NumPy arrays.

We can confirm that both arrays have the same width and height and only differ in terms of the number of channels. We can also see that the first photograph (e.g. image_id=0) in this case only has one mask.

A plot of the photograph is also created with the first mask overlaid.

In this case, we can see that one kangaroo is present in the photo and that the mask correctly bounds the kangaroo.

Photograph of Kangaroo With Object Detection Mask Overlaid

Photograph of Kangaroo With Object Detection Mask Overlaid

We could repeat this for the first nine photos in the dataset, plotting each photo in one figure as a subplot and plotting all masks for each photo.

Running the example shows that photos are loaded correctly and that those photos with multiple objects correctly have separate masks defined.

Plot of First Nine Photos of Kangaroos in the Training Dataset With Object Detection Masks

Plot of First Nine Photos of Kangaroos in the Training Dataset With Object Detection Masks

Another useful debugging step might be to load all of the ‘image info‘ objects in the dataset and print them to the console.

This can help to confirm that all of the calls to the add_image() function in the load_dataset() function worked as expected.

Running this code on the loaded training dataset will then show all of the ‘image info‘ dictionaries, showing the paths and ids for each image in the dataset.

Finally, the mask-rcnn library provides utilities for displaying images and masks. We can use some of these built-in functions to confirm that the Dataset is operating correctly.

For example, the mask-rcnn library provides the mrcnn.visualize.display_instances() function that will show a photograph with bounding boxes, masks, and class labels. This requires that the bounding boxes are extracted from the masks via the extract_bboxes() function.

For completeness, the full code listing is provided below.

Running the example creates a plot showing the photograph with the mask for each object in a separate color.

The bounding boxes match the masks exactly, by design, and are shown with dotted outlines. Finally, each object is marked with the class label, which in this case is ‘kangaroo‘.

Photograph Showing Object Detection Masks, Bounding Boxes, and Class Labels

Photograph Showing Object Detection Masks, Bounding Boxes, and Class Labels

Now that we are confident that our dataset is being loaded correctly, we can use it to fit a Mask R-CNN model.

How to Train Mask R-CNN Model for Kangaroo Detection

A Mask R-CNN model can be fit from scratch, although like other computer vision applications, time can be saved and performance can be improved by using transfer learning.

The Mask R-CNN model pre-fit on the MS COCO object detection dataset can be used as a starting point and then tailored to the specific dataset, in this case, the kangaroo dataset.

The first step is to download the model file (architecture and weights) for the pre-fit Mask R-CNN model. The weights are available from the GitHub project and the file is about 250 megabytes.

Download the model weights to a file with the name ‘mask_rcnn_coco.h5‘ in your current working directory.

Next, a configuration object for the model must be defined.

This is a new class that extends the mrcnn.config.Config class and defines properties of both the prediction problem (such as name and the number of classes) and the algorithm for training the model (such as the learning rate).

The configuration must define the name of the configuration via the ‘NAME‘ attribute, e.g. ‘kangaroo_cfg‘, that will be used to save details and models to file during the run. The configuration must also define the number of classes in the prediction problem via the ‘NUM_CLASSES‘ attribute. In this case, we only have one object type of kangaroo, although there is always an additional class for the background.

Finally, we must define the number of samples (photos) used in each training epoch. This will be the number of photos in the training dataset, in this case, 131.

Tying this together, our custom KangarooConfig class is defined below.

Next, we can define our model.

This is achieved by creating an instance of the mrcnn.model.MaskRCNN class and specifying the model will be used for training via setting the ‘mode‘ argument to ‘training‘.

The ‘config‘ argument must also be specified with an instance of our KangarooConfig class.

Finally, a directory is needed where configuration files can be saved and where checkpoint models can be saved at the end of each epoch. We will use the current working directory.

Next, the pre-defined model architecture and weights can be loaded. This can be achieved by calling the load_weights() function on the model and specifying the path to the downloaded ‘mask_rcnn_coco.h5‘ file.

The model will be used as-is, although the class-specific output layers will be removed so that new output layers can be defined and trained. This can be done by specifying the ‘exclude‘ argument and listing all of the output layers to exclude or remove from the model after it is loaded. This includes the output layers for the classification label, bounding boxes, and masks.

Next, the model can be fit on the training dataset by calling the train() function and passing in both the training dataset and the validation dataset. We can also specify the learning rate as the default learning rate in the configuration (0.001).

We can also specify what layers to train. In this case, we will only train the heads, that is the output layers of the model.

We could follow this training with further epochs that fine-tune all of the weights in the model. This could be achieved by using a smaller learning rate and changing the ‘layer’ argument from ‘heads’ to ‘all’.

The complete example of training a Mask R-CNN on the kangaroo dataset is listed below.

This may take some time to execute on the CPU, even with modern hardware. I recommend running the code with a GPU, such as on Amazon EC2, where it will finish in about five minutes on a P3 type hardware.

Running the example will report progress using the standard Keras progress bars.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

We can see that there are many different train and test loss scores reported for each of the output heads of the network. It can be quite confusing as to which loss to pay attention to.

In this example where we are interested in object detection instead of object segmentation, I recommend paying attention to the loss for the classification output on the train and validation datasets (e.g. mrcnn_class_loss and val_mrcnn_class_loss), as well as the loss for the bounding box output for the train and validation datasets (mrcnn_bbox_loss and val_mrcnn_bbox_loss).

A model file is created and saved at the end of each epoch in a subdirectory that starts with ‘kangaroo_cfg‘ followed by random characters.

A model must be selected for use; in this case, the loss continues to decrease for the bounding boxes on each epoch, so we will use the final model at the end of the run (‘mask_rcnn_kangaroo_cfg_0005.h5‘).

Copy the model file from the config directory into your current working directory. We will use it in the following sections to evaluate the model and make predictions.

The results suggest that perhaps more training epochs could be useful, perhaps fine-tuning all of the layers in the model; this might make an interesting extension to the tutorial.

Next, let’s look at evaluating the performance of this model.

How to Evaluate a Mask R-CNN Model

The performance of a model for an object recognition task is often evaluated using the mean absolute precision, or mAP.

We are predicting bounding boxes so we can determine whether a bounding box prediction is good or not based on how well the predicted and actual bounding boxes overlap. This can be calculated by dividing the area of the overlap by the total area of both bounding boxes, or the intersection divided by the union, referred to as “intersection over union,” or IoU. A perfect bounding box prediction will have an IoU of 1.

It is standard to assume a positive prediction of a bounding box if the IoU is greater than 0.5, e.g. they overlap by 50% or more.

Precision refers to the percentage of the correctly predicted bounding boxes (IoU > 0.5) out of all bounding boxes predicted. Recall is the percentage of the correctly predicted bounding boxes (IoU > 0.5) out of all objects in the photo.

As we make more predictions, the recall percentage will increase, but precision will drop or become erratic as we start making false positive predictions. The recall (x) can be plotted against the precision (y) for each number of predictions to create a curve or line. We can maximize the value of each point on this line and calculate the average value of the precision or AP for each value of recall.

Note: there are variations on how AP is calculated, e.g. the way it is calculated for the widely used PASCAL VOC dataset and the MS COCO dataset differ.

The average or mean of the average precision (AP) across all of the images in a dataset is called the mean average precision, or mAP.

The mask-rcnn library provides a mrcnn.utils.compute_ap to calculate the AP and other metrics for a given images. These AP scores can be collected across a dataset and the mean calculated to give an idea at how good the model is at detecting objects in a dataset.

First, we must define a new Config object to use for making predictions, instead of training. We can extend our previously defined KangarooConfig to reuse the parameters. Instead, we will define a new object with the same values to keep the code compact. The config must change some of the defaults around using the GPU for inference that are different from how they are set for training a model (regardless of whether you are running on the GPU or CPU).

Next, we can define the model with the config and set the ‘mode‘ argument to ‘inference‘ instead of ‘training‘.

Next, we can load the weights from our saved model.

We can do that by specifying the path to the model file. In this case, the model file is ‘mask_rcnn_kangaroo_cfg_0005.h5‘ in the current working directory.

Next, we can evaluate the model. This involves enumerating the images in a dataset, making a prediction, and calculating the AP for the prediction before predicting a mean AP across all images.

First, the image and ground truth mask can be loaded from the dataset for a given image_id. This can be achieved using the load_image_gt() convenience function.

Next, the pixel values of the loaded image must be scaled in the same way as was performed on the training data, e.g. centered. This can be achieved using the mold_image() convenience function.

The dimensions of the image then need to be expanded one sample in a dataset and used as input to make a prediction with the model.

Next, the prediction can be compared to the ground truth and metrics calculated using the compute_ap() function.

The AP values can be added to a list, then the mean value calculated.

Tying this together, the evaluate_model() function below implements this and calculates the mAP given a dataset, model and configuration.

We can now calculate the mAP for the model on the train and test datasets.

The full code listing is provided below for completeness.

Running the example will make a prediction for each image in the train and test datasets and calculate the mAP for each.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

A mAP above 90% or 95% is a good score. We can see that the mAP score is good on both datasets, and perhaps slightly better on the test dataset, instead of the train dataset.

This may be because the dataset is very small, and/or because the model could benefit from further training.

Now that we have some confidence that the model is sensible, we can use it to make some predictions.

How to Detect Kangaroos in New Photos

We can use the trained model to detect kangaroos in new photographs, specifically, in photos that we expect to have kangaroos.

First, we need a new photo of a kangaroo.

We could go to Flickr and find a random photo of a kangaroo. Alternately, we can use any of the photos in the test dataset that were not used to train the model.

We have already seen in the previous section how to make a prediction with an image. Specifically, scaling the pixel values and calling model.detect(). For example:

Let’s take it one step further and make predictions for a number of images in a dataset, then plot the photo with bounding boxes side-by-side with the photo and the predicted bounding boxes. This will provide a visual guide to how good the model is at making predictions.

The first step is to load the image and mask from the dataset.

Next, we can make a prediction for the image.

Next, we can create a subplot for the ground truth and plot the image with the known bounding boxes.

We can then create a second subplot beside the first and plot the first, plot the photo again, and this time draw the predicted bounding boxes in red.

We can tie all of this together into a function that takes a dataset, model, and config and creates a plot of the first five photos in the dataset with ground truth and predicted bound boxes.

The complete example of loading the trained model and making a prediction for the first few images in the train and test datasets is listed below.

Running the example first creates a figure showing five photos from the training dataset with the ground truth bounding boxes, with the same photo and the predicted bounding boxes alongside.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

We can see that the model has done well on these examples, finding all of the kangaroos, even in the case where there are two or three in one photo. The second photo down (in the right column) does show a slip-up where the model has predicted a bounding box around the same kangaroo twice.

Plot of Photos of Kangaroos From the Training Dataset With Ground Truth and Predicted Bounding Boxes

Plot of Photos of Kangaroos From the Training Dataset With Ground Truth and Predicted Bounding Boxes

A second figure is created showing five photos from the test dataset with ground truth bounding boxes and predicted bounding boxes.

These are images not seen during training, and again, in each photo, the model has detected the kangaroo. We can see that in the case of the second last photo that a minor mistake was made. Specifically, the same kangaroo was detected multiple times.

No doubt these differences can be ironed out with more training, perhaps with a larger dataset and/or data augmentation, to encourage the model to detect people as background and to detect a given kangaroo once only.

Plot of Photos of Kangaroos From the Training Dataset With Ground Truth and Predicted Bounding Boxes

Plot of Photos of Kangaroos From the Training Dataset With Ground Truth and Predicted Bounding Boxes

Further Reading

This section provides more resources on the topic if you are looking to go deeper.






In this tutorial, you discovered how to develop a Mask R-CNN model for kangaroo object detection in photographs.

Specifically, you learned:

  • How to prepare an object detection dataset ready for modeling with an R-CNN.
  • How to use transfer learning to train an object detection model on a new dataset.
  • How to evaluate a fit Mask R-CNN model on a test dataset and make predictions on new photos.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Deep Learning Models for Vision Today!

Deep Learning for Computer Vision

Develop Your Own Vision Models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Computer Vision

It provides self-study tutorials on topics like:
classification, object detection (yolo and rcnn), face recognition (vggface and facenet), data preparation and much more...

Finally Bring Deep Learning to your Vision Projects

Skip the Academics. Just Results.

See What's Inside

666 Responses to How to Train an Object Detection Model with Keras

  1. Avatar
    Milemi June 1, 2019 at 5:38 am #

    Great tutorial !
    Could you give us advice how to annotate images, please ?
    What is the best practice ?
    How many images per object is enough ?
    How to annotate when there are several objects in the same image and they overlap ?
    Thank you.

    • Avatar
      Jason Brownlee June 1, 2019 at 6:17 am #

      Great questions, thanks!

      I hope to cover the topic in the future.

      • Avatar
        Yajuan October 15, 2020 at 7:40 pm #

        Dear Dr. Jason,

        I am a student from China. I am dealing with a problem related to scene classification and wondering if you could provide some good methods and materials.

        best, looking forward to hearing from you, Thank you for your time.

        Warm regards,
        Yajuan Xu

        • Avatar
          Jason Brownlee October 16, 2020 at 5:52 am #

          Thanks for the suggestion, I hope to write about the topic in the future.

    • Avatar
      Usama Ahmed October 7, 2019 at 8:40 pm #

      Here is the image annotation tool.

    • Avatar
      waleed June 4, 2020 at 11:51 pm #

      HEY JASON I need help in my satellite building images dataset I have labels in JSON format in which image coordinates in polygon shapes more than 4 points so mask rcnn is suitable for this kind of dataset because RPN needs 4 points to make a box but I have more than 4 points in my labels and annotated images as well so how it works for polygon? please help is there any method to convert polygon to 4 coordinates or any function which can help.

    • Avatar
      harsh verma October 9, 2021 at 8:20 pm #

      use labelimg repo

  2. Avatar
    simonYU June 4, 2019 at 6:05 pm #

    hi, Jason, while display_instances:

    running : display_instances(image, bbox, mask, class_ids, train_set.class_names)

    An error ocurred while starting the kernel ,

    home/user/anaconda3/bin/python: symbol lookup error: /home/user/anaconda3/lib/python3.6/site‑packages/numpy/core/../../../../ undefined symbol: __kmpc_global_thread_num

    Pls find the solution .

  3. Avatar
    roopesh June 4, 2019 at 6:19 pm #

    very nice steps !! How to predict with real time video (CCTV) instead of images, Thanks.

    • Avatar
      Jason Brownlee June 5, 2019 at 8:33 am #

      Great suggestion, I hope to cover it in the future.

    • Avatar
      Usama Ahmed September 26, 2019 at 11:59 pm #

      Use OpenCV to capture video from attached camera.

  4. Avatar
    maryam June 8, 2019 at 5:02 am #

    Hi Jason,
    Thank you very much for the precious tutorial. I face a problem in people counting project when I am going to track people though detecting them is not hard.
    would you please give me a tutorial about the best tracking methods such as “deep tracking” or other else?

  5. Avatar
    gary June 19, 2019 at 8:51 pm #

    Thank you very much for such a beautiful yet detailed tutorial. Its been great learning from you.

    • Avatar
      Jason Brownlee June 20, 2019 at 8:31 am #

      Thanks, I’m glad it helped.

      • Avatar
        gary June 24, 2019 at 10:27 pm #

        Hi jason, i am trying to train multiple object, how can i change the code to import multiple classes?
        Do i use multiple lines of:
        self.add_class(“dataset”, 1, “kangaroo”)
        self.add_class(“dataset”, 2, “tiger”)?

        • Avatar
          Jason Brownlee June 25, 2019 at 6:20 am #

          You can specify all of your classes with a unique integer.

          • Avatar
            Romell Domínguez August 28, 2019 at 6:47 am #

            Hi Jason, and then just add each image using:
            self.add_class(“dataset”, 1, “kangaroo”)
            self.add_class(“dataset”, 2, “tiger”)?

            self.add_image(‘dataset’, … )
            what parameter do i need set for identify ‘the class’

          • Avatar
            Romell Domínguez August 30, 2019 at 8:25 pm #

            i solve that problem for polygons shapes

          • Avatar
            Jason Brownlee August 31, 2019 at 6:04 am #

            I’m happy to hear that, well done!

          • Avatar
            Biki November 8, 2019 at 12:28 am #

            Please let me know how to do this

          • Avatar
            Akshay February 12, 2020 at 3:01 am #

            If we have both kangaroo and tiger inside single image, then how van I load the mask?

            self.add_class(“dataset”, 1, “kangaroo”)
            self.add_class(“dataset”, 2, “tiger”)

            I meant this part!!!

          • Avatar
            Jason Brownlee February 12, 2020 at 5:50 am #

            Sorry, I cannot review/debug your code.

  6. Avatar
    marry June 20, 2019 at 12:27 pm #

    ValueError: Dimension 1 in both shapes must be equal, but are 8 and 16. Shapes are [1024,8] and [1024,16]. for ‘Assign_682’ (op: ‘Assign’) with input shapes: [1024,8], [1024,16].

    hello,Jason,How to solve this error when calculating the mAP value?

  7. Avatar
    mahmoud July 10, 2019 at 6:30 pm #

    hi jason,

    I want to inquire about this file ~~mask_rcnn_kangaroo_cfg_0005.h5 ,
    how i can find it also why you seprate the training and predicting
    ,I mean at the last version of file it contains only the predicting with out the training ,how the model have saved the new weights after training so it can be used on the predicting step

    • Avatar
      Jason Brownlee July 11, 2019 at 9:46 am #

      The model is fit on the training dataset, saved, loaded and used to make prediction on a hold out test dataset.

      Does that help?

      • Avatar
        mahmoud July 11, 2019 at 6:18 pm #

        ya but my question befor train on dataset kangaroo i load weights to model
        # load weights (mscoco) and exclude the output layers
        model.load_weights(‘mask_rcnn_coco.h5’, by_name=True, exclude=[“mrcnn_class_logits”, “mrcnn_bbox_fc”, “mrcnn_bbox”, “mrcnn_mask”])
        then after training
        # load model weights
        model.load_weights(‘mask_rcnn_kangaroo_cfg_0005.h5’, by_name=True)
        why we load the weights again

        i have the file mask_rcnn_coco.h5, i think it have any initial weights ,but i do not know what is the file mask_rcnn_kangaroo_cfg_0005.h5 contains and where i can find this problem

        • Avatar
          Jason Brownlee July 12, 2019 at 8:32 am #

          The new set of weights is focused on only detecting kangaroos based on our own dataset.

          Does that help?

          • Avatar
            mahmoud July 12, 2019 at 5:58 pm #

            ya but i can not find this new set of weights ,i mean when it creats the file mask_rcnn_kangaroo_cfg_0005.h5

          • Avatar
            Jason Brownlee July 13, 2019 at 6:52 am #

            It will be in the same directory as the python file.

          • Avatar
            mahmoud July 14, 2019 at 7:25 am #

            thnx for your response ,another question how i can prepare my images to be on same structure of Kangaroo dataset to train and apply the model on it

          • Avatar
            Jason Brownlee July 14, 2019 at 8:18 am #

            It is not required, but it might be a helpful start if you are having trouble.

          • Avatar
            mahmoud eltaher July 15, 2019 at 7:27 pm #

            ya I need to do this because I want to implement the model on my problem so I have some images with some circles and I want to detect these circles

  8. Avatar
    Wolverin July 13, 2019 at 3:29 am #

    same problem with me, i am using google colab.

    this ‘mask_rcnn_kangaroo_cfg_0005.h5’ file is created while training as said in the blog. but i cannot find anywhere in my gdrive.

    • Avatar
      Jason Brownlee July 13, 2019 at 7:00 am #

      Perhaps try running on your workstation from the command line?

    • Avatar
      kevi October 8, 2019 at 1:03 am #

      I have ran on Google colab and .h5 saved in (/content/Mask_RCNN/kangaroo_cfg*/) folder. Check

    • Avatar
      zakaria October 10, 2019 at 3:40 am #

      Hi plz i am using colab google, i am having trouble installing RCNN Librery in python

      • Avatar
        Jason Brownlee October 10, 2019 at 7:03 am #

        I don’t know about colab, sorry.

        Perhaps try posting on stackoverflow?

  9. Avatar
    Jeremy Immanuel Putra Tandjung July 16, 2019 at 2:25 pm #

    Hello Jason,

    First of all, nice tutorial! Having the overall code at the end of each step really helped keep track of where I am in the code! Keep up the good job!

    I have a question, I notice that it took you on average a minute per epoch to train. However, I tried doing this with a different dataset and right now i’m on my first epoch and it’s ETA 3.5 hours. My desktop is fairly fast with a ryzen 7 cpu and a nvidia 1050Ti gpu.

    So is there something that I’m missing? My training dataset consist of 296 pictures of playing cards in different situations with a total file size of 30.4 MB (I’m trying to train a model to detect playing cards)

    Or is that a normal? Or is there some setting I’m missing?


    • Avatar
      Jason Brownlee July 17, 2019 at 8:15 am #

      It may be a factor of the number of images?

      It may be hardware?

      Perhaps experiment on some p3 EC2 instances or with a smaller dataset?

  10. Avatar
    Choi July 16, 2019 at 4:51 pm #

    Hi Jason.
    This post is so helpful to me to learn R-CNN training!

    As I do my work, I encounter some problems now.
    First I train the model based on ‘mask_rcnn_coco.h5’ weight first.

    So i got the model weight : ‘mask_rcnn_carpk_cfg_0010.h5’ file
    how can i append more training images and train based on above file?

    I just tried to append more images by load_images function, and next I trained the model by load_weights(‘mask_rcnn_carpk_cfg_0010.h5’, by_name=True, exclude=[“mrcnn_class_logits”, “mrcnn_bbox_fc”, “mrcnn_bbox”, “mrcnn_mask”])
    But it did not work..

    Is there any other things to set??

    Thank you!!

  11. Avatar
    Nathan Starliper July 17, 2019 at 5:33 am #

    Hi Jason,

    Great tutorial. However, I am bit confused as to why you used Mask RCNN instead of Faster RCNN? Mask RCNN is essentially Faster RCNN except with segmentation added. Here in this example you basically converted the segmentation into bounding boxes so it seems to me that it would have saved you quite a bit of effort and manual labor to just use Faster RCNN model instead?


    • Avatar
      Jason Brownlee July 17, 2019 at 8:31 am #

      Good question.

      Optionality. We can do object detection which is what most people want, with ability to do segmentation if needed.

  12. Avatar
    SATYAM SAREEN July 22, 2019 at 7:43 pm #

    Great Tutorial Sir,
    I really learned a lot.
    I have a doubt regarding multiclass detection. I have 2 classes: person with a helmet, person without a helmet. what changes should I make in the program? Like adding classes through add_class function.
    Huge Respect and Love.
    Satyam Sareen

    • Avatar
      Jason Brownlee July 23, 2019 at 8:00 am #

      Perhaps this tutorial will help you train your model:

      • Avatar
        SATYAM SAREEN July 23, 2019 at 7:38 pm #

        Good Afternoon Sir,

        You have attached the link to the same blog. Can you suggest the changes to be made in your code so that it runs smoothly for multiclass object detection?

        Warm Regards
        Satyam Sareen

        • Avatar
          Jason Brownlee July 24, 2019 at 7:52 am #

          What do you mean smoothly?

        • Avatar
          AutoRoboCulture November 18, 2019 at 3:05 am #

          Hello Satyam Sareen,

          Check out the code below, I have changed it to your requirement. If any query comment it down. Keep it up!


          class KangarooDataset(Dataset):
          # load the dataset definitions
          def load_dataset(self, dataset_dir, is_train=True):
          # define two class
          self.add_class(“dataset”, 1, “personWithHelmet”) #Change required
          self.add_class(“dataset”, 2, “personWithoutHelmet”) #Change required
          # define data locations
          images_dir = dataset_dir + ‘/images/’
          annotations_dir = dataset_dir + ‘/annots/’
          # find all images

          for filename in listdir(images_dir):
          # extract image id
          image_id = filename[:-4]
          #print(‘IMAGE ID: ‘,image_id)
          # skip all images after 90 if we are building the train set
          if is_train and int(image_id) >= 90: #set limit for your train and test set
          # skip all images before 90 if we are building the test/val set
          if not is_train and int(image_id) < 90:
          img_path = images_dir + filename
          ann_path = annotations_dir + image_id + '.xml'
          # add to dataset
          self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path, class_ids = [0,1,2]) # for your case it is 0:BG, 1:PerWithHel.., 2:PersonWithoutHel… #Change required

          # extract bounding boxes from an annotation file
          def extract_boxes(self, filename):
          # load and parse the file
          tree = ElementTree.parse(filename)
          # get the root of the document
          root = tree.getroot()
          # extract each bounding box
          boxes = list()
          #for box in root.findall('.//bndbox'):
          for box in root.findall('.//object'): #Change required
          name = box.find('name').text #Change required
          xmin = int(box.find('./bndbox/xmin').text)
          ymin = int(box.find('./bndbox/ymin').text)
          xmax = int(box.find('./bndbox/xmax').text)
          ymax = int(box.find('./bndbox/ymax').text)
          #coors = [xmin, ymin, xmax, ymax, name]
          coors = [xmin, ymin, xmax, ymax, name] #Change required
          # extract image dimensions
          width = int(root.find('.//size/width').text)
          height = int(root.find('.//size/height').text)
          return boxes, width, height

          # load the masks for an image
          def load_mask(self, image_id):
          # get details of image
          info = self.image_info[image_id]
          # define box file location
          path = info['annotation']
          # load XML
          boxes, w, h = self.extract_boxes(path)
          # create one array for all masks, each on a different channel
          masks = zeros([h, w, len(boxes)], dtype='uint8')
          # create masks
          class_ids = list()
          for i in range(len(boxes)):
          box = boxes[i]
          row_s, row_e = box[1], box[3]
          col_s, col_e = box[0], box[2]
          if (box[4] == 'personWithHelmet'): #Change required #change this to your .XML file
          masks[row_s:row_e, col_s:col_e, i] = 2 #Change required #assign number to your class_id
          class_ids.append(self.class_names.index('personWithHelmet')) #Change required
          masks[row_s:row_e, col_s:col_e, i] = 1 #Change required
          class_ids.append(self.class_names.index('personWithoutHelmet')) #Change required

          return masks, asarray(class_ids, dtype='int32')

          # load an image reference
          def image_reference(self, image_id):
          info = self.image_info[image_id]
          return info['path']

          # define a configuration for the model
          class KangarooConfig(Config):
          # define the name of the configuration
          NAME = "kangaroo_cfg"
          # number of classes (background + personWithoutHelmet + personWithHelmet)
          NUM_CLASSES = 1 + 2 #Change required
          # number of training steps per epoch
          STEPS_PER_EPOCH = 90

          • Avatar
            Jason Brownlee November 18, 2019 at 6:50 am #

            Thanks for sharing.

          • Avatar
            Ashutosh Srivastava February 11, 2020 at 10:21 pm #

            You are great @AutoRoboCulture.

          • Avatar
            Akshay February 12, 2020 at 3:22 am #

            Halo, I tried to train with multiple classes in a single image, I am gettting an error like this

            File “C:……….\lib\site-packages\keras\engine\”, line 145, in standardize_input_data

            ValueError: Error when checking input: expected input_image to have shape (None, None, 1) but got array with shape (1024, 1024, 3).

            PS: I am working with gray scale images. and 3 classes. Inside single image both classes are present

          • Avatar
            Ademola Okerinde February 26, 2020 at 10:47 am #


          • Avatar
            Nourhan March 4, 2020 at 9:25 pm #

            Thank you so much for sharing these changes.
            However, after I followed all of them and adjusted the whole thing to fit my dataset, I keep getting this error:
            RuntimeError: generator raised StopIteration

            from that training line:
            model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers=’heads’)

            Do you have any suggestions to overcome it?

          • Avatar
            hila March 12, 2020 at 10:14 pm #

            hey i was using your code for the training.
            can you please show us your prediction code? i was trying to use Jason’s but it came with a lot of errors which i cannot solve.

          • Avatar
            Andy August 2, 2020 at 1:36 am #

            Can you share the prediction part, please?

  13. Avatar
    mahmoud July 25, 2019 at 7:25 pm #

    is this model also suppose to detect the mask of the objects ,for the kangaroo on the images or we will need some modification to segment the images.

    • Avatar
      Jason Brownlee July 26, 2019 at 8:19 am #

      Yes, if masks are provided.

      In the case of kangaroos, we do not provide masks – just bounding boxes, therefore masks cannot be learned.

      • Avatar
        mahmoud July 29, 2019 at 8:47 pm #

        when i try to test image with multiple kangaroos ,it failed to detect them is there are two kangaroos interference it detect them as only one ?? any advice

        • Avatar
          Jason Brownlee July 30, 2019 at 6:11 am #

          Perhaps the model requires more training on photos with multiple kangaroos?

          • Avatar
            mahmoud July 30, 2019 at 10:02 pm #

            thanks for your response, another question is there a new version of Mask RCNN avilable on github .
            also what i need to have mask on my model how i can provide the model and make my model learn it also

          • Avatar
            Jason Brownlee July 31, 2019 at 6:52 am #

            The model can learn the mask, if you provide a dataset that has masks on the images.

          • Avatar
            mahmoud August 22, 2019 at 7:22 pm #

            thanks for your response i confused about some thing ,now we train model without mask ,so what is the mask loss on this case,and how it is calculated??

          • Avatar
            Jason Brownlee August 23, 2019 at 6:24 am #

            I don’t follow sorry, what do you mean exactly?

  14. Avatar
    Nishant Gaurav July 29, 2019 at 6:39 pm #

    I am getting this error. Please help
    OSError Traceback (most recent call last)
    in ()
    —-> 1 model.load_weights(‘mask_rcnn_kangaroo_cfg_0005.h5’, by_name=True)

    2 frames
    /usr/local/lib/python3.6/dist-packages/h5py/_hl/ in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
    140 if swmr and swmr_support:
    141 flags |= h5f.ACC_SWMR_READ
    –> 142 fid =, flags, fapl=fapl)
    143 elif mode == ‘r+’:
    144 fid =, h5f.ACC_RDWR, fapl=fapl)

    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

    h5py/h5f.pyx in

    OSError: Unable to open file (unable to open file: name = ‘mask_rcnn_kangaroo_cfg_0005.h5’, errno = 2, error message = ‘No such file or directory’, flags = 0, o_flags = 0)

    • Avatar
      Jason Brownlee July 30, 2019 at 6:06 am #

      The error suggests that the path to your data file is incorrect or the file is corrupted in some way?

      • Avatar
        Nishant Gaurav July 31, 2019 at 9:45 pm #

        Thanks for the suggestion. The problem was resolved.
        How do we resolve the problem with the multiclass label? If we have to identify numbers and characters given in the same image and want to label all the characters and images, then how do we apply the multiclass label.

        • Avatar
          Jason Brownlee August 1, 2019 at 6:49 am #

          Perhaps extract the images of detected numbers (called segmentation), then classify each segmented image.

    • Avatar
      Mark November 10, 2020 at 1:59 pm #

      Hi I had the same “could not find file” problem. May you share how you resolve the problem?

  15. Avatar
    Dicko July 29, 2019 at 8:13 pm #

    Hi there, when I copied the example exactly, I am getting a train mAP of 0.000 and a test mAP of 0.000 also. Clearly something is wrong, I was wondering if anyone knew what the issue could be and how to resolve it. Thank you.

  16. Avatar
    saka July 30, 2019 at 2:26 pm #

    Dear Jason, Thanks! I really learned a lot.

    I am getting this error for the coding line “from mrcnn.utils import Dataset”.

    ” from mrcnn.utils import Dataset

    ModuleNotFoundError: No module named ‘mrcnn’ “.

    However, I checked if the library was installed by typing “show mask-rcnn” and got the results below,

    Name: mask-rcnn
    Version: 2.1
    Summary: Mask R-CNN for object detection and instance segmentation
    Author: Matterport
    License: MIT
    Location: c:\users\sakal\appdata\local\continuum\anaconda3\lib\site-packages\mask_rcnn-2.1-py3.7.egg

    According the information above, It seems no problem about the library installed. Could you please advise me about this. Thanks!!

  17. Avatar
    Dicko July 30, 2019 at 7:13 pm #

    Thanks for that I’ll have a look through the code and see if I’ve made a mistake somewhere when copying.
    Is there a file which has the complete code written so that i can just copy and past the whole lot rather than bits at a time?

    Thank you 🙂

    • Avatar
      Jason Brownlee July 31, 2019 at 6:48 am #

      Each of my tutorials has the complete file embedding, you can copy-paste it directly.

  18. Avatar
    Nishant Gaurav July 31, 2019 at 9:52 pm #

    File “”, line 21
    self.add_class(“dataset”, 2, “1”)
    IndentationError: unindent does not match any outer indentation level
    I am getting this error when I added just two new lines, in the code.

    def load_dataset(self, dataset_dir, is_train=True):
    # define one class
    self.add_class(“dataset”, 1, “N”)
    self.add_class(“dataset”, 2, “1”) //Added this new line
    # define data locations
    images_dir = dataset_dir + ‘/images/’
    annotations_dir = dataset_dir + ‘/annots/’

    for i in range(len(boxes)):
    box = boxes[i]
    row_s, row_e = box[1], box[3]
    col_s, col_e = box[0], box[2]
    masks[row_s:row_e, col_s:col_e, i] = 1
    class_ids.append(self.class_names.index(‘1′)) //Added this new line.
    return masks, asarray(class_ids, dtype=’int32’)

  19. Avatar
    Nishant Gaurav July 31, 2019 at 11:14 pm #

    IndexError Traceback (most recent call last)
    2 plt.imshow(image)
    3 # plot mask
    —-> 4 plt.imshow(mask[:, :, 0], cmap=’gray’, alpha=0.1)

    IndexError: index 0 is out of bounds for axis 2 with size 0

    I am getting this error after i added that extre two lines.

  20. Avatar
    ahmadreza August 1, 2019 at 2:44 am #

    hi Sir
    I am getting this error. Please help

    if is_train and int(image_id) >= 150:

    ValueError: invalid literal for int() with base 10: ‘Thumb’

  21. Avatar
    Nishant Gaurav August 1, 2019 at 2:16 pm #

    Hi Sir,
    Could you please give some insight where do I need to make changes for the multi-class label in the code so that I could identify the different characters and numbers in a single image?
    Please give some insight with examples so that it is easier to understand.
    Thanks so much for helping.

  22. Avatar
    mh August 6, 2019 at 6:58 pm #

    Thanks for your tutorial.

    But i want to ask is there any model can deal with the objects which have similar color on the back ground.

    • Avatar
      Jason Brownlee August 7, 2019 at 7:45 am #

      Perhaps. You may have to do some testing, or perhaps use transfer learning to tune an existing model.

  23. Avatar
    Tal August 9, 2019 at 12:33 am #

    Thank you very much for this great and clear tutorial!
    If I may ask:
    Is there a way to evaluate the model while training? For example at the end of each epoch?

    Thanks a million,


  24. Avatar
    Selman Bozkır August 14, 2019 at 4:57 am #

    Hi Jason,

    I have a problem. My dataset contains only 872 training images and 15 classes. Meanwhile, my images are rather bigger than kangroo or pascal voc files. They are around 1500 pixel wide and 1000 pixel tall. I have changed the python codes in order to apply multi-class classification. My equipment is 1050 ti on a 24 GB memory system. I have run your code for kangroo data, it was ok. But whenever I have done it for my custom data, the memory requirement is getting higher than 20 GB and makes the ubuntu run on slow swap memory yielding a dead situation.

    What is the problem? is it normal? What about the ram consumption in your case. I did not check it for kangroo data. But I remember that, on 5th epoch it activated the swap memory.
    What could be a walk-around about this problem?

    • Avatar
      Jason Brownlee August 14, 2019 at 6:46 am #

      Perhaps you can reduce the size of the image prior to modeling?

      • Avatar
        Selman Bozkır August 14, 2019 at 6:59 am #

        Well, for a fair scientfic study, i would not reduce it but, the only way I found is to reduce IMAGE_MIN_DIM =400 and IMAGE_MAX_DIM= 512. However, it is interesting that, for each epoch, the total memory consumption is getting higher.

        Moreover, I need to say that, the training procedure always starts with giving warnings such as “UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.”

        This is the problem actually. Is it possible to solve this? I have googled it but the solutions did not come so clear to me (or it sound so technical).

        Currently, I can train the model for only 4 epochs. More needs more memory. This is for me, a certain bug since, the advancing epochs should not increase the memory consumption.

        Btw, I really thank for your reply.

        As I told, this memory issue really made me sad. Is this normal?

        • Avatar
          Jason Brownlee August 14, 2019 at 2:08 pm #

          Perhaps you can use progressive loading and only load/yield one batch of images into memory at a time.

          This can be achieved with the ImageDataGenerator:

          • Avatar
            Selman Bozkır August 14, 2019 at 7:26 pm #

            Dear Jason;

            Thanks so much for your advice. Here, I would like to share my experience with you and others. The only solution I have found so far is that setting
            the use_multiprocessing=False in and reducing the number of workers to 1. This has helped me. Btw, I am now using 384×384 images by reducing the IMAGE_MAX_DIM = 384 and IMAGE_MIN_DIM =384 . Now I can train it with 20 epochs. This has really helped me.

            I hope this information may help others whom lived the same problems.


          • Avatar
            Jason Brownlee August 15, 2019 at 8:00 am #

            Nice! Thanks for sharing.

    • Avatar
      K Ramesh May 8, 2021 at 3:06 pm #

      Hi Selman, How you annotated that many images, and how much time it took? Can someone help me to annotate automatically

  25. Avatar
    N. Arvind August 18, 2019 at 1:58 am #

    Dear Jason
    Good morning!

    We have used this model to detect bounding boxes and masks for id cards.

    We provided annotations in .csv files as quadrilaterals and modified ‘load_mask’ function accordingly. We are looking for quadrilateral shaped masks.

    We are able to detect bounding boxes correctly. We are not able to detect masks correctly. Although incorrect masks do show up.

    We have used the exact code. Learning rate is 0.00001. We have used 800 images and 65 epochs for training. A higher learning rate gives NaN loss. We have checked the entire dataset for any discrepancy.

    Can you guide where we are going wrong ? Can we use this exact code with exactly the same config with four vertices to generate masks ?

    Warm regards,
    N. Arvind

  26. Avatar
    Per Nord August 22, 2019 at 12:34 am #

    Great tutorial! I’ve managed to successfully train a model and now I want to use the model in Android and iOS.

    I’ve learned that his requires me to convert my model.h5 file to model.pb and then to a Tensorflow Lite format.

    I expected this to be trivial, but alas. The MaskRCNN issue list is riddled with people having problems with this.

    Did you ever try this?
    If not, it would be a great continuation to this tutorial.

  27. Avatar
    Abilash August 24, 2019 at 10:03 pm #

    Hi Jason,

    That was nice Tutorial, i have some errors on trying with multiclass.

    IndexError: boolean index did not match indexed array along dimension 0; dimension is 2 but corresponding boolean dimension is 1

    I have two class ( full glass and empty glass) and have made NUM_CLASSES = 1 + 2 in config along with self.add_class(“dataset”, 1, “Full Glass”) and self.add_class(“dataset”, 2, “Not Full Glass”) also made changes class_ids.append(self.class_names.index(‘Full’))
    class_ids.append(self.class_names.index(‘Not Full’)).

    Please help me out, i am unable to resolve the error since many attempts.

    • Avatar
      Jason Brownlee August 25, 2019 at 6:37 am #

      It’s hard to debug this for you off the cuff, sorry.

      Perhaps double check you made all of the required changes?

    • Avatar
      Julio César Álvarez Iglesias September 14, 2019 at 4:42 am #

      I am facing the same problem. Did you manage to resolve this issue?

      • Avatar
        Jason Brownlee September 14, 2019 at 6:23 am #

        Try removing “Thumb” files from your folder.

    • Avatar
      Shubhangi November 5, 2019 at 3:51 pm #

      Having same error!! Could you find some solutions regarding this.

  28. Avatar
    Alaki September 4, 2019 at 5:14 pm #

    I hope its not a repeated question. I wonder if you have tutorial on training a model for custom multi-object detection ? basically, an image taken where we would like to recognize multiple images in an image. There is no pre-trained model on these objects, and we have labeled a few set of images. (again each image, is labeled with multiple rectangular which are covering each object).

    Thank you again for all these nice tutorials.

    • Avatar
      Jason Brownlee September 5, 2019 at 6:49 am #

      I believe you adapt the above tutorial for this purpose, the model supports multiple objects in one image, and they can be different types.

  29. Avatar
    Ade September 6, 2019 at 5:45 pm #

    Dear Dr Jason,

    Good day sir, I am a Machine Learning Engineer. I am currently working on logo detection system. I have tried MobileNet SSD, Faster RCNN and their seemed to be a higher number of false positives when I try the model out. It seems its not too good for logo that is very small in size. I have also created Haar and LBP cascade model and it seemed to perform better than the deep learning model, false positive wise. My question: is there any other technique that can do very well with small logos with different contrast, orientations? Thank you.

    • Avatar
      Jason Brownlee September 7, 2019 at 5:22 am #

      I’m not sure off hand, sorry. Perhaps check the literature?

      I recall some interesting work on test-time augmentation that might be very helpful to you.

  30. Avatar
    K_gao September 9, 2019 at 5:01 pm #

    Dear Jason!

    You’ve made a great work again. Thank you for this post!

    What if I want to train an add not jus one object to my model? For example, I want to add 100 new class. If I have 100 class, and every class has 500 images, how can I train the model? Impossible to load 50.000 image into the memory! It is possible to make it with loop, and add a new class to the modell with every iteration?

    Do you have a post about this?


  31. Avatar
    Ankit September 13, 2019 at 4:58 am #

    Hi Jason,

    The notebook is very helpful and full of knowledge but I am having problems while training the model on a different dataset(fruits -apple, banana, orange).
    After loading the images,annots and masks when I try to train the model i am getting the following error:
    RemoteTraceback Traceback (most recent call last)

    Traceback (most recent call last):
    File “/usr/lib/python3.6/multiprocessing/”, line 119, in worker
    result = (True, func(*args, **kwds))
    File “/usr/local/lib/python3.6/dist-packages/keras/utils/”, line 641, in next_sample
    File “/content/Mask_RCNN/mrcnn/”, line 1709, in data_generator
    File “/content/Mask_RCNN/mrcnn/”, line 1265, in load_image_gt
    class_ids = class_ids[_idx]
    IndexError: boolean index did not match indexed array along dimension 0; dimension is 6 but corresponding boolean dimension is 2

    The above exception was the direct cause of the following exception:

    IndexError Traceback (most recent call last)

    in ()
    3 learning_rate = config.LEARNING_RATE,
    4 epochs = 10,
    —-> 5 layers = ‘all’ )

    7 frames

    /usr/lib/python3.6/multiprocessing/ in get(self, timeout)
    642 return self._value
    643 else:
    –> 644 raise self._value
    646 def _set(self, i, obj):

    IndexError: boolean index did not match indexed array along dimension 0; dimension is 6 but corresponding boolean dimension is 2

    Please provide a hint about the same.
    Also, I am using multiclass for 3 fruits.

    • Avatar
      Jason Brownlee September 13, 2019 at 5:45 am #

      Perhaps double check that you are loading the data correctly or as you expect?

    • Avatar
      bella October 22, 2019 at 4:03 am #

      Hey, How did you solve this issue of IndexError?

  32. Avatar
    Shubhangi September 17, 2019 at 4:33 pm #

    I do train model on my won dataset but the prediction of model is not getting right. can you pls help me ?
    Actually i have train for kangaroo class name but in prediction i am getting person class tag

    • Avatar
      Jason Brownlee September 18, 2019 at 5:57 am #

      Perhaps start with the example in the tutorial and adapt it for your specific dataset?

      • Avatar
        Shubhangi September 20, 2019 at 3:17 pm #

        Thank You i solved it ….But i have total of 125 images of id card and aim is to get id card from images but i am not getting correct output after training of model object detection is results is not good at all …..i have done 50 epochs at 25 steps…Can you pls help me?

        • Avatar
          Jason Brownlee September 21, 2019 at 6:44 am #

          I have some general suggestions for diagnosing and improving deep learning model performance here that may help:

          • Avatar
            Shubhangi October 7, 2019 at 8:50 pm #

            Hello. I found some issues regarding accuracy of model. I dont know what issue is there which effect accuracy of model. Same cnfiguration as described above is used in my model but accuracy is no good. The ROI getting from prediction of model is not correct. Can some one Please help me out

          • Avatar
            Jason Brownlee October 8, 2019 at 8:00 am #

            Is this on your own dataset or the dataset used in the above tutorial?

            I have some general suggestions here that might help to diagnose and address performance issues:

  33. Avatar
    Jeorge September 21, 2019 at 4:32 pm #

    Hello. I Have this error. I dont know how to solve it:

    ~/.ve/main/lib/python3.7/site-packages/mask_rcnn-2.1-py3.7.egg/mrcnn/ in compile(self, learning_rate, momentum)
    2197 tf.reduce_mean(layer.output, keepdims=True)
    2198 * self.config.LOSS_WEIGHTS.get(name, 1.))
    -> 2199 self.keras_model.metrics_tensors.append(loss)
    2201 def set_trainable(self, layer_regex, keras_model=None, indent=0, verbose=1):

    AttributeError: ‘Model’ object has no attribute ‘metrics_tensors’

    • Avatar
      Jason Brownlee September 22, 2019 at 9:26 am #

      Sorry, I have not seen that error before.

      Are you able to confirm that your Keras/TensorFlow/RCNN libraries are up to date?

      Are you able to try Python 3.6 instead, I don’t think Python 3.7 is supported?

    • Avatar
      Kay September 25, 2019 at 8:57 pm #

      You can add the line

      model.keras_model.metrics_tensors = []

      right after the model definition to circumvent the error.

      • Avatar
        Jason Brownlee September 26, 2019 at 6:34 am #

        Thanks for sharing.

      • Avatar
        ahasan September 29, 2019 at 2:17 am #

        exactly where should I change in the

  34. Avatar
    Mikael October 2, 2019 at 7:32 am #


  35. Avatar
    Kevin October 8, 2019 at 1:20 am #

    Hi Jason Brownlee,

    Great tutorial for object detection. This is the first time, I visited this site and I loved the way to document your post. I have walk-through each line of code and successfully implemented kangaroo detection. You have developed well documented code guide for us. Based on your tutorial, I have managed to run this model on Weed detection problem. And yes, I am able detect weed with these. Thanks a lot for your post.

    By the way, I have one question:

    –> How to save full keras model (architecture + weights)? I want to convert it to TensorRT for that I need full model.

    I have tried:


    2) save weights only = False

    but It gives error:

    [TypeError: can’t pickle _thread.RLock objects]

    If possible please help me on this.


  36. Avatar
    Reem October 8, 2019 at 2:28 am #

    How can I create the annotated xml file? The VGG tool only creates a csv file or json file. Could you please assist in the way of creating the xml file or the conversion from csv/json to xml?


    • Avatar
      Jason Brownlee October 8, 2019 at 8:08 am #

      I believe there are a ton of image annotation tools available that can create the annotations with/for you.

    • Avatar
      Kevin October 8, 2019 at 5:39 pm #

      Hi Reem,

      Check out this annotation tool, this will create .xml file for you. As used in this model. Link: []

  37. Avatar
    Akash Nakarmi October 8, 2019 at 3:18 am #


    Thanks for the very nice tutorial. I was able to train the model and get mask_rcnn_kangaroo_cfg_0005.h5 created. However, when I ran the model evaluation code, I got the following error. Could you help me resolve this?

    AssertionError: Create model in inference mode, and it is complaining on line yhat=model.detect(sample, verbose=0) saying that len(images) must be equal to BATCH_SIZE.


  38. Avatar
    Shubhangi October 9, 2019 at 6:48 pm #

    I have my own data set ….Thank you for general suggestions this is helpful for me but i don’t understand that why accuracy of model is not good even using same structure and configuration of model as suggested above.

    And I have also tried on different data set for all issues is same by that I conclude that there is some minor issue in the script which is not detected by me so please help me out ….

    If u want my source code i will that also


  39. Avatar
    Dinesh Kumar October 10, 2019 at 4:58 pm #

    Hello Jason,

    While trying to train the model I got the following message.

    File “C:\Users\userid\AppData\Local\Continuum\anaconda3\lib\site-packages\tensorflow_core\python\framework\”, line 523, in _disallow_in_graph_mode
    ” this function with @tf.function.”.format(task))

    OperatorNotAllowedInGraphError: using a tf.Tensor as a Python bool is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.

    Could you please suggest on this

    • Avatar
      Jason Brownlee October 11, 2019 at 6:15 am #

      Sorry to hear that, are you able to confirm that you are using Python 3.6, TensorFlow 1.14, and Keras 2.3 or better?

      • Avatar
        Dinesh Kumar October 11, 2019 at 7:09 pm #

        Hello Jason,

        Thank you for your reply.

        I am using python 3.7, TensorFlow 2.0.0 and keras 2.3.1


        • Avatar
          Jason Brownlee October 12, 2019 at 6:52 am #

          This example will not work with TF 2.0. You must use TF 1.14. I believe I mention this right at the top of the page:

          Note: This tutorial requires TensorFlow version 1.14 or higher. It currently does not work with TensorFlow 2 because some third-party libraries have not been updated at the time of writing.

          • Avatar
            Dinesh Kumar October 15, 2019 at 3:49 pm #

            Hello Jason,

            Thanks for your reply,

            I will use TF 1.14.


  40. Avatar
    mark October 18, 2019 at 6:59 am #

    hello jason,
    i just wanted to know how much time it takes to make a prediction on a new image.
    so basically how long does it take to run
    yhat = model.detect(sample, verbose=0)[0]

    thank you for your time.

    • Avatar
      Jason Brownlee October 18, 2019 at 8:18 am #

      Fractions of a second, although depends on hardware of course.

      • Avatar
        mark October 19, 2019 at 3:52 pm #

        well i need to know how many times it can be run in 1 second.if run on your computer can you give me an estimate of how many times it would run in 1 second. (5,10,20,30,40, 50, 60, 60+)


        • Avatar
          Jason Brownlee October 20, 2019 at 6:15 am #

          Perhaps you can calculate those estimates yourself on your own hardware with your data – that way they will be meaningful/useful to your project?

  41. Avatar
    Jerico October 18, 2019 at 12:31 pm #

    Pretty cool tutorial, definitely will help us.
    Brother is it possible to determine the size or dimension of kangaroo?

    • Avatar
      Jason Brownlee October 18, 2019 at 2:52 pm #


      In real life from a photo? Not using these models, sorry.

      • Avatar
        Jerico October 20, 2019 at 6:24 pm #

        yep! from a photo. Assuming i took a pix of a kangaroo and test it on your model . definitely your model will recognize it as kangaroo. what i’m opt is, the dimension of kangaroo, i’m sure you have technique on how to determine its size using the model that you had created.

        • Avatar
          Jason Brownlee October 21, 2019 at 6:16 am #

          No idea off the cuff, sorry.

          It does not sound tractable as each photo has a different scale.

  42. Avatar
    Yaroslav October 20, 2019 at 4:19 am #

    I found out, that we can’t assign image id randomly (not from 0). Perhaps class Dataset creates list, not a numpy array. I checked myself and realized that I can’t access image with id, for example, 317 while i have only 100 images.
    Thus, I don’t know why this field “image id” exists, when it numbered anyway from 0, increasing by 1.

    • Avatar
      Jason Brownlee October 20, 2019 at 6:25 am #

      Thanks for sharing.

      • Avatar
        Yaroslav October 20, 2019 at 10:33 pm #

        Thanks for your great article. It’s the best tutorial about object detection. It helped a lot.

  43. Avatar
    JuanM October 22, 2019 at 3:19 am #

    Good afternoon, I have a problema with the code. When I start the training the procces is stack in the first Epoch. What can i do to ?

    • Avatar
      JuanM October 22, 2019 at 3:23 am #

      WARNING:tensorflow:From C:\Users\Juan\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\ The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

      Epoch 1/5

      in this momento I have the problem

      • Avatar
        Jason Brownlee October 22, 2019 at 5:58 am #

        TensorFlow 2.0 is not support for this tutorial at the moment, try TensorFlow 1.14 instead.

    • Avatar
      Jason Brownlee October 22, 2019 at 5:57 am #

      Sorry to hear that, I have some suggestions here:

      • Avatar
        JuanM October 23, 2019 at 2:51 am #

        In this momment my tensorflow is 1.14.0. there´s no solution ?

        • Avatar
          Jason Brownlee October 23, 2019 at 6:54 am #

          Try down-grading tip 1.14? Or perhaps try a different tutorial/library?

  44. Avatar
    Akash Joshi October 29, 2019 at 4:47 am #

    Hi Jason,
    it was a very great article and thoroughly explained code.
    I have a question for you regarding this tutorial. I am trying out this tutorial on my laptop and I have limited processing power.When I tried with the full data set of kangaroos the first epoch took around 8 hrs approx.I stopped it in between then I tried to reduce the data set to about 10 images and started training process but it still showed 7 hours as the ETA and each epoch had 131 steps.

    As per my thinking if I reduce the number of images in the data set the training time should reduce and instead of 131 steps it should have 10 steps in each epoch as the data set has only 10 images.I am currently willing to have a lower accuracy.

    Can you let me know if my understanding in wrong?

    • Avatar
      Jason Brownlee October 29, 2019 at 5:32 am #

      Less images might impact model performance generally.

      Perhaps try running on EC2?

      • Avatar
        Akash Joshi November 2, 2019 at 12:16 am #

        Hi Jason,

        I tried using less no. of images but i cannot complete the training process as i am getting the following message

        2019-11-01 18:37:31.547297: W T:\src\github\tensorflow\tensorflow\core\framework\] Allocation of 603979776 exceeds 10% of system memory.

        can you tell me why do i get this message?

        • Avatar
          Jason Brownlee November 2, 2019 at 6:44 am #

          Try even fewer images?
          Try EC2 with more RAM?
          Try a smaller model?
          Try progressive loading?

          • Avatar
            Akash November 2, 2019 at 7:39 am #

            I tried with 3 images also but got the same issue.can you explain or give links to the the last two options you mentioned.

  45. Avatar
    Asjad Murtaza October 29, 2019 at 9:54 am #

    Hi Jason, I plan on following this tutorial for skin segmentation on compaq dataset. The labels are in PBM(Portable Bitmap) format. Is it fine or do I need to do somethings differently ?

    • Avatar
      Jason Brownlee October 29, 2019 at 1:49 pm #

      I don’t think it matters as long as the images can be loaded to numpy arrays.

  46. Avatar
    Juan Pablo November 1, 2019 at 8:34 am #

    Hi Jason,

    Thanks for this great article!

    One question:

    I already have my model trained and my weights (mask_rcnn_kangaroo_cfg_0019.h5).
    How can I valid this with new images?

    I mean not to call the test or train datasets

    plot_actual_vs_predicted(‘MY PHOTO’, model, cfg)

    • Avatar
      Jason Brownlee November 1, 2019 at 1:40 pm #

      Load the model and use it to make predictions on a test dataset and compare predictions with the expected values.

      The section “How to Evaluate a Mask R-CNN Model” will provide a useful guide.

  47. Avatar
    Juan Pablo November 2, 2019 at 5:01 am #

    Thanks Jason,

    But why would I need de annots if I want to validate the model with a new image.

    • Avatar
      Jason Brownlee November 2, 2019 at 6:52 am #

      To confirm the predictions match the expectations and calculate an evaluation score.

  48. Avatar
    osteocyte November 5, 2019 at 8:59 pm #

    Hi Jason,
    thanks a lot for this great tutorial! Could you please give me a quick hint how one can extract the total number of detected objects in each image?
    Thanks a lot, osteocyte

    • Avatar
      Jason Brownlee November 6, 2019 at 6:33 am #

      It will be the number of bounding boxes returned from a call to predict.

  49. Avatar
    Felipe Correa November 5, 2019 at 11:12 pm #

    Hi, I already have my trained model (generated with this tutorial). Is it posible to use this model for video live detection?

    Do you have a script example or something that you could help me out with.?

    Best regards!

    • Avatar
      Jason Brownlee November 6, 2019 at 6:33 am #

      Yes, perhaps apply to each frame of the video, or every 20th frame?

      I don’t have an example at this stage.

  50. Avatar
    Florian Garrigues November 6, 2019 at 1:18 am #

    first thanks for this amazing tutorial!
    Second i have an question how can we modifie you’re code to have mask and box

    Thank you

    • Avatar
      Jason Brownlee November 6, 2019 at 6:36 am #

      You can define a mask and a box and then fit the model on it. In my example I treat them as the same.

  51. Avatar
    tejas November 6, 2019 at 4:13 pm #

    self.add_class(“dataset”, 1, “kangaroo”)
    self.add_class(“dataset”, 2, “tiger”)
    self.add_class(“dataset”, 3, “dog”)

    for multi class classification is this changes are enough anything more needded?

    • Avatar
      Jason Brownlee November 7, 2019 at 6:34 am #

      Looks good to me, off the cuff at least.

    • Avatar
      tejas November 8, 2019 at 12:00 am #

      Please tell how to do for multi class classification?

    • Avatar
      Suave December 20, 2019 at 5:08 am #

      Take a look on your xml file and then and then modify the parsing.Goal is: to get the right boundig boxes to the right class name. Then create your mask with the right boundig box classname corelation

  52. Avatar
    Shubhangi November 6, 2019 at 11:35 pm #

    first thanks for this amazing tutorial!
    Second i have an question how many epochs and time steps are required for 2 lakh dataset

    • Avatar
      Jason Brownlee November 7, 2019 at 6:42 am #

      Perhaps test different configurations and see what works best for your specific dataset?

  53. Avatar
    shriya November 8, 2019 at 12:32 am #

    boolean index did not match indexed array along dimension 0; dimension is 4 but corresponding boolean dimension is 2

  54. Avatar
    yamuna November 12, 2019 at 4:40 pm #

    based on colour object is detecting.How can i avoid this type of situation

    • Avatar
      Jason Brownlee November 13, 2019 at 5:34 am #

      Sorry, I don’t understand your question, can you elaborate please?

  55. Avatar
    Dave November 13, 2019 at 12:23 am #

    Thanks for these tutorials, I’m making good progress on my projects.

    Can I please ask: Is it solely tagged content that contributes to the training/prediction, or is it the whole image?

    If I create a dataset of 100 photos (as an example), and tag the easiest elements (say people) in these photos, will untagged people in these photos work to “untrain” the model? Would I be better off creating a smaller dataset that is more thoroughly tagged, or do untagged elements not matter? Thanks.

    • Avatar
      Jason Brownlee November 13, 2019 at 5:46 am #

      It is the localized object within the image. Both.

      Good question. Test both and compare.

  56. Avatar
    Saurabh November 13, 2019 at 12:24 am #

    Hello Jason,

    Thanks for the interesting technical blog.

    I am looking for “How to train SSD based object detection on the custom dataset?”. Could you please provide a pointer?

    Thanking you!

  57. Avatar
    yamuna November 13, 2019 at 7:18 pm #

    i have done object detection to detect gloves.
    the gloves are white in colour.
    but if the person where white colour shirt then also it is detecting as gloves

    • Avatar
      Jason Brownlee November 14, 2019 at 7:59 am #

      Well done!

      Perhaps expand the training dataset or try data augmentation during training?

      • Avatar
        yamuna November 15, 2019 at 4:58 am #

        i have augmented the images then i need to do annotations separately? or is there any other way?

        • Avatar
          Jason Brownlee November 15, 2019 at 7:56 am #

          You can use augmentation that is “annotation-aware”, e.g. apply augmentation in a consistent way to images and annotations.

          Big labs might have code for this, e.g. facebook. Otherwise, custom code will be required.

  58. Avatar
    bhandavi November 15, 2019 at 4:55 am #

    how to retrain the already trained weights with more images?

    • Avatar
      Jason Brownlee November 15, 2019 at 7:56 am #

      That is exactly what we do in this tutorial.

      • Avatar
        bhandavi November 15, 2019 at 10:27 pm #

        i am asking already trained kangaroo weight file for more kangaroo images.

        Replacing coco file with kangaroo .h5 file?

        • Avatar
          Jason Brownlee November 16, 2019 at 7:24 am #

          Yes, follow this tutorial and adapt the coco weights with your own dataset.

  59. Avatar
    Dave November 15, 2019 at 10:07 am #

    Hi Jason,

    Thanks for a great tutorial. My trained model gives many bbox predictions of different sizes for the same kangaroo, and also for random background objects. This was after training for 2 epochs. After training for further epochs, the losses all flatlined to NaN or 0. Just wondering if you’ve ever experienced this.

    Thanks again,

    • Avatar
      Jason Brownlee November 16, 2019 at 7:16 am #

      Not really.

      Perhaps try fitting the model a few times and compare results?

  60. Avatar
    Aqiff November 19, 2019 at 2:45 pm #

    This is a very great tutorial. For the training, I am stuck with this line model = MaskRCNN(mode=’training’, model_dir=’./’, config = config)
    The error is: ‘NoneType’ object has no attribute ‘lower’. How can I fix this?

  61. Avatar
    yamuna November 21, 2019 at 6:34 pm #

    I am trying to predict hand gloves and spects using mask rcnn. I am facing the following issues:

    1.the people who are not wearing gloves also it is taking as glove.i think it is taking hand structure
    2.It is complety getting baised on colur.where ever it find’s white color it is predicting as gloves.

    Please help me. I have 1000 images as by training .I have done for nearly 50 epochs

    • Avatar
      Jason Brownlee November 22, 2019 at 5:59 am #

      Perhaps include training examples with hands and gloves in the same image to help the model tell the difference?

  62. Avatar
    Mursyideen November 24, 2019 at 7:27 pm #

    Hello there, I am trying to execute this code using my own GPU, however, i have this error
    ResourceExhaustedError: 2 root error(s) found.
    (0) Resource exhausted: OOM when allocating tensor with shape[2,512,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
    [[{{node rpn_model_11/rpn_class_raw/convolution-0-TransposeNHWCToNCHW-LayoutOptimizer}}]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

    (1) Resource exhausted: OOM when allocating tensor with shape[2,512,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
    [[{{node rpn_model_11/rpn_class_raw/convolution-0-TransposeNHWCToNCHW-LayoutOptimizer}}]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

    0 successful operations.
    0 derived errors ignored.

    • Avatar
      Jason Brownlee November 25, 2019 at 6:27 am #

      Sorry, I don’t know about this error, perhaps try posting to stckoverflow?

    • Avatar
      Maged December 4, 2019 at 7:33 am #

      Hey Mursyideen, how did you solve this issue ? 🙂 I am having the same problem

  63. Avatar
    Ozan Veranyurt November 25, 2019 at 6:29 am #

    Hi Jason, I used your tutorial to prepare a Pistol detector. The data is properly loaded and when I try to train the epochs are frozen. It sometimes freezes on images randomly. Here is output

    Epoch 1/5
    25/150 [====>…………………….] – ETA: 3:13 – loss: 3.2542 – rpn_class_loss: 0.0182 – rpn_bbox_loss: 0.6457 – mrcnn_class_loss: 0.5098 – mrcnn_bbox_loss: 0.8810 – mrcnn_mask_loss: 1.1995

    It stops on different images. I checked all images and annexes are oke. I followed the suggestions here : (Made modifications in the under mrcnn )
    My tensorflow is 1.15
    and keras : 2.2.4

    Any suggestions? I am working on different approachs for pistol detection and mrcnn is one of them. It is critical for my thesis. So I will appreciate any suggestions. Maybe a working combination of keras – tensorflow with mrcnn.

    • Avatar
      Jason Brownlee November 25, 2019 at 6:35 am #

      I wonder if you are running out of memory or having a hardware fault?

      Perhaps try running on an AWS EC2 instance?

  64. Avatar
    Yussi Eikelman November 28, 2019 at 7:13 pm #

    Jason Hi,
    I have a set of grayscale images of shape(192,384,3) with none/one/multiple masks in each of size (5,5).
    I’m able to train my model, but unable to receive any result – the tuple from the detect() appears to be empty. In rare cases there is a prediction, which is not good enough.
    Please help, thanks!

    • Avatar
      Jason Brownlee November 29, 2019 at 6:47 am #

      Perhaps the model is not detecting anything on the test images?

      • Avatar
        Yussi Eikelman December 1, 2019 at 1:44 am #

        A different question:

        if the masks = zeros([h, w, len(boxes)], dtype=’uint8′),
        in my case each mask is (h = 5,w = 5, i) and the bounding box, for example, is (5, 5, 10, 10).
        How masks[row_s:row_e, col_s:col_e, i] = 1, where the indexes are not in the original mask range is (5,5), are affected by the bounding box indexes?

        • Avatar
          Jason Brownlee December 1, 2019 at 5:43 am #

          Sorry, I don’t follow your question, are you able to elaborate?

  65. Avatar
    bts December 1, 2019 at 6:25 pm #

    Hello Jason,
    I am running this code on my mac and I get this error the running epoch 1 and the program gets stuck here.

    Epoch 1/5
    /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/keras/utils/ UserWarning: An input could not be retrieved. It could be because a worker has died.We do not have any information on the lost sample.

    Any idea about this?
    Also, should this code be run only on GPU machines?

    • Avatar
      Jason Brownlee December 2, 2019 at 6:00 am #

      I have not seen that before.

      No, the code works fine on the GPU or CPU.

      Perhaps try re-installing your development environment?
      Perhaps try running either on the cpu or gpu?
      Perhaps try posting/searching on stackoverflow?
      Perhaps try running other examples and see if they work on your workstation?

    • Avatar
      Ali April 23, 2020 at 1:06 am #

      I’ve seen this error before and I fixed it by lowering my tensorflow version from 2.1 to 1.12 and by installing the appropriate keras-gpu libraries for that.

      • Avatar
        Jason Brownlee April 23, 2020 at 6:07 am #

        Yes, as stated at the top, the tutorial does not work with tensorflow 2 because the maskrcnn lib has not been updated.

  66. Avatar
    shankar December 2, 2019 at 7:19 am #

    Hi Jason, what an amazing post..well done on your hard work!

    For my application, in addition to the predicted bounding box+mask+class, I also need to extract the last fully-connected layer of the mask_rcnn model (that is, the feature vector representation of the input image).

    In keras, we can save a model’s json and weight files. And then load them again. And extract the output of any intermediate layer as:

    1. model.summary()
    2. feature_extractor = tf.keras.models.Model(inputs=model.input, outputs=model.get_layer(‘avg_pool’).output)
    3. features = feature_extractor.predict(my_image)

    In mask_rcnn, we load the pre-trained model mask_rcnn_coco.h5.. Do you know how we can access and extract the last fully-connected weights?

    My research is stuck because I am unable to complete this step. I shall be grateful if you can guide me (either via email, or on this forum).


    • Avatar
      Jason Brownlee December 2, 2019 at 1:53 pm #


      Great question.

      Hmmm, not off hand, sorry. Some experimentation will be required.

  67. Avatar
    Maged December 4, 2019 at 7:45 am #

    Hey @Jason thank you for a fantastic tutorial. Please keep it up :)!

    Two questions if you may,

    – How can we reduce the batch_size ?
    – How can we reduce the image_dimensions given to the model?

    Both of these are attempts to fix the “..Resource exhausted: OOM when allocating tensor with shape..” error

    • Avatar
      Jason Brownlee December 4, 2019 at 1:56 pm #


      Good question about the batch size, I’m not sure off the cuff. Perhaps check the code for the train() function?

      I believe you have control over the images sizes – so you can define your own fixed size.

      • Avatar
        Maged December 5, 2019 at 9:49 pm #

        Thanks Jason for your reply, Here is how I fixed it by modifying the KangarooConfig class

        class KangarooConfig(Config):

        # define the name of the configuration
        NAME = “kangaroo_cfg”
        # number of classes (background + kangaroo)
        NUM_CLASSES = 1 + 1
        STEPS_PER_EPOCH = 131

        GPU_COUNT = 1
        IMAGES_PER_GPU = 1

        IMAGE_MIN_DIM = 400
        IMAGE_MAX_DIM = 512

        • Avatar
          Jason Brownlee December 6, 2019 at 5:15 am #

          Well done, thanks for sharing!

        • Avatar
          João Granzotti March 12, 2020 at 4:44 am #

          what this parameters means?

          GPU_COUNT = 1
          IMAGES_PER_GPU = 1

          they are the nunber of GPUs that i have and the batch size ? I want to reduce the batch size to.

          Thanks for you help.

        • Avatar
          Taki March 2, 2021 at 4:54 am #

          Hi Maged,

          You can also change the batch size before the train starts like this.



  68. Avatar
    Tim December 8, 2019 at 2:05 am #

    Hey Jason, When I plot the graph “Actual” vs “Predicted”, the “actual” photos all appear so dark. Is there a way to tweak it so it appears similar to the “Predicted Photo” with red boxes on a transparent photo. Thank you,

    • Avatar
      Jason Brownlee December 8, 2019 at 6:16 am #

      Yes, I intentionally darken the photo to highlight the detection.

      You can remove the code to do that. Just plot the photo and use the box to drop a colored rectangle.

      • Avatar
        amine December 21, 2019 at 6:24 am #

        hi trying to train on my dataset , however i get this error when trying to load the data
        help please

        FileNotFoundError Traceback (most recent call last)
        in ()
        1 image_id = 1
        —-> 2 image = train_set.load_image(image_id)
        3 print(image.shape)
        4 # load image mask
        5 mask, class_ids = train_set.load_mask(image_id)

        6 frames
        /usr/local/lib/python3.6/dist-packages/imageio/core/ in _parse_uri(self, uri)
        271 # Reading: check that the file exists (but is allowed a dir)
        272 if not os.path.exists(fn):
        –> 273 raise FileNotFoundError(“No such file: ‘%s'” % fn)
        274 else:
        275 # Writing: check that the directory to write to does exist

        FileNotFoundError: No such file: ‘/content/Mask_RCNN/Amine/imagessacdf21.JPG’

        • Avatar
          Jason Brownlee December 21, 2019 at 7:18 am #

          Looks like the image you are trying to load does not exist on your workstation.

        • Avatar
          J February 12, 2022 at 12:24 am #

          have you dealt with the problem? got the same issue – in, the load_mask doesn not add ‘/’ to the path – which results the /images to be concatenated to the image’s name.

  69. Avatar
    Narottam December 12, 2019 at 9:40 pm #

    Hi Jason, please confirm for mask RCNN model do we need to mask new images also (i.e need to create .xml file) ? If no, then please suggest changes in function ‘def plot_actual_vs_predicted’ for me to get better output the way we got after using ‘display_instances(image, bbox, mask, class_ids, dataset.class_names)’ under evaluate_model function.

    • Avatar
      Jason Brownlee December 13, 2019 at 6:01 am #

      You, you can just work with object boxes – and use them as masks, and prepare the data any way you wish.

  70. Avatar
    Sally Jac December 14, 2019 at 4:32 am #

    Hi Jason, when I am creating the model, I keep getting this error

    /anaconda3/lib/python3.7/site-packages/mask_rcnn-2.1-py3.7.egg/mrcnn/ in detection_targets_graph(proposals, gt_class_ids, gt_boxes, gt_masks, config)
    551 positive_count = int(config.TRAIN_ROIS_PER_IMAGE *
    552 config.ROI_POSITIVE_RATIO)
    –> 553 positive_indices = tf.random_shuffle(positive_indices)[:positive_count]
    554 positive_count = tf.shape(positive_indices)[0]
    555 # Negative ROIs. Add enough to maintain positive:negative ratio.

    AttributeError: module ‘tensorflow’ has no attribute ‘random_shuffle’

    I am unsure of how to debug this. I tried changing random_shuffle to random.shuffle in but it does not work. Or have I downloaded the wrong MaskRCNN? What is the link to download the MaskRCNN? Thank you for your help.

    • Avatar
      Jason Brownlee December 14, 2019 at 6:26 am #

      It looks like you are using tensorflow version 2, and the maskrcnn model requires tensorflow 1.14 or 1.15.

      This is mentioned right at the top of the tutorial.

      • Avatar
        Twayne Jeremy December 15, 2019 at 5:40 pm #

        Hello Jason, your tutorial is really helpful. However, I’ve seen some errors while trying it.

        When I am evaluating the model, I received this error and I am unsure of how to debug this.

        ValueError: shapes (1,1048576) and (1050624,1) not aligned: 1048576 (dim 1) != 1050624 (dim 0)

        Thank you for your help.

        • Avatar
          Jason Brownlee December 16, 2019 at 6:13 am #

          I’m sorry to hear that, I have some suggestions here:

          • Avatar
            Yang-Yin May 6, 2020 at 4:01 pm #

            Hi Jason,

            I really like your detailed tutorial. Excellent work, thanks.

            I am able to run the example without a problem. However, when working with my own images, I got this kind of error when calculating the mAP.

            “ValueError: shapes (7,1048576) and (1104896,1) not aligned: 1048576 (dim 1) != 1104896 (dim 0)”.

            I was able to train the model and make predictions with other images. But I just cannot evaluate the model’s performance in terms of mAP via compute_ap().

            I checked this issue online for some days and didn’t find any solutions. Are you able to show any guidance?

            Thanks very much for your help.

          • Avatar
            Jason Brownlee May 7, 2020 at 6:40 am #

            Perhaps confirm the data was loaded as you expect and that the inputs to the metric are as required by the API?

          • Avatar
            Yang-Yin May 14, 2020 at 9:28 am #

            Hi Jason,

            Thanks for your comments. Yes, some of the input data were not working well for some reason (I will double-check it). I really appreciated your help!

            I have one more question about this model: in addition to calculating the mAP, precision, and recall, how to plot accuracy and loss during training to monitor overfit or determine the number of epochs to stop training?

            Thank you in advance.

          • Avatar
            Jason Brownlee May 14, 2020 at 1:26 pm #

            Well done!

            Good question, I don’t have an example of plotting the history of this specific model. Perhaps investigate the use of tensorboard?

        • Avatar
          Gary May 5, 2020 at 7:07 am #

          Hi Twayne,

          I received a similar message when trying it with my dataset. Have you figured it out? Thanks,

          • Avatar
            Peter January 26, 2021 at 3:11 am #

            I have the same problem. Have you managed to solve it?

  71. Avatar
    Ekrem Fatih Yılmazer December 16, 2019 at 2:18 am #

    I have a data set of liver CT which is grayscale.
    Is it possible for me to apply the same model (also transfer learning) for grayscale images . Since the pretained models are for RGB images, I am curious about whether I can convert them for my application purpose?

    • Avatar
      Jason Brownlee December 16, 2019 at 6:18 am #

      Perhaps try it and compare to fitting a new model from scratch?

  72. Avatar
    Zain December 16, 2019 at 3:44 am #

    Thank you very much for such an informative article.

    I have created a colab notebook which walks through this article and here it is.

  73. Avatar
    Yansen December 18, 2019 at 7:07 am #

    Hi Jason, thanks for the tutorial. Following your instruction I fitted a custom dataset of 200 photos with one label. I got a Train mAP of 0.986 and a Test mAP of 1.000. The detection results are great and even see things I would miss if I do labeling. My question is that: is 1.000 too good to be true?

    • Avatar
      Jason Brownlee December 18, 2019 at 1:26 pm #

      Wow, well done.

      Perhaps think of ways that you could have a misleading result and test them?

      e.g. more/less data? Different measures? Inspect predictions? etc.

  74. Avatar
    amine December 20, 2019 at 8:31 am #

    hi Mr Brownlee
    thanks for this awsome tutorial however when i tried to run it on my datatset (13 images jus for fun) on collab
    i get this message:

    ValueError Traceback (most recent call last)
    in ()
    79 # train set
    80 train_set = KangarooDataset()
    —> 81 train_set.load_dataset(‘kangaroo’, is_train=True)
    82 train_set.prepare()
    83 print(‘Train: %d’ % len(train_set.image_ids))

    in load_dataset(self, dataset_dir, is_train)
    22 continue
    23 # skip all images after 150 if we are building the train set
    —> 24 if is_train and int(image_id) >= 10:
    25 continue
    26 # skip all images before 150 if we are building the test/val set

    ValueError: invalid literal for int() with base 10: ‘sacdf21’
    gratefuly yours

    • Avatar
      Jason Brownlee December 20, 2019 at 1:05 pm #

      Sorry to hear that. Perhaps start with the working tutorial and slowly adapt it to your needs?

      • Avatar
        amine December 21, 2019 at 7:46 am #

        absolutely , the tutorial just did awsome but the crafting part keeps bugging over and over again is there any other way to skip this bug, it’s just the splitting part train/test datasets that does not work, I m running on collab if helps?
        please I m stuck for hours now
        very grateful

        • Avatar
          Jason Brownlee December 21, 2019 at 8:16 am #

          Yes, don’t split into train and test sets or split using your own method that does not use file names.

          • Avatar
            residence les jardins December 22, 2019 at 4:00 am #

            to code does not work without splitting, it considers all data as a unique block, we lose the val dataset?

  75. Avatar
    João Vitor Granzotti Machado December 22, 2019 at 3:05 am #

    Hello, I am a student from Brazil and I am having a problem executing the code. on the line:

    model.train (train_set, test_set, learning_rate = config.LEARNING_RATE, epochs = 5, layers = ‘heads’)

    When running the program I get the following error:

    raise StopIteration ()

    What could be the cause of this ??

    • Avatar
      Jason Brownlee December 22, 2019 at 6:15 am #

      Sorry, I have not seen this error before. I have some suggestions here that might help:

      • Avatar
        João Vitor Granzotti Machado December 24, 2019 at 3:51 am #

        I checked the versions of python, tensorflow and even numpy and they are all correct. The output when executing the code is as follows:

        C:\Users\João Vitor\trabalho>python
        Using TensorFlow backend.
        Train: 131
        Test: 32

        BACKBONE resnet101
        BACKBONE_STRIDES [4, 8, 16, 32, 64]
        BATCH_SIZE 2
        BBOX_STD_DEV [0.1 0.1 0.2 0.2]
        GPU_COUNT 1
        IMAGES_PER_GPU 2
        IMAGE_MAX_DIM 1024
        IMAGE_META_SIZE 14
        IMAGE_MIN_DIM 800
        IMAGE_RESIZE_MODE square
        IMAGE_SHAPE [1024 1024 3]
        LEARNING_RATE 0.001
        LOSS_WEIGHTS {‘rpn_class_loss’: 1.0, ‘rpn_bbox_loss’: 1.0, ‘mrcnn_class_loss’: 1.0, ‘mrcnn_bbox_loss’: 1.0, ‘mrcnn_mask_loss’: 1.0}
        MASK_POOL_SIZE 14
        MASK_SHAPE [28, 28]
        MAX_GT_INSTANCES 100
        MEAN_PIXEL [123.7 116.8 103.9]
        MINI_MASK_SHAPE (56, 56)
        NAME kangaroo_cfg
        NUM_CLASSES 2
        POOL_SIZE 7
        PRE_NMS_LIMIT 6000
        RPN_ANCHOR_RATIOS [0.5, 1, 2]
        RPN_ANCHOR_SCALES (32, 64, 128, 256, 512)
        RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2]
        STEPS_PER_EPOCH 131
        TRAIN_BN False
        USE_MINI_MASK True
        USE_RPN_ROIS True
        WEIGHT_DECAY 0.0001

        WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\ The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

        WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\ The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

        WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\ The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

        WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\ The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

        WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\ The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.

        WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\ calling reduce_max_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
        Instructions for updating:
        keep_dims is deprecated, use keepdims instead
        WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\ calling reduce_sum_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
        Instructions for updating:
        keep_dims is deprecated, use keepdims instead
        WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\ add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
        Instructions for updating:
        Use tf.where in 2.0, which has the same broadcast rule as np.where
        WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\mask_rcnn-2.1-py3.6.egg\mrcnn\ The name tf.random_shuffle is deprecated. Please use tf.random.shuffle instead.

        WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\mask_rcnn-2.1-py3.6.egg\mrcnn\ The name tf.log is deprecated. Please use tf.math.log instead.

        WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\mask_rcnn-2.1-py3.6.egg\mrcnn\ calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
        Instructions for updating:
        box_ind is deprecated, use box_indices instead
        2019-12-23 13:45:34.251579: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2

        Starting at epoch 0. LR=0.001

        Checkpoint Path: ./kangaroo_cfg20191223T1345\mask_rcnn_kangaroo_cfg_{epoch:04d}.h5
        Selecting layers to train
        fpn_c5p5 (Conv2D)
        fpn_c4p4 (Conv2D)
        fpn_c3p3 (Conv2D)
        fpn_c2p2 (Conv2D)
        fpn_p5 (Conv2D)
        fpn_p2 (Conv2D)
        fpn_p3 (Conv2D)
        fpn_p4 (Conv2D)
        In model: rpn_model
        rpn_conv_shared (Conv2D)
        rpn_class_raw (Conv2D)
        rpn_bbox_pred (Conv2D)
        mrcnn_mask_conv1 (TimeDistributed)
        mrcnn_mask_bn1 (TimeDistributed)
        mrcnn_mask_conv2 (TimeDistributed)
        mrcnn_mask_bn2 (TimeDistributed)
        mrcnn_class_conv1 (TimeDistributed)
        mrcnn_class_bn1 (TimeDistributed)
        mrcnn_mask_conv3 (TimeDistributed)
        mrcnn_mask_bn3 (TimeDistributed)
        mrcnn_class_conv2 (TimeDistributed)
        mrcnn_class_bn2 (TimeDistributed)
        mrcnn_mask_conv4 (TimeDistributed)
        mrcnn_mask_bn4 (TimeDistributed)
        mrcnn_bbox_fc (TimeDistributed)
        mrcnn_mask_deconv (TimeDistributed)
        mrcnn_class_logits (TimeDistributed)
        mrcnn_mask (TimeDistributed)
        WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\ The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

        C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\ UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
        “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
        C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\ UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
        “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
        C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\ UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
        “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
        WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\ calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
        Instructions for updating:
        Call initializer instance with the dtype argument instead of passing it to the constructor
        WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\ The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

        WARNING:tensorflow:From C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\ The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

        Epoch 1/5
        Traceback (most recent call last):
        File “”, line 109, in
        model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers=’heads’)
        File “C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\mask_rcnn-2.1-py3.6.egg\mrcnn\”, line 2374, in train
        File “C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\legacy\”, line 87, in wrapper
        return func(*args, **kwargs)
        File “C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\”, line 2065, in fit_generator
        generator_output = next(output_generator)
        File “C:\Users\João Vitor\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\utils\”, line 710, in get
        raise StopIteration()

        I can’t figure out what’s wrong. I am very interested in Mask R-CNN and would like to see it working. Can you help me plz ?? Thank you for your attention and greetings from Brazil

        • Avatar
          Jason Brownlee December 24, 2019 at 6:43 am #

          It looks like you are using tensorflow 2.

          You must use tensorflow 1.15.

          • Avatar
            João Vitor Granzotti Machado December 24, 2019 at 11:32 am #

            The versions of the libraries I am using are these:

            Python: 3.6.8
            Tensorflow: 1.15.0
            Numpy: 1.16.0
            Keras: 2.1.0
            Scipy: 1.4.1

            So I think the error is not related to the libraries version, because everything is in line with the tutorial.

          • Avatar
            Jason Brownlee December 24, 2019 at 4:57 pm #

            I recommend updating to keras 2.2, at least.

          • Avatar
            Thanakorn March 4, 2020 at 4:46 am #

            Hi Jason,

            My project is on GOOGLE COLAB. Even though the version of my libraries are “Tensorflow: 1.15.0” and “Keras: 2.2.5”, it still appears these several lines, so how can fix this out

          • Avatar
            Jason Brownlee March 4, 2020 at 6:01 am #

            Perhaps colab is inappropriate.

          • Avatar
            Anand Nataraj June 5, 2020 at 1:14 am #

            Could you please help us in letting me know where er have to make changes if i wanted to add another label say: Monkey

          • Avatar
            Jason Brownlee June 5, 2020 at 8:17 am #

            Yes, in all the places were we add kangaroo.

  76. Avatar
    amine December 22, 2019 at 3:56 am #

    hi i tried to subset my data to train and val files , the way data is slit in balloon dataset
    here is the error
    FileNotFoundError Traceback (most recent call last)
    in ()
    85 # train set
    86 train_set = KangarooDataset()
    —> 87 train_set.load_dataset(‘Amine’,”train”, is_train=True)
    88 train_set.prepare()
    89 print(‘train: %d’ % len(train_set.image_ids))

    in load_dataset(self, dataset_dir, subset, is_train)
    26 #annotations_dir = dataset_dir + ‘/Amine/’
    27 # find all images
    —> 28 for filename in listdir(images_dir):
    29 # extract image id
    30 image_id = filename[:-4]

    FileNotFoundError: [Errno 2] No such file or directory: ‘Amine/train’

    any way to get out of this bug

    • Avatar
      Jason Brownlee December 22, 2019 at 6:18 am #

      Looks like the data is not in the required location your workstation.

      Perhaps put the data in the same directory as your code, and run the code from the command line.

      • Avatar
        Amine December 23, 2019 at 1:18 am #

        Hi, how could I select the dataset by names rather than I split them by index before and after your breakup point,(150), it seems to me that could be a better fixer for this bug without having to manipulate files?
        What would be your code to change the splitting key?.

        • Avatar
          Jason Brownlee December 23, 2019 at 6:55 am #

          Sorry, I don’t have the capacity to prepare custom code.

          Perhaps focus on Python basics first?

          • Avatar
            amine December 24, 2019 at 4:25 am #

            i figuered out how to split the data to train and val with in each file others sub file (annots and images) to respect your data structure
            her is the result

            train: 8
            test: 4 # seems ok but
            NameError Traceback (most recent call last)
            in ()
            103 mask, class_ids = train_set.load_mask(image_id)
            104 # extract bounding boxes from the masks
            –> 105 bbox = extract_bboxes(mask)
            106 # display image with masks and bounding boxes
            107 display_instances(image, bbox, mask, class_ids, train_set.class_names)

            NameError: name ‘extract_bboxes’ is not defined

  77. Avatar
    amine December 25, 2019 at 7:12 am #

    hi, I tried all clues without any success , I know this would not take few minutes to get solved with a professional like you, all I m asking for is some compassion

  78. Avatar
    raj January 6, 2020 at 9:07 pm #

    (raj) ➜ Mask_RCNN git:(master) ✗ python
    Using TensorFlow backend.
    Train: 131
    Test: 32
    WARNING:tensorflow:From /home/debu/.virtualenvs/raj/lib/python3.6/site-packages/keras/backend/ The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

    WARNING:tensorflow:From /home/debu/.virtualenvs/raj/lib/python3.6/site-packages/keras/backend/ The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

    WARNING:tensorflow:From /home/debu/.virtualenvs/raj/lib/python3.6/site-packages/keras/backend/ The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

    WARNING:tensorflow:From /home/debu/.virtualenvs/raj/lib/python3.6/site-packages/keras/backend/ The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

    WARNING:tensorflow:From /home/debu/.virtualenvs/raj/lib/python3.6/site-packages/keras/backend/ The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.

    WARNING:tensorflow:From /home/debu/raj/Mask_RCNN/mrcnn/ The name tf.log is deprecated. Please use tf.math.log instead.

    WARNING:tensorflow:From /home/debu/raj/Mask_RCNN/mrcnn/ add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.where in 2.0, which has the same broadcast rule as np.where
    WARNING:tensorflow:From /home/debu/raj/Mask_RCNN/mrcnn/ calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
    Instructions for updating:
    box_ind is deprecated, use box_indices instead
    WARNING:tensorflow:From /home/debu/raj/Mask_RCNN/mrcnn/ The name tf.sets.set_intersection is deprecated. Please use tf.sets.intersection instead.

    WARNING:tensorflow:From /home/debu/raj/Mask_RCNN/mrcnn/ The name tf.sparse_tensor_to_dense is deprecated. Please use tf.sparse.to_dense instead.

    WARNING:tensorflow:From /home/debu/raj/Mask_RCNN/mrcnn/ to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.cast instead.
    Traceback (most recent call last):
    File “”, line 150, in
    model.load_weights(model_path, by_name=True)
    File “/home/debu/raj/Mask_RCNN/mrcnn/”, line 2130, in load_weights
    saving.load_weights_from_hdf5_group_by_name(f, layers)
    File “/home/debu/.virtualenvs/raj/lib/python3.6/site-packages/keras/engine/”, line 1018, in load_weights_from_hdf5_group_by_name
    str(weight_values[i].shape) + ‘.’)
    ValueError: Layer #389 (named “mrcnn_bbox_fc”), weight has shape (1024, 8), but the saved weight has shape (1024, 324).
    (raj) ➜ Mask_RCNN git:(master) ✗

    • Avatar
      Jason Brownlee January 7, 2020 at 7:22 am #

      Looks like a problem with your development environment?

      Perhaps confirm TensorFlow 1.15 and Keras 2.2.

    • Avatar
      Bence December 12, 2020 at 4:02 am #

      Hey there,

      Set the number of Classes to match your input class:

      # number of classes (background + kangaroo)
      NUM_CLASSES = 1 + 1

  79. Avatar
    Saurabh January 9, 2020 at 12:08 am #

    Hello Jason,

    First of all Happy New Year 2020 and looking forward for more exciting blogs from you.

    I have one question regarding labelimg tool. As I looked into labelimg tool but there is no way to rotate bounding box. In custom dataset, object is not straight and I can’t rotate images.

    Could you please suggest me any other labeling tool which allows to rotate even bounding box?

    Thanking you,

    • Avatar
      Jason Brownlee January 9, 2020 at 7:26 am #

      Sorry, I don’t have good advice for image annotation tools.

      • Avatar
        Saurabh January 10, 2020 at 12:53 am #

        Thank you!

      • Avatar
        Siddhartha Pachhai January 21, 2020 at 7:10 am #

        Hi, Jason, the best solution I have found for this is Jupyter Innotator, Its very convenient and easy to use.

        *Also: The output of the tool does not resemble the xml file structure that is often used in object detection, but the tool produces enough so that you can generate a xml conversion script in python.

        *I remember having some difficulties installing the package in Mac initially (something to do with icy-widgets, but I think its fixable), hopefully this was mitigated.

    • Avatar
      Suman January 14, 2020 at 7:32 am #

      Hi Saurab,

      May be for object detection you can use labelImg or labelme. For segmentation you can use CVAT tool.

  80. Avatar
    Suman January 14, 2020 at 7:36 am #

    Hi Jason Brownlee,

    Great tutorial for beginners like me, thanks.
    Here the mask-rcnn is saving weights, but i want to save the model along with the weights like‘xxxx.h5) . But this function is not working here. Please reply me as soon as possible.


    • Avatar
      Jason Brownlee January 14, 2020 at 1:46 pm #


      I believe it is using the tensorflow API. Perhaps investigate an appropriate function.

  81. Avatar
    Niall Delany January 15, 2020 at 7:53 pm #

    Thanks for the great tutorial, very helpful in getting started with this kind of work and was able to apply it to a custom dataset.

    My question is, suppose I also have binary mask annotations for each image (png files), how would I load them into the model instead of the xml annotations so that the model prediction is a mask rather than a bounding box?

    • Avatar
      Jason Brownlee January 16, 2020 at 6:12 am #

      You’re welcome.

      Sure, load any custom masks you like.

  82. Avatar
    JJ January 23, 2020 at 10:37 am #

    I’m stuck at “Parse Annotation File” step.
    Where could I type the “tree = ElementTree.parse(filename)”?

    • Avatar
      Jason Brownlee January 23, 2020 at 12:56 pm #

      Sorry to hear that, perhaps try copying the “complete examples” at the end of each section.

  83. Avatar
    Nelli January 23, 2020 at 2:27 pm #

    After training, in the prediction, the displayed image is showing with a bounding box but the label is not there. Please reply to me. Thanks in advance

  84. Avatar
    Samrawit January 26, 2020 at 1:30 pm #

    Does Mask-R-cnn only work in annotated image only, can i use normal image? And which annotation approach (automatic,manual or semi automatic) could gives better results?

    • Avatar
      Jason Brownlee January 27, 2020 at 7:01 am #

      It learns from annotated images.

      It is used on normal images.

      • Avatar
        Samrawit January 27, 2020 at 5:39 pm #

        Thank you. So is their any example on automatically annotating image data-set and how to use them for object detection and mask an object?

        • Avatar
          Jason Brownlee January 28, 2020 at 7:51 am #

          No, I believe it is manual at this stage.

          • Avatar
            Samrawit January 28, 2020 at 7:07 pm #

            Thank you very much! One more question, Is their any example on Mask-R-cnn with out using pre-trained weights?

          • Avatar
            Jason Brownlee January 29, 2020 at 6:31 am #

            I don’t have such an example.

            It makes sense to use pre-trained weights as a starting point for transfer learning.

  85. Avatar
    mahmoud January 29, 2020 at 9:12 am #

    hi Jason,thanks for your illustration
    i run the MaskRCNN on my dataset and it gives me horrible result

    Train mAP: 0.818
    Test mAP: 0.549
    can you advice me why it can result in such this a big difference on the Train and Test set ???
    how i can face this problem.

  86. Avatar
    Rohit January 30, 2020 at 5:12 pm #

    Hi Jason, thank you for this wonderful article.

    I am working on a case where we have multiple labels for each object in an image.
    The task is similar to the one asked in the following problem:

    Could you suggest how to approach to this problem?

  87. Avatar
    Sai Abinesh February 5, 2020 at 8:34 pm #

    Hello Jason,

    Thank you very much for a great tutorial. It’s a great resource for anyone trying to get started with object detection and for people who need to check their configurations.

    I am retraining just the “heads” layer of a resnet101 backbone, on a 3d synthetic dataset generated using Unreal Engine and python. I have 7 object classes + 1 background, and a total of 591 training images and 60 real images for validation.
    Using default training config from the maskrcnn official repo, I suspect there is a case of over-fitting, as the val loss decreases while the training loss decreases.

    Here they are pasted below.

    I also constructed a training curve of my own, by calculating the AP50 (Average Precision at 50% Intersection Over Union) for all the epochs from epoch 1 to epoch 100. It seems like the network is not improving a lot. The curve can be found below.

    How should I adjust my learning rates, weight decays? What kind of heuristics/rules of thumb to use based on the size of the dataset, number of classes etc? My config can be found below.

    class aerial_trains_Config(Config):
    “””Configuration for training on the toy shapes dataset.
    Derives from the base Config class and overrides values specific
    to the toy shapes dataset.
    # Give the configuration a recognizable name
    NAME = “Baldonnell_from_scratch_from9m”

    # Train on 1 GPU and 8 images per GPU. We can put multiple images on each
    # GPU because the images are small. Batch size is 8 (GPUs * images/GPU).
    GPU_COUNT = 1

    # Number of classes (including background)
    NUM_CLASSES = 1 + 7 # background + 80 default classes

    # Use small images for faster training. Set the limits of the small side
    # the large side, and that determines the image shape.
    IMAGE_MIN_DIM = 256
    IMAGE_MAX_DIM = 2048

    # Use smaller anchors because our image and objects are small
    RPN_ANCHOR_SCALES = (64, 128, 256, 512, 1024) # anchor side in pixels

    # Reduce training ROIs per image because the images are small and have
    # few objects. Aim to allow ROI sampling to pick 33% positive ROIs.

    # Use a small epoch since the data is simple

    LEARNING_RATE = 0.001

    # Weight decay regularization
    WEIGHT_DECAY = 0.0001

  88. Avatar
    Ofis Taşıma February 6, 2020 at 6:18 pm #

    Thank you ver much your great article about Object Detection Model with Keras

  89. Avatar
    Ashutosh Srivastava February 7, 2020 at 8:15 pm #

    Hi Jason,

    This is really a great article. I am trying to solve my multi-object detection problem following your approach, i think there will be a need of just a little tweak into this code but i am stuck.

    I have added multiple classes in load_dataset function:
    self.add_class(“dataset”, 1, “list”)
    self.add_class(“dataset”, 2, “Menu”)
    self.add_class(“dataset”, 3, “Home”)

    but here in load_mask function you are appending class_ids statically as 1 “kangaroos”,
    i want to add classes w.r.t objects found.
    Kindly check and help.

    • Avatar
      Ashutosh Srivastava February 7, 2020 at 8:19 pm #

      And as per me, here object detection is implemented but classification is missing as their is only one object. Correct me if i am wrong here.

      • Avatar
        Jason Brownlee February 8, 2020 at 7:08 am #

        I don’t understand, sorry? Can you elaborate?

    • Avatar
      Jason Brownlee February 8, 2020 at 7:08 am #

      Looks fine to be, perhaps test it?

  90. Avatar
    Kavılca February 11, 2020 at 4:06 am #

    Awesome article, thank you for this blog

  91. Avatar
    Saurabh February 11, 2020 at 11:50 pm #

    Hello Jason,

    Could you please share your views on “How to label overlapping objects?” What is the best practice with reference to overlapping objects? The problem is most of the labeling tools don’t support oriented bounding boxes.

    How can I inform my object detector that it should look at only certain part of images without cropping images? Can I edit images and put white/black (constant) color so that object detector will ignore such areas?

    Kindly share your views.

    Thanking you!

    • Avatar
      Jason Brownlee February 12, 2020 at 5:47 am #

      I don’t have specific advice on the topic, sorry.

      • Avatar
        Saurabh February 12, 2020 at 7:24 am #

        Thank you!

  92. Avatar
    Savyasachi February 12, 2020 at 4:26 am #

    Hello Dr. Brownlee!
    I’m running this matterport/mrcnn code on my custom dataset (to detect comic characters). I’m using a total of 6500 images. My training model saturates with a loss of 0.873 (Steps: 2500, Batch: 2, Epoch(at which saturation happens):9th-10th) and it breaks my heart. What are the ways I could tweak my code to lower the loss? (Rest of the config is default)

    Thank you so much!

    • Avatar
      Jason Brownlee February 12, 2020 at 5:54 am #

      Some of the suggestions here might help:

      • Avatar
        Savyasachi February 20, 2020 at 4:35 am #

        Hi again Dr. Brownlee!
        Is there a way to know if my code will perform well/worse in the first epoch (or some time sooner) rather than waiting for 6 long hours to get a loss value?

        Every time I make some changes, I have to run it through the whole cycle till I see the saturation (in loss) after which, I have to manually perform a ‘Keyboard Interrupt’

  93. Avatar
    Phil February 13, 2020 at 11:42 am #

    Hi Jason! This is a great tutorial.This is the exact solution to the problem I’m trying to solve.
    One quick question. My model gets trained fine but it is not creating the checkpoint models at any point during the training or at the end. So I’m basically left with a trained model object.

    I’ve searched my whole system in the case it was cached at some other location. Could not find it though.
    Could you please tell me if you’ve come across this kind of a problem before and how to solve it?

    I am on windows 10
    With keras==2.2.5

    • Avatar
      Jason Brownlee February 13, 2020 at 1:24 pm #

      The models are saved in the current working directory I believe, under a subdirectory for the run.

  94. Avatar
    Juan February 14, 2020 at 11:58 pm #

    Hi Jason, I’m having troubles understandig how to detect multiple classes.

    During the annotation process, do we need to split each class in its own folder?
    e.g. class1/annots and class1/images, class2/annots and class2/images

    I don’t know if this is the good approach, since there might be images where both classes appear.

    It would be great to know how should the folder structure be and the code for the load_dataset function.


    • Avatar
      Jason Brownlee February 15, 2020 at 6:31 am #

      The choice is yours, as long as it is presented consistently to the model during training.

  95. Avatar
    Steven February 18, 2020 at 6:26 pm #

    Hey Jason,

    Thanks, Thanks, Thanks.
    This is the best tutorial i found for keras.
    I had no hard problems to do this.

    You did very well.

  96. Avatar
    Narottam February 18, 2020 at 9:26 pm #

    Hi Jason, i build a single/multi-class classification poc project on different object using your tutorials. Thanks for the neat explanation above.
    Now as a part of complete project I require your’s suggestion on below points:
    1. Ideally in which case model accuracy will be high i.e in single class model or multi-class model (I did single and multi-class on different object) and accuracy on new data seems to be low on traning epoch-100 and learning_rate = 0.0001
    2. What are the different hyper-parameters I can tune apart from the learning_rate and epoch for getting better accuracy using Mask RCNN
    3. I’m working on architecture project, how can i detect the line connecting different object like A———B, how can i detect the line between A & B

    Your help will be very much appreciated !!!

  97. Avatar
    Helmy February 23, 2020 at 5:22 am #

    Hey Jason, is it worth it to pass the images through an edge detector like Sobel, prewitt, canny as a pre-processing step before sending them off to Mask RCNN ?

    In an attempt to make it “Easier” to increase accuracy ? Any literature or references you recommend reading ?

  98. Avatar
    G February 27, 2020 at 3:57 pm #

    Hi! Can you explain what’s going on with:

    pyplot.subplot(330 + 1 + i)

    Why those numbers?

    • Avatar
      Jason Brownlee February 28, 2020 at 5:56 am #

      3 rows, 3 columns and the image number from 1 to 9.

  99. Avatar
    Runist February 28, 2020 at 6:04 pm #

    The code give me a lot of warning such as “Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.”
    May be I should change a better comptuer.But is there a cheap method?

    • Avatar
      Jason Brownlee February 29, 2020 at 7:09 am #

      Perhaps try running on a machine with more RAM, e.g. EC2?

      • Avatar
        Runist February 29, 2020 at 1:03 pm #

        You mean GPU more RAM or CPU?

  100. Avatar
    He March 3, 2020 at 12:29 am #

    Hi Jason, can you kindly create a tutorial to Estimate the Speed of Object in the detected boxes? Or have any reference to such tutorials?

  101. Avatar
    Nourhan March 4, 2020 at 9:21 pm #

    Hello Mr. Jason, thank you for the very beneficial and informative tutorials you are making. I appreciate your great effort. I would like to suggest having a similar tutorial in multiple classes object detection not only a one, if possible. Thanks again.

  102. Avatar
    Dimitrios Politikos March 6, 2020 at 10:13 pm #

    Hi Jason,

    When I’m trying to evaluate the PredictionConfig in a case study with marine litter images

    cfg = PredictionConfig()
    # define the model
    model = MaskRCNN(mode=’inference’, model_dir=’./’, config=cfg)
    # load model weights
    model.load_weights(‘mask_rcnn_train_config_0005.h5’, by_name=True)
    # evaluate model on training dataset
    train_mAP = evaluate_model(train_set, model, cfg)

    I get as a message:

    “re-start from epoch 5 ” and the run stucks there.

    Should I wait or there is a bug in my code?



    • Avatar
      Jason Brownlee March 7, 2020 at 7:17 am #

      No, you can ignore the warning I think.

      • Avatar
        Dimitrios Politikos March 7, 2020 at 8:03 am #

        Thank you for your response. Really appreciated!


  103. Avatar
    Steven March 7, 2020 at 12:48 am #

    Hey Jason,

    I successfully run your prejct on my cpu.
    Now i want to do this on my gpu.

    I installed the latest versions of all libaries.
    TF-gpu : 2.1.0
    keras-gpu: 2.3.1
    cudnn: 7.6.5
    cudatoolkit: 10.1.243

    the problem is, that thorws many Errors like renaming tf.log(x) to tf.math.log(x)…

    the question is:
    1. Can you publish a project for latest libary versions?
    2. Can you say wich libary versions i have to install for using your project on gpu?
    Like I said it works fine in an environment without gpu usage. But with is never happened.

    I hope you can help me.

    • Avatar
      Jason Brownlee March 7, 2020 at 7:19 am #

      The example will not work with TensorFlow 2 because the Mask RCNN library has not yet been updated to support it.

      • Avatar
        Steven March 7, 2020 at 12:52 pm #

        could you give me please an example for settings?

        wich version do you use, or wich are available.
        Stucked on it for 2 days now…

        • Avatar
          Jason Brownlee March 8, 2020 at 6:01 am #

          Yes, I mention this at the top of the page.

          You can use TensorFlow 1.14 or 1.15.

          • Avatar
            Steven March 11, 2020 at 2:56 am #

            I tried but dont works how it should.

            I got now:
            tf-gpu: 1.14
            keras: 2.2.5
            cuda: 10.0

            The script run until Epoch 1/20:
            Image 1/100 [………]

            an dit doesnt make progress.

            Can u give me please your versions of theese 4 things, to get it work?

          • Avatar
            Jason Brownlee March 11, 2020 at 5:28 am #

            Perhaps there is something going no with your workstation.

            Perhaps try running other code to confirm your libraries can fit a basic model.
            Perhaps try running the code on another machine to confirm you have everything you need?

  104. Avatar
    jackson March 12, 2020 at 2:43 am #

    Hello Jason,

    I have a question!
    Why is it that when training the model, the loss for the classification output on the train set is usually lower than that of the validation datasets (e.g. mrcnn_class_loss and val_mrcnn_class_loss), as well as why is the loss for the bounding box output for the train lower than that of the validation datasets (mrcnn_bbox_loss and val_mrcnn_bbox_loss)?

    Thank you.

  105. Avatar
    Steve March 16, 2020 at 7:46 pm #

    Hey Jason,
    is it possible that my training on gpu (8GB gpu) not work because the net is to big for this problem?
    is tried to use resnet50 but ist got Allocation problems.

    How many gpu memory do you have?

  106. Avatar
    Steven March 16, 2020 at 10:50 pm #

    Hey Jason,

    i’m not sure if i can train pictures with the size 1024×1024 with resnet50 on my gpu

    I ‘ve got a GeForce RTX 2070 and i can’t run it on gpu.
    I don’t have any Exceptions. The commandpromt just hanging.
    Over a monitor for gpu i see that it want to use all, but i think it’s not enough.

    Can you help me?

    Another question is, how can i manipulate the resnet50 to a smaller net (if it’s the solution for my problem)

    • Avatar
      Jason Brownlee March 17, 2020 at 8:16 am #

      Might be too large. Perhaps try smaller images first.

      • Avatar
        Steven March 18, 2020 at 2:39 am #

        I tried to use 64×64 images but it still not works.
        I also tried just one picture per epoch.

        The problem occured fot others too.

        Now i’m wondering if its a problem of the generator.

        I dont think it’s a problem of storage of gpu because the Script stucks without any errors.

        Can u pls help with some advices?

  107. Avatar
    chiraz March 18, 2020 at 11:52 pm #

    Hello Jason

    I am working with Faster rcnn for defects detection and i would like that you help me how to detect objects from scratch with my own dataset with of course a pretrained cnn like vgg16 or resnet. How to prepare the data and insert it in Jupyter notebook or even in anaconda virtual environment. I will be very thankful


  108. Avatar
    Thanakorn March 20, 2020 at 12:36 am #

    Hi Jason,

    I would like to ask you that how can I add the name of the label with the AP at the top left of the rectangle ??

    • Avatar
      Jason Brownlee March 20, 2020 at 8:46 am #

      You can draw text directly onto the image. Perhaps review the pillow API or the matplitlib API.

  109. Avatar
    João Vitor Granzotti Machado March 24, 2020 at 2:55 am #

    Hi Jason, I’m trying to make a traffic light detector, I have a very large dataset of images known as DTLD and I would like to use it in this tutorial.
    The images of the dataseet have dimensions 2048X1024 and the objects to be detected are very small. When performing the training and validation for the first time, the result obtained was very bad. I imagine it is due to the resizing performed on the images.
    If I change the IMAGE_RESIZE_MODE parameter from “square” to “none” can I continue using transfer learning normally? Or would it be necessary to train the network from scratch?

    In the file the following information is provided, however I don’t know if I can change this parameter according to my will.

    # Input image resizing
    # Generally, use the “square” resizing mode for training and predicting
    # and it should work well in most cases. In this mode, images are scaled
    # up such that the small side is = IMAGE_MIN_DIM, but ensuring that the
    # scaling doesn’t make the long side> IMAGE_MAX_DIM. Then the image is
    # padded with zeros to make it a square so multiple images can be put
    # in one batch.
    # Available resizing modes:
    # none: No resizing or padding. Return the image unchanged.
    # square: Resize and pad with zeros to get a square image
    # of size [max_dim, max_dim].
    # pad64: Pads width and height with zeros to make them multiples of 64.
    # If IMAGE_MIN_DIM or IMAGE_MIN_SCALE are not None, then it scales
    # up before padding. IMAGE_MAX_DIM is ignored in this mode.
    # The multiple of 64 is needed to ensure smooth scaling of feature
    # maps up and down the 6 levels of the FPN pyramid (2 ** 6 = 64).
    # crop: Picks random crops from the image. First, scales the image based
    # on IMAGE_MIN_DIM and IMAGE_MIN_SCALE, then picks a random crop of
    # size IMAGE_MIN_DIM x IMAGE_MIN_DIM. Can be used in training only.
    # IMAGE_MAX_DIM is not used in this mode.
    IMAGE_RESIZE_MODE = “square”
    IMAGE_MIN_DIM = 800
    IMAGE_MAX_DIM = 1024

    • Avatar
      Jason Brownlee March 24, 2020 at 6:08 am #

      I wonder if you can use smaller images.

      It might be worth looking in the literature for models that are appropriate for this specific problem or detecting small objects generally.

      • Avatar
        João Vitor Granzotti Machado March 24, 2020 at 9:38 am #

        In this case an interesting processing would be to change the size of the 2048×1024 to 2048×512 images, cutting the lower half of the image, as it is a known fact that there are no traffic lights below the horizon line.
        Using the default values ​​for maximum and minimum size of images (IMAGE_MIN_DIM = 800, IMAGE_MAX_DIM = 1024) I didn’t get a good result, I was wondering if it would be possible to increase the values ​​IMAGE_MIN_DIM and IMAGE_MAX_DIM and continue using transfer learning.

        • Avatar
          Jason Brownlee March 24, 2020 at 1:44 pm #

          Good question, perhaps try it and compare results?

          • Avatar
            João Vitor Granzotti Machado March 25, 2020 at 3:27 am #

            anging the values of ​​IMAGE_MIN_DIM and IMAGE_MAX_DIM I get the following error:
            OSError: [Errno 12] Cannot allocate memory
            I’m running the code on Google Colab, as I don’t have the processing power necessary to train the base in a reasonable time on my computer.
            Therefore, there are two possibilities for this error, either it is related to excess size of the images or it is not possible to carry out transfer learning by changing the mentioned parameters.

          • Avatar
            Jason Brownlee March 25, 2020 at 6:39 am #

            Perhaps try and AWS EC2 with more memory, say 64GB?

      • Avatar
        João Vitor Granzotti Machado March 24, 2020 at 10:11 am #

        The fact that my images are in BGR (openCV) format and not in RGB format may be sabotaging my training

        • Avatar
          Jason Brownlee March 24, 2020 at 1:44 pm #

          Perhaps you can convert some and see if it makes a difference?

  110. Avatar
    Prashanth Mariappan March 28, 2020 at 2:50 am #

    Hey this is a great tutorial it is very helpful could you please tell what are all the changes required if we want to train multiple classes. I tried on my own iam getting some errors in load_mask() function

    • Avatar
      Jason Brownlee March 28, 2020 at 6:26 am #

      Very few changes, just to the definition of the model – e.g. how the dataset is loaded and classes are defined.

      • Avatar
        Prashanth Mariappan March 28, 2020 at 3:14 pm #

        I’ve made the cahnges . But I am getting the following error in the part where we check out data set with masks. Please help me out

        AssertionError Traceback (most recent call last)
        in ()
        105 bbox = extract_bboxes(mask)
        106 # display image with masks and bounding boxes
        –> 107 display_instances(image, bbox, mask, class_ids, train_set.class_names)

        /content/drive/My Drive/masked rcnn/Mask_RCNN/mrcnn/ in display_instances(image, boxes, masks, class_ids, class_names, scores, title, figsize, ax, show_mask, show_bbox, colors, captions)
        103 print(“\n*** No instances to display *** \n”)
        104 else:
        –> 105 assert boxes.shape[0] == masks.shape[-1] == class_ids.shape[0]
        107 # If no axis is passed, create one and automatically call show()


        In load data set i have added the 2nd class using self.add class and also in load_masks funcion I have added it to class_ids. What else should Idp

        • Avatar
          Jason Brownlee March 29, 2020 at 5:49 am #

          Sorry, I don’t know the cause of your fault. Perhaps try posting your code and issue to stackoveflow?

  111. Avatar
    Rajesh March 30, 2020 at 6:14 pm #

    Can we use other weights for training. If so where can we download it.

  112. Avatar
    Dmitry Kroytor April 6, 2020 at 12:10 am #

    Thanks for the tutorial!

    I tried to run this on google colab and i had this erorr:
    [ module ‘tensorflow’ has no attribute ‘placeholder’ ]

    finaly solve it by caling this lines before everything:
    %tensorflow_version 1.x
    import tensorflow

  113. Avatar
    kaka April 6, 2020 at 10:07 pm #

    Hi Great tutorial, How about the images that no Kangaroo? it seems that all the images have Kangaroo. How to set the model for images don’t have Kangaroo or have in train data and test data?

    • Avatar
      Jason Brownlee April 7, 2020 at 5:49 am #

      Good question. The model could include some images with no objects during training.

      • Avatar
        kaka April 7, 2020 at 10:49 am #

        thanks for your answer, How to input the data that have no kangaroo? I mean that the xml and images file that have no kangaroo to be the model input? Could you give details how to deal with this. Thanks in advance!

        • Avatar
          Jason Brownlee April 7, 2020 at 1:29 pm #

          Not sure, I have not done it. Perhaps try experimenting.

  114. Avatar
    Teh April 6, 2020 at 11:19 pm #

    Hi, may i know how i want to save the model and how is the code to do the prediction on the new images?

  115. Avatar
    Arsalan April 7, 2020 at 4:16 am #

    Sir, i didn’t understand the ranges of rows and columns you’ve set for creating masks , can you kindly explain it?

  116. Avatar
    kaka April 7, 2020 at 6:08 pm #

    Hi How to set use one GPU in the code?

    • Avatar
      Jason Brownlee April 8, 2020 at 7:48 am #

      Configure tensorflow on your workstation to use GPU, then the example will run in the GPU.

      • Avatar
        kaka April 8, 2020 at 5:19 pm #

        thanks for answer, I mean how to revise the code to use one GPU? I mainly use the GCP GPU.

        • Avatar
          Jason Brownlee April 9, 2020 at 7:57 am #

          No change to the code, only a change to your tensorflow library.

  117. Avatar
    kaka April 7, 2020 at 9:27 pm #

    Hi If my image id is not int type, how to change the code of load_image?

    • Avatar
      Jason Brownlee April 8, 2020 at 7:51 am #

      Sorry, I don’t have the capacity to help you customize the code.

  118. Avatar
    zenith April 10, 2020 at 4:10 pm #

    Hi I am building a model for image recognition.
    The model should able to identify which image is provided by user.
    I have two (2) sets of images.
    Passport images and Driving Liscence images.
    I am building a model using these images.
    I am having only 119 images of passport for train.
    I am training on passport images
    After completion of model when i test the model it gives more probability on Driving liscence images than on passport images.
    Whaty can the issue will be?
    How i do it with adding a bounding Box for on training images

    • Avatar
      Jason Brownlee April 11, 2020 at 6:08 am #

      You can prepare the data with bounding boxes defined and the model will lean how to localize the items in new photos.

  119. Avatar
    zenith April 10, 2020 at 4:12 pm #

    i have a model for image recognition.
    i am using passport images for training.
    when i test it using liscence images it gives more probability on liscence images.
    What can be the issue will be?

  120. Avatar
    anish jain April 13, 2020 at 10:39 pm #

    Starting at epoch 0. LR=0.001

    Checkpoint Path: ./content/kangaroo_cfg20200413T1223/mask_rcnn_kangaroo_cfg_{epoch:04d}.h5
    Selecting layers to train
    fpn_c5p5 (Conv2D)
    fpn_c4p4 (Conv2D)
    fpn_c3p3 (Conv2D)
    fpn_c2p2 (Conv2D)
    fpn_p5 (Conv2D)
    fpn_p2 (Conv2D)
    fpn_p3 (Conv2D)
    fpn_p4 (Conv2D)
    In model: rpn_model
    rpn_conv_shared (Conv2D)
    rpn_class_raw (Conv2D)
    rpn_bbox_pred (Conv2D)
    mrcnn_mask_conv1 (TimeDistributed)
    mrcnn_mask_bn1 (TimeDistributed)
    mrcnn_mask_conv2 (TimeDistributed)
    mrcnn_mask_bn2 (TimeDistributed)
    mrcnn_class_conv1 (TimeDistributed)
    mrcnn_class_bn1 (TimeDistributed)
    mrcnn_mask_conv3 (TimeDistributed)
    mrcnn_mask_bn3 (TimeDistributed)
    mrcnn_class_conv2 (TimeDistributed)
    mrcnn_class_bn2 (TimeDistributed)
    mrcnn_mask_conv4 (TimeDistributed)
    mrcnn_mask_bn4 (TimeDistributed)
    mrcnn_bbox_fc (TimeDistributed)
    mrcnn_mask_deconv (TimeDistributed)
    mrcnn_class_logits (TimeDistributed)
    mrcnn_mask (TimeDistributed)
    AttributeError Traceback (most recent call last)
    in ()
    —-> 1 model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers=’heads’)

    1 frames
    /content/Mask_RCNN/mrcnn/ in compile(self, learning_rate, momentum)
    2197 tf.reduce_mean(layer.output, keepdims=True)
    2198 * self.config.LOSS_WEIGHTS.get(name, 1.))
    -> 2199 self.keras_model.metrics_tensors.append(loss)
    2201 def set_trainable(self, layer_regex, keras_model=None, indent=0, verbose=1):

    AttributeError: ‘Model’ object has no attribute ‘metrics_tensors’

    this error comes when i train model
    tensorflow 1.15
    in google colab

    • Avatar
      Jason Brownlee April 14, 2020 at 6:18 am #

      Perhaps try running on your workstation or on ec2. Perhaps colab is the issue?

      • Avatar
        Akash Kumar April 24, 2020 at 7:54 am #

        Collab has no issue. I have trained and achieved results using tensorflow 1.15.0 and Keras 2.2.4. However, I want to detect in the video after training on images. How should I achieve it?

        • Avatar
          Jason Brownlee April 24, 2020 at 8:03 am #

          Perhaps you can extract frames of the video and pass them to your model?

  121. Avatar
    Williana April 16, 2020 at 2:54 am #

    Hi, I’m a Brazilian student!
    I am replicating your tutorial for my own dataset. I’m also using Mask-RCNN for object detection only. During training, only two metrics are presented: loss and val_loss. The metrics you talked about (e.g. mrcnn_class_loss and val_mrcnn_class_loss, mrcnn_bbox_loss and val_mrcnn_bbox_loss), are not displayed during the training, do you know why this happens? I’m using verbose = 1.

    • Avatar
      Jason Brownlee April 16, 2020 at 6:04 am #

      Well done.

      I don’t know why there is a difference.

  122. Avatar
    Ghafour April 17, 2020 at 11:24 am #

    When I execute the above complete code the error “module ‘dask.dataframe’ has no attribute ‘Series'” is taken. and I can not solve the problem
    What happened

  123. Avatar
    Ghafour April 18, 2020 at 9:02 pm #

    dask is true. Sorry about my mistake

  124. Avatar
    saipavankumar April 23, 2020 at 2:38 am #

    Hi Jason, I have searched for text detection and recognition on your blog and I haven’t found anything and can I use RCNN for text detection and what about faster RCNN.

    • Avatar
      Jason Brownlee April 23, 2020 at 6:11 am #

      I don’t think have tutorials on that topic, sorry.

  125. Avatar
    Lam Thanh April 23, 2020 at 6:46 am #

    Background class detection

    Hi when I input an image without Kangaroo, the model outputs y_hat as empty arrays, I think it should be 0’s (no kangaroo then it should see as background class).


    >> array([], dtype=int32)

    is it true that I’m supposed to have an image dataset without kangaroo so that the model can learn to detect background class?

    Thank you a lot,
    Look forward to your response,

    • Avatar
      Jason Brownlee April 23, 2020 at 7:45 am #

      You could change the model to operate that way if you wish.

      • Avatar
        Lam Thanh April 23, 2020 at 9:14 pm #

        hi, I’m thinking some solutions to applying in that way,

        1. Add a background dataset
        2. Change the source code

        I hope you could please tell me which direction is fine.
        Thank you a lot

        • Avatar
          Jason Brownlee April 24, 2020 at 5:41 am #

          Perhaps explore both and see what works/makes sense?

  126. Avatar
    Akash Kumar April 24, 2020 at 7:51 am #

    I have trained the model on images and want to test on videos and lebel objects in video. Any suggestions and links are appreciated.

    • Avatar
      Jason Brownlee April 24, 2020 at 8:03 am #

      Perhaps you can extract frames of the video and pass them to your model?

  127. Avatar
    Lorenzo Gabrielli April 24, 2020 at 6:19 pm #

    I’m trying your code on Colab, to use GPU, but when the train starts it says that I’m not using GPU.

    Do you know if I have to run something different or it is simply a system problem?

    Thanks a lot.

  128. Avatar
    shradha April 24, 2020 at 8:41 pm #

    Hi, I am trying to use my custom images for training with 1 class. When the object isnt there in any image, then I do not generate the xml. Due to this training ‘model.train’ is throwing error ‘No such file or directory’ How to handle this situation.
    (Particularly i m trying to solve the kaggle competition for table detection in document images).

  129. Avatar
    Ketil April 28, 2020 at 12:16 am #

    So… did this post copy this one:

    Or did she copy you?

    Not that it matters much, but I can’t find any attribution in either piece, which would be normal courtesy. Credit where it’s due.

    • Avatar
      Jason Brownlee April 28, 2020 at 6:47 am #

      They copied me, check the publication dates.

      I get ripped off every day. It sucks.

  130. Avatar
    hard May 9, 2020 at 2:55 pm #

    hi jason, can this code applied for another object like bloods or single object and work from anaconda envt

    • Avatar
      Jason Brownlee May 10, 2020 at 5:55 am #

      Maybe. Perhaps prototype a model on your dataset and see how it goes. Also, perhaps check the literature for other solutions to the type of problem you are working on and see what types of models they use.

  131. Avatar
    Williana May 10, 2020 at 2:16 pm #

    Could you please explain to me how to calculate mAR from the “utils.compute_recall” function, I understand that it returns the AR, but how should I calculate the mAR? Please help me!!

    • Avatar
      Jason Brownlee May 10, 2020 at 4:10 pm #

      Sorry, I don’t have an example. Thanks for the suggestion!

  132. Avatar
    tunnn May 17, 2020 at 8:13 pm #

    hi jason
    i got a broble when i try to run this

    from mrcnn.model import MaskRCNN

    output is

    ModuleNotFoundError: No module named ‘keras’

    how to fix it?

  133. Avatar
    Ellie May 19, 2020 at 6:05 pm #

    Hey Jason, thank you for the tutorial! I have two questions:

    1) I’m getting a terrible train and test mAP. Is the only way to improve this via more data, or do you have any other ideas? (Dataset includes 95 photos for training, and 26 photos for testing)
    Train mAP: 0.423
    Test mAP: 0.546

    2) How could I use this for video multi-object recognition/tracking? Should I just run the video frame by frame? Is there a way I can use my segmentation data to show objects moving across the screen, for example?


    • Avatar
      Ellie May 19, 2020 at 6:31 pm #

      Sorry one more question: In relation to the question above about using this for a frame by frame video, I’m wondering if you have any tutorials or ideas on doing a “total object count” for a video. For example, if you were tracking kangaroos across the screen in a video, how could you assign a unique identifier to a newly recognized kangaroo and then report the total number of kangaroos in the video at the end, even if some appeared and left during the video?


      • Avatar
        Jason Brownlee May 20, 2020 at 6:22 am #

        Sorry, I do not have tutorials on video or object counts.

    • Avatar
      Jason Brownlee May 20, 2020 at 6:21 am #

      Perhaps try data augmentation?
      Perhaps try changing the model?
      Perhaps try changing learning parameters?

      For video, perhaps try applying the model to each frame or a subset of frames?

  134. Avatar
    Baran May 22, 2020 at 9:31 am #

    Hi Jason,

    Thanks for the amazing tutorial. I’ve got some brief questions.

    Firstly, just to confirm, the masks are passed to the model in the form of an array of shape (H, W, num_masks), correct? This appears to be what’s going on in the load_masks method.

    Secondly, I can’t quite identify where the sizes of the training images comes into play. For example, you haven’t specified a specific image size that the model should expect. So, does the model expect a particular input size (i.e. H x W x num. channels) – if so, what is it?


    • Avatar
      Jason Brownlee May 22, 2020 at 1:20 pm #

      You can plot the image with the mask to confirm they are as you expect. I show this in the tutorial.

      Good question. From memory, I believe the model expects a fixed sized images and the library around it handles image resizing.

  135. Avatar
    Remi May 24, 2020 at 1:33 pm #

    Awesome tutorial ! Thank you.

  136. Avatar
    kevinn May 25, 2020 at 11:38 pm #

    Wonderful Works! Thank You!

  137. Avatar
    utkarsh May 27, 2020 at 6:47 pm #

    hi jason ,
    i have trained my model successfully but it is making many masks more then i expct how to solve this can u please tell me

    • Avatar
      Jason Brownlee May 28, 2020 at 6:12 am #

      You will have to debug your code to discover the answer.

  138. Avatar
    Ardhika Nofardiansa June 5, 2020 at 4:29 am #

    hi Jason ,
    i have trained my model successfully based by your tutorial(My model is for motorbike detection). And then, how i can get the output file of this trained? Are .h is the output? I mean, i want to just call the output trained model if i want use to other source code for motorbike detection. So, i don’t need to train it from the beginning again if i want detect the motorbike.

    Thank you.

  139. Avatar
    Anand Nataraj June 5, 2020 at 6:36 am #

    What so ever I do i’m getting the below error while implementing multiclass:

    IndexError: boolean index did not match indexed array along dimension 0; dimension is 2 but corresponding boolean dimension is 1

    I would request a kind favour from the author to help me in implenting the multiclass object detection.

  140. Avatar
    Donald June 5, 2020 at 8:37 am #

    Hi Jason, thank you for the tutorial. I’d like to ask something about anchor boxes. If i have anchor_box_scales = [32, 64, 128], what does these values mean exactly? Are they square pixels (area), or are they scalar values? If i have very small objects that range between 20×20 pixels to 40×40 what should i put as values? Can I only put two? I would love some guidance and insight if possible, of course.
    Thank you again.

    • Avatar
      Jason Brownlee June 5, 2020 at 1:40 pm #

      The average sizes of objects in the dataset used to train the model, I believe.

      You can try smaller boxes and see if it makes a difference for your dataset.

  141. Avatar
    Anand June 5, 2020 at 1:48 pm #

    Is Mask-RCNN better than retinaNet? May I know what is the best of all models available for object detection?

    • Avatar
      Jason Brownlee June 6, 2020 at 7:40 am #

      It may depend on the specifics of your dataset.

      Perhaps test a suite of techniques on your dataset and discover which best meets your needs.

  142. Avatar
    Reki Dian June 6, 2020 at 6:00 pm #

    Hi jason,

    In this tutorial, use 32 last file for test, right? But, How should i do if i want to get randoms data test? so the data test is not the 32 last file but random file. Thank You

    • Avatar
      Jason Brownlee June 7, 2020 at 6:20 am #

      Sorry, I don’t understand. Can you please rephrase or elaborate your question?

  143. Avatar
    Anand Nataraj June 7, 2020 at 12:47 am #

    Is there any way to print the Train and Validation accuracy in the callback?

    • Avatar
      Jason Brownlee June 7, 2020 at 6:28 am #

      Probably not a good idea to print from a callback, but perhaps try it directly and see.

  144. Avatar
    Siddhant K. Sancheti June 7, 2020 at 5:35 am #

    Hello Jason,
    Thanks a lot for such a great tutorial.
    Firstly, how long does it take to calculate mAP scores?? its been half an hour its still processing. i think I am in a loop!!
    I just had a doubt. Why do u need to use the scaled image for evaluation during prediction as per your code?
    def evaluate_model(dataset, model, cfg):
    APs = list()
    for image_id in dataset.image_ids:
    # load image, bounding boxes and masks for the image id
    image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)
    # convert pixel values (e.g. center)
    scaled_image = mold_image(image, cfg)
    # convert image into one sample
    sample = expand_dims(scaled_image, 0)
    # make prediction
    yhat = model.detect(sample, verbose=0)
    # extract results for first sample
    r = yhat[0]
    # calculate statistics, including AP
    AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r[“rois”], r[“class_ids”], r[“scores”], r[‘masks’])
    # store
    # calculate the mean AP across all images
    mAP = mean(APs)
    return mAP

    Also, I didn’t get your following statements:

    1.the pixel values of the loaded image must be scaled in the same way as was performed on the training data, e.g. centered. This can be achieved using the mold_image() convenience function.

    2. The dimensions of the image then need to be expanded one sample in a dataset and used as input to make a prediction with the model.

    Thanks in advance!!!

    • Avatar
      Jason Brownlee June 7, 2020 at 6:30 am #

      That sounds too long. Perhaps try running on a faster machine or double check your code.

      Yes, any data prep applied to the training data must be also be applied to new data, like test data. This often means scaling the pixels in the same way.

      Yes, the model expects one or more samples as input, in this case images. We need to ensure the input has appropriate dimension to meet the expectations of the model.

  145. Avatar
    Siddhant Sancheti June 7, 2020 at 1:33 pm #

    Hello Jason,
    How can I improve my mAP scores? I’ve been getting scores as follows
    Train mAP: 0.760
    Test mAP: 0.657
    don’t know why such low score b’cause prediction its predicting each and every defined object i.e. gun, knife, and sword in my case very accurately.

    Thanks in advance!!

    Also, I guess something with this particular webpage. This page is working too slowly and getting lagged while other pages of machinelearningmastery or other websites are working perfectly fine

  146. Avatar
    Ali June 11, 2020 at 1:03 am #

    Hello. I Have this error. I dont know how to solve it:

    Traceback (most recent call last):

    File “”, line 47, in
    train_mAP = evaluate_model(train_set, model, cfg)

    File “”, line 32, in evaluate_model
    AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r[“rois”], r[“class_ids”], r[“scores”], r[‘masks’])

    File “C:\Users\lenovo\Anaconda3\envs\tensorflow\lib\site-packages\mrcnn\”, line 739, in compute_ap

    File “C:\Users\lenovo\Anaconda3\envs\tensorflow\lib\site-packages\mrcnn\”, line 691, in compute_matches
    overlaps = compute_overlaps_masks(pred_masks, gt_masks)

    File “C:\Users\lenovo\Anaconda3\envs\tensorflow\lib\site-packages\mrcnn\”, line 107, in compute_overlaps_masks
    masks1 = np.reshape(masks1 > .5, (-1, masks1.shape[-1])).astype(np.float32)

    File “C:\Users\lenovo\Anaconda3\envs\tensorflow\lib\site-packages\numpy\core\”, line 257, in reshape
    return _wrapfunc(a, ‘reshape’, newshape, order=order)

    File “C:\Users\lenovo\Anaconda3\envs\tensorflow\lib\site-packages\numpy\core\”, line 52, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)

    ValueError: cannot reshape array of size 0 into shape (0)

    can you help please

  147. Avatar
    Shrey June 22, 2020 at 11:16 pm #

    Hello Jason,

    First of all, wanted so thank you for this tutorial. So far, it has been really thorough, especially in helping a ML beginner like me to grasp relatively complex ideas.

    I have been following the tutorial step-by-step so far with code on Google Colab and python v 3.6 installed as well. I am using this to work on a school project that uses object detection to recognize traffic lights in images.

    When train the model, it gives me an error that others have experienced before i.e. AttributeError: ‘Model’ object has no attribute ‘metrics_tensors’

    Not exactly sure what is incorrect here, have looked at the file and there only seems to be that single instance of metrics_tensors when we add metrics to the losses. Were you or the others (who might have faced similar errors) able to identify the source of the error?


    • Avatar
      Jason Brownlee June 23, 2020 at 6:26 am #

      Well done on your progress.

      Perhaps there is a library version problem with your environment. Maybe try and run the example locally instead?

  148. Avatar
    Samek July 7, 2020 at 7:16 pm #

    I am geting this error while trying

    model = MaskRCNN(mode=’training’, model_dir=’./’, config=config)


    The following Variables were created within a Lambda layer (anchors)
    but are not tracked by said layer:

    The layer cannot safely ensure proper Variable reuse across multiple
    calls, and consquently this behavior is disallowed for safety. Lambda
    layers are not well suited to stateful computation; instead, writing a
    subclassed Layer is the recommend way to define layers with

  149. Avatar
    Anand July 10, 2020 at 2:51 am #

    mAP is always coming 1.0 and in some cases it exceeded one (1.00000876). Could you please suggest what would be the suitable cause?

    • Avatar
      Jason Brownlee July 10, 2020 at 6:05 am #

      Not sure off the cuff, perhaps try experimenting with the model and specific inputs.

  150. Avatar
    SaiManikanta Vuppala July 17, 2020 at 7:23 pm #

    I have data that consists of images and their corresponding annotation files. I have to detect two classes. I intend to use my own neural network. Can you explain to me how to load data into the network.

    • Avatar
      Jason Brownlee July 18, 2020 at 6:00 am #

      You may need to write custom code to load your dataset.

  151. Avatar
    SaiManikanta Vuppala July 19, 2020 at 4:27 am #

    could you please suggest me an article to follow

    • Avatar
      Jason Brownlee July 19, 2020 at 6:35 am #

      Yes, the above tutorial shows how to load a custom dataset, perhaps you can adapt it to load your custom dataset.

  152. Avatar
    Rishikesh Pathak July 19, 2020 at 10:43 pm #

    Great tutorial..!
    I used this to make a multi-class model (face mask and without mask) I trained it and got the output.
    But it is not differentiating Between both classes as this code is for single class..
    How can I get it to classify the classes seperately to detect faces without

    • Avatar
      Jason Brownlee July 20, 2020 at 6:13 am #

      Well done!

      Good question, you can prepare and load the data to have two classes instead of one. Note the location where we define the classes when loading and defining the dataset.

      • Avatar
        Rishikesh Pathak July 20, 2020 at 5:45 pm #

        Yes, I did that. I updated the code for 2 classes and trained it. but for the predictions(output) how to differentiate between both classes. its showing boxes on all faces(with mask and without) I want to know which face is without mask or with a mask.
        So that I can write a script to detect face mask in a photograph.

        • Avatar
          Jason Brownlee July 21, 2020 at 5:55 am #

          Nice work.

          Yes, the model output will indicate the box and the label.

          • Avatar
            Rishikesh Pathak July 22, 2020 at 2:52 am #

            Yes, it’s working, thank you so much. Please keep up the good work. this tutorial helped a lot.

          • Avatar
            Jason Brownlee July 22, 2020 at 5:43 am #

            Well done!

    • Avatar
      Jonas June 7, 2022 at 4:12 pm #

      Hi i am also trying to build a multi class model(Bicycle and car) but I cant seem to get it to work, what changes did you make to the code? my epoch just run forever without exiting

  153. Avatar
    Narottam Saini August 3, 2020 at 3:50 pm #

    Hi Jason, I’m facing issue while trying to run the Mask-RCNN over the google-COLAB environment where first epoch run not getting completed. I tried to solve it by following multiple steps mention by people on various forum but still facing issue. Including trying on various version of tensorflow from version 1.14 to 1.5.1 and keras from 2.0.8 to 2.1.0.

    Will it be please possible for you to run the code again at your end with multiple epoch run and then share the requirement.txt file.

    Waiting for your reply…

  154. Avatar
    Michelangiolo August 6, 2020 at 1:35 am #

    Hi, great tutorial! I am stuck at

    from mrcnn.model import MaskRCNN

    ImportError: Keras requires TensorFlow 2.2 or higher. Install TensorFlow via pip install tensorflow

    The issue is that MaskRCNN seems incompatible with the latest version of tf. I have been installing tensorflow 1.5 to avoid issues with model_dir not recognized.

    Thank you

    • Avatar
      Jason Brownlee August 6, 2020 at 6:15 am #

      You must use TF 1.14 and Keras 2.2.

      • Avatar
        Nils September 8, 2020 at 9:02 pm #

        Hey Jason,

        I have the same problem, but following my error im unable to download an earlier version of tf.

        ERROR: Could not find a version that satisfies the requirement tensorflow==1.15 (from versions: 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0, 2.3.0rc0, 2.3.0rc1, 2.3.0rc2, 2.3.0)
        ERROR: No matching distribution found for tensorflow-gpu==1.15

        Thank you for all you great work

        • Avatar
          Nils September 8, 2020 at 10:54 pm #

          Hey all,

          python 3.8 does not support lower tensorflow versions. Python 3.7 can.

          The next steps can be used when working in anaconda

          This can be installed via conda with the command conda install -c anaconda python=3.7 as per

          Though not all packages support 3.7 yet, running conda update –all may resolve some dependency failures.

          • Avatar
            Jason Brownlee September 9, 2020 at 6:51 am #

            Thanks for sharing.

            Generally, Python 3.6 is recommend.

        • Avatar
          Jason Brownlee September 9, 2020 at 6:48 am #

          Sorry to hear that.

  155. Avatar
    Peter August 8, 2020 at 11:04 pm #

    Hello Jason!
    Love your articles.
    How about if I want to train from scratch,what changes have to be made to the code.
    Thanks Peter.

    • Avatar
      Jason Brownlee August 9, 2020 at 5:44 am #

      Do not load the pre-trained weights.

      • Avatar
        Peter August 9, 2020 at 4:34 pm #

        Ok thanks
        Another question…Please must all input be of same shape

        • Avatar
          Jason Brownlee August 10, 2020 at 5:45 am #

          It is common to reshape images to the same size/shape prior to modeling.

          The model will do this for you I believe.

  156. Avatar
    Anand Nataraj August 13, 2020 at 2:04 pm #

    Hi Jason,

    Is this architecture capable to work for multi class like predicting kangaroo, lion, tiger, etc?

    I tried it but getting very less accuracy. Your advice would help.


  157. Avatar
    Dan September 14, 2020 at 11:18 pm #

    Hi Jason and thanks a million for the tutorial. I implemented it and it is fully functional.

    It’s been 24 hours since I started to learn ML, bear in mind.

    I want to adapt your code to detect certain photos of items in scanned images, so it is not kangaroos. I need to train the model on a completely new type of object.

    I can create the training set folders with images and annotations as you defined them, no problem.

    But what would I have to change in the code in order to train it on a completely new type of object? I started by eliminating the load_weights instruction.

    • Avatar
      Jason Brownlee September 15, 2020 at 5:26 am #

      Well done!

      Good question. Load the weights as before. The change is focused on how you load your custom dataset – to ensure that the class, image, and mask are represented correctly using the Mask RCNN API – use the existing code as a guide.

      • Avatar
        Dan September 17, 2020 at 8:01 pm #

        thanks for the help Jason. I now understand some more about the topic.

        When I try to train the model with your code it always gives me this error at the end. Have you noticed this before? I changed the number of epochs to run the program faster and trigger the error sooner for debugging. The error appears with the kangaroo dataset as well as with my dataset.

        5/6 [========================>…..] – ETA: 49s – loss: 3.1602 – rpn_class_loss: 0.0097 – rpn_bbox_loss: 0.5310 – mrcnn_class_loss: 0.2470 – mrcnn_bbox_loss: 1.4792 – mrcnn_mask_loss: 0.8933 C:\Python\lib\site-packages\skimage\transform\ FutureWarning: Input image dtype is bool. Interpolation is not defined with bool data type. Please set order to 0 or explicitely cast input image to another data type. Starting from version 0.19 a ValueError will be raised instead of this warning.
        order = _validate_interpolation_order(image.dtype, order)
        C:\Python\lib\site-packages\skimage\transform\ FutureWarning: Input image dtype is bool. Interpolation is not defined with bool data type. Please set order to 0 or explicitely cast input image to another data type. Starting from version 0.19 a ValueError will be raised instead of this warning.
        order = _validate_interpolation_order(image.dtype, order)
        Traceback (most recent call last):
        File “C:\Python\lib\site-packages\mask_rcnn-2.1-py3.6.egg\mrcnn\”, line 1692, in data_generator
        ZeroDivisionError: integer division or modulo by zero

        During handling of the above exception, another exception occurred:

        Traceback (most recent call last):
        File “C:/Users/GABI/PycharmProjects/Object_Recognition/”, line 109, in
        model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers=’heads’)
        File “C:\Python\lib\site-packages\mask_rcnn-2.1-py3.6.egg\mrcnn\”, line 2374, in train
        File “C:\Python\lib\site-packages\keras\legacy\”, line 91, in wrapper
        return func(*args, **kwargs)
        File “C:\Python\lib\site-packages\keras\engine\”, line 1418, in fit_generator
        File “C:\Python\lib\site-packages\keras\engine\”, line 234, in fit_generator
        File “C:\Python\lib\site-packages\keras\legacy\”, line 91, in wrapper
        return func(*args, **kwargs)
        File “C:\Python\lib\site-packages\keras\engine\”, line 1472, in evaluate_generator
        File “C:\Python\lib\site-packages\keras\engine\”, line 330, in evaluate_generator
        generator_output = next(output_generator)
        File “C:\Python\lib\site-packages\mask_rcnn-2.1-py3.6.egg\mrcnn\”, line 1810, in data_generator
        UnboundLocalError: local variable ‘image_id’ referenced before assignment

        • Avatar
          Jason Brownlee September 18, 2020 at 6:44 am #

          Sorry to hear that, I have not seen this error.

          Do your Keras and TF versions match the expected versions listed at the top of the tutorial?
          Did you copy all of the code?
          Are you running from the command line?

        • Avatar
          Fabian January 4, 2021 at 8:14 am #

          @Dan I have the same error, did you manage to get rid of it?

          • Avatar
            Mauricio June 29, 2021 at 6:46 am #

            Same thing, did you ever fix it Fabian?

  158. Avatar
    Laura September 17, 2020 at 3:31 am #

    Hi Jason!

    Similar to the last question, I want to train on a new dataset in which the objects that I want to detect are not even closely related to the objects the pre-trained model has seen. Is this actually possible or do we have to have images that are similar to the pre-trained model? I guess the root of my question is – how flexible is transfer learning? Can I really take a pretrained model trained on kangaroos and get it to learn to detect random shapes in a new image?

    Can a MASK RCNN detect overlapping objects?

    • Avatar
      Jason Brownlee September 17, 2020 at 6:51 am #

      It is critical to train on data close to what you want to make predictions on in the future.

  159. Avatar
    Richie September 22, 2020 at 8:53 am #

    Hello and thank you for such a great tutorial! I am stuck on a certain part though. I can’t seem to find the mask_rcnn_kangaroo_cfg files that are supposed to be generated. It’s supposed to be saved in your working directory right? Is there another place the .h5 files could be saved?

  160. Avatar
    Manikanteswar September 25, 2020 at 8:34 pm #

    Great Tutorial Jason.

    I understood your code, but i have one doubt .

    You didn’t include any background images in training (i mean image without kangaroo).

    So, how to include background images in training because background images don’t have any masks right?

    So please tell me how to do this.


    • Avatar
      Jason Brownlee September 26, 2020 at 6:18 am #

      I don’t think it is needed.

      But perhaps you can provide images without kangaroos and see if the API/model accepts them.

  161. Avatar
    chadi September 30, 2020 at 4:59 pm #

    Hi Jason,

    thanks for this great tutorial. how to have the class_name i.e ‘kangaroo’ displayed on the picture? more importantly, how to extract it and save it in some list ?

    many thanks

  162. Avatar
    Shriram October 6, 2020 at 5:20 am #

    Hey Jason,

    Lets just say you saved my life.

  163. Avatar
    Jan Beneš October 12, 2020 at 8:27 am #

    Hi, is there a way for me to see validation/training accuracy in each epoch? Will model.history.keys() even show something? Is there a way? Thank you.

    • Avatar
      Jason Brownlee October 12, 2020 at 9:16 am #

      Good question, the performance is reported on the command line.

      Perhaps check the API for the train() function to see if it returns a history object.

      • Avatar
        Jan Beneš October 14, 2020 at 11:52 pm #

        Thank you, found that can use Tensorboard for the graphic view. I have another question/problem. With my custom dataset around 1000 images and128x128 pixels, and I somehow manage to run out of memory, is there a fix for that?

        • Avatar
          Jan Beneš October 15, 2020 at 5:49 am #

          Well, managed to fix that by, reducing learning rate to 0.00001 and steps per epoch to 50.

        • Avatar
          Jason Brownlee October 15, 2020 at 6:10 am #

          Use a machine with more memory, like AWS EC2.
          Use a smaller dataset.
          Use smaller images.

  164. Avatar
    Edward October 14, 2020 at 9:37 pm #

    Hi Jason,

    Thank you for this great tutorial. Would you have a similar tutorial using YOLO for Keras instead of R-CNN for Keras?

    Thank you very much.

  165. Avatar
    Anand October 16, 2020 at 10:05 pm #

    Is Mask-RCNN better than yoloV3? I’m trying to build a model which could predict stamps in the given bank forms.

    • Avatar
      Jason Brownlee October 17, 2020 at 6:03 am #

      I believe it is. It might a good idea to test a suite of models and discover what works best for your specific dataset.

  166. Avatar
    Jan Beneš October 19, 2020 at 11:21 pm #

    Hi, is it possible to predict in real time? Or would it be possible to get each image shown right after it predicts, so I dont have to wait for the whole batch to finish?


    • Avatar
      Jan Beneš October 20, 2020 at 12:05 am #

      Ohh, sorry for asking too much questions. What if I wanted to just predict the picture and save it with the bounding box filled?

    • Avatar
      Jason Brownlee October 20, 2020 at 6:25 am #

      Yes, you can call predict() with one image in real time.

  167. Avatar
    Nassif October 21, 2020 at 7:08 am #

    For people facing memory problems when running the code in the training part, add IMAGES_PER_GPU = 1 in the “#define a configuration for the model” section.

    Thanks for the tutorial really helpful

  168. Avatar
    amin November 1, 2020 at 3:55 pm #

    Hello my good friends
    I want to diagnose a car (car brand) through yolo.
    Thank you for your help.

    09174286232 WhatsApp

  169. Avatar
    Nils November 3, 2020 at 1:49 am #

    Hello Jason,

    Based on this link “mAP (mean Average Precision) for Object Detection, 2018” i cannot really figure out what kind of method is used to calculate the mAP and where i can find it. Is the Pascal Voc used or MS coco. If MS coco is used the interpolation of 101 points is ment by it right? Where could i find it myself next time?

    Thank you for your great work,


  170. Avatar
    Alma November 3, 2020 at 6:10 pm #

    Thanks for the great writeup. Was able to successfully implement this. Question:

    If training on new images—I assume we have to come up with an xml file to classify “where in an image an object is”. What is the best way to generate that file?

    Also—I did this in tensorflow 2.0. Must be an update. That said, I used your recommended keras. Perhaps you want it add this information to your article.

  171. Avatar
    Alma November 20, 2020 at 9:46 am #

    Is there an easy way to convert your programs output .h5 file to a .pb file for TensorRT use?

  172. Avatar
    Saad Khan December 6, 2020 at 3:01 am #

    Currently, I’m getting 0.0 train and test mAP accuracy. What could potentially be the issue?

  173. Avatar
    darkrider97 December 27, 2020 at 2:37 am #

    Can you be a bit more clear about why mold_image is required ?
    I can see in source code that mold_image does normalization of the image, but I haven’t seen the same normalization done for Kangaroo Dataset which is used for training the model.
    So, why are we doing normalization while predicting ?

    • Avatar
      Jason Brownlee December 27, 2020 at 5:03 am #

      If I recall, it is because the data prep is performed automatically when training the model, and when predicting/evaluating we are loading new data and must perform data prep manually.

  174. Avatar
    Sohini Mallick February 9, 2021 at 10:41 pm #

    Traceback (most recent call last):
    File “C:\Users\User\anaconda3-38\lib\site-packages\IPython\core\”, line 3418, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
    File “”, line 8, in
    train_mAP = evaluate_model(train_set, model, cfg)
    File “”, line 60, in evaluate_model
    AP, _, _, _ = mrcnn.utils.compute_ap(gt_bbox, gt_class_id, gt_mask, r[“rois”], r[“class_ids”], r[“scores”], r[‘masks’])
    File “C:\Users\User\PycharmProjects\Test2\mrcnn\”, line 727, in compute_ap
    gt_match, pred_match, overlaps = compute_matches(
    File “C:\Users\User\PycharmProjects\Test2\mrcnn\”, line 682, in compute_matches
    overlaps = compute_overlaps_masks(pred_masks, gt_masks)
    File “C:\Users\User\PycharmProjects\Test2\mrcnn\”, line 115, in compute_overlaps_masks
    intersections =, masks2)
    File “”, line 5, in dot
    ValueError: shapes (2,1048576) and (3136,2) not aligned: 1048576 (dim 1) != 3136 (dim 0)

    Is there a solution to this error?

  175. Avatar
    Jan Beneš March 7, 2021 at 5:34 am #

    Hello, thanks for the tutorial, I managed to run realtime recognition through opencv.
    But I have a problem and I am hoping I can get it resolved.
    When I call model.predict() it takes 0.54 seconds until it finishes, that is very very slow like 2 frames pre second, how can I speed it up?

    Thank you.

  176. Avatar
    Neha March 24, 2021 at 11:52 pm #

    Hi Jason, Did you get a chance to write on image annotation?
    Though I am aware of a couple of tools like but they all work on a single image at a time. This is cumbersome when there are thousands of images.

    Are you aware of any platform where we upload a list of images and their corresponding labels to generate annotations for all in one go?

    • Avatar
      Jason Brownlee March 25, 2021 at 4:45 am #

      No, sorry. I have not taken a close look at image annotation.

  177. Avatar
    Noushin April 1, 2021 at 9:32 pm #

    A question is bothering me for a while and that is about the limitation of Neural Networks in general ( CNNs, RNNs, or other structures) for detecting small objects. I know small object detection is itself a challenging topic. However, is there any limitation in the size of the object that these models can detect as the smallest kernel size can be used is 3 by 3? Please correct me if I am looking at this issue from the wrong point of view (relating kernel size and object size).

    • Avatar
      Jason Brownlee April 2, 2021 at 5:38 am #

      Yes, really small or really large objects can be missed and may require specalized handling of the data or custom models that can operate at multiple scales in parallel.

      • Avatar
        Noushin April 13, 2021 at 11:16 pm #

        thanks a lot, very straight and clear

  178. Avatar
    Neha April 6, 2021 at 6:58 pm #

    Hi Jason,

    In the code above, at the time of model evaluation or running prediction on a single image, function: mold_image(..) is used to perform pixel centering, This step wasn’t explicit in model training. Is it that this step is taken care off by MaskRCNN model training behind the scenes?

    Thanks in advance!!!

    • Avatar
      Jason Brownlee April 7, 2021 at 5:08 am #


      • Avatar
        Neha April 8, 2021 at 5:18 pm #

        Thanks Jason for a quick response, I have a follow-up question.
        My dataset is images of emergency and non-emergency vehicles. After model training and evaluation, when running it on test set images, model couldn’t detect vehicle in one image. And when I commented the step of mold_image(…), it could successfully detect the vehicle.

        So, is it right to say that pre-processing step – centering of image should not be done on this dataset? If so, how do I turn that off during model training.

        • Avatar
          Jason Brownlee April 9, 2021 at 5:20 am #

          You’re welcome.

          Interesting. If the pre-processing was used during training, it should be used on new data.

          Perhaps confirm it was applied during training.
          Perhaps confirm any other assumptions.

  179. Avatar
    Amb April 8, 2021 at 11:04 pm #

    Hi Jason,

    Thanks for your guide!

    How would I build a data set of the images that aren’t flagged as difficult?


    • Avatar
      Jason Brownlee April 9, 2021 at 5:25 am #

      Perhaps exclude all images from train and test that cannot be predicted with a simple model?

      But why?

  180. Avatar
    amb April 9, 2021 at 12:33 am #

    actually I’ve just worked it out


  181. Avatar
    James Chang April 23, 2021 at 7:18 pm #

    According to,

    The coordinate of the bounding boxes should be in the form of [ymin, xmin, ymax, xmax], which is different from yours. I am a bit confused.

  182. Avatar
    Aishwarya G April 28, 2021 at 8:52 pm #

    Hello Jason,

    Greetings for the day!

    While training the model I am receiving the the following errors:

    1. File “/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/util/”, line 193, in __getattr__
    attr = getattr(self._tfmw_wrapped_module, name)
    AttributeError: module ‘tensorflow’ has no attribute ‘name_scope’

    2. File “/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/keras/applications/”, line 22, in
    import keras_applications
    ModuleNotFoundError: No module named ‘keras_applications’

    Could please help me to solve these errors.

    Thank you!!

  183. Avatar
    Johan May 30, 2021 at 5:03 am #

    AttributeError Traceback (most recent call last)
    in ()
    6 from mrcnn.utils import Dataset
    7 from mrcnn.config import Config
    —-> 8 from mrcnn.model import MaskRCNN

    /content/Mask-Rcnn/mrcnn/ in ()
    –> 255 class ProposalLayer(KE.Layer):
    256 “””Receives anchor scores and selects a subset to pass as proposals
    257 to the second stage. Filtering is done based on anchor scores and

    AttributeError: module ‘keras.engine’ has no attribute ‘Layer’

    I am getting this error can you help pls

    • Avatar
      Jason Brownlee May 30, 2021 at 5:52 am #

      Perhaps ensure you are using the version of tensorfow and keras described at the top of the tutorial.

  184. Avatar
    Radhika June 3, 2021 at 10:21 pm #

    Hello Jason ,
    Thank you for this comprehensive tutorial !

    Issues encountered :
    (1)AttributeError: module ‘keras.engine’ has no attribute ‘Layer’ : This issue (also reported by somebedy else in the comments section ) gets resolved (as you have suggested above in another comment)after installing the tensor flow version and keras version that you have mentioned

    However , after having resolved the above issue, there is another error :

    AttributeError: module ‘tensorflow’ has no attribute ‘name_scope’

    Any suggestions on resolving this error please ?

    • Avatar
      Jason Brownlee June 4, 2021 at 6:53 am #

      You’re welcome.

      Perhaps ensure you are using the version of keras and tensorflow listed above.

  185. Avatar
    Radhika June 3, 2021 at 11:28 pm #

    please ignore/delete comment , issue resolved

  186. Avatar
    Steve June 8, 2021 at 6:02 pm #

    Hey Jason,

    very nice work.

    I have a question for the output.
    We get a single class output with a confidence score for this class.

    Is it possible to get a class vector for each box?
    Example: 2 Classes (Dog, Cat)
    Box[x, y , width, height, class:[0.3, 0.7]
    So is it possible to say this box is 30% Dog and 70% Cat or something like that?

    Even better would be that each class could be 0%-100% for itself. So 65%/100% its a dog and 80%/100% its a cat.

    I want decide myself wich class it should take.

    • Avatar
      Jason Brownlee June 9, 2021 at 5:40 am #


      Yes, I believe it provides a box for each item discovered in the image and probabilities for all known classes.

      • Avatar
        Steve June 9, 2021 at 4:57 pm #

        Hey Jason,

        unluckily it doesnt.
        I just get one class for each box and not a multiclass vector.

        My problem is that i don’t find any chance to customize the code for output a vector…

        If you believe, how can I get this vector…

        Thanks for your answer. 🙂
        It’s nice that you’re still helping people.

        • Avatar
          Jason Brownlee June 10, 2021 at 5:23 am #

          I believe it does, see the kangaroo example where two “objects” are found in one image.

          • Avatar
            Steve June 18, 2021 at 4:42 pm #

            I understand, that I can detect multiple objects in one image.
            But i need a multiclass “vector” for one object.

            I need: “This object is 60% a dog and 40% a cat.” for example.

          • Avatar
            Jason Brownlee June 19, 2021 at 5:48 am #

            Yes, a given prediction gives a vector of probabilities across all known classes I believe. You can sorry by probability and report the top 5. I think I have an example of this for pre-trained image classification models on the blog.

  187. Avatar
    Angel0 June 24, 2021 at 8:04 pm #

    Ciao a tutti, sono alle prime armi con il deep learning…ho addestrato un modello tramite una CNN con Keras ed ho salvato il modello.h5 – Da questo come posso fare object detection per rilevare gli oggetti nelle immagini ? Chi mi aiuta ad eseguire questo prox step? GRAZIE mille

  188. Avatar
    Simeon July 17, 2021 at 5:50 am #

    Hello Jason,
    I have a different dataset from this. It is a CSV annotation file and it has more than one class in the dataset such as person, car, cat, etc. The bounding box coordinates are in x_min, y_min, x_max, y_max format, where x_min, y_min is the top-left coordinate whereas x_max, y_max is the bottom right coordinate. The class names are text files and I need to change them into integer representation, I think. I have seen in some datasets that they arranged based on classes, but in this dataset, all the images are in one folder. I want to parse the CSV file and preprocess it before loading it to the object detection model. Each image contains more than one object. How can I parse my dataset? I used the pandas read_csv () file function, but I ended up with an error saying that the length of the input image and the images in the annotation file are not equal. This is because the image names are repeated for each bounding box in the annotation.
    I really appreciate your suggestion and help

    • Avatar
      Jason Brownlee July 18, 2021 at 5:19 am #

      You may have to write some custom code to load your dataset.

  189. Avatar
    Simeon July 17, 2021 at 6:04 am #

    Correction: The class names are categorical data. Sorry, I wrote it as text data.

  190. Avatar
    Nghia Nguyen July 22, 2021 at 12:25 am #

    I have reused above code, but I got below error. Is there any one got the same issue and how to solve it ? Thanks.

    Traceback (most recent call last):
    File “.\”, line 124, in
    model = MaskRCNN(mode=’training’, model_dir=’./’, config=config)
    File “C:\Users\HP\Documents\000_DATA\100_TECH\100_CODE\100_TRAIN\300_ComputerVision\envWinComputerVision_python37\lib\site-packages\mask_rcnn-2.1-py3.7.egg\mrcnn\”, line 1849, in __init__
    File “C:\Users\HP\Documents\000_DATA\100_TECH\100_CODE\100_TRAIN\300_ComputerVision\envWinComputerVision_python37\lib\site-packages\mask_rcnn-2.1-py3.7.egg\mrcnn\”, line 1978, in build
    File “C:\Users\HP\Documents\000_DATA\100_TECH\100_CODE\100_TRAIN\300_ComputerVision\envWinComputerVision_python37\lib\site-packages\keras\engine\”, line 457, in __call__
    output =, **kwargs)
    File “C:\Users\HP\Documents\000_DATA\100_TECH\100_CODE\100_TRAIN\300_ComputerVision\envWinComputerVision_python37\lib\site-packages\mask_rcnn-2.1-py3.7.egg\mrcnn\”, line 323, in call
    File “C:\Users\HP\Documents\000_DATA\100_TECH\100_CODE\100_TRAIN\300_ComputerVision\envWinComputerVision_python37\lib\site-packages\mask_rcnn-2.1-py3.7.egg\mrcnn\”, line 820, in batch_slice
    File “C:\Users\HP\Documents\000_DATA\100_TECH\100_CODE\100_TRAIN\300_ComputerVision\envWinComputerVision_python37\lib\site-packages\mask_rcnn-2.1-py3.7.egg\mrcnn\”, line 321, in
    File “C:\Users\HP\Documents\000_DATA\100_TECH\100_CODE\100_TRAIN\300_ComputerVision\envWinComputerVision_python37\lib\site-packages\mask_rcnn-2.1-py3.7.egg\mrcnn\”, line 263, in clip_boxes_graph
    File “C:\Users\HP\Documents\000_DATA\100_TECH\100_CODE\100_TRAIN\300_ComputerVision\envWinComputerVision_python37\lib\site-packages\tensorflow_core\python\framework\”, line 645, in set_shape
    raise ValueError(str(e))
    ValueError: Shapes must be equal rank, but are 3 and 2

  191. Avatar
    Minkle July 26, 2021 at 8:12 am #

    Hi! Firstly, thank you so much for this guide! It has been insanely helpful and I really appreciate it!

    I was wondering if I could get your help on something. I’m currently trying to train the RCNN to detect insects in my backyard and the network is picking up other things like chairs, vases, and people and classifying it as insect. I believe this is from the base network that has 80 objects trained.

    Is there anyway I can separate these 80 objects from the network and prevent it from detecting other things and only detect the new insect classes I want?

    Thank you!

    • Avatar
      Jason Brownlee July 27, 2021 at 5:04 am #

      Perhaps you can write some code to interpret the prediction from the model and only report relevant objects to the user.

  192. Avatar
    Fancy August 12, 2021 at 6:53 am #

    Hi Jason,
    I love this tutorial, very detailed explanation.
    I’m using Keras 2.2.4 , TF 1.15, and h5py 3.3.0, My problem is that I’m stuck at ‘model.load_weights’, error message says:

    ImportError: dlopen(/Users/fancy/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/h5py/, 2): Symbol not found: _H5Pget_fapl_ros3
    Referenced from: /Users/fancy/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/h5py/
    Expected in: /Users/fancy/opt/anaconda3/envs/myenv/lib/libhdf5.103.dylib
    in /Users/fancy/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/h5py/

    I searched a lot but can’t figure out the solution, could you help?

    Thank you so much!

    • Adrian Tam
      Adrian Tam August 12, 2021 at 7:34 am #

      Error like this are due to library installation. May be you have some conflicting mix of libraries? Try uninstall h5py and reinstall. That may help.

      • Avatar
        Fatma Mazen September 9, 2021 at 1:09 am #

        This worked for me
        !pip install ‘h5py==2.10.0’ –force-reinstall

  193. Avatar
    Fatma Mazen September 9, 2021 at 1:14 am #

    Thank you for this informative tutorial
    I have a question about image size
    I am having this error
    IndexError: index 2048 is out of bounds for axis 0 with size 2048
    because my image height is 2048
    What to do with large-size images?
    Thanks in advance.

    • Adrian Tam
      Adrian Tam September 9, 2021 at 4:43 am #

      There can be a lot of things to do with large images but your error seems to be accessing outside of the array. It sounds to me like some coding mistake more than anything else.

      • Avatar
        Fatma Mazen September 15, 2021 at 5:52 pm #

        Thank you for your reply
        I have already annotated the large size images using VIA tool
        the largest image is 2322*4096 and the smallest one is 720 *1280
        Should the image resize mode be “square” with max_size=1024 and min_size=800 as default values?
        Or I should modify them according to my dataset image size?
        Can you kindly tell me what should be the optimum value for max_size and min_size?

        • Adrian Tam
          Adrian Tam September 16, 2021 at 12:50 am #

          I would refer to rescale the image rather than modify the model. The reason is that, modifying the model means retraining, which is very time consuming.

          • Avatar
            Fatma Mazen September 16, 2021 at 3:05 am #

            I have already annotated the dataset which was a time-consuming task. I think I will have to re annotate them for the new resized images
            I have three questions now:
            1.Should I set image_resize_Mode to “square” or “crop”?
            2.Do you think that setting max_size and min_size to larger values like 2048 or 4096 rather than 1024 will give better results?
            3.Are there any parameters needs to be modified if I change min_size and max_size?
            Thanks in advance

          • Adrian Tam
            Adrian Tam September 16, 2021 at 11:49 pm #

            I don’t think re-annotate is necessary. There should be a tool to resize/crop image together with the annotation. For square or crop, I would prefer whatever to keep the aspect ratio. And for the size, I would prefer to make it as small as possible while you can still identify the object. You shouldn’t be greedy here, but rather, keep the minimum information for the model so it will not learn from the noise and converge faster.

  194. Avatar
    Fatma Mazen September 17, 2021 at 7:58 am #

    Thank you for your reply
    I would be grateful for you if you tell me tool name to crop the image with annotation
    I have used vgg image annotator (VIA)
    Thanks in advance

  195. Avatar
    Erfan Hatefi September 25, 2021 at 5:09 am #

    Hi Jason
    TNX for sharing such a great article!!
    Well described part by part of it

    Personally, I faced some debugging errors.
    I’m using colab. Installed the Tensroflow and Keras with the specific version mentioned in the begining of the article.
    First, after running the cell(in which the trainng starts), I got the Error
    ModuleNotFoundError: No module named ‘keras_applications’
    in which is described in one of above comments(Back to April 28th).

    By a little search, I supposed that all of it is because of h5py version. So, I tried installing h5py v2.8.0. However, the funny thing is I got a new completely different error which is:(The last traceback)

    Traceback (most recent call last):
    File “/usr/local/lib/python3.7/dist-packages/IPython/core/”, line 1132, in get_records
    return _fixed_getinnerframes(etb, number_of_lines_of_context, tb_offset)
    File “/usr/local/lib/python3.7/dist-packages/IPython/core/”, line 313, in wrapped
    return f(*args, **kwargs)
    File “/usr/local/lib/python3.7/dist-packages/IPython/core/”, line 358, in _fixed_getinnerframes
    records = fix_frame_records_filenames(inspect.getinnerframes(etb, context))
    File “/usr/lib/python3.7/”, line 1502, in getinnerframes
    frameinfo = (tb.tb_frame,) + getframeinfo(tb, context)
    File “/usr/lib/python3.7/”, line 1460, in getframeinfo
    filename = getsourcefile(frame) or getfile(frame)
    File “/usr/lib/python3.7/”, line 696, in getsourcefile
    if getattr(getmodule(object, filename), ‘__loader__’, None) is not None:
    File “/usr/lib/python3.7/”, line 733, in getmodule
    if ismodule(module) and hasattr(module, ‘__file__’):
    File “/usr/local/lib/python3.7/dist-packages/tensorflow/”, line 50, in __getattr__
    module = self._load()
    File “/usr/local/lib/python3.7/dist-packages/tensorflow/”, line 44, in _load
    module = _importlib.import_module(self.__name__)
    File “/usr/lib/python3.7/importlib/”, line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
    File “”, line 1006, in _gcd_import
    File “”, line 983, in _find_and_load
    File “”, line 965, in _find_and_load_unlocked
    ModuleNotFoundError: No module named ‘tensorflow_core.estimator’

    Have you got any ideas for dealing with this?
    I appreciate your response

    • Adrian Tam
      Adrian Tam September 27, 2021 at 10:23 am #

      I believe that is more like tensorflow 1.x vs 2.x issue. There’s a lot of change in this major version upgrade and breaks a lot of old code.

  196. Avatar
    Sultan October 2, 2021 at 5:10 pm #

    Got an error in /Mask_RCNN/mrcnn/”, line 20,

    ImportError: cannot import name ‘get_config’ from ‘tensorflow.python.eager.context’.
    Tensorflow versions issues as always

  197. Avatar
    Josh Blumer October 25, 2021 at 6:46 pm #

    Hey Jason, thanks so much for all the great tutorials. I’m having an issue with an “AttributeError: ‘str’ object has no attribute ‘decode'” error when trying to execute the “model.load_weights” block. The error line reads “original_keras_version = f.attrs[‘keras_version’].decode(‘utf8’). Google suggests dropping the “.decode(‘utf8) because it’s no longer necessary after python 3 but that’s not possible due to it being source code. I’m using python 3.6 and force installed tensorflow 1.15.3 and keras 2.2.4 as you directed at the beginning of the tutorial. Any advice is greatly appreciated, thank you.

    • Avatar
      Sofia December 7, 2021 at 10:45 am #

      Hi Josh,
      I have the same problem as you. did you manage to fix it?

      thank you

  198. Adrian Tam
    Adrian Tam December 8, 2021 at 7:51 am #

    Try to use a newer Tensorflow (e.g., 2.x) which I believe it has better support of Python 3

    • Avatar
      Chiedozie February 7, 2022 at 12:35 am #

      Hi Sofia and Josh,

      Did any of you successfully fix this issue? Please do let me know.

  199. Avatar
    Eric Yi December 16, 2021 at 6:43 am #

    Hi James I am facing an issue when trying out the codes.
    When using functions like .image_reference(), and .load_image() where inside the functions it will call the self.image_info[image_id] function

    I will get this error:

    TypeError Traceback (most recent call last)
    ~\AppData\Local\Temp/ipykernel_15376/ in
    1 # load an image
    2 image_id = ’30_days_01.jpg’
    —-> 3 dataset_train.image_reference(image_id)
    5 # image = dataset_train.load_image(image_id)

    ~\AppData\Local\Temp/ipykernel_15376/ in image_reference(self, image_id)
    116 def image_reference(self, image_id):
    117 “””Return the path of the image.”””
    –> 118 info = self.image_info[image_id]
    119 if info[“source”] == “object”:
    120 return info[“path”]

    TypeError: list indices must be integers or slices, not str

    the keras version im using is 2.2.5 and tensorflow version is 1.13.1

    I did tried out with the verions mentiond in your article but I’m still having the same error too.

    Hope to hear from you soon
    Thank you

    • Adrian Tam
      Adrian Tam December 17, 2021 at 7:18 am #

      The error message tells it all – you need image_id to be an integer to make it work.

  200. Avatar
    Eric Yi January 14, 2022 at 1:24 pm #

    Thank Adrian,
    It is just as what you mentioned. 😀

  201. Avatar
    John February 2, 2022 at 6:33 am #

    Hello, great tutorial! I ran the 1st part of the code you have at the beginning of the tutorial. How can I modify this line of code: “display_instances(image, bbox, mask, class_ids, train_set.class_names)” in order to print the image in original dimensions? Because I use very big UAV images and the squares seem very small.
    Great work!!!

  202. Avatar
    John February 2, 2022 at 11:27 pm #

    One of the greatest tutorial on the internet!! Very understandable!!! What modifications should I do in the above code to make it train and work with my custom dataset? Once again, THANKS for the great tutorial and the information…You are awesome!

  203. Avatar
    Jeffrey March 29, 2022 at 11:32 am #

    Thanks for providing this article… it might be a start for me. I just got started into object detection and I’m working on a project that detect bank cheques from images. Can I use this procedures in training with my datasets for my project.
    I would be very grateful for response
    Best regards

    • Avatar
      James Carmichael March 30, 2022 at 3:50 am #

      Hi Jeffrey…Yes, but understand that all code and material on my site and in my books was developed and provided for educational purposes only.

      I take no responsibility for the code, what it might do, or how you might use it.

      If you use my code or material in your own project, please reference the source, including:

      The Name of the author, e.g. “Jason Brownlee”.
      The Title of the tutorial or book.
      The Name of the website, e.g. “Machine Learning Mastery”.
      The URL of the tutorial or book.
      The Date you accessed or copied the code.
      For example:

      Jason Brownlee, Machine Learning Algorithms in Python, Machine Learning Mastery, Available from, accessed April 15th, 2018.
      Also, if your work is public, contact me, I’d love to see it out of general interest.

  204. Avatar
    Kiran Agashe April 16, 2022 at 1:31 pm #

    Hi Jason,
    Thanks for the great tutorial, with excellent explaination.
    While using example, I am facing following issue (mentioned above by others too).

    model.load_weights(“mask_rcnn_coco.h5”, by_name=True, exclude=[“mrcnn_class_logits”, “mrcnn_bbox_fc”, “mrcnn_bbox”, “mrcnn_mask”])

    Traceback (most recent call last):
    File “/home/kiran/cds2_cp_team5/mrcnn/COCO_creator/”, line 324, in
    model.load_weights(str, by_name=True, exclude=[“mrcnn_class_logits”, “mrcnn_bbox_fc”, “mrcnn_bbox”, “mrcnn_mask”])
    File “/home/kiran/.pyenv/versions/3.7.13/lib/python3.7/site-packages/mask_rcnn-2.1-py3.7.egg/mrcnn/”, line 2130, in load_weights
    File “/home/kiran/.pyenv/versions/3.7.13/lib/python3.7/site-packages/keras/engine/”, line 1083, in load_weights_from_hdf5_group_by_name
    original_keras_version = f.attrs[‘keras_version’].decode(‘utf8’)

    I am using TF version 1.15.3 and Keras 2.2.4
    Python version: 3.7.13

    Can you please help figure out the issue?

    • Avatar
      James Carmichael April 17, 2022 at 8:00 am #

      Hi Kiran…I would highly recommend that you run your code in Google Colab to determine if there could be versioning issues on your local machine.

  205. Avatar
    rizwan October 26, 2022 at 11:36 pm #

    This was lovely to read.

    • Avatar
      James Carmichael October 27, 2022 at 7:39 am #

      Thank you for your support and feedback!

  206. Avatar
    R K November 28, 2022 at 12:43 am #

    Hi Guys,
    Thanks for the fantastic blog .
    A quick question please : Lets suppose I’v trained another Data Set in exactly this way on this model and lets assume I’ve done everything right . Inspite of that ,If I am not getting good results , how should I deal with this ?

    How can I tune this model(this particular matter port implementation) if need be ?Is there an option to tune it ?
    Do I have the option of training more than just the top layers ?Do I have the option to change hyper params ?

    If yes ,would I need to modify the Matter port source code for all of the above or is there anyway around this ?

    • Avatar
      R K November 28, 2022 at 1:03 am #

      Please ignore /delete this comment , I missed the section where you have already mentioned in the blog that we can finetune more layers , looks like this is a configuration that is available . Thanks !

  207. Avatar
    Mahmoud March 16, 2023 at 3:08 am #

    why running model on GPU return Nan values


  208. Avatar
    Hassan Said June 14, 2023 at 4:51 am #

    If I have to detect defects in the images, but there are not defects in all images, do I need to train with images withou defects also?

  209. Avatar
    Anil July 25, 2023 at 1:39 am #

    Thanks for the nice tutorial.
    When I execute the training on my GPU I see that some rpn_loss is nan , what could be the reason?

    loss: nan – rpn_class_loss: nan – rpn_bbox_loss: nan – mrcnn_class_loss: 0.3515 – mrcnn_bbox_loss: 0.0023 – mrcnn_mask_loss: 0.0010 – val_loss: nan – val_rpn_class_loss: nan – val_rpn_bbox_loss: nan – val_mrcnn_class_loss: 0.1193 – val_mrcnn_bbox_loss: 0.0000e+00 – val_mrcnn_mask_loss: 0.0000e+00

    and also I get no object detections for the image after the train on my custom dataset.

  210. Avatar
    Geoffrey Peart September 11, 2023 at 6:30 am #

    Hi, I’m trying to get this up and running with my daughter for a science fair project, I will admit to having spent most of my development career in Java, so not needing to worry as much about hardware.

    We are finding ourselves locked in loop, we are trying to get working Tensorflow and Keras versions (recommended above) that both work with this code, each other and the M1 chip. When I checkout Tensorflow GitHub, they only have a Mac x86 version of the older version of TensorFlow. Anyone tackled this one?

    Also, just a shout out to the commenters and writers, this has been a really great tutorial and community, so thank you all!

  211. Avatar
    Tobi January 21, 2024 at 5:27 am #

    Hey there,
    is there a possibility to re-train/finetune a pretrained MR-CNN on a custom dataset?
    Thanks in advance.

  212. Avatar
    Tobias January 21, 2024 at 9:39 pm #

    Thanks for your answer but as i see this resource only shows finetuning classification model(s) but not an object detector

    • Avatar
      James Carmichael January 22, 2024 at 10:27 am #

      Hi Tobias…Here are some additional thoughts:

      Fine-tuning an object detector involves several key steps, and it’s a process used to adapt a pre-trained model to your specific task, improving its accuracy on your particular dataset. Here’s a step-by-step guide on how to fine-tune an object detection model:

      ### 1. Choose a Pre-trained Model
      Start with a pre-trained object detection model that has been trained on a large and general dataset like COCO, Pascal VOC, or ImageNet. Popular architectures include YOLO (You Only Look Once), SSD (Single Shot MultiDetector), and Faster R-CNN.

      ### 2. Collect and Prepare Your Dataset
      – **Collect a dataset** that is relevant to your specific task. Your dataset should include images that represent the kind of objects you want to detect.
      – **Annotate your images** by drawing bounding boxes around the objects of interest and labeling them. There are various annotation tools available, such as LabelImg or CVAT.
      – **Split your dataset** into training, validation, and test sets. A common split ratio is 70% for training, 15% for validation, and 15% for testing.

      ### 3. Configure the Model for Your Dataset
      – **Modify the model’s head** if necessary, to match the number of classes in your dataset. For instance, if you’re detecting three types of objects, the final layer should output three classes.
      – **Adjust the configuration settings** of the model, such as the learning rate, batch size, and the number of epochs. You might start with the configuration of the pre-trained model and adjust based on your dataset size and complexity.

      ### 4. Augment Your Data (Optional)
      Data augmentation involves artificially increasing the size and diversity of your training dataset by applying various transformations like flipping, scaling, cropping, and color variation. This can help improve the robustness of your model.

      ### 5. Fine-Tune the Model
      – **Load the pre-trained model** and modify it for your dataset.
      – **Freeze the early layers** of the model to retain learned features that are generally applicable to most visual tasks. Only train the latter layers that are more specific to the detection task.
      – **Train the model** on your dataset. Use the training set to train the model and the validation set to tune the hyperparameters and avoid overfitting.

      ### 6. Evaluate the Model
      – **Use the test set** to evaluate the model’s performance. Common metrics for object detection include Precision, Recall, and the mean Average Precision (mAP).
      – **Iterate** on your training process by adjusting model configurations, augmenting your data differently, or even collecting more data based on the performance on the test set.

      ### 7. Deploy the Model
      Once satisfied with the model’s performance, deploy it for real-world usage or further testing.

      ### Tools and Libraries
      You can use deep learning frameworks like TensorFlow (with its object detection API), PyTorch (with libraries like Detectron2 or Torchvision), or even higher-level APIs like Keras for fine-tuning object detection models.

      Fine-tuning is an iterative process. It might take several rounds of adjustment and training to get the desired accuracy and performance from your object detector.

Leave a Reply