How to Perform Object Detection With YOLOv3 in Keras

Last Updated on October 8, 2019

Object detection is a task in computer vision that involves identifying the presence, location, and type of one or more objects in a given photograph.

It is a challenging problem that involves building upon methods for object recognition (e.g. where are they), object localization (e.g. what are their extent), and object classification (e.g. what are they).

In recent years, deep learning techniques are achieving state-of-the-art results for object detection, such as on standard benchmark datasets and in computer vision competitions. Notable is the “You Only Look Once,” or YOLO, family of Convolutional Neural Networks that achieve near state-of-the-art results with a single end-to-end model that can perform object detection in real-time.

In this tutorial, you will discover how to develop a YOLOv3 model for object detection on new photographs.

After completing this tutorial, you will know:

  • YOLO-based Convolutional Neural Network family of models for object detection and the most recent variation called YOLOv3.
  • The best-of-breed open source library implementation of the YOLOv3 for the Keras deep learning library.
  • How to use a pre-trained YOLOv3 to perform object localization and detection on new photographs.

Kick-start your project with my new book Deep Learning for Computer Vision, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Update Oct/2019: Updated and tested for Keras 2.3.0 API and TensorFlow 2.0.0.
How to Perform Object Detection With YOLOv3 in Keras

How to Perform Object Detection With YOLOv3 in Keras
Photo by David Berkowitz, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

  1. YOLO for Object Detection
  2. Experiencor YOLO3 Project
  3. Object Detection With YOLOv3

Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

YOLO for Object Detection

Object detection is a computer vision task that involves both localizing one or more objects within an image and classifying each object in the image.

It is a challenging computer vision task that requires both successful object localization in order to locate and draw a bounding box around each object in an image, and object classification to predict the correct class of object that was localized.

The “You Only Look Once,” or YOLO, family of models are a series of end-to-end deep learning models designed for fast object detection, developed by Joseph Redmon, et al. and first described in the 2015 paper titled “You Only Look Once: Unified, Real-Time Object Detection.”

The approach involves a single deep convolutional neural network (originally a version of GoogLeNet, later updated and called DarkNet based on VGG) that splits the input into a grid of cells and each cell directly predicts a bounding box and object classification. The result is a large number of candidate bounding boxes that are consolidated into a final prediction by a post-processing step.

There are three main variations of the approach, at the time of writing; they are YOLOv1, YOLOv2, and YOLOv3. The first version proposed the general architecture, whereas the second version refined the design and made use of predefined anchor boxes to improve bounding box proposal, and version three further refined the model architecture and training process.

Although the accuracy of the models is close but not as good as Region-Based Convolutional Neural Networks (R-CNNs), they are popular for object detection because of their detection speed, often demonstrated in real-time on video or with camera feed input.

A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance.

You Only Look Once: Unified, Real-Time Object Detection, 2015.

In this tutorial, we will focus on using YOLOv3.

Experiencor YOLO3 for Keras Project

Source code for each version of YOLO is available, as well as pre-trained models.

The official DarkNet GitHub repository contains the source code for the YOLO versions mentioned in the papers, written in C. The repository provides a step-by-step tutorial on how to use the code for object detection.

It is a challenging model to implement from scratch, especially for beginners as it requires the development of many customized model elements for training and for prediction. For example, even using a pre-trained model directly requires sophisticated code to distill and interpret the predicted bounding boxes output by the model.

Instead of developing this code from scratch, we can use a third-party implementation. There are many third-party implementations designed for using YOLO with Keras, and none appear to be standardized and designed to be used as a library.

The YAD2K project was a de facto standard for YOLOv2 and provided scripts to convert the pre-trained weights into Keras format, use the pre-trained model to make predictions, and provided the code required to distill interpret the predicted bounding boxes. Many other third-party developers have used this code as a starting point and updated it to support YOLOv3.

Perhaps the most widely used project for using pre-trained the YOLO models is called “keras-yolo3: Training and Detecting Objects with YOLO3” by Huynh Ngoc Anh or experiencor. The code in the project has been made available under a permissive MIT open source license. Like YAD2K, it provides scripts to both load and use pre-trained YOLO models as well as transfer learning for developing YOLOv3 models on new datasets.

He also has a keras-yolo2 project that provides similar code for YOLOv2 as well as detailed tutorials on how to use the code in the repository. The keras-yolo3 project appears to be an updated version of that project.

Interestingly, experiencor has used the model as the basis for some experiments and trained versions of the YOLOv3 on standard object detection problems such as a kangaroo dataset, racoon dataset, red blood cell detection, and others. He has listed model performance, provided the model weights for download and provided YouTube videos of model behavior. For example:

We will use experiencor’s keras-yolo3 project as the basis for performing object detection with a YOLOv3 model in this tutorial.

In case the repository changes or is removed (which can happen with third-party open source projects), a fork of the code at the time of writing is provided.

Object Detection With YOLOv3

The keras-yolo3 project provides a lot of capability for using YOLOv3 models, including object detection, transfer learning, and training new models from scratch.

In this section, we will use a pre-trained model to perform object detection on an unseen photograph. This capability is available in a single Python file in the repository called “” that has about 435 lines. This script is, in fact, a program that will use pre-trained weights to prepare a model and use that model to perform object detection and output a model. It also depends upon OpenCV.

Instead of using this program directly, we will reuse elements from this program and develop our own scripts to first prepare and save a Keras YOLOv3 model, and then load the model to make a prediction for a new photograph.

Create and Save Model

The first step is to download the pre-trained model weights.

These were trained using the DarkNet code base on the MSCOCO dataset. Download the model weights and place them into your current working directory with the filename “yolov3.weights.” It is a large file and may take a moment to download depending on the speed of your internet connection.

Next, we need to define a Keras model that has the right number and type of layers to match the downloaded model weights. The model architecture is called a “DarkNet” and was originally loosely based on the VGG-16 model.

The “” script provides the make_yolov3_model() function to create the model for us, and the helper function _conv_block() that is used to create blocks of layers. These two functions can be copied directly from the script.

We can now define the Keras model for YOLOv3.

Next, we need to load the model weights. The model weights are stored in whatever format that was used by DarkNet. Rather than trying to decode the file manually, we can use the WeightReader class provided in the script.

To use the WeightReader, it is instantiated with the path to our weights file (e.g. ‘yolov3.weights‘). This will parse the file and load the model weights into memory in a format that we can set into our Keras model.

We can then call the load_weights() function of the WeightReader instance, passing in our defined Keras model to set the weights into the layers.

That’s it; we now have a YOLOv3 model for use.

We can save this model to a Keras compatible .h5 model file ready for later use.

We can tie all of this together; the complete code example including functions copied directly from the “” script is listed below.

Running the example may take a little less than one minute to execute on modern hardware.

As the weight file is loaded, you will see debug information reported about what was loaded, output by the WeightReader class.

At the end of the run, the model.h5 file is saved in your current working directory with approximately the same size as the original weight file (237MB), but ready to be loaded and used directly as a Keras model.

Make a Prediction

We need a new photo for object detection, ideally with objects that we know that the model knows about from the MSCOCO dataset.

We will use a photograph of three zebras taken by Boegh on safari, and released under a permissive license.

Photograph of Three Zebras

Photograph of Three Zebras
Taken by Boegh, some rights reserved.

Download the photograph and place it in your current working directory with the filename ‘zebra.jpg‘.

Making a prediction is straightforward, although interpreting the prediction requires some work.

The first step is to load the Keras model. This might be the slowest part of making a prediction.

Next, we need to load our new photograph and prepare it as suitable input to the model. The model expects inputs to be color images with the square shape of 416×416 pixels.

We can use the load_img() Keras function to load the image and the target_size argument to resize the image after loading. We can also use the img_to_array() function to convert the loaded PIL image object into a NumPy array, and then rescale the pixel values from 0-255 to 0-1 32-bit floating point values.

We will want to show the original photo again later, which means we will need to scale the bounding boxes of all detected objects from the square shape back to the original shape. As such, we can load the image and retrieve the original shape.

We can tie all of this together into a convenience function named load_image_pixels() that takes the filename and target size and returns the scaled pixel data ready to provide as input to the Keras model, as well as the original width and height of the image.

We can then call this function to load our photo of zebras.

We can now feed the photo into the Keras model and make a prediction.

That’s it, at least for making a prediction. The complete example is listed below.

Running the example returns a list of three NumPy arrays, the shape of which is displayed as output.

These arrays predict both the bounding boxes and class labels but are encoded. They must be interpreted.

Make a Prediction and Interpret Result

The output of the model is, in fact, encoded candidate bounding boxes from three different grid sizes, and the boxes are defined the context of anchor boxes, carefully chosen based on an analysis of the size of objects in the MSCOCO dataset.

The script provided by experiencor provides a function called decode_netout() that will take each one of the NumPy arrays, one at a time, and decode the candidate bounding boxes and class predictions. Further, any bounding boxes that don’t confidently describe an object (e.g. all class probabilities are below a threshold) are ignored. We will use a probability of 60% or 0.6. The function returns a list of BoundBox instances that define the corners of each bounding box in the context of the input image shape and class probabilities.

Next, the bounding boxes can be stretched back into the shape of the original image. This is helpful as it means that later we can plot the original image and draw the bounding boxes, hopefully detecting real objects.

The experiencor script provides the correct_yolo_boxes() function to perform this translation of bounding box coordinates, taking the list of bounding boxes, the original shape of our loaded photograph, and the shape of the input to the network as arguments. The coordinates of the bounding boxes are updated directly.

The model has predicted a lot of candidate bounding boxes, and most of the boxes will be referring to the same objects. The list of bounding boxes can be filtered and those boxes that overlap and refer to the same object can be merged. We can define the amount of overlap as a configuration parameter, in this case, 50% or 0.5. This filtering of bounding box regions is generally referred to as non-maximal suppression and is a required post-processing step.

The experiencor script provides this via the do_nms() function that takes the list of bounding boxes and a threshold parameter. Rather than purging the overlapping boxes, their predicted probability for their overlapping class is cleared. This allows the boxes to remain and be used if they also detect another object type.

This will leave us with the same number of boxes, but only very few of interest. We can retrieve just those boxes that strongly predict the presence of an object: that is are more than 60% confident. This can be achieved by enumerating over all boxes and checking the class prediction values. We can then look up the corresponding class label for the box and add it to the list. Each box must be considered for each class label, just in case the same box strongly predicts more than one object.

We can develop a get_boxes() function that does this and takes the list of boxes, known labels, and our classification threshold as arguments and returns parallel lists of boxes, labels, and scores.

We can call this function with our list of boxes.

We also need a list of strings containing the class labels known to the model in the correct order used during training, specifically those class labels from the MSCOCO dataset. Thankfully, this is provided in the experiencor script.

Now that we have those few boxes of strongly predicted objects, we can summarize them.

We can also plot our original photograph and draw the bounding box around each detected object. This can be achieved by retrieving the coordinates from each bounding box and creating a Rectangle object.

We can also draw a string with the class label and confidence.

The draw_boxes() function below implements this, taking the filename of the original photograph and the parallel lists of bounding boxes, labels and scores, and creates a plot showing all detected objects.

We can then call this function to plot our final result.

We now have all of the elements required to make a prediction using the YOLOv3 model, interpret the results, and plot them for review.

The full code listing, including the original and modified functions taken from the experiencor script, are listed below for completeness.

Running the example again prints the shape of the raw output from the model.

This is followed by a summary of the objects detected by the model and their confidence. We can see that the model has detected three zebra, all above 90% likelihood.

A plot of the photograph is created and the three bounding boxes are plotted. We can see that the model has indeed successfully detected the three zebra in the photograph.

Photograph of Three Zebra Each Detected with the YOLOv3 Model and Localized with Bounding Boxes

Photograph of Three Zebra Each Detected with the YOLOv3 Model and Localized with Bounding Boxes

Further Reading

This section provides more resources on the topic if you are looking to go deeper.




Other YOLO for Keras Projects


In this tutorial, you discovered how to develop a YOLOv3 model for object detection on new photographs.

Specifically, you learned:

  • YOLO-based Convolutional Neural Network family of models for object detection and the most recent variation called YOLOv3.
  • The best-of-breed open source library implementation of the YOLOv3 for the Keras deep learning library.
  • How to use a pre-trained YOLOv3 to perform object localization and detection on new photographs.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Deep Learning Models for Vision Today!

Deep Learning for Computer Vision

Develop Your Own Vision Models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Computer Vision

It provides self-study tutorials on topics like:
classification, object detection (yolo and rcnn), face recognition (vggface and facenet), data preparation and much more...

Finally Bring Deep Learning to your Vision Projects

Skip the Academics. Just Results.

See What's Inside

387 Responses to How to Perform Object Detection With YOLOv3 in Keras

  1. dong zhan May 28, 2019 at 7:48 am #

    thank you so much, machine learning for object detection! you have broadened my horizon, but, I am a game developer, so, usually bounding boxes are known, I have personal interest in go game, I wish I could understand machine-learning in go game, could you please give me a pointer?

    • Jason Brownlee May 28, 2019 at 8:23 am #


      Sorry, I don’t have any tutorials on games, I cannot give you good advice.

      • usman January 28, 2021 at 9:57 pm #

        which model use for skin wound care?

        • Jason Brownlee January 29, 2021 at 6:04 am #

          Perhaps test a suite of models on your problem and select the one that performs the best.

    • shgele October 2, 2019 at 5:35 am #

      Awsom @Jason Brownlee, I faced some issue like ValueError: If your data is in the form of symbolic tensors, you should specify the steps argument (instead of the batch_size argument, because symbolic tensors are expected to produce batches of input data) when I called Model.predict(image)

      • HEEGLS October 2, 2019 at 6:29 am #

        @Jasin Brownlee, It was very interesting and very well narrated. Could you please include , how would we add additional training set and labels ?
        Let say, I want this model to train on additional data, to classify Faces, or Hand Scribbled Digits and alphabets ? Suppose I have an additional training & test data set.

      • Masoud March 29, 2020 at 12:48 pm #

        choose steps=2

    • andika February 25, 2020 at 12:43 am #

      Nice article,
      I found also an interesting beginner’s guide of YOLOv3 here:

  2. Anna May 28, 2019 at 6:42 pm #

    Amazing post, thanks for sharing

    • Jason Brownlee May 29, 2019 at 8:37 am #

      Thanks, I’m glad it was helpful.

      • Grinzou February 23, 2020 at 8:48 pm #

        Hi sur, I just want to use the Darknet 53 as a features extractor to my dataset
        How can I get these features using Darknet 53 only?

        • Jason Brownlee February 24, 2020 at 7:41 am #

          Sorry, I don’t have a tutorial on darknet, I cannot give you good off the cuff advice.

    • Hamed December 22, 2019 at 9:54 pm #

      Hi Jason,

      Thanks for this great lesson. Have you tried to run any custom object using experiencor’s codes namely and If so, have you faced any error? threw me some errors even with his own config file for raccoon dataset and unfortunately he’s not being responsive. Thank you!

  3. Salman Sajid May 28, 2019 at 11:19 pm #

    Kindly also make a blog on How we train a custom object detection with YOLO3

  4. Katia Z May 29, 2019 at 1:57 am #

    Jason, thank you very much for introducing computer vision so clearly! Could you specify in what part we saved our trained model to the disk?

    • Jason Brownlee May 29, 2019 at 8:50 am #

      The model was pre-trained, we simply loaded it.

      • Katia Z May 31, 2019 at 1:51 pm #

        Thank you so much for clean working code!) Could you advise some working and understandable GitHub repository for YOLOv3 for detecting objects on video, experiencer case is not clear in steps(

        • Jason Brownlee May 31, 2019 at 2:48 pm #

          I hope to cover this in the future, thanks for the suggestion.

  5. Noman Khan May 29, 2019 at 5:50 am #

    Thank you so much Sir! Very amazing and informative tutorial . I am beginner and following your tutorials for learning deep learning.
    kindly Sir!
    1) use YOLOv3 for camera video or simple video
    2) use LSTM for video sequence
    It will be great helpful for me

  6. Lina Xu June 1, 2019 at 1:13 am #

    Thanks so much for posting such a good tutorial, thanks!!!

  7. Jorge a June 1, 2019 at 4:31 pm #

    Thank you so much sir for your helpful tutorial.
    Could you tell me how can i change the number and titles of labels in this model ? or labels are unchangeable ?

  8. Lai Junhao June 4, 2019 at 12:54 am #

    Amazing guide Jason.

    At the start I experienced some difficulty with the library, since the latest version of Tensorflow did not work. I downgraded it to v1.12 and it worked.

    Thanks for the introduction to CV and YOLOv3. It was fun trying it out with my own images.

    • Jason Brownlee June 4, 2019 at 7:53 am #

      Thanks, and well done.

      I tested it with TensorFlow 1.13 on Ubuntu Linux and MacOS without incident.

      What OS/platform are you on?

      • Lai Junhao June 4, 2019 at 11:14 pm #

        i was using pycharm on windows using anaconda as the python interpreter.

        Irregardless, I will be moving on to the kangaroo tutorial since I managed to run the process successfully.

        Thanks again for creating these tutorials

      • Vikas Verma July 2, 2019 at 1:44 am #

        I am getting “AbortedError: Operation received an exception:Status: 3, message: could not create a dilated convolution forward descriptor, in file tensorflow/core/kernels/
        [[{{node conv_1/convolution}}]]” error in predictin step.

        Using tensorflow 1.13.1

  9. Niall O'Hara June 6, 2019 at 1:18 am #

    Great tutorial Jason. Very well laid out and easy to follow. I’m hoping to extend the tutorial to consider training on a custom data set. Would love to see a follow up blog on this 🙂
    Keep up the great work!

  10. Hans Pfeufer June 14, 2019 at 7:47 pm #

    Great tutorial, thank you so much. How can I access the location of an object in an image? I would like to know, for instance, where the zebras are in the image (coordinates of the center of the bounding boxes) and save this data to a file.

    • Jason Brownlee June 15, 2019 at 6:31 am #

      The model will output bounding boxes (pixel coordinates) that you can use any way that you wish.

      In this tutorial, we simply draw the box.

  11. Ivan June 16, 2019 at 12:41 pm #

    Thanks for an ELI5 guide Jason!

    In interpreting the prediction array with decode_netout(), one of the argument was the anchors that you defined:

    anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]

    Is this suppose to be an initial guess for a bounding box of where the object may be?

    • Jason Brownlee June 17, 2019 at 8:13 am #

      Good question.

      It is the shape of the anchor boxes used during training. From the post:

      …and the boxes are defined the context of anchor boxes, carefully chosen based on an analysis of the size of objects in the MSCOCO dataset.

      • Paulo June 19, 2019 at 6:33 am #

        But what those anchors are? What are they used for? I’m guessing I my question is the same as Ivan’s.

        • Jason Brownlee June 19, 2019 at 8:19 am #

          They are chosen to best capture objects in the image net dataset, chosen based on that dataset.

          They are used when making a prediction to help quickly find objects in your new photos – e.g. bounding boxes in the image data.

          • Navin June 23, 2019 at 10:14 pm #

            Hi Jason,
            Question regarding anchors as they are hard coded, is it possible to derive them programmatically from yhat or the three NumPy array values or some other mechanism ?
            Also each array of ‘anchors’ having 6 elements is also puzzling, would be nice to know the process/documentation for defining these values.

            Good tutorial and helpful in getting started.
            A suggestion if you could cover on the the steps to generalize this code e.g. if I pass file path of a photo of a cat or a car etc. code should able to detect that just like it did for zebra ?

            Many thanks.

          • Jason Brownlee June 24, 2019 at 6:31 am #

            They are derived based on the average size of objects in a training dataset.

            You could derive them based on the expected object size in your dataset if you like.

    • Robert Franklin January 18, 2020 at 3:15 am #

      This article helped me understand this.

      From what i can gather the algorithm has set bounding box sizes it uses. Each “cell” (IE small division of the larger image) the probability that this cell contains a specific object is computed for each anchor box size.

      So if you have an image made up of 10 x 10 cells, and 5 anchor sizes, AND 100 objects to detect you will get an output of size:

      10 x 10 x 5 x 100
      10 x 10 for each cell
      x 5 for each anchor size
      x 100 for each object

      each of these will be a probability and we take probabilities which meet a certain threshold.

      I’m just learning this so PLEASE correct me if i’m mistaken.

  12. sumitra June 17, 2019 at 2:35 pm #

    Hi Jason,

    I have followed through the Experiencor YOLO3 for Keras Project to train with my own data set. If i would like to crop the bounding box from the newly predicted images, how do i go about it?

    Thank you

    • Jason Brownlee June 18, 2019 at 6:32 am #

      The model will output bounding boxes, you can use them directly on the image pixels.

      • sumitra June 18, 2019 at 2:58 pm #

        Thanks a lot for the response Jason. I managed to get good outputs with new images. However, the mAP performance of the model indicates 0.000 which is very strange. Could there be any reasons for it?

        • Jason Brownlee June 19, 2019 at 7:48 am #

          Well done.

          Perhaps test a suite of images to see if that uncovers more information?

  13. Chinmay June 25, 2019 at 3:51 pm #

    awesome tutorial sir !! Please answer this question
    many thanks

  14. Ashish Roopan June 27, 2019 at 2:20 am #

    Great work Jason.
    Can you also tell me how to train in with a new dataset?
    Mainly the format of annotation of the dataset to train with.(The YOLOv2 of this repository used .xml format like that of pascalVOC).

    Is it the same or something different?

    • Ashish Roopan June 27, 2019 at 3:40 am #

      Also I need train the model with transfer learning.
      So which all layers should I train?
      If possible can you please give an idea about the code to use?

      Thank you

    • Jason Brownlee June 27, 2019 at 7:58 am #

      I hope to cover that in the future.

      • xavier July 9, 2019 at 10:58 pm #

        Hi Jason, thx for this example !

        note that I found that there is some vertical down shift of the output boxes (it can be seen on your “zebra” image output above, the 3 boxes are sligthly too low. The shift can be really bigger on some other images I tested. Even on 416×416 input images)

        Any chance you fix this soon ?
        Many thanks anyway!

  15. Jorgetheman July 11, 2019 at 3:31 pm #


    This is an awesome tutorial. I went through it but at the end did not get a picture with bounding boxes as shown. I only got the array values and predictions for zebra and percentages.
    A figure with the bounding boxes wasn’t created.

    Any ideas?
    Thank you

  16. Karol Borkowski July 16, 2019 at 4:34 am #

    Is there in the book any additional content in addition to what is in this article?

  17. Karol Borkowski July 21, 2019 at 10:09 pm #

    Isn’t the compilation of the model missing?

  18. Gunners July 22, 2019 at 5:17 pm #

    I want to run this code. But it was too slow. I think it was not run on GPU. How could I test it by GPU? I want to run it in Jupyter Notebook.

  19. Mostafa July 24, 2019 at 6:18 am #

    Thanks so much for posting such a good tutorial. My question is how can we use YoloV3 to detect custom classes (labels)? In other words, how to reuse (retrain) it to detect custom objects?

  20. shreekanth July 24, 2019 at 8:15 pm #

    how can train the model with my data

  21. markos July 25, 2019 at 5:17 am #

    Excellent post, as always. One question. I have some yolo weights in .pt (pytorch) form. Is it possible to load this .pt file in some way or transform it so I can load it in a keras implemented yolo? Many thanks!!!

    • Jason Brownlee July 25, 2019 at 8:00 am #

      It may be possible to convert them.

      Sorry, I don’t have an example of this.

  22. markos July 27, 2019 at 7:09 pm #

    Thanks for your reply, Jason. I’ look it up.

  23. Karol Borkowski July 28, 2019 at 5:21 am #

    I have one technical question about the model. How the last layer knows which units correspond to which cell, since it is denselly connected with the previous one?

    • Jason Brownlee July 28, 2019 at 6:50 am #

      Sorry, what do you mean exactly?

      • Karol Borkowski July 28, 2019 at 8:33 am #

        The output of the YOLO network is S x S x N values, where S is the number of cells in both image directions. Therefore, the network predicts N values for each cell of the image.
        If I understand it correctly, each set on N values delivers a prediction for the corresponding image cell.
        My question is how a given image cell corresponds to the particular set of N values if everything is densely connected with everything? In other words, how the networks “know” which set of the predicted values are for which part of the image?

        I know that the question can be a liittle bit confusing, but I hope that now I have clarified it better.

  24. Karol Borkowski July 29, 2019 at 6:17 am #

    Let me ask about one more thing 😉 The output is a tensor of size n x S x S x (B*5 + C), where n is the number of anchor sizes, S x S is the number of cells, B is the number of anchors for each size and C is the number of classess. In our model we have 3 anchors in 3 different sizes, and the MSCOCO dataset has 80 classes, so the shape of the output tensor should be: (3,S,S,3*5+80), what is (3,S,S,95). However, our output is (3,S,S,255). Why is that?

    • Jason Brownlee July 29, 2019 at 6:27 am #

      Not quite, the output is encoded.

      You can review the decoding functions in the post to see how the 3d output is interpreted.

      • Karol Borkowski July 29, 2019 at 6:37 am #

        Oh, I see it. Thanks again! 😉

  25. Kumar July 31, 2019 at 2:45 am #

    I am new to machine learning. if we use pretrained weights for training custom (new) object, Will it detect old objects? for example, pretrained weights detect 80 objects (classes), I used this weight to train my new object (classes). Will it detect 81 objects or only one (new object).
    if it is detect only one (new object) how to make it detect 81 object.

    Thanks in advance

    • Jason Brownlee July 31, 2019 at 6:56 am #

      You will have to train on the a dataset that combines the original training dataset with your new class.

      Starting with pre-trained weights would speed this up a lot!

  26. FT August 2, 2019 at 5:12 pm #

    thanks a lot for this excellent post. I have a question. In the function correct_yolo_boxes(), what is the purpose to calculate offsets and scale? I thought offsets and scales are always 0 and 1 respectively, and they have no effect.

    • Jason Brownlee August 3, 2019 at 7:48 am #

      I believe we are scaling the boxes to the size of the image.

      • Madalin August 22, 2019 at 11:33 pm #

        Hi, is there a typo in the correct_yolo_boxes() implementation? The first line in the function: new_w, new_h = net_w, net_h will make the offsets and scales always 0 and 1.

        Thank you for your post, it helped me a lot

        • Jason Brownlee August 23, 2019 at 6:28 am #


          • Aleksey August 15, 2020 at 5:41 am #

            So, what about this code?
            offset and scale always 0 and 1.
            Is this mistake?
            what the right code?

  27. Yashash Gaurav August 6, 2019 at 3:52 pm #

    Anyone trying to understand how the Yolov3 outputs are in the shapes they are, this( is an amazing resource.

    Thank you for the code walk through Jason!

  28. Prashanth August 7, 2019 at 4:44 pm #

    Excellent article, thanks.

    I trained my own YOLOv3 model with custom classes (fewer than the default 80) using Darknet. I therefore have a new .cfg file as well as weights file.
    I would like to perform inference using keras on Tensorflow, rather than Darknet.

    Is there a way I can do this using your program above? I don’t see a way to specify my own .cfg file. It only seems to accept the weights file.


    • Jason Brownlee August 8, 2019 at 6:29 am #

      There may be a way, I’m not sure off the cuff sorry. Some experimentation may be required.

      • Prashanth August 8, 2019 at 2:31 pm #

        I tried changing the number of filters from 255 to (num_classes + 5)*3, in each of the 3 YOLO layers. I’m able to get inference results, but they don’t match what I obtained with the original Darknet.

  29. Karthika August 20, 2019 at 1:27 pm #

    Thank you so much for such a simple and elaborate explanation.
    The code is working fine but I have a problem of not being able to save the model.
    I get the following error when I save any model.

    Error! F:\NITD\Project\Code\model.h5 is not UTF-8 encoded.
    Saving disabled.

    • Jason Brownlee August 20, 2019 at 2:13 pm #

      That is very odd.

      Perhaps it is an issue with your h5py library?

      Perhaps try posting/searching on stackoverflow for related issues?

  30. Sachin August 25, 2019 at 4:15 pm #

    Hi , I’m trying to perform this object detection on video, but i’m getting error . check this out.

    ERROR: rectangle() missing required argument ‘rec’ (pos 2)

    Here is the sample code. Do you know how to fix this? i tried a lot of things.

    def cv(frame, x, y, w, h, v_lables, v_scores):
    cv2.rectangle(img = frame,
    pt1 = (x, y),
    pt2 = (x + w, y + h),
    color = (0, 0, 255),
    thickness = 2,
    lineType = cv2.LINE_AA)
    cv2.putText(frame, “{}-{}”.format(v_labels,v_scores), (x + w, y),
    fontScale = 1, color = (255, 255, 255), thickness = 1, lineType = cv2.LINE_AA)
    return frame

    reader = imageio.get_reader(‘video.mp4’)
    fps = reader.get_meta_data()[‘fps’]
    writer = imageio.get_writer(‘output1.mp4’, fps = fps)

    for i,frame in enumerate(reader):
    image, image_w, image_h = load_image_pixels(frame, (input_w, input_h))
    yhat = model.predict(image)
    for j in range(len(yhat)):
    boxes += decode_netout(yhat[j][0], anchors[j], class_threshold, input_h, input_w)
    correct_yolo_boxes(boxes, image_h, image_w, input_h, input_w)
    do_nms(boxes, 0.5)
    v_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)
    for z in range(len(v_boxes)):
    box = v_boxes[z]
    y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax
    width = x2 – x1
    height = y2 – y1
    frame = cv(frame, x1, y1, width, height, v_labels[z], v_scores[z])

  31. Sachin August 26, 2019 at 3:22 pm #

    That’s okay i fixed it. Thank you !

    • Jason Brownlee August 27, 2019 at 6:24 am #

      I’m happy to hear that.

    • Karthika August 28, 2019 at 3:57 pm #

      Can you share the code ?

  32. Tedi August 27, 2019 at 1:02 am #

    Hello Jason,

    Thank you for the comprehensive tutorial.

    2 questions:

    1) I tried removing some of the objects in the labels and the image opened without applying the algorithm, there were no markings on it. How can we add objects such as (Jar, Speaker, etc..) or modify this current list?

    2) When the algorithm works, it does perform well. could we change the bounding box anchors from the code, or it has to be done from the training source code? In case we wanted to increase the height of the box.

    I understand these are in-depth questions, if you could provide any brief steps or research hints from your Books/Posts, that would be appreciated.


  33. Tedi August 28, 2019 at 7:36 am #

    Thank you!

    I will check it out.

  34. Pasin September 2, 2019 at 4:58 pm #

    So I tried to predict an image with the same code as written above but I got this error:

    ValueError: If your data is in the form of symbolic tensors, you should specify the steps argument (instead of the batch_size argument, because symbolic tensors are expected to produce batches of input data).

    Any idea on what’s wrong here? Thanks in advance!

    • Jason Brownlee September 3, 2019 at 6:13 am #

      Perhaps confirm that your libraries are up to date and that you loaded your image as a NumPy array?

  35. Hongbo Ai September 3, 2019 at 8:29 pm #

    the code works for zebra.jpg, but failed to the elephant and carrot pictures which are downloaded from internet

    • Jason Brownlee September 4, 2019 at 5:57 am #

      Perhaps double check hat the images were loaded correctly?

      • Hongbo Ai September 4, 2019 at 11:47 am #

        it is the threshold that filter out the correct result. anyway it works now.

        • Jason Brownlee September 4, 2019 at 1:43 pm #

          Happy to hear that.

        • Fetulhak September 14, 2019 at 9:55 pm #

          what do you mean exactly it is the threshold to filter the correct result? the same problem happened to me when i try using an elephant picture it did not give me the BB? Should I have to change the threshold value every-time I give another picture of different class?

          • Hongbo Ai September 23, 2019 at 7:42 pm #

            the picture outputs a class value as 0.7 while my threshlod is 0.8, so there is no output in the plot picture. when i change the threshlod to 0.6, the plot picture show the correct result

  36. Salman Sajd September 5, 2019 at 3:06 am #

    Thanks for a brilliant blog and have a one request
    Just like ”custom training on MaskR-CNN” blog also make a blog on YOLO to.

  37. Sridharan September 6, 2019 at 2:30 pm #

    Sir, how to store the objects detected in the bounding box as a separate image.

  38. Fetulhak September 14, 2019 at 9:33 pm #

    Thank you for your amazing dedication to teach those of us who are learning through the internet. the internet is my university. Currently I am working on project for detection of malaria parasite from thick blood microscopic images using yolov3 algorithm. what i am afraid of is may be yolov3 will not detect object at parasite level since they are tiny. what is your recommendation for me? I have the microscopic images.

    • Jason Brownlee September 15, 2019 at 6:21 am #

      That sounds like a great project.

      I would encourage you to test a suite of different models and discover what works best for your specific dataset.

  39. Bhaskar Chandra Trivedi September 18, 2019 at 4:54 am #

    Can we have tutorial for object tracking using discrimitive correlation filter

  40. Arelis September 30, 2019 at 10:53 am #

    Thank you. Your explain very good.
    Only question. How I can keep on disk the image with detection object.

    Good bless you

  41. Florin Andrei October 2, 2019 at 2:56 pm #

    Very cool example, but decode_netout() is very slow. Any suggestions for making it faster? Aggressively pruning unwanted items?

    E.g., what if I’m only interested in one type of label?

    • Jason Brownlee October 3, 2019 at 6:36 am #

      Good question. You might have to experiment.

      • Florin Andrei October 3, 2019 at 7:19 am #

        Found an excellent pointer here:

        Or replace

        if(objectness.all() <= obj_thresh): continue


        if (objectness <= obj_thresh).all(): continue

        It speeds up that function by orders of magnitude.

        • Jason Brownlee October 3, 2019 at 1:24 pm #

          Thanks for sharing.

        • Hyunho Kim November 10, 2020 at 6:34 pm #

          I think that the line must be changed like below..

          if netout[int(row)][int(col)][b][5:].max() <= obj_thresh: continue

          The corrected line works well.
          I think original code is a bug. It makes all the boxes valid.

  42. Sandeco October 7, 2019 at 11:28 pm #

    Hi Jason, Thanks for all codes man. My friend, I updated tensorflow to 2.0 version, this code doesn’t work. I changed the importations to tensorflow.keras, but the class “Add” have a problem em this line “return add([skip_connection, x]) if skip else x”. Do u have some tip?

    • Jason Brownlee October 8, 2019 at 8:02 am #

      If you use standalone Keras v2.3.0 on top of TensorFlow 2.0, the code works fine.

  43. Dang Tien Dat October 8, 2019 at 6:34 pm #

    Thank you very much for sharing, I think I understand more about YOLOv3.
    But I still do not understand about meaning of numbers in :
    [(1, 13, 13, 255), (1, 26, 26, 255), (1, 52, 52, 255)]
    I think 13*13 or 26*26 and 52*52 is three different sizes of anchor boxes.
    But what is meaning of number ‘1’ and ‘255’ ?
    Let me thank you again for helping.

    • Jason Brownlee October 9, 2019 at 8:07 am #

      Good question.

      Yes, the first dimension (1) can be ignored, the middle are the size (13 or 26), and the final are the number of boxes, that will need to be reduced or interpreted.

  44. tyt October 16, 2019 at 8:31 pm #


    how can i make detection from this code

    • Jason Brownlee October 17, 2019 at 6:29 am #

      See the section titled “Make a Prediction”.

  45. Peter Mankowski October 28, 2019 at 8:42 am #

    What a great talent you have to compile those tutorials. My students use some of them for the initial training.

    Question: Do you have one for streaming video content/object recognition that I could use when the new batch of interns arrive?
    thanks, peter

    • Jason Brownlee October 28, 2019 at 1:17 pm #


      Sorry, I don’t have tutorials on working with video data, I hope to cover it in the future.

  46. Andrei Tarnakin November 8, 2019 at 10:30 pm #


    I think there is an error in the following line:

    44 netout[…, 4:] = _sigmoid(netout[…, 4:])

    It should be:

    44 netout[…, 4] = _sigmoid(netout[…, 4])

    You can see it by comparing the source code from in experiencor/keras-yolo3 with

  47. Sandra Mateska November 13, 2019 at 4:00 pm #

    I have made a text file for the anchor boxes which looks like this:
    C:\path\00000002.jpg 0.33999999999999997,0.6900000000000001,0.7225,0.26,1 0.7775,0.40249999999999997,0.91,0.265,1 0.68,0.97,0.84,0.8025,1
    (path x1,y1,x2,y2,class)
    I have made it by myself with converting the form from txt file which was like this:
    (class x y width height)
    *Used a different annotator

    Now when I try to run the and put the path to my txt file for my anchors, classes,the path to my foldaer with the photos and the output folder its gives me ERROR :
    Using TensorFlow backend.
    2019-11-13 08:17:06.054804: I C:\tf_jenkins\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
    Traceback (most recent call last):
    File “”, line 190, in
    File “”, line 42, in _main
    with open(annotation_path) as f:
    FileNotFoundError: [Errno 2] No such file or directory: ‘train.txt’

    What should I do ?

  48. Denis Candido December 14, 2019 at 8:33 am #

    Hello Jason!

    Any chance of you writing a post of how can I retrain this yolo model with a new class? With labeled data? Data labeled with program labelImg (

    Is it possible to still use Keras for this task?

    Thanks in advance,

  49. Ankit December 17, 2019 at 11:48 am #

    Thanks for sharing this tutorial with us.

    I am not able to understand how is anchor size associated to yhat in decode_netout. Smaller size output (13,13,255) is being compared with the biggest dimension anchor boxes[116,90, 156,198, 373,326].

    When we say anchor box is of dimension 116,90 how does it map to the original image? what are the units of these dimensions?

  50. mohammed December 21, 2019 at 11:26 pm #

    thank you very much, can you give me details if i want to use this code for multi-view dataset ( any one img has three views)

    • Jason Brownlee December 22, 2019 at 6:14 am #

      Sorry, I have not worked with a multi-view dataset. I cannot give you good advice.

  51. Hamed December 24, 2019 at 4:05 pm #

    Hi Jason,

    Thanks for this great lesson. Have you tried to run any custom object using experiencor’s codes namely and If so, have you faced any error? threw me some errors even with his own config file for raccoon dataset and unfortunately he’s not being responsive. Thank you!

  52. Vikalp Ravi Jain December 29, 2019 at 10:20 pm #

    Hey Jason thank you for the tutorial. Can you please explain from Line 42 – line 46 in the function: decode_netout. that manipulation part is so vague for me:

    boxes = []
    netout[…, :2] = _sigmoid(netout[…, :2])
    netout[…, 4:] = _sigmoid(netout[…, 4:])
    netout[…, 5:] = netout[…, 4][…, np.newaxis] * netout[…, 5:]
    netout[…, 5:] *= netout[…, 5:] > obj_thresh

    • Vikalp Ravi Jain December 29, 2019 at 10:39 pm #

      Specially Line 24
      netout[…, 5:] = netout[…, 4][…, np.newaxis] * netout[…, 5:]

    • Mina May 21, 2020 at 2:06 am #

      did you figure this out?

  53. Nadav December 30, 2019 at 8:21 pm #

    Hi Jason,
    Thank you for the clear demonstration of YOLO. However, how do I use this method/model to train a new data set of my own, not a pre-loaded one?
    Many thanks

  54. Khushit Shah January 2, 2020 at 4:40 pm #

    Hey, Thanks for the tutorials, it helped me a lot, I’m implementing yolov3-tiny into android and I am getting an array of [1,2535,85] as output, I am still confused what to do with it?

    Like which one is classes probability, which ones are x, y, position, which on is width, height? can you help!


    • Jason Brownlee January 3, 2020 at 7:15 am #

      Sorry, I am not familiar with that model.

    • Rahul November 16, 2020 at 1:11 am #

      Did you figure it out?

  55. Sonder January 7, 2020 at 4:35 am #


    Thank you very much for this detailed tutorial, Could I use binary images in training and testing time?

  56. Ali January 10, 2020 at 1:25 pm #

    Thanks for sharing this tutorial with us.

    I am a beginner in deep learning and trying to find the accuracy of YOLOv3 for the test dataset. I can run your code to predict for one image but I could not do it for the whole dataset. In addition, I need to print the precision and other parameters that are useful to compare YOLOv3 with other methods like MASK R-CNN. Can you please guide me on how to do it?


  57. Hansy January 13, 2020 at 3:19 pm #


    Thank you very much for this detailed tutorial, I am a beginner in deep learning, Could I use this YOLOv3 for cancer detection in CT scan images?

  58. Saurabh January 14, 2020 at 2:52 am #

    Hi Jason,

    Thanks for sharing the interesting blog!

    I have trained object detection using ssd (mobilenet-v1) on custom dataset. The dataset consist of uno playing card images (skip, reverse, and draw four). On all these cards, model performs pretty well as I have trained model only on these 3 card (around 278 images with 829 bounding boxes collected using mobile phone). However, I haven’t trained model on any other card but still it detects other cards (inference using webcam).

    How can I fix this?

    Please share your view!

    Thanking you!

    • Jason Brownlee January 14, 2020 at 7:26 am #

      You’re welcome!

      Maybe create an “other” class and give examples of all kinds of random examples during training that belong to this class, then write code to ignore this class in operation.

  59. Saurabh January 14, 2020 at 6:28 pm #

    Thanks Jason for the quick response.

    Is this also applicable in real world scenario? Bcoz let’s say at the moment I am looking for only three cards (skip, reverse and draw four) and ignoring rest of the cards (nearly 10 cards).

    According to my area of interest (skip, reverse and draw four cards), I have collected ~278 images with 829 bounding boxes. As you pointed to include “other” class then I have collect lot more images of other cards. But in the real-world, it is difficult to get images of other class.

    Could you please share your views?

    Please feel free to correct me.

    Thanking you!

    • Jason Brownlee January 15, 2020 at 8:22 am #

      Other is just anything that is not the main focus. You are teaching the model to ignore what is not the main focus of the project.

      Or you can impose a limit on what the model “sees” further up in the pipeline – e.g. the environment in which you deploy the model.

  60. Saurabh January 15, 2020 at 6:35 pm #


    Thanks for the explanation. I got your first point. Regarding your second point:

    [Jason]: you can impose a limit on what the model “sees”

    [Saurabh]: It means I shouldn’t present other class images to the model? Is this true?

    Could you please elaborate more on the second point?

    Thanking you!

    • Jason Brownlee January 16, 2020 at 6:12 am #

      I meant that you have control about how the model will be used and what data will be provided to it. You control the environment and in turn you can ise this to limit the range/complexity when training the model.

  61. Saurabh January 16, 2020 at 6:33 pm #


    Thanks for the explanation. I appreciate your efforts!

    Thanking you!

  62. Brijesh Rawat January 17, 2020 at 5:01 am #

    Great explanation.
    What if we have different set of classes? Can we train our new data-set on the same yolo model?
    Also, how much time can it take to train if I have 10 different classes of object?

  63. nusrat January 25, 2020 at 4:20 am #

    Is it possible to detect live with score in webcam with percentage? plz give a tutorial for live detection

  64. nusrat January 26, 2020 at 3:36 am #

    hi Jason, plz give a tutorial for webcam detection.

    • Jason Brownlee January 26, 2020 at 5:25 am #

      Thanks for the suggestion, maybe in the future.

  65. Mehrdad Jannesar January 26, 2020 at 7:03 pm #

    How can i fit dataset to model this code ?
    I want json file or cifar 10 dataset fit to this code.

  66. vikash January 31, 2020 at 11:55 pm #

    {‘filter’: 64, ‘kernel’: 3, ‘stride’: 2, ‘bnorm’: True, ‘leaky’: True, ‘layer_idx’: 1},
    {‘filter’: 32, ‘kernel’: 1, ‘stride’: 1, ‘bnorm’: True, ‘leaky’: True, ‘layer_idx’: 2},
    —-> {‘filter’: 64, ‘kernel’: 3, ‘stride’: 1, ‘bnorm’: True, ‘leaky’: True, ‘layer_idx’: 3}])

    ValueError: Variable bnorm_0/moving_mean/biased does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=tf.AUTO_REUSE in VarScope?

    i m not able to resolve this error, every time i try to create ,model = make_yolov3_model()
    it give me same value error from the dafination of the make_yolov3_model().

  67. Rahul February 22, 2020 at 5:45 am #

    Hello Jason,

    I tried implementing the above code. However, I don’t see my image being detected by the box.

    I get the array and the plot version of zebra as output but not detected.

    P.S. I am not using the command prompt


  68. KK February 22, 2020 at 10:57 pm #

    Can you please explain why in

    def make_yolov3_model():
    input_image = Input(shape=(None, None, 3))

    The input shape is set to None,None instead of 416×416 which I guess is the default input for yolo v3.

    Secondly if I generate and train a model with this kind of None,None,3 as input the model summary has an input of None,None,3. when I try to convert this model to tflite, it throws an error saying unable to get weights for the input layer – this has been discussed here

    I have created a new model after training by dropping the loss layer like this
    inferenceModel = Model(trainingmodel.input,outputs=[trainingmodel.layers[-7].output]

    However I don’t know the keras syntax to correctly remove the None,None,3 input layer and replace it with an input 416,416,3 to the inference model

    Can you help.

    • Jason Brownlee February 23, 2020 at 7:28 am #

      To keep the cols/rows unbounded.

      I don’t know about tflite, sorry.

      • KK February 23, 2020 at 2:24 pm #

        What about the first query why is the model created with NonexNone instead of 416×416, do you know?

        • Jason Brownlee February 24, 2020 at 7:36 am #

          Yes, I believe a None,None input is to make the model general – to let input size be defined by a provided image.

  69. KK February 24, 2020 at 3:48 am #

    i got it thanks, I can just replace it with fixed size if my training images are going to be of fixed aspect ratio i.e 416×416 – this will solve the tflite conversion issue as well.

  70. Runist February 27, 2020 at 7:07 pm #

    A great course!!!Could write a blog with more detail about YOLO? I want learn step by step how to use build a model like YOLO and train it and predict.

  71. Joy Kumar Chakraborty March 3, 2020 at 5:18 am #

    Hello I downloaded that pretrained weights yolov3.weights and copied this code both are in same folder but I am getting this error file not found

    ipython-input-2-3060e4e6db8a> in __init__(self, weight_file)
    1 class WeightReader:
    2 def __init__(self, weight_file):
    —-> 3 with open(weight_file, ‘rb’) as w_f:
    4 major, = struct.unpack(‘i’,
    5 minor, = struct.unpack(‘i’,

    FileNotFoundError: [Errno 2] No such file or directory: ‘C:\\Users\\Joych\\Downloads\\y1\\YOLOV3-master\\yolov3.weights’

    • Jason Brownlee March 3, 2020 at 6:04 am #

      Nevertheless, the error suggests your code and weights are not in the same folder.

  72. pouria March 5, 2020 at 1:24 pm #

    i have a proble with this code :
    yhat = model.predict(image)
    jupiter error :
    AbortedError: Operation received an exception:Status: 3, message: could not create a dilated convolution forward descriptor, in file tensorflow/core/kernels/
    [[{{node conv_1_2/convolution}}]]
    version my keras : 2.3.1
    help me please 🙂

  73. grini March 9, 2020 at 4:50 am #

    want to get the features extracted by darknet 53
    is that possible !

    • Jason Brownlee March 9, 2020 at 7:18 am #

      Probably. I don’t have an example of using that library, sorry.

  74. pouria March 9, 2020 at 1:59 pm #

    Hi …
    It was really great and helped me a lot …
    I just asked how can I configure this Yolo network with new data like car detection ( fine tune)…
    for example for ua_detrac data ?

  75. thanhhien March 9, 2020 at 3:27 pm #

    I need the execution time around 0.05 seconds per frame but this code takes me 12 seconds from reading the image to drawing the boxes. Are there any measures to reduce this time?

    • Jason Brownlee March 10, 2020 at 5:37 am #


      Faster machine?
      Different implementation?
      Smaller data?

    • shubham July 23, 2020 at 6:31 pm #

      i am also facing this issue if you know how to come up with this, plz lemme know

  76. akbar March 18, 2020 at 12:43 am #

    Thank you so much for your step-by-step codes ..
    I just wanted to ask what should I do if I want to “transfer learning” for my data? For example for vehicles …

  77. Hamed Suliman March 21, 2020 at 7:08 am #

    Hi Jason,
    I wouldlike to train yolov3 on a new object not exist in coco dataset. what is the approach?

  78. Thomas March 25, 2020 at 4:02 am #

    Great article thanks.
    I would like to know what is the output array of yolo? Can some one send the whole array and explain decoding of the array.

  79. Ali March 30, 2020 at 5:56 pm #

    Thank you so much for such a simple and elaborate explanation.

    I want to run the trained YOLOv3 model on GPU for the purpose of object detection. Now, I can run your code (the code you explained above) on the CPU, but I do not know what I should do to run an object detection for an image on GPU. Can you please help me with this issue? I emphasize that I want to do it for just one image object detection, NOT for training the model.

    Thank you

    • Jason Brownlee March 31, 2020 at 7:58 am #

      If you configure your tensorflow installation to use GPU, then YOLO will run on the GPU.

  80. Jimmy April 1, 2020 at 6:20 pm #

    Thank you for such a great article.

    I want to export this yoloV3 model to tensorflow-serving use predict_signature_def() function. However I got output item
    ‘list’ object has no attribute ‘dtype’.

    Do you have a pointer for me to solve this issue ? Thank you so much for your time !

    • Jason Brownlee April 2, 2020 at 5:45 am #

      I don’t sorry. Perhaps try posting your question to stackoverflow.

  81. phil April 12, 2020 at 1:47 pm #

    Jason, great article. Enjoyed reading it.

    A question on the code for the netout function. Experiencor added a _softmax function to the processing of output array. It seems to affect the confidence level in the output. Just want to get your thoughts.

    netout[…, 5:] = netout[…, 4][…, np.newaxis] * _softmax(netout[…, 5:])


    • Jason Brownlee April 13, 2020 at 6:09 am #


      I don’t know much about it, sorry. Perhaps contact him directly?

  82. Ali April 14, 2020 at 5:19 pm #

    Hi Jason,

    I am using an AWS instance with a Tesla-K80 GPU. I first configured Tensorflow to use GPU. Then, I executed the to predict objects in an image. But it took about 20 seconds to predict objects just in one image. I am wondering that I have made a mistake anywhere since I expected the execution time in milliseconds not 20 seconds. Do you think this execution time is normal for YOLO prediction on GPU??


    • Jason Brownlee April 15, 2020 at 7:53 am #

      Perhaps the first time is slow as it starts up and subsequent calls are faster.

  83. Ankit April 20, 2020 at 3:32 am #

    boxes[i].xmin = int((boxes[i].xmin – x_offset) / x_scale * image_w)
    ValueError: cannot convert float NaN to integer

    I am getting this error as many values are nan.

    Btw great article, thanks.

    • Jason Brownlee April 20, 2020 at 5:31 am #

      Looks like you have a nan in your data somehow?

      • Ankit April 21, 2020 at 4:24 am #

        I ran the same images with AlexeyAB trained yolov3 repo and the output is correct.
        I am using the same trained weights in your code but in the prediction, there are non-nan array values with many nan arrays also!

        • Jason Brownlee April 21, 2020 at 6:06 am #

          Interesting, thanks for sharing.

          • Vivek Gupta May 23, 2020 at 1:35 am #

            I am also recieving this error in kaggle notebook. I have correctly copied your code.
            this is the error :
            ValueError Traceback (most recent call last)
            190 boxes += decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)
            191 # correct the sizes of the bounding boxes for the shape of the image
            –> 192 correct_yolo_boxes(boxes, image_h, image_w, input_h, input_w)
            193 # suppress non-maximal boxes
            194 do_nms(boxes, 0.5)

            in correct_yolo_boxes(boxes, image_h, image_w, net_h, net_w)
            70 x_offset, x_scale = (net_w – new_w)/2./net_w, float(new_w)/net_w
            71 y_offset, y_scale = (net_h – new_h)/2./net_h, float(new_h)/net_h
            —> 72 boxes[i].xmin = int((boxes[i].xmin – x_offset) / x_scale * image_w)
            73 boxes[i].xmax = int((boxes[i].xmax – x_offset) / x_scale * image_w)
            74 boxes[i].ymin = int((boxes[i].ymin – y_offset) / y_scale * image_h)

            ValueError: cannot convert float NaN to integer

          • Jason Brownlee May 23, 2020 at 6:25 am #

            Sorry to hear that, run on your workstation at the command line instead.

  84. Jayamani S April 20, 2020 at 9:50 pm #

    Hi Jason,

    Great tutorial . Good explanation and procedure to execute the code.If i change the
    input zebra to other image, the result remain same. May i doing any mistake.

    My work is on Lane detection on Indian road scens. Is YOLO3 used to detect lane. If yes tell me what modification can i do to achieve lane detection.

    • Jason Brownlee April 21, 2020 at 5:55 am #


      No, I believe you will need to train a model on your dataset, or use a pre-trained model for this problem.

  85. Tabby April 23, 2020 at 7:27 pm #

    Hi, I implemented yolov3 model by following your tutorial, now I want to merge my monodepth depth estimation model with yolov3, how I can do that?

  86. Haresh May 17, 2020 at 8:30 pm #

    I’ve come to know that for deciding the shape of anchor boxes , a K-means clustering is used in
    YoloV3. So why should we specify these sizes ,

    anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]

    while getting predictions from the above model.

    Thank you.

  87. Sritharan A May 31, 2020 at 10:08 pm #

    Hi jason,

    Awesome article on YOLO. Thank you so much

    I just dont understand the anchors concept. As you have mentioned, you have gone through the dataset carefully and created these anchors. If you can explain more, we can get more knowledge to create own anchors for our dataset as well

    • Jason Brownlee June 1, 2020 at 6:22 am #

      I did not create the anchors, the developers of the yolo model did that.

      Generally, you want some boxes that are the average size of objects in your photos. These are the anchors.

      • Sritharan A June 7, 2020 at 9:53 pm #

        Yes i read some more interesting articles about anchors. As like you have mentioned, we initially give some average size that might cover our object, but as the model gets trained, the anchor boxes are adjusted as per the loss to accommodate the object

  88. Alex Melbourne June 3, 2020 at 2:48 pm #

    Thank you very much, Jason, exceptionally clear post on a highly sophisticated model. Are you by any chance aware of any attempts to do an object detection in 3D data? Is it even viable to train a model like that without a supercomputer?

    • Jason Brownlee June 4, 2020 at 6:10 am #

      You’re welcome.

      Sorry, I’m not up to speed with 3d object detection.

  89. Leila Dabiran June 4, 2020 at 1:59 pm #

    Thanks for the great tutorial.
    Is there any way to use same implementation but with yolo9000 weights and classes. I want yolov3 to detect more than 80 classes but I don’t have a custom labelled dataset. I want it to use yolo9000 weights and classes. Is this possible?

  90. Sritharan A June 8, 2020 at 3:15 am #

    Hi Jason,

    Can you tell us how to flatten the layer, and add a FC layer after that

    I tried this

    flat1 = Flatten()(model.outputs)
    class1 = Dense(1024, activation=’relu’)(flat1)
    output = Dense(2, activation=’softmax’)(class1)
    # define new model
    model = Model(inputs=model.inputs, outputs=output)
    # summarize

    And got this error “Layer flatten_1 expects 1 inputs, but it received 3 input tensors. Input received: [, , ]”

  91. Sritharan A June 8, 2020 at 11:47 pm #

    I understand it jason.

    model = Model(input_image, [yolo_82, yolo_94, yolo_106]). This is the last layer of our model.

    I try to flatten this layer. And thats why i got this error

    i thought i should get the outputs from each of these three layers and stack them and then flatten it. Is my idea correct?

    • Jason Brownlee June 9, 2020 at 6:02 am #

      Sorry, I don’t understand what you are trying to do, I cannot give you good advice.

  92. Toshit Varshney June 10, 2020 at 9:11 pm #

    please tell how can I customize yolo on my data for training

  93. Jalal Khan June 25, 2020 at 8:43 pm #

    Sir i have an question about activation function. Sir please ask me why we use activation function in yolov3 ?

    • Jason Brownlee June 26, 2020 at 5:33 am #

      Why do we use activation functions in any neural network model?

      Activation functions add non-linearity to the model, allowing it to learn complex representations and relationships.

  94. Ouarda June 26, 2020 at 4:33 am #

    First thank you for this amazing and clearly tutorial .
    i want to ask you about the principale idea of yolo detector ( non maximum suppression) Where did you apply it in the code ?

    • Jason Brownlee June 26, 2020 at 5:41 am #

      It is applied after prediction, the do_nms() function.

  95. Ihsan June 29, 2020 at 9:04 pm #

    Hi Jason, thank You for this tutorial .
    I am triying to train the model with my own dataset , but i had a probleme with the Y_train i didn’t understand what i should put in the three NUMPY ARRAYS for training.

  96. Alaoui July 16, 2020 at 5:20 am #

    Hi Jason,
    Thanks for this great tutorial.
    I’m want to know if the yolo family is the models used in mobiles camera app to detect the faces , and how big model like that (237MB) can be integrated in small camera app ?

    • Jason Brownlee July 16, 2020 at 6:48 am #

      Sorry, I don’t know about the size of the model or mobile apps.

  97. Cong July 29, 2020 at 1:54 am #

    I would like to ask: How to define anchor box:
    anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]

    • Jason Brownlee July 29, 2020 at 5:54 am #

      These come from the paper or original implementation I believe, based on the size of objects in the dataset.

  98. akbar August 13, 2020 at 7:09 am #

    thansk a lot , how can i implement yolo v3 for custom dataset?

  99. sana August 14, 2020 at 11:21 pm #

    how can i do it on my own dataset?

  100. Deepti Joshi August 20, 2020 at 9:49 pm #

    Hi Jason, when I run this code, I only get the image printed without boxes or probability values. Just the plain original image. Can you please help?

  101. Rohit Gupta September 5, 2020 at 12:22 am #

    Hello Jason, I have downloaded weights file and saved it on my PC but when I am reading them using WeightReader, it is giving me file not found error ??

  102. Nicolas September 17, 2020 at 5:34 am #

    Hello Janson, I have a question. do you know a network trainned for identifie the background or escenario in an image?, for instance; the background in this picture is a forest, the background in this picture is a higway

  103. Peter September 18, 2020 at 4:30 am #

    Jason, great tutorial, thank you very much! I am interested in retraining the YOLO framework to detect other things, do you cover this in your book or anywhere else?


  104. Gledson October 23, 2020 at 2:39 am #

    Good afternoon Jason. I’m sorry for the inconvenience. Once again congratulations on your explanations. The explanation of Yolo v3 is very good / interesting. However, I have a doubt. The calculated v_boxes provide, in addition to the boxes, the score value for each class of interest. For example, if I have an image with only one object, in my case a car, I can get a single score value, in this case, the corresponding score value only for the car class. How to obtain the corresponding score values for the other classes? That is, obtain the score value for the car class, obtain the score value for the person class and obtain the score value for the bicycle class with respect to the same object.

    • Jason Brownlee October 23, 2020 at 6:15 am #

      Prediction will return a list of detected objects for you to enumerate.

      • Gledson October 23, 2020 at 9:49 pm #

        Okay, I’ll check. Thank you very much.
        Best regards.

  105. Gledson October 23, 2020 at 2:56 am #

    Good afternoon Jason. I’m sorry for the inconvenience. Is the “objness” variable inside the BoundBox (v_box) the same thing as the confidence score? If so, is the objness the same as the IOU?

    Best Regards.

    • Jason Brownlee October 23, 2020 at 6:16 am #

      I didn’t write that function, perhaps ask the author directly.

      • Gledson October 23, 2020 at 9:50 pm #

        Okay, I’ll ask. Thank you very much.

  106. Amin October 29, 2020 at 9:58 pm #

    Hello Mr. Jason
    Your article was awesome, complete, exciting and informative.
    Here you have used the weights trained by yolov3 that were set to detect zebras.
    I need to be able to identify the type of car and its color by the Yolov3 algorithm.
    Thank you for your help.
    WhatsApp: 09174286232

    • Jason Brownlee October 30, 2020 at 6:51 am #


      Perhaps you can adapt the example for your application exactly.

  107. Geleta November 8, 2020 at 1:36 pm #

    You clearly explained it Jason. Your blog is the first blog that gave me a clear mental picture of machine learning. I have recommended this blog to anyone who asks me where to learn machine learning. After a long time i read this blog on yolov3 which is special. Thanks for your efforts. God bless you

  108. SY Chun November 11, 2020 at 4:01 pm #

    I’m new to the ML. This article is the best one on the planet. With a single copy and paste, I was able to get my own result image! Thank you so much, Jason!

    One thing, would you please help me how to make a joblib dump file, so that I could run it on a flask server? One more Thanks.