9 Applications of Deep Learning for Computer Vision

By Jason Brownlee on July 5, 2019 in Deep Learning for Computer Vision 39

The field of computer vision is shifting from statistical methods to deep learning neural network methods.

There are still many challenging problems to solve in computer vision. Nevertheless, deep learning methods are achieving state-of-the-art results on some specific problems.

It is not just the performance of deep learning models on benchmark problems that is most interesting; it is the fact that a single model can learn meaning from images and perform vision tasks, obviating the need for a pipeline of specialized and hand-crafted methods.

In this post, you will discover nine interesting computer vision tasks where deep learning methods are achieving some headway.

Kick-start your project with my new book Deep Learning for Computer Vision, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Overview

In this post, we will look at the following computer vision problems where deep learning has been used:

Image Classification
Image Classification With Localization
Object Detection
Object Segmentation
Image Style Transfer
Image Colorization
Image Reconstruction
Image Super-Resolution
Image Synthesis
Other Problems

Note, when it comes to the image classification (recognition) tasks, the naming convention from the ILSVRC has been adopted. Although the tasks focus on images, they can be generalized to the frames of video.

I have tried to focus on the types of end-user problems that you may be interested in, as opposed to more academic sub-problems where deep learning does well.

Each example provides a description of the problem, an example, and references to papers that demonstrate the methods and results.

Do you have a favorite computer vision application for deep learning that is not listed?
Let me know in the comments below.

Image Classification

Image classification involves assigning a label to an entire image or photograph.

This problem is also referred to as “object classification” and perhaps more generally as “image recognition,” although this latter task may apply to a much broader set of tasks related to classifying the content of images.

Some examples of image classification include:

Labeling an x-ray as cancer or not (binary classification).
Classifying a handwritten digit (multiclass classification).
Assigning a name to a photograph of a face (multiclass classification).

A popular example of image classification used as a benchmark problem is the MNIST dataset.

Example of Handwritten Digits From the MNIST Dataset

A popular real-world version of classifying photos of digits is The Street View House Numbers (SVHN) dataset.

For state-of-the-art results and relevant papers on these and other image classification tasks, see:

What is the class of this image?

There are many image classification tasks that involve photographs of objects. Two popular examples include the CIFAR-10 and CIFAR-100 datasets that have photographs to be classified into 10 and 100 classes respectively.

Example of Photographs of Objects From the CIFAR-10 Dataset

The Large Scale Visual Recognition Challenge (ILSVRC) is an annual competition in which teams compete for the best performance on a range of computer vision tasks on data drawn from the ImageNet database. Many important advancements in image classification have come from papers published on or about tasks from this challenge, most notably early papers on the image classification task. For example:

Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Image Classification With Localization

Image classification with localization involves assigning a class label to an image and showing the location of the object in the image by a bounding box (drawing a box around the object).

This is a more challenging version of image classification.

Some examples of image classification with localization include:

Labeling an x-ray as cancer or not and drawing a box around the cancerous region.
Classifying photographs of animals and drawing a box around the animal in each scene.

A classical dataset for image classification with localization is the PASCAL Visual Object Classes datasets, or PASCAL VOC for short (e.g. VOC 2012). These are datasets used in computer vision challenges over many years.

Example of Image Classification With Localization of a Dog from VOC 2012

The task may involve adding bounding boxes around multiple examples of the same object in the image. As such, this task may sometimes be referred to as “object detection.”

Example of Image Classification With Localization of Multiple Chairs From VOC 2012

The ILSVRC2016 Dataset for image classification with localization is a popular dataset comprised of 150,000 photographs with 1,000 categories of objects.

Some examples of papers on image classification with localization include:

Selective Search for Object Recognition, 2013.
Rich feature hierarchies for accurate object detection and semantic segmentation, 2014.
Fast R-CNN, 2015.

Object Detection

Object detection is the task of image classification with localization, although an image may contain multiple objects that require localization and classification.

This is a more challenging task than simple image classification or image classification with localization, as often there are multiple objects in the image of different types.

Often, techniques developed for image classification with localization are used and demonstrated for object detection.

Some examples of object detection include:

Drawing a bounding box and labeling each object in a street scene.
Drawing a bounding box and labeling each object in an indoor photograph.
Drawing a bounding box and labeling each object in a landscape.

The PASCAL Visual Object Classes datasets, or PASCAL VOC for short (e.g. VOC 2012), is a common dataset for object detection.

Another dataset for multiple computer vision tasks is Microsoft’s Common Objects in Context Dataset, often referred to as MS COCO.

Example of Object Detection With Faster R-CNN on the MS COCO Dataset

Some examples of papers on object detection include:

Object Segmentation

Object segmentation, or semantic segmentation, is the task of object detection where a line is drawn around each object detected in the image. Image segmentation is a more general problem of spitting an image into segments.

Object detection is also sometimes referred to as object segmentation.

Unlike object detection that involves using a bounding box to identify objects, object segmentation identifies the specific pixels in the image that belong to the object. It is like a fine-grained localization.

More generally, “image segmentation” might refer to segmenting all pixels in an image into different categories of object.

Again, the VOC 2012 and MS COCO datasets can be used for object segmentation.

Example of Object Segmentation on the COCO Dataset
Taken from “Mask R-CNN”.

The KITTI Vision Benchmark Suite is another object segmentation dataset that is popular, providing images of streets intended for training models for autonomous vehicles.

Some example papers on object segmentation include:

Style Transfer

Style transfer or neural style transfer is the task of learning style from one or more images and applying that style to a new image.

This task can be thought of as a type of photo filter or transform that may not have an objective evaluation.

Examples include applying the style of specific famous artworks (e.g. by Pablo Picasso or Vincent van Gogh) to new photographs.

Datasets often involve using famous artworks that are in the public domain and photographs from standard computer vision datasets.

Example of Neural Style Transfer From Famous Artworks to a Photograph
Taken from “A Neural Algorithm of Artistic Style”

Some papers include:

Image Colorization

Image colorization or neural colorization involves converting a grayscale image to a full color image.

This task can be thought of as a type of photo filter or transform that may not have an objective evaluation.

Examples include colorizing old black and white photographs and movies.

Datasets often involve using existing photo datasets and creating grayscale versions of photos that models must learn to colorize.

Examples of Photo Colorization
Taken from “Colorful Image Colorization”

Some papers include:

Image Reconstruction

Image reconstruction and image inpainting is the task of filling in missing or corrupt parts of an image.

This task can be thought of as a type of photo filter or transform that may not have an objective evaluation.

Examples include reconstructing old, damaged black and white photographs and movies (e.g. photo restoration).

Datasets often involve using existing photo datasets and creating corrupted versions of photos that models must learn to repair.

Example of Photo Inpainting.
Taken from “Image Inpainting for Irregular Holes Using Partial Convolutions”

Some papers include:

Image Super-Resolution

Image super-resolution is the task of generating a new version of an image with a higher resolution and detail than the original image.

Often models developed for image super-resolution can be used for image restoration and inpainting as they solve related problems.

Datasets often involve using existing photo datasets and creating down-scaled versions of photos for which models must learn to create super-resolution versions.

Example of the Results From Different Super-Resolution Techniques.
Taken from “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network”

Some papers include:

Image Synthesis

Image synthesis is the task of generating targeted modifications of existing images or entirely new images.

This is a very broad area that is rapidly advancing.

It may include small modifications of image and video (e.g. image-to-image translations), such as:

Changing the style of an object in a scene.
Adding an object to a scene.
Adding a face to a scene.

Example of Styling Zebras and Horses.
Taken from “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”

It may also include generating entirely new images, such as:

Generating faces.
Generating bathrooms.
Generating clothes.

Example of Generated Bathrooms.
Taken from “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”

Some papers include:

Summary

In this post, you discovered nine applications of deep learning to computer vision tasks.

Was your favorite example of deep learning for computer vision missed?
Let me know in the comments.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

39 Responses to 9 Applications of Deep Learning for Computer Vision

Elie Kawerk March 14, 2019 at 12:07 am #

Hi Jason,

Thanks for this nice post! Are you planning on releasing a book on CV? Will it also include the foundations of CV with openCV?

Best,
Elie

Reply
- Jason Brownlee March 14, 2019 at 9:24 am #
  
  Thanks.
  
  I don’t plan to cover OpenCV, but I do plan to cover deep learning for computer vision. I hope to release a book on the topic soon.
  
  Reply
Bart March 14, 2019 at 7:57 am #

Hey Jason,

Great read!!

but, but….

comp vision is easy (relatively) and covered everywhere.

Please, please cover sound recognition with TIMIT dataset .

I would appreciate that immensely!

cheers!
Bart

Reply
- Bart March 14, 2019 at 8:31 am #
  
  PS: by TIMIT dataset, I mean specifically phoneme classification.
  
  isnt that exciting:
  https://github.com/llSourcell/Neural_Network_Voices
  
  sound/speach recognition is more challenging, hence little coverage….
  
  Reply
- Jason Brownlee March 14, 2019 at 9:29 am #
  
  Thanks for the suggestion.
  
  Reply
Ab March 15, 2019 at 6:18 am #

Great stuff as always! when is your new book/books coming out? I will be glad to get it 🙂 thank you for the great work 🙂

Reply
- Jason Brownlee March 15, 2019 at 6:34 am #
  
  Perhaps a few weeks.
  
  Reply
Temo March 15, 2019 at 7:28 am #

Hello Jason,
thanks for the nice post.
could you please, tell something about extracting other information from images such as depth and motion.

Reply
- Jason Brownlee March 15, 2019 at 2:26 pm #
  
  Interesting question.
  
  I’m not sure off hand, sorry.
  
  Reply
ohoud Aziz March 15, 2019 at 7:36 am #

I’m really enjoy reading your articles .

Reply
- Jason Brownlee March 15, 2019 at 2:26 pm #
  
  Thanks.
  
  Reply
SHAHEEN ALHIRMIZY March 15, 2019 at 5:41 pm #

Hi Jason How are doing may god bless you.
you dident talk about satellite images analysis the most important field

Reply
- Jason Brownlee March 16, 2019 at 7:48 am #
  
  Thanks, wonderful suggestion!
  
  Reply
P.ananth raj March 15, 2019 at 7:15 pm #

Thanks for nice article

Reply
- Jason Brownlee March 16, 2019 at 7:50 am #
  
  You’re welcome, I’m glad it helped.
  
  Reply
abkul March 15, 2019 at 7:37 pm #

Hi Mr. Jason,
Thanks for your excellent blog.

I am an avid follower of your blog and also purchased some of your e-books.
Please cover topics on combination of CNN + LSTM in future

Reply
- Jason Brownlee March 16, 2019 at 7:51 am #
  
  Great suggestion, thanks.
  
  Reply
Guy Koren March 23, 2019 at 6:08 pm #

Hi, Jason.
Great post ! very informative ! (as alwas 🙂 )
Although provides a good coverage of computer vision for image analysis, I still lack similar information on using deep learning for image sequence (video) – like action recognition, video captioning, video “super resolution” (in time axis) etc.
Also , I join Abkul’s suggestion for writing such a post on speech and other sequential datasets / problems.

Thanks a lot for the post !
Guy

Reply
- Jason Brownlee March 24, 2019 at 7:05 am #
  
  Thanks Guy!
  
  Reply
Bashir March 28, 2019 at 8:32 pm #

Nice post !

Reply
- Jason Brownlee March 29, 2019 at 8:31 am #
  
  Thanks.
  
  Reply
Dee Dee May 27, 2019 at 7:38 am #

So after studying this book, which p.hd topics can you suggest this book could help greatly?

Reply
- Jason Brownlee May 27, 2019 at 2:37 pm #
  
  My book is intended for practitioners, nevertheless, academics may also find it useful in terms of defining base models for comparison and on learning how to use the Keras library effectively for computer vision applications.
  
  Reply
Manohar July 11, 2019 at 8:08 pm #

There are lot of things to learn and apply in Computer vision. Great article. Thanks so much Jason for giving the insights.

Reply
- Jason Brownlee July 12, 2019 at 8:34 am #
  
  Thanks, I’m glad it helped.
  
  Reply
Jonny Braidy July 29, 2019 at 8:14 pm #

Hi Jason, thanks you for your insight in Computer Vision…

My question regarding Computer Vision Face ID Identifying Face A from Face B from Face C etc… just like Microsoft Face Recognition Engine, or Detecting a set of similar types of objects with different/varying sizes & different usage related, markings tears, cuts, deformations caused by usage or like detecting banknotes or metal coins with each one of them identifiable by the engine.

What are the Learning Materials, Technologies & Tools needed to build a similar Engine, albeit not that accurate?

What materials in your publication(s) can cover the above mentioned topics?

Can you give an estimate on the cost in time & money you might charge for developing such an engine, or an MVP version? & are available for such a task?

Thanks You for your precious time.

Regards.

Reply
- Jason Brownlee July 30, 2019 at 6:10 am #
  
  This might be a good starting point:
  https://machinelearningmastery.com/introduction-to-deep-learning-for-face-recognition/
  
  Reply
Sandeep Panchal September 6, 2019 at 10:49 pm #

Very very well written. Thanks for this blog, sir. I always love reading your blog.

Reply
- Jason Brownlee September 7, 2019 at 5:30 am #
  
  Thanks, I’m glad it helped!
  
  Reply
Stephen Mc January 17, 2020 at 10:51 pm #

Hi Jason, thanks for this piece.

I’m an investment analyst and wondering what companies are leading in this space? There seems to be a lot to explode within computer vision–hardware, software… and then the industries that benefit. But i’m struggling to see what companies are making money from this currently.

Thanks again!

Reply
- Jason Brownlee January 18, 2020 at 8:47 am #
  
  Not my area, I don’t know sorry.
  
  I just help developers get results with the techniques.
  
  Reply
Sanchit February 9, 2020 at 2:51 am #

Hi Jason, This is a very nice article.
I am further interested to know more about ways to implement ‘Quality Based Image Classification’ – Can you help me with some content on the same. I know BRISK and BIQA are few such methods but would be great to know from you if there are better and proven methods.

Reply
- Jason Brownlee February 9, 2020 at 6:24 am #
  
  Thanks.
  
  Sorry, I’m not aware of that problem, what is it exactly?
  
  Reply
  - Sanchit February 9, 2020 at 11:00 pm #
    
    let’s say that there are huge number of pre-scanned images and you know that the images are not scanned properly. Is it possible to run classification on these images and label them basis quality : good, bad, worse…the quality characteristics could be noise, blur, skew, contrast etc. – can there be a method to give quality metadata in output and suggest what needs to be improved and how so that the image becomes machine readable further for OCR and text conversion etc.
    
    Reply
    - Jason Brownlee February 10, 2020 at 6:31 am #
      
      Yes, you can classify images based on quality.
      
      I’m not aware of existing models that provide meta data on image quality.
      
      Reply
apache January 7, 2021 at 10:52 am #

hello, excuse because my comment it not really about article. i am new in computer vision, i need some scientific paper about computer vision problem, i don’t know how and where to begin find. Please can i have help?

Reply
- Jason Brownlee January 7, 2021 at 2:01 pm #
  
  If you want to get started with computer vision with deep learning, you can start here:
  https://machinelearningmastery.com/start-here/#dlfcv
  
  If you have questions about a paper, perhaps contact the author directly.
  
  Reply
Vidya March 1, 2021 at 1:54 pm #

Hi Jason .

This question is for application 2: Image Classification With Localization

If for given labeled xray images , we were to extract the contour for a bone graft/implant and then classify the implant , would we use the same techniques as for image classification with localization ?
Any references for the same ?

Thanks !

Reply
- Jason Brownlee March 2, 2021 at 5:39 am #
  
  I recommend checking for literature on the topic on scholar.google.com
  
  Reply

Navigation

9 Applications of Deep Learning for Computer Vision

Overview

Image Classification

Want Results with Deep Learning for Computer Vision?

Image Classification With Localization

Object Detection

Object Segmentation

Style Transfer

Image Colorization

Image Reconstruction

Image Super-Resolution

Image Synthesis

Other Problems

Further Reading

Survey Papers

Datasets

Articles

References

Summary

Develop Deep Learning Models for Vision Today!

Develop Your Own Vision Models in Minutes

Finally Bring Deep Learning to your Vision Projects

More On This Topic

39 Responses to 9 Applications of Deep Learning for Computer Vision

Leave a Reply Click here to cancel reply.