9 Applications of Deep Learning for Computer Vision

Last Updated on

The field of computer vision is shifting from statistical methods to deep learning neural network methods.

There are still many challenging problems to solve in computer vision. Nevertheless, deep learning methods are achieving state-of-the-art results on some specific problems.

It is not just the performance of deep learning models on benchmark problems that is most interesting; it is the fact that a single model can learn meaning from images and perform vision tasks, obviating the need for a pipeline of specialized and hand-crafted methods.

In this post, you will discover nine interesting computer vision tasks where deep learning methods are achieving some headway.

Discover how to build models for photo classification, object detection, face recognition, and more in my new computer vision book, with 30 step-by-step tutorials and full source code.

Let’s get started.

Overview

In this post, we will look at the following computer vision problems where deep learning has been used:

  1. Image Classification
  2. Image Classification With Localization
  3. Object Detection
  4. Object Segmentation
  5. Image Style Transfer
  6. Image Colorization
  7. Image Reconstruction
  8. Image Super-Resolution
  9. Image Synthesis
  10. Other Problems

Note, when it comes to the image classification (recognition) tasks, the naming convention from the ILSVRC has been adopted. Although the tasks focus on images, they can be generalized to the frames of video.

I have tried to focus on the types of end-user problems that you may be interested in, as opposed to more academic sub-problems where deep learning does well.

Each example provides a description of the problem, an example, and references to papers that demonstrate the methods and results.

Do you have a favorite computer vision application for deep learning that is not listed?
Let me know in the comments below.

Image Classification

Image classification involves assigning a label to an entire image or photograph.

This problem is also referred to as “object classification” and perhaps more generally as “image recognition,” although this latter task may apply to a much broader set of tasks related to classifying the content of images.

Some examples of image classification include:

  • Labeling an x-ray as cancer or not (binary classification).
  • Classifying a handwritten digit (multiclass classification).
  • Assigning a name to a photograph of a face (multiclass classification).

A popular example of image classification used as a benchmark problem is the MNIST dataset.

Example of Handwritten Digits From the MNIST Dataset

Example of Handwritten Digits From the MNIST Dataset

A popular real-world version of classifying photos of digits is The Street View House Numbers (SVHN) dataset.

For state-of-the-art results and relevant papers on these and other image classification tasks, see:

There are many image classification tasks that involve photographs of objects. Two popular examples include the CIFAR-10 and CIFAR-100 datasets that have photographs to be classified into 10 and 100 classes respectively.

Example of Photographs of Objects From the CIFAR-10 Dataset

Example of Photographs of Objects From the CIFAR-10 Dataset

The Large Scale Visual Recognition Challenge (ILSVRC) is an annual competition in which teams compete for the best performance on a range of computer vision tasks on data drawn from the ImageNet database. Many important advancements in image classification have come from papers published on or about tasks from this challenge, most notably early papers on the image classification task. For example:

Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Image Classification With Localization

Image classification with localization involves assigning a class label to an image and showing the location of the object in the image by a bounding box (drawing a box around the object).

This is a more challenging version of image classification.

Some examples of image classification with localization include:

  • Labeling an x-ray as cancer or not and drawing a box around the cancerous region.
  • Classifying photographs of animals and drawing a box around the animal in each scene.

A classical dataset for image classification with localization is the PASCAL Visual Object Classes datasets, or PASCAL VOC for short (e.g. VOC 2012). These are datasets used in computer vision challenges over many years.

Example of Image Classification With Localization of a Dog from VOC 2012

Example of Image Classification With Localization of a Dog from VOC 2012

The task may involve adding bounding boxes around multiple examples of the same object in the image. As such, this task may sometimes be referred to as “object detection.”

Example of Image Classification With Localization of Multiple Chairs From VOC 2012

Example of Image Classification With Localization of Multiple Chairs From VOC 2012

The ILSVRC2016 Dataset for image classification with localization is a popular dataset comprised of 150,000 photographs with 1,000 categories of objects.

Some examples of papers on image classification with localization include:

Object Detection

Object detection is the task of image classification with localization, although an image may contain multiple objects that require localization and classification.

This is a more challenging task than simple image classification or image classification with localization, as often there are multiple objects in the image of different types.

Often, techniques developed for image classification with localization are used and demonstrated for object detection.

Some examples of object detection include:

  • Drawing a bounding box and labeling each object in a street scene.
  • Drawing a bounding box and labeling each object in an indoor photograph.
  • Drawing a bounding box and labeling each object in a landscape.

The PASCAL Visual Object Classes datasets, or PASCAL VOC for short (e.g. VOC 2012), is a common dataset for object detection.

Another dataset for multiple computer vision tasks is Microsoft’s Common Objects in Context Dataset, often referred to as MS COCO.

Example of Object Detection With Faster R-CNN on the MS COCO Dataset

Example of Object Detection With Faster R-CNN on the MS COCO Dataset

Some examples of papers on object detection include:

Object Segmentation

Object segmentation, or semantic segmentation, is the task of object detection where a line is drawn around each object detected in the image. Image segmentation is a more general problem of spitting an image into segments.

Object detection is also sometimes referred to as object segmentation.

Unlike object detection that involves using a bounding box to identify objects, object segmentation identifies the specific pixels in the image that belong to the object. It is like a fine-grained localization.

More generally, “image segmentation” might refer to segmenting all pixels in an image into different categories of object.

Again, the VOC 2012 and MS COCO datasets can be used for object segmentation.

Example of Object Segmentation on the COCO Dataset

Example of Object Segmentation on the COCO Dataset
Taken from “Mask R-CNN”.

The KITTI Vision Benchmark Suite is another object segmentation dataset that is popular, providing images of streets intended for training models for autonomous vehicles.

Some example papers on object segmentation include:

Style Transfer

Style transfer or neural style transfer is the task of learning style from one or more images and applying that style to a new image.

This task can be thought of as a type of photo filter or transform that may not have an objective evaluation.

Examples include applying the style of specific famous artworks (e.g. by Pablo Picasso or Vincent van Gogh) to new photographs.

Datasets often involve using famous artworks that are in the public domain and photographs from standard computer vision datasets.

Example of Neural Style Transfer from Famous Artworks to a Photograph

Example of Neural Style Transfer From Famous Artworks to a Photograph
Taken from “A Neural Algorithm of Artistic Style”

Some papers include:

Image Colorization

Image colorization or neural colorization involves converting a grayscale image to a full color image.

This task can be thought of as a type of photo filter or transform that may not have an objective evaluation.

Examples include colorizing old black and white photographs and movies.

Datasets often involve using existing photo datasets and creating grayscale versions of photos that models must learn to colorize.

Examples of Photo Colorization

Examples of Photo Colorization
Taken from “Colorful Image Colorization”

Some papers include:

Image Reconstruction

Image reconstruction and image inpainting is the task of filling in missing or corrupt parts of an image.

This task can be thought of as a type of photo filter or transform that may not have an objective evaluation.

Examples include reconstructing old, damaged black and white photographs and movies (e.g. photo restoration).

Datasets often involve using existing photo datasets and creating corrupted versions of photos that models must learn to repair.

Example of Photo Inpainting

Example of Photo Inpainting.
Taken from “Image Inpainting for Irregular Holes Using Partial Convolutions”

Some papers include:

Image Super-Resolution

Image super-resolution is the task of generating a new version of an image with a higher resolution and detail than the original image.

Often models developed for image super-resolution can be used for image restoration and inpainting as they solve related problems.

Datasets often involve using existing photo datasets and creating down-scaled versions of photos for which models must learn to create super-resolution versions.

Example of the results from Different Super-Resolution Techniques

Example of the Results From Different Super-Resolution Techniques.
Taken from “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network”

Some papers include:

Image Synthesis

Image synthesis is the task of generating targeted modifications of existing images or entirely new images.

This is a very broad area that is rapidly advancing.

It may include small modifications of image and video (e.g. image-to-image translations), such as:

  • Changing the style of an object in a scene.
  • Adding an object to a scene.
  • Adding a face to a scene.
Example of Styling Zebras and Horses

Example of Styling Zebras and Horses.
Taken from “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”

It may also include generating entirely new images, such as:

  • Generating faces.
  • Generating bathrooms.
  • Generating clothes.
Example of Generated Bathrooms

Example of Generated Bathrooms.
Taken from “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”

Some papers include:

Other Problems

There are other important and interesting problems that I did not cover because they are not purely computer vision tasks.

Notable examples image to text and text to image:

Presumably, one learns to map between other modalities and images, such as audio.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Survey Papers

Datasets

Articles

References

Summary

In this post, you discovered nine applications of deep learning to computer vision tasks.

Was your favorite example of deep learning for computer vision missed?
Let me know in the comments.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Deep Learning Models for Vision Today!

Deep Learning for Computer Vision

Develop Your Own Vision Models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Computer Vision

It provides self-study tutorials on topics like:
classification, object detection (yolo and rcnn), face recognition (vggface and facenet), data preparation and much more...

Finally Bring Deep Learning to your Vision Projects

Skip the Academics. Just Results.

See What's Inside

29 Responses to 9 Applications of Deep Learning for Computer Vision

  1. Elie Kawerk March 14, 2019 at 12:07 am #

    Hi Jason,

    Thanks for this nice post! Are you planning on releasing a book on CV? Will it also include the foundations of CV with openCV?

    Best,
    Elie

    • Jason Brownlee March 14, 2019 at 9:24 am #

      Thanks.

      I don’t plan to cover OpenCV, but I do plan to cover deep learning for computer vision. I hope to release a book on the topic soon.

  2. Bart March 14, 2019 at 7:57 am #

    Hey Jason,

    Great read!!

    but, but….

    comp vision is easy (relatively) and covered everywhere.

    Please, please cover sound recognition with TIMIT dataset .

    I would appreciate that immensely!

    cheers!
    Bart

  3. Ab March 15, 2019 at 6:18 am #

    Great stuff as always! when is your new book/books coming out? I will be glad to get it πŸ™‚ thank you for the great work πŸ™‚

  4. Temo March 15, 2019 at 7:28 am #

    Hello Jason,
    thanks for the nice post.
    could you please, tell something about extracting other information from images such as depth and motion.

    • Jason Brownlee March 15, 2019 at 2:26 pm #

      Interesting question.

      I’m not sure off hand, sorry.

  5. ohoud Aziz March 15, 2019 at 7:36 am #

    I’m really enjoy reading your articles .

  6. SHAHEEN ALHIRMIZY March 15, 2019 at 5:41 pm #

    Hi Jason How are doing may god bless you.
    you dident talk about satellite images analysis the most important field

  7. P.ananth raj March 15, 2019 at 7:15 pm #

    Thanks for nice article

  8. abkul March 15, 2019 at 7:37 pm #

    Hi Mr. Jason,
    Thanks for your excellent blog.

    I am an avid follower of your blog and also purchased some of your e-books.
    Please cover topics on combination of CNN + LSTM in future

  9. Guy Koren March 23, 2019 at 6:08 pm #

    Hi, Jason.
    Great post ! very informative ! (as alwas πŸ™‚ )
    Although provides a good coverage of computer vision for image analysis, I still lack similar information on using deep learning for image sequence (video) – like action recognition, video captioning, video “super resolution” (in time axis) etc.
    Also , I join Abkul’s suggestion for writing such a post on speech and other sequential datasets / problems.

    Thanks a lot for the post !
    Guy

  10. Bashir March 28, 2019 at 8:32 pm #

    Nice post !

  11. Dee Dee May 27, 2019 at 7:38 am #

    So after studying this book, which p.hd topics can you suggest this book could help greatly?

    • Jason Brownlee May 27, 2019 at 2:37 pm #

      My book is intended for practitioners, nevertheless, academics may also find it useful in terms of defining base models for comparison and on learning how to use the Keras library effectively for computer vision applications.

  12. Manohar July 11, 2019 at 8:08 pm #

    There are lot of things to learn and apply in Computer vision. Great article. Thanks so much Jason for giving the insights.

  13. Jonny Braidy July 29, 2019 at 8:14 pm #

    Hi Jason, thanks you for your insight in Computer Vision…

    My question regarding Computer Vision Face ID Identifying Face A from Face B from Face C etc… just like Microsoft Face Recognition Engine, or Detecting a set of similar types of objects with different/varying sizes & different usage related, markings tears, cuts, deformations caused by usage or like detecting banknotes or metal coins with each one of them identifiable by the engine.

    What are the Learning Materials, Technologies & Tools needed to build a similar Engine, albeit not that accurate?

    What materials in your publication(s) can cover the above mentioned topics?

    Can you give an estimate on the cost in time & money you might charge for developing such an engine, or an MVP version? & are available for such a task?

    Thanks You for your precious time.

    Regards.

  14. Sandeep Panchal September 6, 2019 at 10:49 pm #

    Very very well written. Thanks for this blog, sir. I always love reading your blog.

Leave a Reply