It is hyperbole to say deep learning is achieving state-of-the-art results across a range of difficult problem domains. A fact, but also hyperbole.
There is a lot of excitement around artificial intelligence, machine learning and deep learning at the moment. It is also an amazing opportunity to get on on the ground floor of some really powerful tech.
I try hard to convince friends, colleagues and students to get started in deep learning and bold statements like the above are not enough. It requires stories, pictures and research papers.
In this post you will discover amazing and recent applications of deep learning that will inspire you to get started in deep learning.
Getting started in deep learning does not have to mean go and study the equations for the next 2-3 years, it could mean download Keras and start running your first model in 5 minutes flat. Start applied deep learning. Build things. Get excited and turn it into code and systems.
Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.
I have been wanting to write this post for a while. Let’s get started.
Below is the list of the specific examples we are going to look at in this post.
Not all of the examples are technology that is ready for prime time, but guaranteed, they are all examples that will get you excited.
Some are examples that seem ho hum if you have been around the field for a while. In the broader context, they are not ho hum. Not at all.
Frankly, to an old AI hacker like me, some of these examples are a slap in the face. Problems that I simply did not think we could tackle for decades, if at all.
I’ve focused on visual examples because we can look at screenshots and videos to immediately get an idea of what the algorithm is doing, but there are just as many if not more examples in natural language with text and audio data that are not listed.
Here’s the list:
- Colorization of Black and White Images.
- Adding Sounds To Silent Movies.
- Automatic Machine Translation.
- Object Classification in Photographs.
- Automatic Handwriting Generation.
- Character Text Generation.
- Image Caption Generation.
- Automatic Game Playing.
Need help with Deep Learning in Python?
Take my free 2-week email course and discover MLPs, CNNs and LSTMs (with code).
Click to sign-up now and also get a free PDF Ebook version of the course.
1. Automatic Colorization of Black and White Images
Image colorization is the problem of adding color to black and white photographs.
Traditionally this was done by hand with human effort because it is such a difficult task.
Deep learning can be used to use the objects and their context within the photograph to color the image, much like a human operator might approach the problem.
A visual and highly impressive feat.
This capability leverages of the high quality and very large convolutional neural networks trained for ImageNet and co-opted for the problem of image colorization.
Generally the approach involves the use of very large convolutional neural networks and supervised layers that recreate the image with the addition of color.
Impressively, the same approach can be used to colorize still frames of black and white movies
- Deep Colorization [pdf], 2015
- Colorful Image Colorization [pdf] (website), 2016
- Learning Representations for Automatic Colorization [pdf] (website), 2016
- Image Colorization with Deep Convolutional Neural Networks [pdf], 2016
2. Automatically Adding Sounds To Silent Movies
In this task the system must synthesize sounds to match a silent video.
The system is trained using 1000 examples of video with sound of a drum stick striking different surfaces and creating different sounds. A deep learning model associates the video frames with a database of pre-rerecorded sounds in order to select a sound to play that best matches what is happening in the scene.
The system was then evaluated using a turing-test like setup where humans had to determine which video had the real or the fake (synthesized) sounds.
A very cool application of both convolutional neural networks and LSTM recurrent neural networks.
- Artificial intelligence produces realistic sounds that fool humans
- Machines can generate sound effects that fool humans
3. Automatic Machine Translation
This is a task where given words, phrase or sentence in one language, automatically translate it into another language.
Automatic machine translation has been around for a long time, but deep learning is achieving top results in two specific areas:
- Automatic Translation of Text.
- Automatic Translation of Images.
Text translation can be performed without any preprocessing of the sequence, allowing the algorithm to learn the dependencies between words and their mapping to a new language. Stacked networks of large LSTM recurrent neural networks are used to perform this translation.
As you would expect, convolutional neural networks are used to identify images that have letters and where the letters are in the scene. Once identified, they can be turned into text, translated and the image recreated with the translated text. This is often called instant visual translation.
It’s hard to find good resources for this example, if you know any, can you leave a comment.
- Sequence to Sequence Learning with Neural Networks [pdf], 2014
- Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation [pdf], 2014
- Deep Neural Networks in Machine Translation: An Overview [pdf], 2015
4. Object Classification and Detection in Photographs
This task requires the classification of objects within a photograph as one of a set of previously known objects.
State-of-the-art results have been achieved on benchmark examples of this problem using very large convolutional neural networks. A breakthrough in this problem by Alex Krizhevsky et al. results on the ImageNet classification problem called AlexNet.
A more complex variation of this task called object detection involves specifically identifying one or more objects within the scene of the photograph and drawing a box around them.
- ImageNet Classification with Deep Convolutional Neural Networks [pdf], 2012
- Some Improvements on Deep Convolutional Neural Network Based Image Classification [pdf], 2013
- Scalable Object Detection using Deep Neural Networks [pdf], 2013
- Deep Neural Networks for Object Detection [pdf], 2013
5. Automatic Handwriting Generation
This is a task where given a corpus of handwriting examples, generate new handwriting for a given word or phrase.
The handwriting is provided as a sequence of coordinates used by a pen when the handwriting samples were created. From this corpus the relationship between the pen movement and the letters is learned and new examples can be generated ad hoc.
What is fascinating is that different styles can be learned and then mimicked. I would love to see this work combined with some forensic hand writing analysis expertise.
6. Automatic Text Generation
This is an interesting task, where a corpus of text is learned and from this model new text is generated, word-by-word or character-by-character.
The model is capable of learning how to spell, punctuate, form sentiences and even capture the style of the text in the corpus.
Large recurrent neural networks are used to learn the relationship between items in the sequences of input strings and then generate text. More recently LSTM recurrent neural networks are demonstrating great success on this problem using a character-based model, generating one character at time.
Andrej Karpathy provides many examples in his popular blog post on the topic including:
- Paul Graham essays
- Wikipedia articles (including the markup)
- Algebraic Geometry (with LaTeX markup)
- Linux Source Code
- Baby Names
- The Unreasonable Effectiveness of Recurrent Neural Networks
- Auto-Generating Clickbait With Recurrent Neural Networks
- Generating Text with Recurrent Neural Networks [pdf], 2011
- Generating Sequences With Recurrent Neural Networks [pdf], 2013
7. Automatic Image Caption Generation
Automatic image captioning is the task where given an image the system must generate a caption that describes the contents of the image.
In 2014, there were an explosion of deep learning algorithms achieving very impressive results on this problem, leveraging the work from top models for object classification and object detection in photographs.
Once you can detect objects in photographs and generate labels for those objects, you can see that the next step is to turn those labels into a coherent sentence description.
This is one of those results that knocked my socks off and still does. Very impressive indeed.
Generally, the systems involve the use of very large convolutional neural networks for the object detection in the photographs and then a recurrent neural network like an LSTM to turn the labels into a coherent sentence.
These techniques have also been expanded to automatically caption video.
- A picture is worth a thousand (coherent) words: building a natural description of images
- Rapid Progress in Automatic Image Captioning
- Deep Visual-Semantic Alignments for Generating Image Descriptions [pdf] (and website), 2015
- Explain Images with Multimodal Recurrent Neural Networks [pdf, 2014]
- Long-term Recurrent Convolutional Networks for Visual Recognition and Description [pdf], 2014
- Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models [pdf], 2014
- Sequence to Sequence — Video to Text [pdf], 2015
8. Automatic Game Playing
This is a task where a model learns how to play a computer game based only on the pixels on the screen.
This very difficult task is the domain of deep reinforcement models and is the breakthrough that DeepMind (now part of google) is renown for achieving.
This work was expanded and culminated in Google DeepMind’s AlphaGo that beat the world master at the game Go.
- Deep Reinforcement Learning
- DeepMind YouTube Channel
- Deep Q Learning Demo
- DeepMind’s AI is an Atari gaming pro now
- Playing Atari with Deep Reinforcement Learning [pdf], 2013
- Human-level control through deep reinforcement learning, 2015
- Mastering the game of Go with deep neural networks and tree search, 2016
Below are some additional examples to those listed above.
- Automatic speech recognition.
- Automatic speech understanding.
- Automatically focus attention on objects in images.
- Recurrent Models of Visual Attention [pdf], 2014
- Automatically answer questions about objects in a photograph.
- Automatically turing sketches into photos.
- Convolutional Sketch Inversion [pdf], 2016
- Automatically create stylized images from rough sketches.
There are a lot of great resources, talks and more to help you get excited about the capabilities and potential for deep learning.
Below are a few additional resources to help get you excited.
- The Unreasonable Effectiveness of Deep Learning, talk by Yann LeCun in 2014
- Awesome Deep Vision List of top deep learning computer vision papers
- The wonderful and terrifying implications of computers that can learn, TED talk by Jeremy Howard
- Which algorithm has achieved the best results, list of top results on computer vision datasets
- How Neural Networks Really Work, Geoffrey Hinton 2016
In this post you have discovered 8 applications of deep learning that are intended to inspire you.
This show rather than tell approach is expect to cut through the hyperbole and give you a clearer idea of the current and future capabilities of deep learning technology.
Do you know of any inspirational examples of deep learning not listed here? Let me know in the comments.