A Gentle Introduction to Transfer Learning for Deep Learning

Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task.

It is a popular approach in deep learning where pre-trained models are used as the starting point on computer vision and natural language processing tasks given the vast compute and time resources required to develop neural network models on these problems and from the huge jumps in skill that they provide on related problems.

In this post, you will discover how you can use transfer learning to speed up training and improve the performance of your deep learning model.

After reading this post, you will know:

  • What transfer learning is and how to use it.
  • Common examples of transfer learning in deep learning.
  • When to use transfer learning on your own predictive modeling problems.

Let’s get started.

A Gentle Introduction to Transfer Learning with Deep Learning

A Gentle Introduction to Transfer Learning with Deep Learning
Photo by Mike’s Birds, some rights reserved.

What is Transfer Learning?

Transfer learning is a machine learning technique where a model trained on one task is re-purposed on a second related task.

Transfer learning and domain adaptation refer to the situation where what has been learned in one setting … is exploited to improve generalization in another setting

— Page 526, Deep Learning, 2016.

Transfer learning is an optimization that allows rapid progress or improved performance when modeling the second task.

Transfer learning is the improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned.

Chapter 11: Transfer Learning, Handbook of Research on Machine Learning Applications, 2009.

Transfer learning is related to problems such as multi-task learning and concept drift and is not exclusively an area of study for deep learning.

Nevertheless, transfer learning is popular in deep learning given the enormous resources required to train deep learning models or the large and challenging datasets on which deep learning models are trained.

Transfer learning only works in deep learning if the model features learned from the first task are general.

In transfer learning, we first train a base network on a base dataset and task, and then we repurpose the learned features, or transfer them, to a second target network to be trained on a target dataset and task. This process will tend to work if the features are general, meaning suitable to both base and target tasks, instead of specific to the base task.

How transferable are features in deep neural networks?

This form of transfer learning used in deep learning is called inductive transfer. This is where the scope of possible models (model bias) is narrowed in a beneficial way by using a model fit on a different but related task.

Depiction of Inductive Transfer

Depiction of Inductive Transfer
Taken from “Transfer Learning”

How to Use Transfer Learning?

You can use transfer learning on your own predictive modeling problems.

Two common approaches are as follows:

  1. Develop Model Approach
  2. Pre-trained Model Approach

Develop Model Approach

  1. Select Source Task. You must select a related predictive modeling problem with an abundance of data where there is some relationship in the input data, output data, and/or concepts learned during the mapping from input to output data.
  2. Develop Source Model. Next, you must develop a skillful model for this first task. The model must be better than a naive model to ensure that some feature learning has been performed.
  3. Reuse Model. The model fit on the source task can then be used as the starting point for a model on the second task of interest. This may involve using all or parts of the model, depending on the modeling technique used.
  4. Tune Model. Optionally, the model may need to be adapted or refined on the input-output pair data available for the task of interest.

Pre-trained Model Approach

  1. Select Source Model. A pre-trained source model is chosen from available models. Many research institutions release models on large and challenging datasets that may be included in the pool of candidate models from which to choose from.
  2. Reuse Model. The model pre-trained model can then be used as the starting point for a model on the second task of interest. This may involve using all or parts of the model, depending on the modeling technique used.
  3. Tune Model. Optionally, the model may need to be adapted or refined on the input-output pair data available for the task of interest.

This second type of transfer learning is common in the field of deep learning.

Examples of Transfer Learning with Deep Learning

Let’s make this concrete with two common examples of transfer learning with deep learning models.

Transfer Learning with Image Data

It is common to perform transfer learning with predictive modeling problems that use image data as input.

This may be a prediction task that takes photographs or video data as input.

For these types of problems, it is common to use a deep learning model pre-trained for a large and challenging image classification task such as the ImageNet 1000-class photograph classification competition.

The research organizations that develop models for this competition and do well often release their final model under a permissive license for reuse. These models can take days or weeks to train on modern hardware.

These models can be downloaded and incorporated directly into new models that expect image data as input.

Three examples of models of this type include:

For more examples, see the Caffe Model Zoo where more pre-trained models are shared.

This approach is effective because the images were trained on a large corpus of photographs and require the model to make predictions on a relatively large number of classes, in turn, requiring that the model efficiently learn to extract features from photographs in order to perform well on the problem.

In their Stanford course on Convolutional Neural Networks for Visual Recognition, the authors caution to carefully choose how much of the pre-trained model to use in your new model.

[Convolutional Neural Networks] features are more generic in early layers and more original-dataset-specific in later layers

— Transfer Learning, CS231n Convolutional Neural Networks for Visual Recognition

Transfer Learning with Language Data

It is common to perform transfer learning with natural language processing problems that use text as input or output.

For these types of problems, a word embedding is used that is a mapping of words to a high-dimensional continuous vector space where different words with a similar meaning have a similar vector representation.

Efficient algorithms exist to learn these distributed word representations and it is common for research organizations to release pre-trained models trained on very large corpa of text documents under a permissive license.

Two examples of models of this type include:

These distributed word representation models can be downloaded and incorporated into deep learning language models in either the interpretation of words as input or the generation of words as output from the model.

In his book on Deep Learning for Natural Language Processing, Yoav Goldberg cautions:

… one can download pre-trained word vectors that were trained on very large quantities of text […] differences in training regimes and underlying corpora have a strong influence on the resulting representations, and that the available pre-trained representations may not be the best choice for [your] particular use case.

— Page 135, Neural Network Methods in Natural Language Processing, 2017.

When to Use Transfer Learning?

Transfer learning is an optimization, a shortcut to saving time or getting better performance.

In general, it is not obvious that there will be a benefit to using transfer learning in the domain until after the model has been developed and evaluated.

Lisa Torrey and Jude Shavlik in their chapter on transfer learning describe three possible benefits to look for when using transfer learning:

  1. Higher start. The initial skill (before refining the model) on the source model is higher than it otherwise would be.
  2. Higher slope. The rate of improvement of skill during training of the source model is steeper than it otherwise would be.
  3. Higher asymptote. The converged skill of the trained model is better than it otherwise would be.
Three ways in which transfer might improve learning

Three ways in which transfer might improve learning.
Taken from “Transfer Learning”.

Ideally, you would see all three benefits from a successful application of transfer learning.

It is an approach to try if you can identify a related task with abundant data and you have the resources to develop a model for that task and reuse it on your own problem, or there is a pre-trained model available that you can use as a starting point for your own model.

On some problems where you may not have very much data, transfer learning can enable you to develop skillful models that you simply could not develop in the absence of transfer learning.

The choice of source data or source model is an open problem and may require domain expertise and/or intuition developed via experience.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Books

Papers

Pre-trained Models

Articles

Summary

In this post, you discovered how you can use transfer learning to speed up training and improve the performance of your deep learning model.

Specifically, you learned:

  • What transfer learning is and how it is used in deep learning.
  • When to use transfer learning.
  • Examples of transfer learning used on computer vision and natural language processing tasks.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Frustrated With Your Progress In Deep Learning?

Deep Learning with Python

 What If You Could Develop A Network in Minutes

…with just a few lines of Python

Discover how in my new Ebook: Deep Learning With Python

It covers self-study tutorials and end-to-end projects on topics like:
Multilayer PerceptronsConvolutional Nets and Recurrent Neural Nets, and more…

Finally Bring Deep Learning To
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.

38 Responses to A Gentle Introduction to Transfer Learning for Deep Learning

  1. HandsOnML December 20, 2017 at 2:42 pm #

    As always, well written and insightful article. The additional resources you provide on this topic are also very helpful! Thanks.

  2. Eric H December 22, 2017 at 5:32 am #

    You repeated a sentence early in the article.
    “Transfer learning only works in deep learning if the model features learned from the first task are general.”

  3. Norma A. January 5, 2018 at 5:39 pm #

    Hello Dr. Jason. Thank you so much for sharing clearly and in a succinct way your knowledge and insights on ML and Deep Learning.

    Reviewing the pre-trained model references, it appears that Google Inception Model is not longer available. I also checked at https://github.com/tensorflow/models/tree/master/inception but is not available either.
    Do you have idea if this is now placed at some other location? Thank you!

  4. F. Ameen January 29, 2018 at 4:12 pm #

    Hello Jason,
    I am a big fan of your work. I wanted to ask that how to choose a pre-trained model for my specific problem. for example what would be a good pre-trained model that would work well on handwritten arabic alphabet classification?

    • Jason Brownlee January 30, 2018 at 9:47 am #

      Perhaps a model trained on other character data?

      • F. Ameen January 31, 2018 at 2:33 am #

        yes. because all keras pre-trained models require input size (244, 244, 3) while my data has (64, 64, 1) dimensions. also imagenet and character classification are very different problems. Most of the imagenet features are redundant for my problem.

        • Jason Brownlee January 31, 2018 at 9:47 am #

          I believe the models could be adapted and may still add value.

  5. Reemo A. February 22, 2018 at 7:06 am #

    Many thanks for the good explanation..Can i ask you about your suggestion for the most suitable pre-trained model for image super resolution..how can i select it ?

    • Jason Brownlee February 22, 2018 at 11:22 am #

      Sorry, I have not worked on super resolution. I don’t have a good suggestion at the moment. Perhaps try a google search?

    • Cat Chenal May 28, 2018 at 6:49 am #

      Here is an implementation of image deblurring using GAN [https://blog.sicara.com/keras-generative-adversarial-networks-image-deblurring-45e3ab6977b5].
      Asking your question to its author might be fruitful.
      Good luck.

  6. Ahmad Raza April 17, 2018 at 1:32 pm #

    Thanks! I have understand this concept. I want to ask a question that, for example there is a pre-trained (OCR) model for English language, and I want to transfer this model into (OCR) for my local language which has completely different alphabets than English. It would help me or not ?

  7. Nastaran April 27, 2018 at 1:44 pm #

    Just wondering if applying transfer learning is a solution to dealing with small data-set size?

  8. Lena April 27, 2018 at 7:51 pm #

    Hi Jason, thank you for this article! I wonder if one could fine-tune a model like resnet, trained for classification, for the purpose of say image filtering? Like local contrast enhancement or denoise etc. What you think? Thanks

    • Jason Brownlee April 28, 2018 at 5:27 am #

      Sounds like a good idea, try it. Let me know how you go.

      • Lena April 28, 2018 at 2:45 pm #

        Ok, thanks!

  9. Vijay Ravi May 5, 2018 at 11:37 am #

    Thanks! Helped a lot 🙂

  10. Ahmed May 25, 2018 at 10:14 am #

    Many thanks
    I want to ask if I have PDF content fir example drawings and I need model to train I will give the model what is that geometry which pre trained model you can recommend?

    • Jason Brownlee May 25, 2018 at 2:52 pm #

      Perhaps you might be able to use an existing computer vision model. It would be cheap to try and evaluate the results.

  11. Nastaran June 21, 2018 at 11:29 am #

    Just wondering what is the difference between transfer learning and unsupervised pre-training?

    • Jason Brownlee June 21, 2018 at 4:57 pm #

      They are orthogonal ideas. In theory, could use transfer learning with a supervised or an unsupervised learning problem.

      • afan June 23, 2018 at 6:41 pm #

        Hello sir i need your Help Please if you can .

  12. JG July 24, 2018 at 6:49 am #

    I got an idea for transfer learning, I want to share for your opinion but, in the physical-mathematical domain. This is my anology or equivalence (lets call it JGs approach). For example, what about if in order to solve a physical problem (in our case e.g. an image classification), I approach it with some mathematical description such as partial derivatives, Laplace operator, etc. (in our case this could be the neural model architecture, with convolutions, fully connected , data_augmentation, …that we are intend to apply, etc). But because this math are very complexes with only know some few particular solutions under some initial o boundary values (and this could be the pre-trained weights in our ML or DL issue)…son under OTHER boundary or initial or complexes conditions we do not have any (of these analytical) solutions (this could be the case with other image classification or other datasets inputs) …so we can use the pretty well known solution an try to apply for our problem case, where we intend to obtain similars solutions (this is now the transfer learning repurpose in order to search for similar or analogous solutions)…what do you think Mr. Brownlee about the analogy? Of course, as in real life, probably, from time to time, the solutions to our problems are radically different to the ones we try to reused (model or weight for radically different images sets or classification), ..but at least, the transfer learning essay serves as initial inspiration:-))

    • Jason Brownlee July 24, 2018 at 2:29 pm #

      Not sure I follow, sorry.

      Are you able to simplify the explanation?

  13. PG August 7, 2018 at 9:39 pm #

    Actually, Very vague explanation.. I read it twice. but still I don’t know how to use it in Keras..
    Do you have any short simple example with Keras?

  14. C August 12, 2018 at 3:30 am #

    Thanks for this. Lovely introduction to the area.

    Do you know of any papers using the “Develop Model Approach’? It seems so simple yet I cannot find any work on this specifically.

    • Jason Brownlee August 12, 2018 at 6:35 am #

      Perhaps search for transfer learning on scholar.google.com

      • C August 12, 2018 at 7:00 pm #

        I have done so but approaches seem to be either about the pre-trained model method or some method which leverages the knowledge that two domains are related and so adds an extra step to the process, such as dimensionality reduction.

        The two survey papers also do not mention a simple Develop Model Approach.

  15. Juma Ally September 1, 2018 at 11:34 am #

    Thank you for introduction about transfer learning.
    Is it possible to use this model for the calling records data with the attributes likes(source call, dist call, start time, duration and locations) for prediction such as user mobility prediction or characterize traffic on base station etc.?

Leave a Reply