SALE! Use code blackfriday for 40% off everything!
Hurry, sale ends soon! Click to see the full catalog.

A Gentle Introduction to Transfer Learning for Deep Learning

Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task.

It is a popular approach in deep learning where pre-trained models are used as the starting point on computer vision and natural language processing tasks given the vast compute and time resources required to develop neural network models on these problems and from the huge jumps in skill that they provide on related problems.

In this post, you will discover how you can use transfer learning to speed up training and improve the performance of your deep learning model.

After reading this post, you will know:

  • What transfer learning is and how to use it.
  • Common examples of transfer learning in deep learning.
  • When to use transfer learning on your own predictive modeling problems.

Kick-start your project with my new book Deep Learning for Computer Vision, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

For an example of how to use transfer learning in computer vision, see the post:

A Gentle Introduction to Transfer Learning with Deep Learning

A Gentle Introduction to Transfer Learning with Deep Learning
Photo by Mike’s Birds, some rights reserved.

What is Transfer Learning?

Transfer learning is a machine learning technique where a model trained on one task is re-purposed on a second related task.

Transfer learning and domain adaptation refer to the situation where what has been learned in one setting … is exploited to improve generalization in another setting

— Page 526, Deep Learning, 2016.

Transfer learning is an optimization that allows rapid progress or improved performance when modeling the second task.

Transfer learning is the improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned.

Chapter 11: Transfer Learning, Handbook of Research on Machine Learning Applications, 2009.

Transfer learning is related to problems such as multi-task learning and concept drift and is not exclusively an area of study for deep learning.

Nevertheless, transfer learning is popular in deep learning given the enormous resources required to train deep learning models or the large and challenging datasets on which deep learning models are trained.

Transfer learning only works in deep learning if the model features learned from the first task are general.

In transfer learning, we first train a base network on a base dataset and task, and then we repurpose the learned features, or transfer them, to a second target network to be trained on a target dataset and task. This process will tend to work if the features are general, meaning suitable to both base and target tasks, instead of specific to the base task.

How transferable are features in deep neural networks?

This form of transfer learning used in deep learning is called inductive transfer. This is where the scope of possible models (model bias) is narrowed in a beneficial way by using a model fit on a different but related task.

Depiction of Inductive Transfer

Depiction of Inductive Transfer
Taken from “Transfer Learning”

Want Better Results with Deep Learning?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

How to Use Transfer Learning?

You can use transfer learning on your own predictive modeling problems.

Two common approaches are as follows:

  1. Develop Model Approach
  2. Pre-trained Model Approach

Develop Model Approach

  1. Select Source Task. You must select a related predictive modeling problem with an abundance of data where there is some relationship in the input data, output data, and/or concepts learned during the mapping from input to output data.
  2. Develop Source Model. Next, you must develop a skillful model for this first task. The model must be better than a naive model to ensure that some feature learning has been performed.
  3. Reuse Model. The model fit on the source task can then be used as the starting point for a model on the second task of interest. This may involve using all or parts of the model, depending on the modeling technique used.
  4. Tune Model. Optionally, the model may need to be adapted or refined on the input-output pair data available for the task of interest.

Pre-trained Model Approach

  1. Select Source Model. A pre-trained source model is chosen from available models. Many research institutions release models on large and challenging datasets that may be included in the pool of candidate models from which to choose from.
  2. Reuse Model. The model pre-trained model can then be used as the starting point for a model on the second task of interest. This may involve using all or parts of the model, depending on the modeling technique used.
  3. Tune Model. Optionally, the model may need to be adapted or refined on the input-output pair data available for the task of interest.

This second type of transfer learning is common in the field of deep learning.

Examples of Transfer Learning with Deep Learning

Let’s make this concrete with two common examples of transfer learning with deep learning models.

Transfer Learning with Image Data

It is common to perform transfer learning with predictive modeling problems that use image data as input.

This may be a prediction task that takes photographs or video data as input.

For these types of problems, it is common to use a deep learning model pre-trained for a large and challenging image classification task such as the ImageNet 1000-class photograph classification competition.

The research organizations that develop models for this competition and do well often release their final model under a permissive license for reuse. These models can take days or weeks to train on modern hardware.

These models can be downloaded and incorporated directly into new models that expect image data as input.

Three examples of models of this type include:

For more examples, see the Caffe Model Zoo where more pre-trained models are shared.

This approach is effective because the images were trained on a large corpus of photographs and require the model to make predictions on a relatively large number of classes, in turn, requiring that the model efficiently learn to extract features from photographs in order to perform well on the problem.

In their Stanford course on Convolutional Neural Networks for Visual Recognition, the authors caution to carefully choose how much of the pre-trained model to use in your new model.

[Convolutional Neural Networks] features are more generic in early layers and more original-dataset-specific in later layers

— Transfer Learning, CS231n Convolutional Neural Networks for Visual Recognition

Transfer Learning with Language Data

It is common to perform transfer learning with natural language processing problems that use text as input or output.

For these types of problems, a word embedding is used that is a mapping of words to a high-dimensional continuous vector space where different words with a similar meaning have a similar vector representation.

Efficient algorithms exist to learn these distributed word representations and it is common for research organizations to release pre-trained models trained on very large corpa of text documents under a permissive license.

Two examples of models of this type include:

These distributed word representation models can be downloaded and incorporated into deep learning language models in either the interpretation of words as input or the generation of words as output from the model.

In his book on Deep Learning for Natural Language Processing, Yoav Goldberg cautions:

… one can download pre-trained word vectors that were trained on very large quantities of text […] differences in training regimes and underlying corpora have a strong influence on the resulting representations, and that the available pre-trained representations may not be the best choice for [your] particular use case.

— Page 135, Neural Network Methods in Natural Language Processing, 2017.

When to Use Transfer Learning?

Transfer learning is an optimization, a shortcut to saving time or getting better performance.

In general, it is not obvious that there will be a benefit to using transfer learning in the domain until after the model has been developed and evaluated.

Lisa Torrey and Jude Shavlik in their chapter on transfer learning describe three possible benefits to look for when using transfer learning:

  1. Higher start. The initial skill (before refining the model) on the source model is higher than it otherwise would be.
  2. Higher slope. The rate of improvement of skill during training of the source model is steeper than it otherwise would be.
  3. Higher asymptote. The converged skill of the trained model is better than it otherwise would be.
Three ways in which transfer might improve learning

Three ways in which transfer might improve learning.
Taken from “Transfer Learning”.

Ideally, you would see all three benefits from a successful application of transfer learning.

It is an approach to try if you can identify a related task with abundant data and you have the resources to develop a model for that task and reuse it on your own problem, or there is a pre-trained model available that you can use as a starting point for your own model.

On some problems where you may not have very much data, transfer learning can enable you to develop skillful models that you simply could not develop in the absence of transfer learning.

The choice of source data or source model is an open problem and may require domain expertise and/or intuition developed via experience.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.




Pre-trained Models



In this post, you discovered how you can use transfer learning to speed up training and improve the performance of your deep learning model.

Specifically, you learned:

  • What transfer learning is and how it is used in deep learning.
  • When to use transfer learning.
  • Examples of transfer learning used on computer vision and natural language processing tasks.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Deep Learning Models for Vision Today!

Deep Learning for Computer Vision

Develop Your Own Vision Models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Computer Vision

It provides self-study tutorials on topics like:
classification, object detection (yolo and rcnn), face recognition (vggface and facenet), data preparation and much more...

Finally Bring Deep Learning to your Vision Projects

Skip the Academics. Just Results.

See What's Inside

164 Responses to A Gentle Introduction to Transfer Learning for Deep Learning

  1. Avatar
    HandsOnML December 20, 2017 at 2:42 pm #

    As always, well written and insightful article. The additional resources you provide on this topic are also very helpful! Thanks.

    • Avatar
      Jason Brownlee December 20, 2017 at 3:50 pm #


      • Avatar
        A.G.Ansari May 12, 2019 at 9:10 pm #

        sir i want to know that how to create pretrained network for IDS. will you help me in this regard. will be thankul to you.

        • Avatar
          Jason Brownlee May 13, 2019 at 6:47 am #

          Sorry, I’m not aware of pre-trained models for that class problem.

  2. Avatar
    Eric H December 22, 2017 at 5:32 am #

    You repeated a sentence early in the article.
    “Transfer learning only works in deep learning if the model features learned from the first task are general.”

  3. Avatar
    Norma A. January 5, 2018 at 5:39 pm #

    Hello Dr. Jason. Thank you so much for sharing clearly and in a succinct way your knowledge and insights on ML and Deep Learning.

    Reviewing the pre-trained model references, it appears that Google Inception Model is not longer available. I also checked at but is not available either.
    Do you have idea if this is now placed at some other location? Thank you!

  4. Avatar
    F. Ameen January 29, 2018 at 4:12 pm #

    Hello Jason,
    I am a big fan of your work. I wanted to ask that how to choose a pre-trained model for my specific problem. for example what would be a good pre-trained model that would work well on handwritten arabic alphabet classification?

    • Avatar
      Jason Brownlee January 30, 2018 at 9:47 am #

      Perhaps a model trained on other character data?

      • Avatar
        F. Ameen January 31, 2018 at 2:33 am #

        yes. because all keras pre-trained models require input size (244, 244, 3) while my data has (64, 64, 1) dimensions. also imagenet and character classification are very different problems. Most of the imagenet features are redundant for my problem.

        • Avatar
          Jason Brownlee January 31, 2018 at 9:47 am #

          I believe the models could be adapted and may still add value.

  5. Avatar
    Reemo A. February 22, 2018 at 7:06 am #

    Many thanks for the good explanation..Can i ask you about your suggestion for the most suitable pre-trained model for image super can i select it ?

    • Avatar
      Jason Brownlee February 22, 2018 at 11:22 am #

      Sorry, I have not worked on super resolution. I don’t have a good suggestion at the moment. Perhaps try a google search?

    • Avatar
      Cat Chenal May 28, 2018 at 6:49 am #

      Here is an implementation of image deblurring using GAN [].
      Asking your question to its author might be fruitful.
      Good luck.

  6. Avatar
    Ahmad Raza April 17, 2018 at 1:32 pm #

    Thanks! I have understand this concept. I want to ask a question that, for example there is a pre-trained (OCR) model for English language, and I want to transfer this model into (OCR) for my local language which has completely different alphabets than English. It would help me or not ?

  7. Avatar
    Nastaran April 27, 2018 at 1:44 pm #

    Just wondering if applying transfer learning is a solution to dealing with small data-set size?

  8. Avatar
    Lena April 27, 2018 at 7:51 pm #

    Hi Jason, thank you for this article! I wonder if one could fine-tune a model like resnet, trained for classification, for the purpose of say image filtering? Like local contrast enhancement or denoise etc. What you think? Thanks

    • Avatar
      Jason Brownlee April 28, 2018 at 5:27 am #

      Sounds like a good idea, try it. Let me know how you go.

      • Avatar
        Lena April 28, 2018 at 2:45 pm #

        Ok, thanks!

  9. Avatar
    Vijay Ravi May 5, 2018 at 11:37 am #

    Thanks! Helped a lot 🙂

  10. Avatar
    Ahmed May 25, 2018 at 10:14 am #

    Many thanks
    I want to ask if I have PDF content fir example drawings and I need model to train I will give the model what is that geometry which pre trained model you can recommend?

    • Avatar
      Jason Brownlee May 25, 2018 at 2:52 pm #

      Perhaps you might be able to use an existing computer vision model. It would be cheap to try and evaluate the results.

  11. Avatar
    Nastaran June 21, 2018 at 11:29 am #

    Just wondering what is the difference between transfer learning and unsupervised pre-training?

    • Avatar
      Jason Brownlee June 21, 2018 at 4:57 pm #

      They are orthogonal ideas. In theory, could use transfer learning with a supervised or an unsupervised learning problem.

      • Avatar
        afan June 23, 2018 at 6:41 pm #

        Hello sir i need your Help Please if you can .

  12. Avatar
    JG July 24, 2018 at 6:49 am #

    I got an idea for transfer learning, I want to share for your opinion but, in the physical-mathematical domain. This is my anology or equivalence (lets call it JGs approach). For example, what about if in order to solve a physical problem (in our case e.g. an image classification), I approach it with some mathematical description such as partial derivatives, Laplace operator, etc. (in our case this could be the neural model architecture, with convolutions, fully connected , data_augmentation, …that we are intend to apply, etc). But because this math are very complexes with only know some few particular solutions under some initial o boundary values (and this could be the pre-trained weights in our ML or DL issue)…son under OTHER boundary or initial or complexes conditions we do not have any (of these analytical) solutions (this could be the case with other image classification or other datasets inputs) …so we can use the pretty well known solution an try to apply for our problem case, where we intend to obtain similars solutions (this is now the transfer learning repurpose in order to search for similar or analogous solutions)…what do you think Mr. Brownlee about the analogy? Of course, as in real life, probably, from time to time, the solutions to our problems are radically different to the ones we try to reused (model or weight for radically different images sets or classification), ..but at least, the transfer learning essay serves as initial inspiration:-))

    • Avatar
      Jason Brownlee July 24, 2018 at 2:29 pm #

      Not sure I follow, sorry.

      Are you able to simplify the explanation?

  13. Avatar
    PG August 7, 2018 at 9:39 pm #

    Actually, Very vague explanation.. I read it twice. but still I don’t know how to use it in Keras..
    Do you have any short simple example with Keras?

  14. Avatar
    C August 12, 2018 at 3:30 am #

    Thanks for this. Lovely introduction to the area.

    Do you know of any papers using the “Develop Model Approach’? It seems so simple yet I cannot find any work on this specifically.

    • Avatar
      Jason Brownlee August 12, 2018 at 6:35 am #

      Perhaps search for transfer learning on

      • Avatar
        C August 12, 2018 at 7:00 pm #

        I have done so but approaches seem to be either about the pre-trained model method or some method which leverages the knowledge that two domains are related and so adds an extra step to the process, such as dimensionality reduction.

        The two survey papers also do not mention a simple Develop Model Approach.

  15. Avatar
    Juma Ally September 1, 2018 at 11:34 am #

    Thank you for introduction about transfer learning.
    Is it possible to use this model for the calling records data with the attributes likes(source call, dist call, start time, duration and locations) for prediction such as user mobility prediction or characterize traffic on base station etc.?

  16. Avatar
    sana khan September 24, 2018 at 8:55 pm #

    Hi, It’s me Mr. Sana Ullah Khan, I want to work in detection of malignant cells in breast cytology images, I am confused that how I can use a CNN with transfer learning for detection and classification of malignant cells. I scatch one diagram, as well as i, want to share with you that how I can work. guide me, please

    • Avatar
      Jason Brownlee September 25, 2018 at 6:21 am #

      You can cut the top/bottom off a pre-trained model, add new layers then train the new layers of the model on your new dataset.

  17. Avatar
    SAM September 28, 2018 at 12:43 pm #

    Hi Sir I am seeking for a help if you do not mind

    I am using transfer-learning to classify a dataset of images with google inception V3. I got the code from the tutorial on tensorflow website
    I run the code on tensorflow virtual environment on my machine and it works like they want. What I am looking for is what should I add or change in the code ”” to show the confusion matrix with python and Thanks.

  18. Avatar
    Nikronic November 22, 2018 at 11:07 pm #

    Very good explanation. Thank you

    There is a little problem, link to google Inception model is broken. Please update the link.

  19. Avatar
    nandini December 17, 2018 at 11:18 pm #

    can we apply transfer learning on RNN model using keras.

  20. Avatar
    Cathy January 29, 2019 at 9:23 am #

    Thanks Jason for the nice work, as always!

    If I understand correctly, transfer learning is largely focused in using pretrained model on a large data set to make predictions on new data. I am wondering how transferrable is pretrained model on a not so large data set, i.e., 10k entries, to similar problems? For example, I have trained a good neural network to do sentiment analysis on ~10k tweets. How confident should I be to use this model to do sentiment analysis on other problems?

    Thanks in advance!

    • Avatar
      Jason Brownlee January 29, 2019 at 11:40 am #

      Typically a model fit on a very large dataset is a good starting point for use on small datasets.

      Going from small to small datasets – it really depends on the tasks involved. I recommend prototype some examples and evaluate performance.

  21. Avatar
    Ark Mishara February 11, 2019 at 10:31 pm #

    Hi Jason,

    Cant explain your good work in words.
    Could you please share a working code where we are using Transfer learning to train a naive model from a trained existing model


    • Avatar
      Jason Brownlee February 12, 2019 at 8:03 am #

      Thanks for the suggestion, I will work on some examples.

  22. Avatar
    Nandini February 12, 2019 at 5:17 pm #

    can we increment the model along with existing model ,each time when ever new data comes without starting from scartch for RNN model

    Please suggest me any methods are available for this requirement .

    • Avatar
      Jason Brownlee February 13, 2019 at 7:53 am #

      What do you mean by increment the model exactly?
      What are you trying to achieve?

      • Avatar
        Nandini February 14, 2019 at 5:31 pm #

        increamental model , i would to train the existing model when ever new data comes whether it is small or big it has train not from scartch .

        i have existing model ,but frequently i will get new data to my model , so that i would to train the existing model without starting from scartch for time saving purpose

        this is main motto to achieve .

        Please suggest any ideas or methods are available for this requirement.

        • Avatar
          Jason Brownlee February 15, 2019 at 7:59 am #

          Yes, you can continue the training of an existing model with just new data and a small learning rate.

          • Avatar
            nandini February 19, 2019 at 7:46 pm #

            please can you explain what is small learning rate,i mean which parameter i need to tune,why it is useful for increamental model .

            Thanks in advance

          • Avatar
            Jason Brownlee February 20, 2019 at 7:59 am #

            The “learning rate” for stochastic gradient descent (SGD) optimization algorithm.

        • Avatar
          Ganesha H S February 15, 2022 at 6:46 am #

          Hi, I want to use incremental learning concert for human activities classification using imu sensor accelerometer dataset. Please suggest resources and ideas to implement

  23. Avatar
    Ammar February 17, 2019 at 1:10 pm #

    Could you suggest good models for time series forecasting? I’m doing a research on zakat (tax for Muslims) and I only have 1 year data to make monthly prediction.

  24. Avatar
    nandini February 20, 2019 at 5:50 pm #

    I would like to implement a chat bot applications, model has to learn from each conversation from starting onwards,intially we don’t have any conservation,based on user convsersation only model has to learn.

    could you suggest me which is apt for this requirement.

    thanks in advance.

    • Avatar
      Jason Brownlee February 21, 2019 at 7:50 am #

      Sorry, I don’t have any tutorials on chat bots, cannot give you good off the cuff advice.

    • Avatar
      Jana January 6, 2021 at 7:12 am #

      kindly, do you find any suggestion for your question?

  25. Avatar
    Saman April 23, 2019 at 5:06 am #


    Thanks for the information. Are there pre-trained regression models that can be used for transfer learning? I can only find classification models online.


  26. Avatar
    Tony Pham April 24, 2019 at 3:49 pm #

    Dear Jason;
    Can we apply transfer learning for a regression task? If so, how many training set is minimum to apply transfer learning techniques?
    For instance, we have 60 training samples used for training set in one subset image and 40 testing samples used for validation set in other subset image?

  27. Avatar
    venkatesh May 20, 2019 at 9:19 pm #

    Hi! Thanks for the explanation. I am wondering if there are any pre-trained models for regression problems. Are you aware of any such models?

  28. Avatar
    satyajit maitra May 21, 2019 at 2:13 pm #

    Can you please explain the scope of transfer learning

    • Avatar
      Jason Brownlee May 21, 2019 at 2:45 pm #

      What do you mean by scope? Problem domains?

      • Avatar
        satyajit maitra May 22, 2019 at 2:41 pm #

        I mean to say the domain in particular suppose I am predicting one time series problem using deep learning for electricity estimation can i use the same via transfer learning for stock market prediction or spreading of Dengue .

        • Avatar
          Jason Brownlee May 23, 2019 at 5:52 am #

          Perhaps. I’d recommend trying with some experiments.

          • Avatar
            satyajit maitra June 4, 2019 at 4:52 am #

            Hi sir thanks for sharing your thoughts I am trying to implement the concept you told here ….sir besides this i am also trying to write a blog on ML with practical approach I also messaged you on facebook could you please review my site and give me some feedback how can i improve my writing. Many many thanks for your comment by the way .

          • Avatar
            Jason Brownlee June 4, 2019 at 7:58 am #

            Well done.

            Sorry, I don’t have the capacity to review your blog.

  29. Avatar
    adi May 25, 2019 at 4:00 pm #

    Hi, if my understanding is correct, in case of image classification and NLP, if I have a pre-trained model, to train on new data, I can reshape the data according to the pre-trained model. I am trying to use transfer learning for a regression problem. Consider I train a base model with 15 parameters and 1 million rows. I train a model. Now if I want to use this model for a similar problem where I have only 14 parameters, one parameter is missing. Will the pre-trained model be of any use. Is there a way I can use transfer learning in such cases?

    Thank you.

    • Avatar
      Jason Brownlee May 26, 2019 at 6:42 am #

      I’m not sure about this, I suspect some careful work is required to use transfer learning on univariate data.

      • Avatar
        adi May 26, 2019 at 6:38 pm #

        it is not a univariate data. if one among the 15 parameters is missing, I believe I can impute/fill the missing column and then can use the pre-trained model.

  30. Avatar
    Venkanna August 22, 2019 at 7:41 am #

    What is the advantage of using transfer learning, if we talk about the class distribution of training data?

    • Avatar
      Jason Brownlee August 22, 2019 at 1:57 pm #

      What do you mean exactly?

      In general, transfer learning can shorten the time/effort to develop a new model by leveraging an existing model.

  31. Avatar
    Hammad October 3, 2019 at 10:27 am #

    Hi.. could you please guide me that if we train our model from scratch, but not from transfer learning, then what could be the advantages of doing that? Also please guide that in the future, I would propose a modified/cascaded version of DL model, so will it be possible to train it from transfer learning?

    • Avatar
      Jason Brownlee October 3, 2019 at 1:26 pm #

      Yes, you can discover many examples of training CNNs from scratch here:

      • Avatar
        Hammad October 3, 2019 at 1:49 pm #

        Thanks for the reply….i need to ask that can I still train a deep learning model by transfer learning which has not been trained previously on a larger dataset like imagenet? like AlexNet has already been trained in imagenet dataset but if I use any other model which has not been trained yet by using imagenet dataset, then can that model be trained by transfer learning?

        • Avatar
          Jason Brownlee October 4, 2019 at 5:37 am #

          Yes. You can train on your own problem, then use that model on different problem of your own.

  32. Avatar
    Afsana October 15, 2019 at 8:06 pm #

    Would you please give me a complete transfer learning dataset, not imagenet dataset.
    In the dataset there are multiple entries and attribute, dataset that includes different domains such as videos, music, games (common users among the domains).
    It will be very helpful for me.

  33. Avatar
    Ulf Aslak October 17, 2019 at 6:15 am #

    Hi Jason,

    Thanks for the intro, good job linking together different sources so I know where to go from here! F. Ameen brought up a similar question, but did not get the answer I was looking for.

    My question is: How do you deal with input dimensionality?

    Say your data has different dimensionality than that of the pretrained model. For image data, reusing the filters would result in different sized activation maps, and computationally I can reason this is not a problem if you just change the final dense layer. BUT the learned filters will look for patterns of a certain size (e.g. within 3×3 or 5×5 fields). If your images is twice the resolution of those used for pretraining, the filters are too small (and vice versa).

    What are good ways to deal with this? Should one take account of this and use initial pooling or up-convolution to size match before sending the image into the pretrained filters?

    And what about dense feed forward networks? Should one then just switch out the first (and last) layers? Say your new data has some of the same features as the ones in your pretraining data, do you then just add and/or remove connections between the input layer and first hidden layers? Or switch out the first layer entirely?

    Thanks a lot!


    • Avatar
      Jason Brownlee October 17, 2019 at 6:45 am #

      You cut off the input layer or layers, define a new input layer and away you go. Filters will still work just fine on a larger image.

      In fact, if you use the keras applications api, you can define new input shapes with arbitrary sizes without changing the model. The filters don’t care.

      Yes, you cut off the output layers and refit on the new class labels.

  34. Avatar
    HASHEM ALNABHI October 31, 2019 at 10:43 pm #

    Hello, Dear Jason Brownlee, thank you for sharing this beneficial blog and undoubtedly everyone is getting a lot of benefits.

    my question is …Do the pre-trained datasets which relates to ENGLISH characters (A-Z . a-z . 0-9) consist of defective characters in it?

    waiting for your reply.
    thank you

  35. Avatar
    pavan November 20, 2019 at 4:49 am #

    where to write those codes

  36. Avatar
    Tizita December 9, 2019 at 9:38 pm #

    I want to done my thesis on prediction by using deep learning my data is not image and also it is not on NIP but i want to use transfer learning. Can I use transfer learning on this issue?
    transfer learning is only applied on image and NIP or not?

    • Avatar
      Jason Brownlee December 10, 2019 at 7:31 am #

      Yes, if you can fit or find a model trained on a related dataset.

  37. Avatar
    Onoja December 13, 2019 at 10:18 pm #

    I’m working on malware detection,.Can i find a pre trained model i can use for web-based malware detection?

    • Avatar
      Jason Brownlee December 14, 2019 at 6:19 am #

      I’m not aware of one, sorry. Perhaps you can train one?

      • Avatar
        Onoja December 16, 2019 at 8:11 am #

        Really? Possible to train one like that?

        • Avatar
          Jason Brownlee December 16, 2019 at 1:34 pm #

          Sure, get ta related dataset with a ton more data, fit a model on it, then transfer it to your related problem.

  38. Avatar
    tirualem zeleke March 12, 2020 at 8:35 pm #

    hello i am doing in HIV statues prediction using DHS dataset .how can use the transfer learning? it is possible to perform prediction using transfer learning??

  39. Avatar
    Kiran April 7, 2020 at 12:41 am #

    i did not get the point where we define transfer learning programatically. What i understood was we first train the model from one dataset and then use the same model to re-train again on different dataset to achieve transfer learning. Is this assumption correct.
    If yes, once the model is trained on a particular dataset, how to retrain it on another dataset??

    • Avatar
      Jason Brownlee April 7, 2020 at 5:51 am #

      No, we freeze the trained model, add a new output layer and just train that.

  40. Avatar
    Mahshid April 7, 2020 at 6:48 am #

    Hi. Thanks for this great tutorial. Always helpful. I have got a question on the “Develop Model Approach”. I have two related tasks where I want to learn a model on my emotion detection task and transfer the knowledge to my second task which is empathy detection. Both tasks have labeled datasets. I do not know how I should develop a model on the first task and reuse it on the second task. Do you have any kind of tutorial/GitHub repository in your mind that specifically use transfer learning by this approach (not using the pre-trained models)?

  41. Avatar
    Sam April 8, 2020 at 4:13 am #

    Hello Jason. Please can I use transfer temporal information for transfer learning? Train on one dataset using LSTM and then use it for another?

  42. Avatar
    Sam April 9, 2020 at 7:08 am #

    Thanks. One more question please. I am having an argument with someone about something.

    Please, does LSTM models accept variable length sequences? My argument is that they don’t. You need to pad it to use it. However, I can’t really defend myself with a scholarly publication.

    If I am correct, can you give me a better explanation (Or a link to one you have done in the past) and a link to a paper on this? Thanks.

  43. Avatar
    AYS May 26, 2020 at 3:32 am #

    Hi, hopefully you will be fine. I am using AlexNet pre-trained model for feature extraction only, as the input shape of AlexNet is 227x227x3, but my image dataset contains images of dimension 100x100x3. Do I need to resize my dataset images to 227x227x3 in order to use AlexNet pre-trained model as feature extractor?
    Or i can directly input these images to AlexNet?

  44. Avatar
    Michelle August 26, 2020 at 3:00 am #

    Hi Jason,

    thanks for the article.
    Does transfer learning usually apply to complex models like VGG model or ResNet model?
    I would think it makes no sense to transfer a two layer LSTM model trained from one dataset to a new dataset. We can simply train a new model with the new dataset in this case. What do you think please?


    • Avatar
      Jason Brownlee August 26, 2020 at 6:53 am #


      Yes, try it and see if it helps.

      Here’s an example for tabular data:

      • Avatar
        Anugrah January 1, 2021 at 5:42 pm #

        Dear Jason,

        Is there any proved methods to perform cross modal learning? I couldn’t find much on the internet.

        I have 2 sets of datasets ( labelled data set ‘A’ of images annotated with 5 emotions and another unlabelled dataset ‘B’ of text containing same 5 emotions)

        Is there any method out there which can use the learning from labelled image data to classify unlabelled text data?

        • Avatar
          Jason Brownlee January 2, 2021 at 6:23 am #

          All predictive models learn from labeled data to make predictions on unlabelled data.

          This is called predictive modeling.

          • Avatar
            Anugrah January 2, 2021 at 3:21 pm #

            But my 2 datasets are of different modality. one is labeled images and the other one is unlabeled Text data. Both of them share the same category of emotions.

          • Avatar
            Jason Brownlee January 3, 2021 at 5:52 am #

            Perhaps you can use a pre-trained language model and a pre-trained object detection model and integer both models into a new model that accepts both data types. This is called a multi-input model and is common for problems like image captioning.

  45. Avatar
    oglee September 23, 2020 at 2:47 pm #

    It’s amazing post. I love it

  46. Avatar
    shobi February 18, 2021 at 6:54 am #

    Hi Jason,

    Thank you so much for your article. I have question regarding the concept of Transfer Learning.
    For example, I am using a pretained VGG16 model trained on Fashion-MNIST dataset. I know it is an obvious transfer learning, and it is Inductive transfer learning as per your article If i am not wrong.. But my question is, would it be a homogeneous or heterogenous transfer learning? If its any homo/hetero then will it be a symmetric or asymmetric problem of transfer learning?

    I am struggling to find the answer of this question. I have read the definations but still confused..


    • Avatar
      Jason Brownlee February 18, 2021 at 8:10 am #

      You’re welcome!

      What does “homo/hetero” mean when it comes to transfer learning, and why does it matter? My advice is use whatever works best for your dataset.

  47. Avatar
    shobi February 18, 2021 at 8:18 am #

    Hi Jason,

    Thnak you for your response. Let me repharase my question. Classifying fashion images from fashion-mnist dataset using vgg16 pretrained model would be a homogeneous or heterogeneous transfer learning?

    Symmetric and Asymetric are further types of homogeneous /heterogeneous transfer learning.

    So, In what category my problem solving approach will lie?

    Thank you!

    • Avatar
      Jason Brownlee February 18, 2021 at 1:19 pm #

      What is “homogeneous or heterogeneous transfer learning”?

  48. Avatar
    shobi February 19, 2021 at 1:27 am #

    Hi Jason,

    Heterogenous transfer learning, source and domain targets have different feature spaces, and generally non overlapping. Homeogeneous is vice versa. So according to my understanding our approach will be heteregeous approach.


    Thank you!

    • Avatar
      Jason Brownlee February 19, 2021 at 6:00 am #

      Thanks for sharing. It is not something I know about.

      I would recommend skipping the taxonomy and focus on discovering what works best for your problem.

  49. Avatar
    Gabe Gibler February 28, 2021 at 1:49 pm #

    I tremendously appreciate the last sentence in your article: “The choice of source data or source model is an open problem and may require domain expertise and/or intuition developed via experience.”

    I don’t entirely know how to classify everything about it that I appreciate. Somehow, you phrased it very well, and somehow that phrasing does a very good of setting expectations. It strikes me as the kind of thing I mostly want to know beyond basic, primary definitions, but that ends up taking forever to get “experts” to acknowledge about a subject, or usually a lot of my own reading, learning, and experimentation time to finally realize. Nobody has the perfect answer — it will likely take a lot of learning and experimentation time to arrive at the skill of making selections well.

    It sounds obvious in some ways. However, I think skills are too often sold as “just take this class, do a few exercises, and you are then supposed to be an expert that can be expected to do said skill flawlessly in minimized time”.

    • Avatar
      Jason Brownlee February 28, 2021 at 1:56 pm #

      Thanks for sharing! I’m happy to hear it struck a chord.

  50. Avatar
    Shobi May 1, 2021 at 9:44 am #

    Hi Jason,

    I am bit confused please guide me on this; Transffering knowledge from VGG16 to Fashion-MNIST would be an inductive transfer learning or transductive transfer learning?

    Thank you!

    • Avatar
      Jason Brownlee May 2, 2021 at 5:27 am #

      It would be “transfer learning”.

      Inductive vs deductive vs transductive is not relevant.

  51. Avatar
    Abida Danish May 12, 2021 at 6:53 pm #

    Hi, Dr Thanks for this useful tutorial. I have a question I want to use transfer learning in 5G communication. Can you please help me with how should I start? do you know any model for communication where the inputs are data rate, latency, packet loss, distance, and duration?

    • Avatar
      Jason Brownlee May 13, 2021 at 6:02 am #

      You would likely need to train your model on one dataset, then adapt the model for a second related dataset.

  52. Avatar
    Shobi June 10, 2021 at 5:15 am #

    Hi Jason,

    Thank you so much for your nice article!

    I am using transfer learning and I fee some weired behavior of Adam optimizer even with exponential learning rate decay! My validation accuracy and loss both increase simultaneously althoug training accuracy improves gradually and loss decreases. Should I consider this behavior as an overfitting? If I stop it early then it causes in test accuracy drop.

    Looking forward to your kind response!

  53. Avatar
    Obsa Gilo Wakuma June 28, 2021 at 10:54 pm #

    Dear Sir,
    It is interesting topics! Thank you for your contribution!

    Please, do you have some tutorial about domain adaptation in visual recognition?
    if you have simple coding link in python please attach me!

    • Avatar
      Jason Brownlee June 29, 2021 at 4:48 am #

      You’re welcome.

      Sorry, I don’t think I have tutorials on “domain adaptation”.

  54. Avatar
    Shefali July 6, 2021 at 8:45 pm #

    Can be do multiple class text classification with transfer learning

  55. Avatar
    Manohar July 24, 2021 at 1:02 am #

    Hi Jason,
    Can You tell me the logic behind object detection, just like face or any other object.

  56. Avatar
    Felipe August 19, 2021 at 12:09 am #

    Hi Jason, do you have a tutorial on how to pre-train your own model from scratch or maybe a material/book to indicate?

  57. Avatar
    Felipe August 19, 2021 at 5:17 am #

    Hi Jason, I was meant to how to pre-train a model from scratch before and then fine tune the the trained model parameters on a given task.

  58. Avatar
    Felipe August 19, 2021 at 6:24 am #

    This is what I was looking for:

    Have you ever seem throughout literature or in practice transfer learning on regression tasks?

    • Avatar
      Adrian Tam August 20, 2021 at 12:56 am #

      I got what you mean here. I can find a few paper on regression, such as this one:
      But relatively, it is less often to use transfer learning on regression tasks than on classification. Probably because of the nature of regression problems, the improvement is less charming.

      • Avatar
        Felipe August 20, 2021 at 1:28 am #

        Thank you so much for sharing this paper.

  59. Avatar
    Amy January 13, 2022 at 10:14 am #

    Thank you! Very clear article.

    • Avatar
      James Carmichael January 15, 2022 at 11:35 am #

      Thank you for the feedback and kind words, Amy!

  60. Avatar
    basma January 16, 2022 at 8:08 am #

    thank you so much !
    i used a pre-trained model from timm for training a vision transformer model but just on my small dataset , i observed is take a lot of time than training my Vit model from scratch for this small dataset?? i don’t know what is the reason !
    it took 1h for 1 epoch !! it’s normall?? as we know when we use a pretrained model the time of training is reduced ! but in my case vice versa !

    can u help me ,please?

Leave a Reply