Home Topics Transfer Learning What is transfer learning?
Explore watsonx.ai Subscribe to the Think Newsletter
Isometric illustration of transfer learning

Published: 12 February 2024
Contributors: Jacob Murel Ph.D., Eda Kavlakoglu

 

Transfer learning uses pre-trained models from one machine learning task or dataset to improve performance and generalizability on a related task or dataset.

Transfer learning is a machine learning technique in which knowledge gained through one task or dataset is used to improve model performance on another related task and/or different dataset.1 In other words, transfer learning uses what has been learned in one setting to improve generalization in another setting.2 Transfer learning has many applications, from solving regression problems in data science to training deep learning models. Indeed, it is particularly appealing for the latter given the large amount of data needed to create deep neural networks.

Traditional learning processes build a new model for each new task, based on the available labeled data. This is because traditional machine learning algorithms assume training and test data come from the same feature space, and so if the data distribution changes, or the trained model is applied to a new dataset, users must retrain a newer model from scratch, even if attempting a similar task as the first model (e.g. sentiment analysis classifier of movie reviews versus song reviews). Transfer learning algorithms, however, takes already-trained models or networks as a starting point. It then applies that model’s knowledge gained in an initial source task or data (e.g. classifying movie reviews) towards a new, yet related, target task or data (e.g. classifying song reviews).3

Take a tour of IBM watsonx

Explore IBM watsonx and learn how to create machine learning models using statistical datasets.

Related content

Subscribe to the IBM newsletter

Advantages and disadvantages of transfer learning
Advantages

- Computational costs. Transfer learning reduces the requisite computational costs to build models for new problems. By repurposing pretrained models or pretrained networks to tackle a different task, users can reduce the amount of model training time, training data, processor units, and other computational resources. For instance, a fewer number of epochs—i.e. passes through a dataset—may be needed to achieve a desired learning rate. In this way, transfer learning can accelerate and simplify model training processes.

- Dataset size. Transfer learning particularly helps alleviate difficulties involved in acquiring large datasets. For instance, large language models (LLMs) require large amounts of training data to obtain optimal performance. Quality publicly available datasets can be limited, and producing sufficient manually labelled data can be time-consuming and expensive.

- Generalizability. While transfer learning aids model optimization, it can further increase a model’s generalizability. Because transfer learning involves retraining an existing model with a new dataset, the retrained model will consist of knowledge gained from multiple datasets. It will potentially display better performance on a wider variety of data than the initial base model trained on only one type of dataset. Transfer learning can thus inhibit overfitting.4

Of course, the transfer of knowledge from one domain to another cannot offset the negative impact of poor-quality data. Preprocessing techniques and feature engineering, such as data augmentation and feature extraction, are still necessary when using transfer learning.

Disadvantages

It is less the case that there are disadvantages inherent to transfer learning than that there are potential negative consequences that result from its misapplication. Transfer learning works best when three conditions are met:

  • both learning tasks are similar
  • source and target datasets data distributions do not vary too greatly
  • a comparable model can be applied to both tasks

When these conditions are not met, transfer learning can negatively affect model performance. Literature refers to this as negative transfer. Ongoing research proposes a variety of tests for determining whether datasets and tasks meet the above conditions, and so will not result in negative transfer.5 Distant transfer is one method developed to correct for negative transfer that results from too great a dissimilarity in the data distributions of source and target datasets.6

Note that there is no widespread, standard metric to determine similarity between tasks for transfer learning. A handful of studies, however, propose different evaluation methods to predict similarities between datasets and machine learning tasks, and so viability for transfer learning.7

Types of transfer learning

There are three adjacent practices or sub-settings of transfer learning. Their distinction from one another—as well as transfer learning more broadly—largely result from changes in the relationship between the source domain, target domain, and tasks to be completed.8

- Inductive transfer. This is when the source and target tasks are different, regardless of any difference or similitude between the target and source domains (i.e. datasets). This can manifest in computer vision models when architectures pretrained for feature extraction on large datasets are then are adopted for further training on a specific task, such as object detection. Multitask learning, which consists of simultaneously learning two different tasks (such as image classification and object detection) on the same dataset, can be considered a form of inductive transfer.9

- Unsupervised learning. This is similar to inductive transfer, as the target and source tasks are different. But in inductive transfer, source and/or target data is often labeled. Per its name, unsupervised transfer learning is unsupervised, meaning there is no manually labeled data.10 By comparison, inductive transfer can be considered supervised learning. One common application of unsupervised learning is fraud detection. By identify common patterns across an unlabeled dataset of transactions, a model can further learn to identify deviating behaviors as possible fraud.

- Transductive transfer. This occurs when the source and target tasks are the same, but the datasets (or domains) are different. More specifically, the source data is typically labelled while the target data is unlabeled. Domain adaptation is a form of transductive learning, as it applies knowledge gained from performing a task on one data distribution towards the same task on another data distribution.11 An example of transductive transfer learning is the application of a text classification model trained and tested on restaurant reviews to classify movie reviews.

Transfer learning versus finetuning

Transfer learning is distinct from finetuning. Both, admittedly, reuse preexisting machine learning models as opposed to training new models. But the similarities largely end there. Finetuning refers to the process of further training a model on a task-specific dataset to improve performance on the initial, specific task for which the model was built. For instance, one may create a general purpose object detection model using massive imagesets such as COCO or ImageNet and then further train the resulting model on a smaller, labeled dataset specific for car detection. In this way, a user finetunes an object detection model for car detection. By contrast, transfer learning signifies when users adapt a model to a new, related problem as opposed to the same problem.

Transfer learning use cases

There are many applications of transfer learning in real-world machine learning and artificial intelligence settings. Developers and data scientists can use transfer learning to aid in a myriad of tasks and combine it with other learning approaches, such as reinforcement learning.

Natural language processing

One salient issue affecting transfer learning in NLP is feature mismatch. Features in different domains can have different meanings, and so connotations (e.g. light signifying weight and optics). This disparity in feature representations affects sentiment classification tasks, language models, and more. Deep learning-based models—in particular, word embeddings—show promise in correcting for this, as they can adequately capture semantic relations and orientations for domain adaptation tasks.12

Computer vision

Because of difficulties in acquiring sufficient manually labeled data for diverse computer vision tasks, a wealth of research examines transfer learning applications with convolutional neural networks (CNNs). One notable example is ResNet, a pretrained model architecture that demonstrates improved performance in image classification and object detection tasks.13 Recent research investigates the renowned ImageNet dataset for transfer learning, arguing that (contra computer vision folk wisdom) only small subsets of this dataset are needed to train reliably generalizable models.14 Many transfer learning tutorials for computer vision use both or either ResNet and ImageNet with TensorFlow’s keras library.

Related resources CodeFlare quickens transfer learning tasks

IBM researchers discuss how CodeFlare reduces time to train TL tasks for foundation models.

Efficient Equivariant Transfer Learning from Pretrained Models

IBM researchers present equivariant TL algorithm that averages feature weights for greater simplicity and generality.

Transfer learning enables carb-reaction predictions

IBM researchers propose TL method to improve model predictions of molecular carbohydrate reactions.

Take the next step

Build an AI strategy for your business on one collaborative AI and data platform—IBM watsonx. Train, validate, tune and deploy AI models to help you scale and accelerate the impact of AI with trusted data across your business.

Explore watsonx Book a live demo
Footnotes

1 Emilio Soria Olivas,Jose David Martin Guerrero,Marcelino Martinez Sober,Jose Rafael Magdalena Benedito,Antonio Jose Serrano Lopez, Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, Information Science Reference, 2009.

2 Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016.

3 Jiawei Han, Micheline Kamber, Jian Pei, Data Mining: Concepts and Techniques, 3rd edition, Elsevier, 2012.

4 Jindong Wang and Yiqiang Chen, Introduction to Transfer Learning: Applications and Methods, Springer, 2023.

5 Wen Zhang, Lingfei Deng, Lei Zhang, Dongrui Wu, "A Survey on Negative Transfer," IEEE/CAA Journal of Automatica Sinica, vol. 10, no. 2, 2023, pp. 305-329, https://arxiv.org/abs/2009.00909 (link resides outside ibm.com).

6 Ben Tan, Yangqiu Song, Erheng Zhong, Qiang Yang, "Transitive Transfer Learning," Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp.1155-1164, https://dl.acm.org/doi/10.1145/2783258.2783295 (link resides outside ibm.com). Ben Tan, Yu Zhang, Sinno Jialin Pan, Qiang Yang, "Domain Distant Transfer," Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017, pp. 2604-2610, https://dl.acm.org/doi/10.5555/3298483.3298614 (link resides outside ibm.com).

7 Changjian Shui, Mahdieh Abbasi, Louis-Émile Robitaille1, Boyu Wang, Christian Gagné, "A Principled Approach for Learning Task Similarity in Multitask Learning," Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019, pp.3446-3452, https://www.ijcai.org/proceedings/2019/0478.pdf (link resides outside ibm.com). Kshitij Dwivedi and Gemma Roig, "Representation Similarity Analysis
for Efficient Task taxonomy & Transfer Learning," Proceedings of Conference on Computer Vision and Pattern Recognition, 2019, pp.12387-12396, https://openaccess.thecvf.com/content_CVPR_2019/papers/Dwivedi_Representation_Similarity_Analysis_for_Efficient_Task_Taxonomy__Transfer_Learning_CVPR_2019_paper.pdf (link resides outside ibm.com). Javier García, Álvaro Visús, and Fernando Fernández, "A taxonomy for similarity metrics between Markov decision processes," Machine Learning, vol. 111, 2022, pp. 4217–4247, https://link.springer.com/article/10.1007/s10994-022-06242-4 (link resides outside ibm.com).

8 Asmaul Hosna, Ethel Merry, Jigmey Gyalmo, Zulfikar Alom, Zeyar Aung, and Mohammad Abdul Azim, “Transfer learning: a friendly introduction” Journal of Big Data, vol. 9, 2022, https://journalofbigdata.springeropen.com/articles/10.1186/s40537-022-00652-w (link resides outside ibm.com). Sinno Jialin Pan and Qiang Yang, "A Survey on Transfer Learning," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, 2010, pp. 1345-1359, https://ieeexplore.ieee.org/document/5288526 (link resides outside ibm.com).

9 Sinno Jialin Pan and Qiang Yang, "A Survey on Transfer Learning," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, 2010, pp. 1345-1359, https://ieeexplore.ieee.org/document/5288526 (link resides outside ibm.com). Ricardo Vilalta, "Inductive Transfer," Encyclopedia of Machine Learning and Data Mining, Springer, 2017.

10 Sinno Jialin Pan and Qiang Yang, "A Survey on Transfer Learning," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, 2010, pp. 1345-1359, https://ieeexplore.ieee.org/document/5288526 (link resides outside ibm.com).

11 Sinno Jialin Pan and Qiang Yang, "A Survey on Transfer Learning," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, 2010, pp. 1345-1359, https://ieeexplore.ieee.org/document/5288526 (link resides outside ibm.com).
Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016.

12 Qiang Yang, Transfer Learning, Cambridge University Press, 2020. Eyal Ben-David, Carmel Rabinovitz, and Roi Reichart, "PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models," Transactions of the Association for Computational Linguistics, vol. 8, 2020, pp. 504–521, https://aclanthology.org/2020.tacl-1.33.pdf (link resides outside ibm.com).

13 Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, "Deep Residual Learning for Image Recognition," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778, https://ieeexplore.ieee.org/document/7780459 (link resides outside ibm.com).

14 Minyoung Huh, Pulkit Agrawal, and Alexei Efros, "What makes ImageNet good for transfer learning?" Berkeley Artificial Intelligence Research Laboratory (BAIR), 2017, https://people.csail.mit.edu/minhuh/papers/analysis/ (link resides outside ibm.com).