Transfer learning uses pre-trained models from one machine learning task or dataset to improve performance and generalizability on a related task or dataset.
Transfer learning is a machine learning technique in which knowledge gained through one task or dataset is used to improve model performance on another related task and/or different dataset.1 In other words, transfer learning uses what has been learned in one setting to improve generalization in another setting.2 Transfer learning has many applications, from solving regression problems in data science to training deep learning models. Indeed, it is particularly appealing for the latter given the large amount of data needed to create deep neural networks.
Traditional learning processes build a new model for each new task, based on the available labeled data. This is because traditional machine learning algorithms assume training and test data come from the same feature space, and so if the data distribution changes, or the trained model is applied to a new dataset, users must retrain a newer model from scratch, even if attempting a similar task as the first model (e.g. sentiment analysis classifier of movie reviews versus song reviews). Transfer learning algorithms, however, takes already-trained models or networks as a starting point. It then applies that model’s knowledge gained in an initial source task or data (e.g. classifying movie reviews) towards a new, yet related, target task or data (e.g. classifying song reviews).3
- Computational costs. Transfer learning reduces the requisite computational costs to build models for new problems. By repurposing pretrained models or pretrained networks to tackle a different task, users can reduce the amount of model training time, training data, processor units, and other computational resources. For instance, a fewer number of epochs—i.e. passes through a dataset—may be needed to achieve a desired learning rate. In this way, transfer learning can accelerate and simplify model training processes.
- Dataset size. Transfer learning particularly helps alleviate difficulties involved in acquiring large datasets. For instance, large language models (LLMs) require large amounts of training data to obtain optimal performance. Quality publicly available datasets can be limited, and producing sufficient manually labelled data can be time-consuming and expensive.
- Generalizability. While transfer learning aids model optimization, it can further increase a model’s generalizability. Because transfer learning involves retraining an existing model with a new dataset, the retrained model will consist of knowledge gained from multiple datasets. It will potentially display better performance on a wider variety of data than the initial base model trained on only one type of dataset. Transfer learning can thus inhibit overfitting.4
Of course, the transfer of knowledge from one domain to another cannot offset the negative impact of poor-quality data. Preprocessing techniques and feature engineering, such as data augmentation and feature extraction, are still necessary when using transfer learning.
It is less the case that there are disadvantages inherent to transfer learning than that there are potential negative consequences that result from its misapplication. Transfer learning works best when three conditions are met:
When these conditions are not met, transfer learning can negatively affect model performance. Literature refers to this as negative transfer. Ongoing research proposes a variety of tests for determining whether datasets and tasks meet the above conditions, and so will not result in negative transfer.5 Distant transfer is one method developed to correct for negative transfer that results from too great a dissimilarity in the data distributions of source and target datasets.6
Note that there is no widespread, standard metric to determine similarity between tasks for transfer learning. A handful of studies, however, propose different evaluation methods to predict similarities between datasets and machine learning tasks, and so viability for transfer learning.7
There are three adjacent practices or sub-settings of transfer learning. Their distinction from one another—as well as transfer learning more broadly—largely result from changes in the relationship between the source domain, target domain, and tasks to be completed.8
- Inductive transfer. This is when the source and target tasks are different, regardless of any difference or similitude between the target and source domains (i.e. datasets). This can manifest in computer vision models when architectures pretrained for feature extraction on large datasets are then are adopted for further training on a specific task, such as object detection. Multitask learning, which consists of simultaneously learning two different tasks (such as image classification and object detection) on the same dataset, can be considered a form of inductive transfer.9
- Unsupervised learning. This is similar to inductive transfer, as the target and source tasks are different. But in inductive transfer, source and/or target data is often labeled. Per its name, unsupervised transfer learning is unsupervised, meaning there is no manually labeled data.10 By comparison, inductive transfer can be considered supervised learning. One common application of unsupervised learning is fraud detection. By identify common patterns across an unlabeled dataset of transactions, a model can further learn to identify deviating behaviors as possible fraud.
- Transductive transfer. This occurs when the source and target tasks are the same, but the datasets (or domains) are different. More specifically, the source data is typically labelled while the target data is unlabeled. Domain adaptation is a form of transductive learning, as it applies knowledge gained from performing a task on one data distribution towards the same task on another data distribution.11 An example of transductive transfer learning is the application of a text classification model trained and tested on restaurant reviews to classify movie reviews.
Transfer learning is distinct from finetuning. Both, admittedly, reuse preexisting machine learning models as opposed to training new models. But the similarities largely end there. Finetuning refers to the process of further training a model on a task-specific dataset to improve performance on the initial, specific task for which the model was built. For instance, one may create a general purpose object detection model using massive imagesets such as COCO or ImageNet and then further train the resulting model on a smaller, labeled dataset specific for car detection. In this way, a user finetunes an object detection model for car detection. By contrast, transfer learning signifies when users adapt a model to a new, related problem as opposed to the same problem.
There are many applications of transfer learning in real-world machine learning and artificial intelligence settings. Developers and data scientists can use transfer learning to aid in a myriad of tasks and combine it with other learning approaches, such as reinforcement learning.
One salient issue affecting transfer learning in NLP is feature mismatch. Features in different domains can have different meanings, and so connotations (e.g. light signifying weight and optics). This disparity in feature representations affects sentiment classification tasks, language models, and more. Deep learning-based models—in particular, word embeddings—show promise in correcting for this, as they can adequately capture semantic relations and orientations for domain adaptation tasks.12
Because of difficulties in acquiring sufficient manually labeled data for diverse computer vision tasks, a wealth of research examines transfer learning applications with convolutional neural networks (CNNs). One notable example is ResNet, a pretrained model architecture that demonstrates improved performance in image classification and object detection tasks.13 Recent research investigates the renowned ImageNet dataset for transfer learning, arguing that (contra computer vision folk wisdom) only small subsets of this dataset are needed to train reliably generalizable models.14 Many transfer learning tutorials for computer vision use both or either ResNet and ImageNet with TensorFlow’s keras library.
1 Emilio Soria Olivas,Jose David Martin Guerrero,Marcelino Martinez Sober,Jose Rafael Magdalena Benedito,Antonio Jose Serrano Lopez, Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, Information Science Reference, 2009.
2 Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016.
3 Jiawei Han, Micheline Kamber, Jian Pei, Data Mining: Concepts and Techniques, 3rd edition, Elsevier, 2012.
4 Jindong Wang and Yiqiang Chen, Introduction to Transfer Learning: Applications and Methods, Springer, 2023.
5 Wen Zhang, Lingfei Deng, Lei Zhang, Dongrui Wu, "A Survey on Negative Transfer," IEEE/CAA Journal of Automatica Sinica, vol. 10, no. 2, 2023, pp. 305-329, https://arxiv.org/abs/2009.00909.
6 Ben Tan, Yangqiu Song, Erheng Zhong, Qiang Yang, "Transitive Transfer Learning," Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp.1155-1164, https://dl.acm.org/doi/10.1145/2783258.2783295. Ben Tan, Yu Zhang, Sinno Jialin Pan, Qiang Yang, "Domain Distant Transfer," Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017, pp. 2604-2610, https://dl.acm.org/doi/10.5555/3298483.3298614.
7 Changjian Shui, Mahdieh Abbasi, Louis-Émile Robitaille1, Boyu Wang, Christian Gagné, "A Principled Approach for Learning Task Similarity in Multitask Learning," Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019, pp.3446-3452, https://www.ijcai.org/proceedings/2019/0478.pdf. Kshitij Dwivedi and Gemma Roig, "Representation Similarity Analysis
for Efficient Task taxonomy & Transfer Learning," Proceedings of Conference on Computer Vision and Pattern Recognition, 2019, pp.12387-12396, https://openaccess.thecvf.com/content_CVPR_2019/papers/Dwivedi_Representation_Similarity_Analysis_for_Efficient_Task_Taxonomy__Transfer_Learning_CVPR_2019_paper.pdf. Javier García, Álvaro Visús, and Fernando Fernández, "A taxonomy for similarity metrics between Markov decision processes," Machine Learning, vol. 111, 2022, pp. 4217–4247, https://link.springer.com/article/10.1007/s10994-022-06242-4.
8 Asmaul Hosna, Ethel Merry, Jigmey Gyalmo, Zulfikar Alom, Zeyar Aung, and Mohammad Abdul Azim, “Transfer learning: a friendly introduction” Journal of Big Data, vol. 9, 2022, https://journalofbigdata.springeropen.com/articles/10.1186/s40537-022-00652-w. Sinno Jialin Pan and Qiang Yang, "A Survey on Transfer Learning," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, 2010, pp. 1345-1359, https://ieeexplore.ieee.org/document/5288526.
9 Sinno Jialin Pan and Qiang Yang, "A Survey on Transfer Learning," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, 2010, pp. 1345-1359, https://ieeexplore.ieee.org/document/5288526. Ricardo Vilalta, "Inductive Transfer," Encyclopedia of Machine Learning and Data Mining, Springer, 2017.
10 Sinno Jialin Pan and Qiang Yang, "A Survey on Transfer Learning," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, 2010, pp. 1345-1359, https://ieeexplore.ieee.org/document/5288526.
11 Sinno Jialin Pan and Qiang Yang, "A Survey on Transfer Learning," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, 2010, pp. 1345-1359, https://ieeexplore.ieee.org/document/5288526.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016.
12 Qiang Yang, Transfer Learning, Cambridge University Press, 2020. Eyal Ben-David, Carmel Rabinovitz, and Roi Reichart, "PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models," Transactions of the Association for Computational Linguistics, vol. 8, 2020, pp. 504–521, https://aclanthology.org/2020.tacl-1.33.pdf.
13 Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, "Deep Residual Learning for Image Recognition," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778, https://ieeexplore.ieee.org/document/7780459.
14 Minyoung Huh, Pulkit Agrawal, and Alexei Efros, "What makes ImageNet good for transfer learning?" Berkeley Artificial Intelligence Research Laboratory (BAIR), 2017, https://people.csail.mit.edu/minhuh/papers/analysis/.
Train, validate, tune and deploy generative AI, foundation models and machine learning capabilities with IBM watsonx.ai, a next-generation enterprise studio for AI builders. Build AI applications in a fraction of the time with a fraction of the data.
Put AI to work in your business with IBM's industry-leading AI expertise and portfolio of solutions at your side.
Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.
Learn fundamental concepts and build your skills with hands-on labs, courses, guided projects, trials and more.
Learn how to confidently incorporate generative AI and machine learning into your business.
Want to get a better return on your AI investments? Learn how scaling gen AI in key areas drives change by helping your best minds build and deliver innovative new solutions.
Learn how to select the most suitable AI foundation model for your use case.
IBM® Granite™ is our family of open, performant and trusted AI models, tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.
Dive into the 3 critical elements of a strong AI strategy: creating a competitive edge, scaling AI across the business and advancing trustworthy AI.
We surveyed 2,000 organizations about their AI initiatives to discover what's working, what's not and how you can get ahead.