A framework for effective intermediate task selection in transfer learning

Pugantsov, Alexander (2024) A framework for effective intermediate task selection in transfer learning. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2023PugantsovPhD.pdf] PDF
Download (2MB)

Abstract

This thesis investigates strategies to improve the performance of natural language processing (NLP) models across diverse tasks, particularly in environments with limited training data. Central to this investigation is the concept of transfer learning, a method where a model developed for one task is repurposed as the starting point for a model on another task. Determining which model will yield improved performance on a specific task is a complex and non-trivial challenge. This complexity arises due to the varying natures of tasks, the intricacies of model architectures, and the unpredictability of their interactions. Accurately estimating which models will be most effective before committing to extensive training can provide substantial benefits, including significant reductions in runtime, environmental impact, and other associated costs. To address this challenge, we propose a framework designed to determine, from a pool of candidate models, which one will provide the greatest performance enhancement for a given task. This framework consists of five components, each addressing a particular concern in selecting tasks for transfer. Parallel to this, and running continuously throughout the process, is the Cost Estimation background process. This module evaluates the resource efficiency of all other components, ensuring that the model development and adaptation processes are both effective and sustainable. The Domain Adapter Generation component involves developing resource-efficient models using training documents from various text-based tasks. The Domain Transfer Analysis component involves evaluating the models created in the previous stage on documents other than those they were originally trained on, providing an understanding in how these models perform on different types of textual data. The Representation Construction component involves the development of profiles or “representations” of each task based on, for example, terms or linguistic characteristics. These representations are intended to be expressive of the features of the underlying data, which we use in subsequent stages of our analysis. The Divergence Estimation component systematically quantifies the degree of variation between different representations through the use of statistical methods. By assessing the divergence between task-specific representations, this component helps identify which intermediate task models exhibit the most promising alignment for a specific target task. Finally, in the Intermediate Task Selection component uses the divergence data to rank tasks by their potential to improve model performance on a given target task. This ranking provides guidance on which intermediate task models, when used to transfer to the target task, are most likely to yield the best performance. In addressing the challenge of identifying tasks that are conducive to effective transfer learning, this thesis places a significant emphasis on evaluating representations against the performance scores derived from the Domain Adapter Generation stage. The core of this evaluation lies in assessing the “effectiveness” of these representations. Effectiveness, in this context, is defined as the capacity of representations to accurately estimate the most beneficial ordering of task combinations. This estimation is based on comparing the outputs of the divergence measures with the inherent ordering of tasks according to their relative transfer gain. Here, transfer gain is measured by the performance ranking of models that have been adapted from intermediate tasks to target tasks, where intermediate tasks are typically tasks abundant in training data, which are then used to transfer knowledge to resource-scarce target tasks that we would like to improve performance on. The theoretical basis of this approach is rooted in a fundamental principle of transfer learning: tasks with higher similarity in their representations are expected to offer greater improvements in model performance when transferred to a target task. Consequently, the thesis investigates the relationship between task similarity, as quantified by our divergence measures, and the actual performance gains observed in transferred models. By analysing the correlation between divergence scores with the model performance rankings across tasks, we aim to validate the hypothesis that task similarity, in terms of representational divergence, is a key predictor of transfer success. This correlation not only provides a practical method to predict the projected effectiveness of task combinations but also offers insights into the nature of transfer learning itself, shedding light on which task characteristics most significantly impact model adaptability and performance enhancement. To evaluate the effectiveness of our framework, we simulate a scenario akin to a user employing the framework to “search” for the most suitable model to transfer to their specific target task. Traditionally, this process would involve exhaustive training and evaluation of all candidate tasks—often a time-consuming and resource-intensive process. We posit that, by predicting which tasks are likely to yield the largest performance gains ahead of time, through the analysis of task similarity, we can substantially improve the accuracy of task selection and also significantly reduce the time and resources required to find effective task pairs. Our framework allows users to bypass the labour-intensive cycle of trial and error, directly focusing on task combinations that are most likely to enhance their model’s performance on a target task. The central contributions of this thesis are the introduction of an effective and efficient intermediate task selection framework for transfer learning in natural language processing. This thesis draws from a diverse range of experiments, covering a broad range of NLP domains and experimental settings, to validate and refine the framework. The experiments presented in this thesis demonstrate the potential of task selection approaches to provide more efficient, sustainable, and impactful practices in the field of transfer learning.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Colleges/Schools: College of Science and Engineering > School of Computing Science
Supervisor's Name: McCreadie, Dr. Richard
Date of Award: 2024
Depositing User: Theses Team
Unique ID: glathesis:2024-84330
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 21 May 2024 08:00
Last Modified: 21 May 2024 08:42
Thesis DOI: 10.5525/gla.thesis.84330
URI: https://theses.gla.ac.uk/id/eprint/84330
Related URLs:

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year