Enlighten Theses

In this section

Collaborative Distributed Machine Learning: from knowledge reuse to sparsification in federated learning

Long, Qianyu (2025) Collaborative Distributed Machine Learning: from knowledge reuse to sparsification in federated learning. PhD thesis, University of Glasgow.

Full text available as:

PDF
Download (6MB)

Abstract

Distributed Machine Learning (DML) leverages distributed computing resources to train models and perform inference on decentralized datasets efficiently. A high-quality distributed system ensures optimal Quality of Service (QoS) by delivering low latency, high reliability, efficient resource utilization, and robust security. However, DML frameworks face significant challenges with the proliferation of devices generating vast volumes of data and the increasing complexity of tasks. For instance, heterogeneous feature spaces from data generated by different users or locations can undermine model reliability. Additionally, the growing size and complexity of models impose substantial burdens on resourceconstrained devices, particularly for inference and storage. Latency becomes a critical concern in distributed online systems such as intelligent transport systems for autonomous vehicles.
This work explores leveraging knowledge reuse, a key meta-learning technique, combined with sparsification methods to build efficient and effective distributed learning systems. To enhance efficiency, we aim to reduce redundant computation and communication, assuming that distributed data exhibit similarities despite not being identical. Specifically, we propose identifying reusable models by examining statistical patterns and meta-features derived from trained models. These reusable models are selected and adapted to local environments without requiring full retraining. Multi-task learning further improves the effectiveness of these adaptations, ensuring comparable performance to locally trained models while significantly reducing the number of models that need to be trained. By clustering models with shared characteristics, the system reduces the computational and communication overhead in a network of M devices, where only K ≪ M models need to be trained, maintaining strong overall performance.
To tackle real-world challenges where data is often non-independent and identically distributed (non-i.i.d.), we incorporate pruning techniques to enhance both system efficiency and effectiveness. This approach reduces communication and computation costs while simultaneously improving model performance and accuracy. Centralized federated learning (CFL) relies on a central server for model aggregation, while decentralized federated learning (DFL) operates without central server coordination, enabling direct communication between clients. In CFL, dynamic pruning strategies with error feedback and adaptive regularization achieve extremely sparse models, reducing computation and communication costs while accelerating inference. These models retain high sparsity with minimal accuracy loss. In DFL, the absence of a central node enhances robustness against adversarial attacks. Efficiency is further improved through dynamic pruning, allowing progressively sparser training, and a hybrid approach combining sequential and parallel training to reuse updates within the same round. Personalized pruning masks address data heterogeneity across clients, promoting both system efficiency and local model performance.
The proposed framework is validated experimentally across diverse datasets and models, including air pollution data from weather stations, temperature data from unmanned surface vehicles, and image classification tasks. The tested models span traditional approaches such as regression and support vector machines to modern deep learning architectures like convolutional neural networks. Theoretically, we conduct hypothesis testing and complexity analysis, including the development of convergence theorems for federated learning scenarios.
Overall, this thesis presents comprehensive frameworks and algorithms backed by robust experimental and theoretical results. It addresses key challenges in federated learning and edge computing through knowledge reuse and sparsification, enhancing the efficiency, effectiveness, and robustness of modern AI applications.

Item Type:	Thesis (PhD)
Qualification Level:	Doctoral
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Colleges/Schools:	College of Science and Engineering > School of Computing Science
Supervisor's Name:	Anagnostopoulos, Dr. Christos and Deligianni, Dr. Fani
Date of Award:	2025
Depositing User:	Theses Team
Unique ID:	glathesis:2025-84846
Copyright:	Copyright of this thesis is held by the author.
Date Deposited:	29 Jan 2025 13:59
Last Modified:	30 Jan 2025 11:35
Thesis DOI:	10.5525/gla.thesis.84846
URI:	https://theses.gla.ac.uk/id/eprint/84846

Actions (login required)

View Item

Downloads

Downloads per month over past year

Tools

Enlighten Theses

Collaborative Distributed Machine Learning: from knowledge reuse to sparsification in federated learning

Abstract

Actions (login required)

Downloads

Library