Enlighten Theses

In this section

Beyond labels and centralisation: representation learning through data curation

Göksu, Özgü (2026) Beyond labels and centralisation: representation learning through data curation. PhD thesis, University of Glasgow.

Full text available as:

PDF
Download (21MB)

Abstract

Deep learning has achieved remarkable progress in vision, language, and multimodal tasks; however, its success remains heavily dependent on centralised, large-scale, and fully labelled datasets such as Imagenet, Open lab V7, etc. In real world, data is frequently limited, unlabelled, privately owned, and distributed across many devices, making traditional supervised learning challenging to scale. These limitations motivate the development of robust, novel representation learning methods capable of addressing under unlabeled, limited, and heterogeneous data constraints.
This thesis addresses these challenges through four main contributions. First, we propose a self-supervised batch curation strategy that scores unlabeled data (batches) using the Fréchet ResNet Distance (FRD) as bad or good, enabling the semantically related and informative batches to improve feature quality under limited and unlabeled data regimes. Second, we introduce FedMPR, a federated parameter-selection framework that adaptively prunes irrelevant weights during local training for each client, improving representation robustness and generalization under highly non-i.i.d. data settings. To further analyse distributional data heterogeneity, the thesis also presents CelebA-Gender, a novel gender classification dataset designed to evaluate complex attribute-based data shifts and compare to the real world cases. Third, we present FedQuad, a framework that incorporates a reformulated quadruplet loss to minimise intra-class distance and maximise inter-class distance while mitigating representational collapse on global representation space. Finally, the thesis investigates partial federated model training combined with self-supervised learning, leveraging a frozen DINOv3 as a backbone and a lightweight projection head (Multilayer) to enable robust and computation-efficient representation learning under extreme client heterogeneity and limited participation.
As a result, our experiments on many benchmarks such as CIFAR10, CIFAR100, TinyImageNet, and CelebA-Gender demonstrate that the proposed algorithms consistently outperform existing baselines in terms of accuracy, representation robustness, and feature consistency across many federated scenarios. To sum up, these contributions advance representation learning by enabling more generalisable, efficient, and data in one place based systems learning without relying on large, labelled, or centrally collected datasets.

Item Type:	Thesis (PhD)
Qualification Level:	Doctoral
Additional Information:	Research funded by the Republic of Türkiye, Ministry of National Education (MEB) under the YLSY Scholarship Program.
Subjects:	T Technology > T Technology (General)
Colleges/Schools:	College of Science and Engineering > School of Computing Science
Funder's Name:	Republic of Türkiye Ministry of National Education
Supervisor's Name:	Pugeault, Dr. Nicolas
Date of Award:	2026
Depositing User:	Theses Team
Unique ID:	glathesis:2026-85878
Copyright:	Copyright of this thesis is held by the author.
Date Deposited:	20 Apr 2026 08:07
Last Modified:	27 Apr 2026 13:13
Thesis DOI:	10.5525/gla.thesis.85878
URI:	https://theses.gla.ac.uk/id/eprint/85878
Related URLs:	Conference proceeding Conference proceeding Conference proceeding

Actions (login required)

View Item

Downloads

Downloads per month over past year

Tools

Enlighten Theses

Beyond labels and centralisation: representation learning through data curation

Abstract

Actions (login required)

Downloads

Library