Wang, Jie (2026) Multi-objective personalization for recommender systems. PhD thesis, University of Glasgow.
Full text available as:|
PDF
Download (38MB) |
Abstract
Modern recommender systems increasingly rely on deep neural architectures to learn user-item relationships from interaction logs. Sequential recommendation has become a prominent paradigm, where RNN-based models such as GRU4Rec and Transformer-based models such as SASRec/BERT4Rec achieve strong performance on accuracy-oriented metrics (e.g., Recall and NDCG). However, real-world deployments expose fundamental limitations that accuracycentric formulations do not address: (i) ID-based representations are platform-specific and difficult to transfer across domains; (ii) optimizing only for relevance often produces homogeneous recommendation lists and fails to satisfy users’ multifaceted needs for diversity, novelty, and serendipity; (iii) offline-learned policies degrade under distribution shift and face exploration risks in dynamic online environments; and (iv) black-box pipelines provide limited interpretability and offer little actionable value to stakeholders beyond end-users. This thesis studies these challenges under a unified theme of multi-objective personalization for sequential recommendation, and develops methods that improve transferability, controllability, deployability, and stakeholder-facing value.
First, to address the transferability bottleneck, we propose TransRec, which learns from mixture-of-modality (MoM) feedback by encoding items with content encoders (e.g., text and images) rather than categorical IDs. By learning directly from raw MoM features in an end-to-end manner, TransRec enables effective cross-domain transfer without requiring overlapped users or items, and yields significant gains in cold-start and cross-domain settings.
Second, to move beyond accuracy-centric optimization, we introduce two frameworks that reformulate recommendation as multi-objective sequential decision-making. MODT4R leverages return-conditioned Decision Transformers to integrate multiple objectives within a stable supervised learning pipeline, allowing flexible objective trade-offs via inference-time adjustment. Building on this, HDT employs a hierarchical architecture to capture long-term preferences across sessions and short-term intent within sessions, and uses hierarchical (expected and unexpected) returns to balance accuracy with diversity, novelty, and serendipity. Across multiple datasets, MODT4R and HDT achieve up to 16% improvement in diversity-related metrics while maintaining competitive accuracy.
Third, to bridge the offline-to-online gap for RL-based recommenders, we leverage Large Language Models (LLMs) as auxiliary components. We introduce LE/LEA to adapt LLMs as state and reward models and to augment offline learning signals via action synthesis. Furthermore, iALP and its adaptive variant A-iALP use LLM-distilled preferences to warm-start policies offline and adapt them online through fine-tuning and exploration strategies, achieving up to 20% improvement in long-horizon cumulative rewards in online simulation and reducing convergence time.
Finally, to support multiple stakeholders, we propose PDiT-GIM, a two-stage diffusion framework that generates semantically meaningful preference representations and decodes them into interpretable, attribute-constrained textual and visual content, enabling actionable insights for retailers and designers in addition to end-user recommendation. Case studies report improved preference-aligned content generation and downstream engagement compared to generic baselines.
Overall, through extensive experiments spanning e-commerce, multimedia recommendation, and simulated online environments, this thesis demonstrates that multi-objective personalization can simultaneously improve beyond-accuracy objectives and long-term policy performance while maintaining strong accuracy. The thesis is presented in a thesis-by-publication format, with chapters organized around the above tasks and objectives.
| Item Type: | Thesis (PhD) |
|---|---|
| Qualification Level: | Doctoral |
| Subjects: | T Technology > T Technology (General) |
| Colleges/Schools: | College of Science and Engineering > School of Computing Science |
| Supervisor's Name: | Jose, Professor Joemon M. |
| Date of Award: | 2026 |
| Depositing User: | Theses Team |
| Unique ID: | glathesis:2026-85840 |
| Copyright: | Copyright of this thesis is held by the author. |
| Date Deposited: | 25 Mar 2026 16:32 |
| Last Modified: | 27 Mar 2026 10:48 |
| Thesis DOI: | 10.5525/gla.thesis.85840 |
| URI: | https://theses.gla.ac.uk/id/eprint/85840 |
| Related URLs: |
Actions (login required)
![]() |
View Item |
Downloads
Downloads per month over past year

Tools
Tools