Multi-objective personalization for recommender systems

Wang, Jie (2026) Multi-objective personalization for recommender systems. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2025WangPhD.pdf] PDF
Download (38MB)

Abstract

Modern recommender systems increasingly rely on deep neural architectures to learn user-item relationships from interaction logs. Sequential recommendation has become a prominent paradigm, where RNN-based models such as GRU4Rec and Transformer-based models such as SASRec/BERT4Rec achieve strong performance on accuracy-oriented metrics (e.g., Recall and NDCG). However, real-world deployments expose fundamental limitations that accuracycentric formulations do not address: (i) ID-based representations are platform-specific and difficult to transfer across domains; (ii) optimizing only for relevance often produces homogeneous recommendation lists and fails to satisfy users’ multifaceted needs for diversity, novelty, and serendipity; (iii) offline-learned policies degrade under distribution shift and face exploration risks in dynamic online environments; and (iv) black-box pipelines provide limited interpretability and offer little actionable value to stakeholders beyond end-users. This thesis studies these challenges under a unified theme of multi-objective personalization for sequential recommendation, and develops methods that improve transferability, controllability, deployability, and stakeholder-facing value.
First, to address the transferability bottleneck, we propose TransRec, which learns from mixture-of-modality (MoM) feedback by encoding items with content encoders (e.g., text and images) rather than categorical IDs. By learning directly from raw MoM features in an end-to-end manner, TransRec enables effective cross-domain transfer without requiring overlapped users or items, and yields significant gains in cold-start and cross-domain settings.
Second, to move beyond accuracy-centric optimization, we introduce two frameworks that reformulate recommendation as multi-objective sequential decision-making. MODT4R leverages return-conditioned Decision Transformers to integrate multiple objectives within a stable supervised learning pipeline, allowing flexible objective trade-offs via inference-time adjustment. Building on this, HDT employs a hierarchical architecture to capture long-term preferences across sessions and short-term intent within sessions, and uses hierarchical (expected and unexpected) returns to balance accuracy with diversity, novelty, and serendipity. Across multiple datasets, MODT4R and HDT achieve up to 16% improvement in diversity-related metrics while maintaining competitive accuracy.
Third, to bridge the offline-to-online gap for RL-based recommenders, we leverage Large Language Models (LLMs) as auxiliary components. We introduce LE/LEA to adapt LLMs as state and reward models and to augment offline learning signals via action synthesis. Furthermore, iALP and its adaptive variant A-iALP use LLM-distilled preferences to warm-start policies offline and adapt them online through fine-tuning and exploration strategies, achieving up to 20% improvement in long-horizon cumulative rewards in online simulation and reducing convergence time.
Finally, to support multiple stakeholders, we propose PDiT-GIM, a two-stage diffusion framework that generates semantically meaningful preference representations and decodes them into interpretable, attribute-constrained textual and visual content, enabling actionable insights for retailers and designers in addition to end-user recommendation. Case studies report improved preference-aligned content generation and downstream engagement compared to generic baselines.
Overall, through extensive experiments spanning e-commerce, multimedia recommendation, and simulated online environments, this thesis demonstrates that multi-objective personalization can simultaneously improve beyond-accuracy objectives and long-term policy performance while maintaining strong accuracy. The thesis is presented in a thesis-by-publication format, with chapters organized around the above tasks and objectives.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Subjects: T Technology > T Technology (General)
Colleges/Schools: College of Science and Engineering > School of Computing Science
Supervisor's Name: Jose, Professor Joemon M.
Date of Award: 2026
Depositing User: Theses Team
Unique ID: glathesis:2026-85840
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 25 Mar 2026 16:32
Last Modified: 27 Mar 2026 10:48
Thesis DOI: 10.5525/gla.thesis.85840
URI: https://theses.gla.ac.uk/id/eprint/85840
Related URLs:

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year