Efficiency modelling in collaborative filtering-based recommendation systems

Paun, Iulia (2022) Efficiency modelling in collaborative filtering-based recommendation systems. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2022PaunPhD.pdf] PDF
Download (2MB)

Abstract

In the past decade, Machine Learning (ML) models have become a critical part of large scale analytics frameworks to solve different problems, such as identify trends and patterns in the data, manipulate images, classify text, and produce recommendations. For the latter (i.e., produce recommendations), ML frameworks have been extended to incorporate both specific recommendation algorithms (e.g., SlopeOne [1]), but also more generalised models (e.g., K-Nearest Neighbours (KNN) [2]) that can be applied not only to recommendation tasks, such as rating prediction or item ranking, but also other classes of ML problems. This thesis examines an important and popular area of the Recommendation Systems (RS) design space, focusing on algorithms that are both specifically designed for producing recommendations, as well as other types of algorithms that are also found in the wider ML field. However, the latter will be only showcased in RS-based use-cases to allow comparison with specific RS models.

Throughout the past years, there have been increased interest in RS from both academia and industry, which led to the development of numerous recommendation algorithms [3]. While there are different families of recommendation models (e.g., Matrix Factorisation (MF)-based, K-Nearest Neighbours (KNN)-based), they can be grouped in three classes as follows: Collaborative Filtering (CF), Content-based Filtering (CBF), and Hybrid Approaches (HA). This thesis investigates the most popular class of RS, namely Collaborative Filtering-based (CF) recommendation algorithms, which recommend items to a user based on similar users’ preferences. One of the current challenges in building CF engines is the selection of the algorithms to be used for producing recommendations. It is often the case that a one-CFmodel-fits-all solution becomes unfeasible due to the dynamic relationship between users and items, and the rate at which new algorithms are proposed in the literature. This challenge is exacerbated by the constant growth of the input data, which in turn impacts the efficiency of these models, as more computational resources are required to train the algorithms on large collections to attain a predefined/desired quality of recommendations. In CF, these challenges have also impacted the way providers deliver content to the users, as they need to strike a balance between revenue maximisation (i.e., how many resources are spent for training the CF models) and the users’ satisfaction (i.e., produce relevant recommendations for the users). In addition, CF models need to be periodically retrained to capture the latest user preferences and interactions with the items, and hence, content providers have to decide whether and when to retrain their CF algorithms, such that the high training times and resource utilisation costs are kept within the operational and monetary budget. Therefore, the problem of estimating resource consumption for CF becomes of critical importance.

In this thesis, we address the pressing challenge of predicting the efficiency (i.e., computational resources spent during training) of traditional and neural CF for a number of popular representatives, including algorithms based on Matrix Factorisation (MF), KNearest Neighbours (KNN), Co-clustering, Slope One schemes, as well as well-known types of Deep Learning (DL) architectures, such as Variational Autoencoder (VAE), Multilayer Perceptron (MLP), and Convolutional Neural Network (CNN). To this end, we first study the computational complexity of the training phase of said CF models and derive time and space complexity equations. Then, using characteristics of the input and the aforementioned equations, we contribute a methodology for predicting the processing time, memory overhead, and GPU utilisation of the CF’s training phase. Our contributions further include an adaptive sampling strategy, to address the trade-off between the computational cost of sampling the dataset and training the CF models on the said samples and the accuracy of the estimated resource consumption of the CF trained on a full collection. Furthermore, we provide a framework which quantifies both the training efficiency (i.e., resource consumption) of CF, as well as the quality of the recommendations produced by the said CF once it has been trained. Finally, systematic experimental evaluations demonstrate that our methodology outperforms state-of-the-art regression schemes (i.e., BB/GBM) by a considerable margin (e.g., for predicting the processing time of CF, the accuracy of WB/LR is 160% higher than the one of BB/GBM), with an overhead that is a small fraction (e.g., 3-4 times smaller) of the overall requirements of CF training.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Colleges/Schools: College of Science and Engineering > School of Computing Science
Supervisor's Name: Ntarmos, Dr Nikos
Date of Award: 2022
Depositing User: Theses Team
Unique ID: glathesis:2022-82911
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 31 May 2022 10:55
Last Modified: 31 May 2022 10:57
Thesis DOI: 10.5525/gla.thesis.82911
URI: https://theses.gla.ac.uk/id/eprint/82911
Related URLs:

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year