Enlighten Theses

In this section

Fair group-based exposure of documents in ranked search results

Jänich, Thomas (2025) Fair group-based exposure of documents in ranked search results. PhD thesis, University of Glasgow.

Full text available as:

PDF
Download (2MB)

Abstract

The main objective in the development and deployment of Information Retrieval (IR) systems has traditionally been to ensure that users receive relevant information, for example by ranking documents with respect to the documents’ predicted relevance to the users’ information needs, as represented by the users’ queries. However, retrieval systems that only rank documents by predicted relevance can unintentionally introduce unfairness in the search results. In particular, search results can be unfair when groups of documents that are linked to specific entities such as individuals, organisations, or demographics are systematically ranked lower due to their associated attributes. Groups of documents that are ranked lower in the search results are less exposed to the user and therefore receive less attention. This lack of exposure can amplify biases and create unequal opportunities, thereby raising numerous ethical concerns and challenges.

Therefore, in this thesis, we address the task of ensuring fairness in the search results. In particular, we aim to ensure fairness of exposure over groups of documents. Such groups are formed based on shared characteristics, such as a common associated geographic location, language, or other given attributes.

Since maintaining relevance in the search results remains the main objective of every IR System, we aim to ensure fairness over the groups of documents without compromising the relevance or quality of the rankings. Indeed, search results that are perfectly fair according to a specific fairness definition but lack relevant documents would render an IR system ineffective and unusable. In this thesis, we provide insights into assessing exposure and demonstrate how IR systems using standard retrieval methods can be adapted to distribute exposure more fairly across groups of documents in search results while maintaining the quality of their results. We start our investigation by providing an overview of how exposure is distributed across document groups in search results. Based on our analysis, we argue that when adjustments to exposure distributions are necessary, modifying the ranked search results—by adding or removing documents from certain groups—is more effective than solely relying on re-ranking strategies. Moreover, we show that assessing the expected exposure distribution before implementing fairness-enhancing modifications to IR systems is a critical step. By evaluating how exposure is likely to be allocated among document groups prior to retrieval, potential disparities can be detected and the necessity and type of required fairness interventions can be determined.

Specifically, we introduce a novel approach that predicts the exposure groups will receive prior to executing the retrieval process. Similar to Query Performance Prediction methods, which estimate the relevance of search results, our approach predicts how exposure will be distributed across groups of documents. Through experiments conducted over different document groups, we demonstrate that our proposed Group Exposure Predictor can assess the exposure distributions before traditional retrieval is performed.

To assess or predict the exposure received by document groups, it is necessary to know the labels that define these groups. However, in practice, such labels are often incomplete or entirely unavailable. To address this issue, we propose a robust method for assessing exposure in search results when document labels are missing, through the use of Quantification techniques. Unlike traditional classification methods that infer individual labels, Quantification focuses on estimating the distribution of labels across groups. We argue that this approach is more suitable for group exposure assessment, as it directly provides the aggregate information needed to evaluate group-level disparities, rather than relying on potentially inaccurate individual predictions.

To address unfair exposure distributions in search results, we propose adapting existing retrieval methods. Specifically, we argue that methods designed to increase the recall of search results are the most promising for adaptation, as adding more documents from underrepresented groups has a material impact on improving exposure distribution. To this end, we introduce a novel adaptation of the Graph-Based Adaptive Re-ranking (GAR) method to achieve a more fair exposure allocation over groups of documents in the search results. We propose several policies to adapt the GAR process and demonstrate through our experiments that each policy effectively improves the allocation of exposure across document groups in the search results, without compromising relevance.

Another effective approach to increasing the recall of relevant documents in search results is through query modification. In this thesis, we propose several query modification techniques that adapt and extend existing retrieval methods to improve the representation of underrepresented groups in the search results. These include techniques such as query expansion, where additional terms are added to the original query to retrieve more diverse and relevant documents and pseudo-relevance feedback, where the system analyses an initial set of retrieved documents to refine the query. By incorporating these approaches, we aim to retrieve more relevant documents for groups with a low representation in an initial retrieval, thereby enhancing recall and promoting a more balanced allocation of exposure in final rankings. To successfully modify a user’s query, we propose to use pre-trained large language models to generate query terms and full queries that help retrieve more documents from underrepresented groups. Through our empirical evaluation, we show that all of our techniques are suitable for improving the exposure distribution over groups of documents while maintaining the relevance and quality of the search results.

Overall, this thesis contributes to a growing body of research on fairness in IR systems. By addressing key challenges in balancing relevance and fairness, this work provides novel insights and practical methods for improving the equitable distribution of exposure in search results. Through the development of predictive models, robust measurement techniques, and adaptations of effective retrieval methods, including recent generative AI approaches, this thesis advances both the theoretical understanding and practical implementation of fairness-aware IR systems. These contributions not only enhance the design of more inclusive and socially responsible retrieval systems but also open new avenues for future research in this critical and evolving field.

Item Type:	Thesis (PhD)
Qualification Level:	Doctoral
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Colleges/Schools:	College of Science and Engineering > School of Computing Science
Supervisor's Name:	McDonald, Dr. Graham and Ounis, Professor Iadh
Date of Award:	2025
Depositing User:	Theses Team
Unique ID:	glathesis:2025-85226
Copyright:	Copyright of this thesis is held by the author.
Date Deposited:	19 Jun 2025 12:09
Last Modified:	19 Jun 2025 12:13
Thesis DOI:	10.5525/gla.thesis.85226
URI:	https://theses.gla.ac.uk/id/eprint/85226

Actions (login required)

View Item

Downloads

Downloads per month over past year

Tools

Enlighten Theses

Fair group-based exposure of documents in ranked search results

Abstract

Actions (login required)

Downloads

Library