Zhou, Ke (2014) On the evaluation of aggregated web search. PhD thesis, University of Glasgow.
Full text available as:
PDF
Download (2MB) |
Abstract
Aggregating search results from a variety of heterogeneous sources or so-called verticals such as news, image and video into a single interface is a popular paradigm in web search. This search paradigm is commonly referred to as aggregated search. The heterogeneity of the information, the richer user interaction, and the more complex presentation strategy, make the evaluation of the aggregated search paradigm quite challenging. The Cranfield paradigm, use of test collections and evaluation measures to assess the effectiveness of information retrieval (IR) systems, is the de-facto standard evaluation strategy in the IR research community and it has its origins in work dating to the early 1960s. This thesis focuses on applying this evaluation paradigm to the context of aggregated web search, contributing to the long-term goal of a complete, reproducible and reliable evaluation methodology for aggregated search in the research community.
The Cranfield paradigm for aggregated search consists of building a test collection and developing a set of evaluation metrics. In the context of aggregated search, a test collection should contain results from a set of verticals, some information needs relating to this task and a set of relevance assessments. The metrics proposed should utilize the information in the test collection in order to measure the performance of any aggregated search pages. The more complex user behavior of aggregated search should be reflected in the test collection through assessments and modeled in the metrics.
Therefore, firstly, we aim to better understand the factors involved in determining relevance for aggregated search and subsequently build a reliable and reusable test collection for this task. By conducting several user studies to assess vertical relevance and creating a test collection by reusing existing test collections, we create a testbed with both the vertical-level (user orientation) and document-level relevance assessments. In addition, we analyze the relationship between both types of assessments and find that they are correlated in terms of measuring the system performance for the user.
Secondly, by utilizing the created test collection, we aim to investigate how to model the aggregated search user in a principled way in order to propose reliable, intuitive and trustworthy evaluation metrics to measure the user experience. We start our investigations by studying solely evaluating one key component of aggregated search: vertical selection, i.e. selecting the relevant verticals. Then we propose a general utility-effort framework to evaluate the ultimate aggregated search pages. We demonstrate the fidelity (predictive power) of the proposed metrics by correlating them to the user preferences of aggregated search pages. Furthermore, we meta-evaluate the reliability and intuitiveness of a variety of metrics and show that our proposed aggregated search metrics are the most reliable and intuitive metrics, compared to adapted diversity-based and traditional IR metrics.
To summarize, in this thesis, we mainly demonstrate the feasibility to apply the Cranfield Paradigm for aggregated search for reproducible, cheap, reliable and trustworthy evaluation.
Item Type: | Thesis (PhD) |
---|---|
Qualification Level: | Doctoral |
Keywords: | Aggregated search, search federation, universal search, evaluation, information retrieval |
Subjects: | Q Science > QA Mathematics Q Science > QA Mathematics > QA76 Computer software |
Colleges/Schools: | College of Science and Engineering > School of Computing Science |
Supervisor's Name: | Joemon, Prof Jose |
Date of Award: | 2014 |
Depositing User: | Dr Ke Zhou |
Unique ID: | glathesis:2014-7104 |
Copyright: | Copyright of this thesis is held by the author. |
Date Deposited: | 08 Mar 2016 11:47 |
Last Modified: | 25 Mar 2016 10:53 |
URI: | https://theses.gla.ac.uk/id/eprint/7104 |
Actions (login required)
View Item |
Downloads
Downloads per month over past year