Foundations research in information retrieval inspired by quantum theory.
PhD thesis, University of Glasgow.
Full text available as:
In the information age information is useless unless it can be found and used, search engines in our time thereby form a crucial component of research. For something so crucial, information retrieval (IR), the formal discipline investigating search, can be a confusing area of study. There is an underlying difficulty, with the very definition of information retrieval, and weaknesses in its operational method, which prevent it being called a 'science'. The work in this thesis aims to create a formal definition for search, scientific methods for evaluation and comparison of different search strategies, and methods for dealing with the uncertainty associated with user interactions; so that one has the necessary formal foundation to be able to perceive IR as "search science".
The key problems restricting a science of search pertain to the ambiguity in the current way in which search scenarios and concepts are specified. This especially affects evaluation of search systems since according to the traditional retrieval approach, evaluations are not repeatable, and thus not collectively verifiable. This is mainly due to the dependence on the method of user studies currently dominating evaluation methodology. This evaluation problem is related to the problem of not being able to formally define the users in user studies. The problem of defining users relates in turn to one of the main retrieval-specific motivations of the thesis, which can be understood by noticing that uncertainties associated with the interpretation of user interactions are collectively inscribed in a relevance concept, the representation and use of which defines the overall character of a retrieval model. Current research is limited in its understanding of how to best model relevance, a key factor restricting extensive formalization of the IR discipline as a whole. Thus, the problems of defining search systems and search scenarios are the principle issues preventing formal comparisons of systems and scenarios, in turn limiting the strength of experimental evaluation. Alternative models of search are proposed that remove the need for ambiguous relevance concepts and instead by arguing for use of simulation as a normative evaluation strategy for retrieval, some new concepts are introduced that can be employed in judging effectiveness of search systems. Included are techniques for simulating search, techniques for formal user modelling and techniques for generating measures of effectiveness for search models.
The problems of evaluation and of defining users are generalized by proposing that they are related to the need for an unified framework for defining arbitrary search concepts, search systems, user models, and evaluation strategies. It is argued that this framework depends on a re-interpretation of the concept of search accommodating the increasingly embedded and implicit nature of search on modern operating systems, internet and networks. The re-interpretation of the concept of search is approached by considering a generalization of the concept of ostensive retrieval producing definitions of search, information need, user and system that (formally) accommodates the perception of search as an abstract process that can be physical and/or computational.
The feasibility of both the mathematical formalism and physical conceptualizations of quantum theory (QT) are investigated for the purpose of modelling the this abstract search process as a physical process. Techniques for representing a search process by the Hilbert space formalism in QT are presented from which techniques are proposed for generating measures for effectiveness that combine static information such as term weights, and dynamically changing information such as probabilities of relevance. These techniques are used for deducing methods for modelling information need change. In mapping the 'macro level search' process to 'micro level physics' some generalizations were made to the use and interpretation of basic QT concepts such the wave function description of state and reversible evolution of states corresponding to the first and second postulates of quantum theory respectively. Several ways of expressing relevance (and other retrieval concepts) within the derived framework are proposed arguing that the increase in modelling power by use of QT provides effective ways to characterize this complex concept.
Mapping the mathematical formalism of search to that of quantum theory presented insightful perspectives about the nature of search. However, differences between the operational semantics of quantum theory and search restricted the usefulness of the mapping. In trying to resolve these semantic differences, a semi-formal framework was developed that is mid-way between a programmatic language, a state-based language resembling the way QT models states, and a process description language. By using this framework, this thesis attempts to intimately link the theory and practice of information retrieval and the evaluation of the retrieval process. The result is a novel, and useful way for formally discussing, modelling and evaluating search concepts, search systems and search processes.
Actions (login required)