Theories of information and uncertainty for the modelling of information retrieval : an application of situation theory and Dempster-Shafer's theory of evidence

Lalmas, Mounia (1996) Theories of information and uncertainty for the modelling of information retrieval : an application of situation theory and Dempster-Shafer's theory of evidence. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 1996lalmasphd.pdf] PDF
Download (13MB)
Printed Thesis Information: http://eleanor.lib.gla.ac.uk/record=b1594544

Abstract

Current information retrieval models only offer simplistic and specific representations of information. Therefore, there is a need for the development of a new formalism able to model information retrieval systems in a more generic manner. In 1986, Van Rijsbergen suggested that such formalisms can be both appropriately and powerfully defined within a logic. The resulting formalism should capture information as it appears in an information retrieval system, and also in any of its inherent forms. The aim of this thesis is to understand the nature of information in information retrieval, and to propose a logic-based model of an information retrieval system that reflects this nature. The first objective of this thesis is to identify essential features of information in an information retrieval system. These are: 0 flow, 0 intensionality, 0 partiality, 0 structure, 0 significance, and o uncertainty. It is shown that the first four features are qualitative, whereas the last two are quantitative, and that their modelling requires different frameworks: a theory of information, and a theory of uncertainty, respectively. The second objective of this thesis is to determine the appropriate framework for each type of feature, and to develop a method to combine them in a consistent fashion. The combination is based on the Transformation Principle. Many specific attempts have been made to derive an adequate definition of information. The one adopted in this thesis is based on that of Dretske, Barwise, and Devlin who claimed that there is a primitive notion of information in terms of which a logic can be defined, and subsequently developed a theory of information, namely Situation Theory. Their approach was in accordance with Van Rijsbergen' s suggestion of a logic-based formalism for modelling an information retrieval system. This thesis shows that Situation Theory is best at representing all the qualitative features. Regarding the modelling of the quantitative features of information, this thesis shows that the framework that models them best is the Dempster-Shafer Theory of Evidence, together with the notion of refinement, later introduced by Shafer. The third objective of this thesis is to develop a model of an information retrieval system based on Situation Theory and the Dempster-Shafer Theory of Evidence. This is done in two steps. First, the unstructured model is defined in which the structure and the significance of information are not accounted for. Second, the unstructured model is extended into the structured model, which incorporates the structure and the significance of information. This strategy is adopted because it enables the careful representation of the flow of information to be performed first. The final objective of the thesis is to implement the model and to perform empirical evaluation to assess its validity. The unstructured and the structured models are implemented based on an existing on-line thesaurus, known as WordNet. The experiments performed to evaluate the two models use the National Physical Laboratory standard test collection. The experimental performance obtained was poor, because it was difficult to extract the flow of information from the document set. This was mainly due to the data used in the experimentation which was inappropriate for the test collection. However, this thesis shows that if more appropriate data, for example, indexing tools and thesauri, were available, better performances would be obtained. The conclusion of this work was that Situation Theory, combined with the Dempster-Shafer Theory of Evidence, allows the appropriate and powerful representation of several essential features of information in an information retrieval system. Although its implementation presents some difficulties, the model is the first of its kind to capture, in a general manner, these features within a uniform framework. As a result, it can be easily generalized to many types of information retrieval systems (e.g., interactive, multimedia systems), or many aspects of the retrieval process (e.g., user modelling).

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Colleges/Schools: College of Science and Engineering > School of Computing Science
Supervisor's Name: van Rijsbergen, Professor Keith
Date of Award: 1996
Depositing User: Adam Swann
Unique ID: glathesis:1996-8385
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 11 Sep 2017 08:59
Last Modified: 11 Sep 2017 08:59
URI: https://theses.gla.ac.uk/id/eprint/8385

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year