Word sense disambiguation and information retrieval

Sanderson, Mark (1996) Word sense disambiguation and information retrieval. PhD thesis, University of Glasgow.

Full text available as:
Download (10MB) | Preview
Printed Thesis Information: https://eleanor.lib.gla.ac.uk/record=b1640128


Starting with a review of previous research that attempted to improve the representation of documents in IR systems, this research is reassessed in the light of word sense ambiguity. It will be shown that a number of the attempts' successes or failures were due to the noticing or ignoring of ambiguity.

In the review of disambiguation research, many varied techniques for performing automatic disambiguities are introduced. Research on the disambiguating abilities of people is presented also. It has been found that people are inconsistent when asked to disambiguate words and this causes problems when testing the output of an automatic disambiguator.

The first of two sets of experiments to investigate the relationship between ambiguity, disambiguation, and IR, involves a technique where ambiguity and disambiguation can be simulated in a document collection. The results of these experiments lead to the conclusions that query size plays an important role in the relationship between ambiguity and IR. Retrievals based on very small queries suffer particularly from ambiguity and benefit most from disambiguation. Other queries, however, contain a sufficient number of words to provide a form of context that implicitly resolves the query word's ambiguities. In general, ambiguity is found to be not as great a problem to IR systems as might have been thought and the errors made by a disambiguator can be more of a problem than the ambiguity it is trying to resolve.

In the complementary second set of experiments, a disambiguator is built and tested, it is applied to a document test collection, and an IR system is adjusted to accommodate the sense information in the collection. The conclusions of these experiments are found to broadly confirm those of the previous set.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Colleges/Schools: College of Science and Engineering > School of Computing Science
Supervisor's Name: Rijsbergen, Prof. Keith van and Cawsey, Dr. Alison
Date of Award: 1996
Depositing User: Angi Shields
Unique ID: glathesis:1996-4463
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 15 Jul 2013 11:16
Last Modified: 17 Jul 2013 15:53
URI: http://theses.gla.ac.uk/id/eprint/4463

Actions (login required)

View Item View Item


Downloads per month over past year