Semantic models as metrics for kernel-based interaction identification

Polajnar, Tamara (2010) Semantic models as metrics for kernel-based interaction identification. PhD thesis, University of Glasgow.

Full text available as:
Download (21MB) | Preview
Printed Thesis Information:


Automatic detection of protein-protein interactions (PPIs) in biomedical publications is vital for efficient biological research. It also presents a host of new challenges for pattern recognition methodologies, some of which will be addressed by the research in this thesis. Proteins are the principal method of communication within a cell; hence, this area of research is strongly motivated by the needs of biologists investigating sub-cellular functions of organisms, diseases, and treatments. These researchers rely on the collaborative efforts of the entire field and communicate through experimental results published in reviewed biomedical journals. The substantial number of interactions detected by automated large-scale PPI experiments, combined with the ease of access to the digitised publications, has increased the number of results made available each day. The ultimate aim of this research is to provide tools and mechanisms to aid biologists and database curators in locating relevant information. As part of this objective this thesis proposes, studies, and develops new methodologies that go some way to meeting this grand challenge.

Pattern recognition methodologies are one approach that can be used to locate PPI sentences; however, most accurate pattern recognition methods require a set of labelled examples to train on. For this particular task, the collection and labelling of training data is highly expensive. On the other hand, the digital publications provide a plentiful source of unlabelled data. The unlabelled data is used, along with word cooccurrence models, to improve classification using Gaussian processes, a probabilistic alternative to the state-of-the-art support vector machines. This thesis presents and systematically assesses the novel methods of using the knowledge implicitly encoded in biomedical texts and shows an improvement on the current approaches to PPI sentence detection.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: Bioinformatics, Text Mining, Text Classification, Kernel Methods, Semantic Models, Biomedical Text Semantics
Subjects: Q Science > Q Science (General)
Colleges/Schools: College of Science and Engineering > School of Computing Science
Supervisor's Name: Girolami, Prof. Mark
Date of Award: 2010
Depositing User: Ms Tamara Polajnar
Unique ID: glathesis:2010-2260
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 26 Nov 2010
Last Modified: 10 Dec 2012 13:53

Actions (login required)

View Item View Item


Downloads per month over past year