Polajnar, Tamara (2010) Semantic models as metrics for kernel-based interaction identification. PhD thesis, University of Glasgow.
Full text available as:
PDF
Download (21MB) |
Abstract
Automatic detection of protein-protein interactions (PPIs) in biomedical publications is vital for efficient biological research. It also presents a host of new challenges for pattern recognition methodologies, some of which will be addressed by the research in this thesis. Proteins are the principal method of communication within a cell; hence, this area of research is strongly motivated by the needs of biologists investigating sub-cellular functions of organisms, diseases, and treatments. These researchers rely on the collaborative efforts of the entire field and communicate through experimental results published in reviewed biomedical journals. The substantial number of interactions detected by automated large-scale PPI experiments, combined with the ease of access to the digitised publications, has increased the number of results made available each day. The ultimate aim of this research is to provide tools and mechanisms to aid biologists and database curators in locating relevant information. As part of this objective this thesis proposes, studies, and develops new methodologies that go some way to meeting this grand challenge.
Pattern recognition methodologies are one approach that can be used to locate PPI sentences; however, most accurate pattern recognition methods require a set of labelled examples to train on. For this particular task, the collection and labelling of training data is highly expensive. On the other hand, the digital publications provide a plentiful source of unlabelled data. The unlabelled data is used, along with word cooccurrence models, to improve classification using Gaussian processes, a probabilistic alternative to the state-of-the-art support vector machines. This thesis presents and systematically assesses the novel methods of using the knowledge implicitly encoded in biomedical texts and shows an improvement on the current approaches to PPI sentence detection.
Item Type: | Thesis (PhD) |
---|---|
Qualification Level: | Doctoral |
Keywords: | Bioinformatics, Text Mining, Text Classification, Kernel Methods, Semantic Models, Biomedical Text Semantics |
Subjects: | Q Science > Q Science (General) |
Colleges/Schools: | College of Science and Engineering > School of Computing Science |
Supervisor's Name: | Girolami, Prof. Mark |
Date of Award: | 2010 |
Depositing User: | Ms Tamara Polajnar |
Unique ID: | glathesis:2010-2260 |
Copyright: | Copyright of this thesis is held by the author. |
Date Deposited: | 26 Nov 2010 |
Last Modified: | 10 Dec 2012 13:53 |
URI: | https://theses.gla.ac.uk/id/eprint/2260 |
Actions (login required)
View Item |
Downloads
Downloads per month over past year