Mining virus genomes for host predictive signals

Young, Francesca (2022) Mining virus genomes for host predictive signals. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2022YoungFPhD.pdf] PDF
Download (18MB)

Abstract

The total dependence of a virus on its host for its survival leads to a fundamental entanglement with its host’s cellular machinery. This drives a coevolutionary relationship that leaves an imprint of the host in viral genomes. The aim of this thesis was to develop machine learning approaches to identify and exploit these host predictive signals. We present methods that use these signals both to build classifiers that can assign putative information to virus genomes and to locate the discriminative features on viral proteins thereby identifying regions that are important in the host relationship. The first step aimed to identify discriminative features that capture the different aspects of the virus host relationship. We generated a range of feature sets from alternative representations of the viral genomes that each aimed to exploit the different levels of biological information present. We used a supervised machine learning approach to compare a range of feature sets for their ability to predict host taxonomic information. Next, we opened these “black box” classifiers and to extract the discriminative information learnt by the model to identify regions of a viral protein that are associated with their host relationship. We used the ‘local’ nature of some of the predictive feature sets to transform an amino acid sequence into host signals. Finally, we developed a multi-view generative mixture model, MVC, to tease apart the complex signals that are embedded in viral genomes via different evolutionary processes. This Bayesian approach uses the clustering of the data defined by labels of interest to guide the features associated with those labels into the "relevant view". The MVC model is able to identify features associated with weak effect in the data.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Subjects: Q Science > QR Microbiology
Q Science > QR Microbiology > QR355 Virology
Colleges/Schools: College of Medical Veterinary and Life Sciences > School of Infection & Immunity
Supervisor's Name: Robertson, Professor David and Rogers, Dr. Simon
Date of Award: 2022
Depositing User: Theses Team
Unique ID: glathesis:2022-82842
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 29 Apr 2022 12:49
Last Modified: 29 Apr 2022 12:51
Thesis DOI: 10.5525/gla.thesis.82842
URI: https://theses.gla.ac.uk/id/eprint/82842

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year