Understanding virus-host interactions using artificial intelligence

Liu, Dan (2025) Understanding virus-host interactions using artificial intelligence. PhD thesis, University of Glasgow.

Due to Embargo and/or Third Party Copyright restrictions, this thesis is not available in this service.

Abstract

Viruses are associated with a wide range of hosts, encompassing all cellular life: humans, animals, plants and bacteria. Virus-host interactions are complex and diverse across species, mostly mediated by protein-protein interactions (PPIs). PPIs play important roles in essential biological activities, including forming protein complexes or more transient interactions in the context signalling pathways, regulatory networks etc., some of which are important for virus infection such as viral entry and replication. Understanding PPI mechanisms is helpful for revealing virus-host interactions, identifying PPIs associated with diseases and discovering potential therapeutic targets. However, the host specificity of most viruses remains unknown, and PPI networks remain sparse except for a few well-studied host species such as human. In this thesis, we developed computational approaches to predict host species of viruses and PPIs from genomes and corresponding protein sequences alone. Firstly, we introduce EvoMIL, a deep learning framework that leverages the protein language model (PLM) for viral protein representations and trains a multiple instance learning (MIL) model to predict prokaryotic and eukaryotic host species. We show that EvoMIL improves the accuracy of host species prediction and can identify key viral proteins that contribute to host specificity. Next, we introduce a deep-learning model, PLM-interact, jointly encoding protein pairs to learn protein interactions, analogous to the nextsentence prediction task in natural language processing (NLP). We show that PLM-interact improves PPI prediction in the intra-species benchmarking task and can identify mutational impacts of human PPIs. We show that PLM-interact can be implemented to predict virus-host PPIs. To enhance training datasets, we construct a dataset by integrating seven public virus-human PPI databases. We introduce three data-splitting strategies to create training, validation and test datasets where training and test sets have varying protein similarities, enabling comprehensive model evaluation. We discover that fine-tuning the human model on virus-human PPIs improves virus-human PPI prediction, offering the potential for developing a generalizable PPI model. In summary, this thesis aims to use deep learning techniques to predict the host specificity for viruses and identify PPIs within and between species using protein sequences. This broadens our view of virus-host interactions and provides insights into developing vaccines, drugs and therapies for human diseases.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Additional Information: Supported by funding from the Marie Sklodowska-Curie Actions Innovative Training Networds VIROINF.
Subjects: Q Science > QR Microbiology > QR355 Virology
Colleges/Schools: College of Medical Veterinary and Life Sciences
Funder's Name: Marie Sklodowska-Curie Actions Innovative Training Networds VIROINF
Supervisor's Name: Robertson, Professor David
Date of Award: 2025
Embargo Date: 10 October 2025
Depositing User: Theses Team
Unique ID: glathesis:2025-85092
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 25 Apr 2025 12:44
Last Modified: 25 Apr 2025 12:45
Thesis DOI: 10.5525/gla.thesis.85092
URI: https://theses.gla.ac.uk/id/eprint/85092

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year