Decomposing scRNA-seq data using topic modelling

Pancheva, Alexandrina (2022) Decomposing scRNA-seq data using topic modelling. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2022panchevaphd.pdf] PDF
Download (22MB)


In recent years, the development of single cell RNA-sequencing technologies has allowed scientists to study heterogeneity of cell populations, compare cells across conditions, analyse biological processes in development and disease, and infer cellular interactions. While single cell studies provide invaluable perspective in understanding disease and identifying therapeutic targets, such datasets are high-dimensional and pose unique challenges compared to earlier technologies. Machine learning techniques have become one of the most popular ways of overcoming those challenges. The work described here develops and applies interpretable models to single cell data. All methods described here are based on topic modelling, a popular technique within natural language processing. In this context, cells correspond to documents and genes to words. Firstly, we investigate the problem of doublet detection and assess the limitations of currently available methods. We propose an alternative approach based on topic modelling. While the proposed approach does not outperform state of the art methods, potential avenues for exploration are highlighted. Next, a topic modelling-based approach is used to detect genes that change as a result of cell-cell interactions in single cells. Experiments using synthetic and real datasets show that our approach is able to detect genes that change as a result of interaction, while also uncovering meaningful biological groups of genes that correspond to the latent topics which aids interpretation. The described approach also alleviates some of the prior information required by the previous methods, in particular ligand-receptor databases, clustering, and generation of synthetic doublets. Finally, the topic model formulation is extended to single cell data ordered in pseudotime. The dynamic topic modelling is able to capture groups of genes that change over time. This dynamic approach outperforms non-temporal topic models and standard differential expression as it detects more biologically relevant groups of genes. The final section outlines potential directions for future research.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: scRNA-seq, gene expression, computational biology, machine learning, topic modelling.
Subjects: Q Science > QH Natural history > QH426 Genetics
Colleges/Schools: College of Medical Veterinary and Life Sciences > School of Infection & Immunity
Funder's Name: Medical Research Council (MRC)
Supervisor's Name: Otto, Professor Thomas, Rogers, Dr. Simon and Wheadon, Professor Helen
Date of Award: 2022
Depositing User: Theses Team
Unique ID: glathesis:2022-83165
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 06 Oct 2022 15:09
Last Modified: 06 Oct 2022 15:12
Thesis DOI: 10.5525/gla.thesis.83165
Related URLs:

Actions (login required)

View Item View Item


Downloads per month over past year