Convergence, connectivity, and continuity: topological perspectives for mining novel biological information from ‘omics data

Chen, Mel (2020) Convergence, connectivity, and continuity: topological perspectives for mining novel biological information from ‘omics data. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2020chenphd.pdf] PDF
Download (7MB)
Printed Thesis Information: https://eleanor.lib.gla.ac.uk/record=b3378033

Abstract

In this thesis, we will explore possible applications of topological data analysis to `omics data. More specifically, we apply the topologically-based data visualisation technique, Mapper, to gene expression data coming from the fish, Arctic charr (\textit{Salvelinus alpinus}). The fish samples come from the wild, from lakes in Scotland and Russia. Furthermore, the Arctic charr is an interesting study species, since it commonly occurs in two morphs, a bottom/bank-dwelling benthic morph, and an open-water pelagic morph. In general, these morphs share features which are common across lakes, and so provide an opportunity to study a subspecies-level split which is replicated across different populations. This gives an example of parallelism in evolution, and the fact that the split is replicated allows us to test if there are common underlying changes leading to this split, at the level of identical genes, or sets of genes, or genes involved in the same pathways.

We provide an overview of the Mapper algorithm, and also show its application to a breast cancer gene expression dataset, which was the inspiration for our PhD project. When applying Mapper to the Arctic charr, we also investigate the effect of sample size by subsampling the breast cancer data.

As well as applying Mapper, we also use a more mathematical view of the gene expression data to provide a new perspective for looking at the commonly used gene analysis techniques in evolutionary biology, namely, differential gene expression, and gene co-expression analysis.

Finally, we provide an experiment which could be done in the future, assuming the cost of sequencing continues to fall. This experiment incorporates ideas of optimal transport in trying to reconstruct the developmental landscape of Arctic charr. We also discuss other avenues for future work, and current difficulties with applying topological data analysis to gene expression data from wild samples.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: Topological data analysis, mapper, evolutionary biology, salmonids, arctic charr.
Subjects: Q Science > QA Mathematics
Q Science > QL Zoology
Colleges/Schools: College of Medical Veterinary and Life Sciences > School of Life Sciences > Life Sciences Animal Biology
College of Science and Engineering > School of Mathematics and Statistics > Mathematics
Supervisor's Name: Watson, Dr. Liam and Elmer, Dr. Kathryn
Date of Award: 2020
Depositing User: Dr Mel Chen
Unique ID: glathesis:2020-78978
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 30 Jan 2020 09:58
Last Modified: 01 Sep 2022 15:15
Thesis DOI: 10.5525/gla.thesis.78978
URI: https://theses.gla.ac.uk/id/eprint/78978

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year