The integration of large biological and clinical datasets towards the understanding of human disease

Robinson, Scott W. (2020) The integration of large biological and clinical datasets towards the understanding of human disease. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2020robinsonphd.pdf] PDF
Download (7MB)


As the cost of high-throughput techniques reduces, and new more powerful equipment is designed, more highly-dimensional biological data will be available – and a lot of data is already in the public domain. The aim of this thesis is to investigate three case studies with interesting opportunities for the integration of large molecular datasets, corresponding clinical data, and publicly available data.

Genome-wide DNA methylation was studied with respect to hypertension. Genomic location data was used both to group individual methylation sites into meaningful functional groups such as promoter regions, and to report the results in a genomic context. Genome-wide SNP data was used to help rule out potential false positives where SNPs interfere with detection of DNA methylation.

Left ventricular hypertrophy is an intermediate cardiovascular phenotype associated with the development of heart failure. This phenotype was studied as a continuous variable – left ventricular mass index (LVMI) – using multiple sample types, in the context of a large cohort, using datasets with different classes of biomolecules and varying genomic coverage. Two alternative analysis approaches were compared, and a linear model was generated showing that a signature of molecular and clinical markers in combination best describes LVMI.
A multi-omics respiratory dataset was investigated, which includes high-throughput data for mRNA, miRNA, proteins, and metabolites and has measurements in two relevant sample types. Test statistics were performed on all datasets, identifying molecules dysregulated with asthma, COPD, and smoking. An asthma molecular interaction network was created with the significant molecules, and the links between them were formed using a variety of public data. Comparisons were made between asthma and COPD, and between asthma in smokers and non-smokers. Correlations with cell type counts may indicate cell type of origin in samples with multiple cell types like induced sputum.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
R Medicine > R Medicine (General)
Colleges/Schools: College of Medical Veterinary and Life Sciences > School of Cardiovascular & Metabolic Health > Cardiovascular & Metabolic Health
Supervisor's Name: Delles, Prof. Christian, Holger, Dr. Husi and Sandosh, Prof. Padmanabhan
Date of Award: 2020
Depositing User: Scott Robinson
Unique ID: glathesis:2020-82049
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 09 Mar 2021 17:04
Last Modified: 09 Mar 2021 17:20
Thesis DOI: 10.5525/gla.thesis.82049

Actions (login required)

View Item View Item


Downloads per month over past year