Development of software framework for the integration of metagenomics with clinical and metadata

Koci, Orges (2020) Development of software framework for the integration of metagenomics with clinical and metadata. PhD thesis, University of Glasgow.

Due to Embargo and/or Third Party Copyright restrictions, this thesis is not available in this service.

Abstract

The past few years have seen an increased utility of shotgun metagenomics for microbial community surveys over traditional amplicon sequencing. This is made possible by the technological advancement in methods development that enables us now to assemble short sequence reads into longer contiguous regions that can be binned together to identify species they are part of (e.g., through CONCOCT software), and their coding regions can further be annotated against public databases to give an assessment of functional diversity. At the same time, integrated solutions are gaining importance through complementing meta’omics technologies. To consolidate all these realisations on the same sample space, and to fully delineate microbial activity response to environmental factors, it is necessary to include and integrate all levels of gene products, mRNA, protein, metabolites, as well as their interactions in a single platform. Hence, in this thesis, we explore a set of statistical analyses, and introduce CViewer, a Java-based software, that integrates with output data from CONCOCT as well as major third party taxonomic and annotation software. The software provides a comprehen-sive set of multivariate statistical algorithms using the theoretical underpinning of numerical ecology to allow exploratory as well as hypothesis driven analyses, emphasizing functional traits of microbial communities and phylogenetic‐based approaches to community assembly, particularly abiotic filtering. The end result is a highly interactive toolkit with multiple document interface, that makes it easier to unravel useful patterns through Point-and-Click tools whether it is looking at annotated tracks of metagenomic contigs, or exploring enrichments of metabolic pathways and microbial species.

As a proof-of-concept, we have used CViewer to explore two independent data sets: a longitudinal gut microbiome profile of children who have Crohn’s disease to unravel its aetiology through dietary intervention targeting the gut microbes; as well as gut microbiome profile for an obesity dataset comparing subjects who are naturally and/or pathologically obese against those who are lean. In addition to analysing the sequencing data, we have developed pyTag, a text-mining tool to investigate literature related to Inflammatory Bowel Diseases (IBDs), with the aim of supplementing genomic exploration with associated textual data available in public repositories, for example, PubMed. This is particularly useful, say, if the meta-genomics data is available for studying obesity, then pyTag can get temporal profiles in terms of ontologies (dictionary of specific terms, related to environment, disease, chemical com-pounds, tissue information etc.) for all the papers (PubMed abstracts) that were published and categorized under “obesity”. This provides an additional context to data analysis. However, in this thesis, we have tested pyTag in the context of common IBDs, including Crohn’s disease, Ulcerative Colitis, Coeliac disease and Irritable Bowel Syndrome, to provide spatial and temporal trends.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Additional Information: A version of Chapter 4 of the thesis has been previously published in the following peer-reviewed publication: Koci O, Logan M, Svolos V, Russell RK, Gerasimidis K, Ijaz UZ (2018). An automated identification and analysis of ontological terms in gastrointestinal diseases and nutrition-related literature provides useful insights. PeerJ 6:e5047 https://doi.org/10.7717/peerj.5047
Keywords: software engineering, metagenomics, integrative analysis, metadata, bioinformatics, data mining and machine learning, visual analytics.
Subjects: Q Science > QR Microbiology
R Medicine > R Medicine (General)
T Technology > T Technology (General)
Colleges/Schools: College of Medical Veterinary and Life Sciences > School of Medicine, Dentistry & Nursing
Supervisor's Name: Gerasimidis, Dr. Konstantinos
Date of Award: 2020
Embargo Date: 1 May 2023
Depositing User: Mr Orges Koci
Unique ID: glathesis:2020-81341
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 05 May 2020 09:30
Last Modified: 24 May 2021 12:57
URI: https://theses.gla.ac.uk/id/eprint/81341
Related URLs:

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year