MSDeconvolve: A new metabolomics fragmentation spectra resolver using statistics and machine learning

Terzis, Nikolaos (2025) MSDeconvolve: A new metabolomics fragmentation spectra resolver using statistics and machine learning. MSc(R) thesis, University of Glasgow.

Full text available as:
[thumbnail of 2024terzismscr.pdf] PDF
Download (1MB)

Abstract

In research fields such as drug discovery and biomarker discovery in diseases, it is often important to identify metabolites present in samples. Metabolites are the end products of metabolic processes in cells, and metabolomics is the study of metabolites. In metabolomics, a widely used approach to identify metabolites in samples is to run liquid chromatography tandem mass spectrometry (LC-MS/MS) experiments. In those experiments, metabolites are fragmented, which means that energy is applied to break the chemical bonds of the metabolite, and produce fragment ions with different masses. The pattern with which a metabolite is fragmented provides valuable information that can lead to its identification. However, a compromise needs to be made between fragmenting a low number of metabolites but having good quality fragmentation patterns (Data Dependent Acquisition, or DDA), or fragmenting all metabolites but obtaining fragmentation patterns from multiple metabolites combined (Data Independent Acquisition, or DIA). In DIA, a deconvolution algorithm is required to determine which metabolite each fragment comes from, and then use the deconvoluted fragmentation patterns to perform identification. The current state-of-the-art deconvolution algorithms do not perform at the required level to produce reliable fragmentation spectra for identification. MSDeconvolve is a framework attempting to improve de-convolution of fragmentation patterns, using statistical methods and models. More specifically, it constructs a design matrix from extracted features from experiment results, where each covariate corresponds to the intensity of the signal of a metabolite. The response variables are constructed from the intensity of the signal of fragment ions. Then, by fitting a model for each response variable against the same design matrix, we obtain coefficient estimates for each fragment ion and each metabolite. By interpreting these coefficients as the proportion of the metabolite that results in a fragment, we can reconstruct the fragmentation patterns of the individual metabolites. Lasso regression was initially used, given its variable selection property that is appropriate in this problem. However, multiple linear regression, Ridge and Elastic Net regression were also evaluated. Further extensions to this modelling approach were also explored. These extensions include applying a penalty such that the coefficients of each metabolite add approximately up to one, to reflect the fact that the signal intensity of fragments for each metabolite should approximately add up to the signal intensity of the original metabolite. Another extension was the combination of data from both a DDA and DIA experiment, in order to improve the deconvolution of metabolites which were not fragmented in the DDA experiment. Although MSDeconvolve performs very well on simulated metabolomics data, it was able to perform similarly, if not marginally better, than the state-of-the-art algorithm on real data. However, future work could potentially further improve its performance by overcoming the limitations of metabolomics data.

Item Type: Thesis (MSc(R))
Qualification Level: Masters
Subjects: H Social Sciences > HA Statistics
Q Science > QA Mathematics
Colleges/Schools: College of Science and Engineering > School of Mathematics and Statistics
Funder's Name: Engineering and Physical Sciences Research Council (EPSRC)
Supervisor's Name: Davies, Dr. Vinny, Elliott, Dr. Andrew and Daly, Dr. Ronan
Date of Award: 2025
Depositing User: Theses Team
Unique ID: glathesis:2025-85148
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 02 Jun 2025 09:55
Last Modified: 02 Jun 2025 09:58
Thesis DOI: 10.5525/gla.thesis.85148
URI: https://theses.gla.ac.uk/id/eprint/85148
Related URLs:

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year