Bayesian hierarchical modelling for biomarkers with applications to doping detection and prostate cancer prediction

Eleftheriou, Dimitra (2022) Bayesian hierarchical modelling for biomarkers with applications to doping detection and prostate cancer prediction. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2022EleftheriouPhD.pdf] PDF
Download (5MB)


Anabolic androgenic steroids (AAS) are frequently detected doping substances in competitive sports. In order to detect AAS doping with pseudo-endogenous steroids, i.e. steroids that are produced in the human body, such as testosterone (T), urinary concentrations of the athlete’s steroid profile are measured over time in the steroidal module of the Athlete Biological Passport (ABP). Monitoring the urinary levels of anabolic steroids can be highly challenging since the distinction between their natural production and exogenous administration is difficult to ascertain. Current methods for monitoring AAS are based on a univariate Bayesian model applied on a single biomarker at a time. The first part of this research work focuses on extending the current univariate Bayesian model to a multivariate adaptive model, able to accommodate repeated measurements from various sensitive biomarkers and their concentration ratios. The developed methodology was applied on data from urine samples obtained from professional athletes. Among these samples, normal, atypical, and abnormal values were identified. An anomaly detection technique based on a one-class classification (OCC) algorithm was carried out to detect the abnormal values within the athletes’ steroid profiles, either due to AAS misuse, samples’ exchange or other confounding factors. In a Bayesian context, the main idea is to construct adaptive decision boundaries around normal concentration values as new data come, and differentiate them from the abnormal ones (also called outliers or anomalies). Improved prediction performance was obtained when using the same data applied on the proposed model and compared to standard methodologies. Higher values of evaluation metrics suggest that the proposed approach can be used to improve the accuracy of standard techniques for doping detection. The proposed model was implemented in an Rshiny app for doping testing purposes. The BioScan App is a web application which constitutes a user-friendly software for anti-doping laboratories to use for athletes’ evaluation in real-life casework. AAS also have the potential to identify metabolic imbalance and pathological con- ditions such as benign prostatic hyperplasia and prostatic carcinoma. The second research part focuses on developing novel methodology in statistical modelling to improve prostate cancer diagnosis by analysing a variety of urinary steroids. The proposed approach constitutes a non-invasive, low cost and an improved screening method compared to the widely used PSA test. The thesis uses the Dirichlet process (DP) models for a mixture of Gaussian distributions in a Bayesian framework as an improved classification tool. This parameter-free model can be applied to both uni- variate and multivariate data sets providing the flexibility of unknown and possible infinite number of components. The models introduced by G ̈or ̈ur and Rasmussen (2010) have been extended to models with covariates, which account for possible patterns within them. The main features of the DP mixture models with and without covariate information are highlighted in this dissertation. Emphasis is given to the model structure when covariates are included in the model using a technique to reduce the number of model parameters. This technique also constitutes an elegant way to deal with high-dimensional predictors, providing a significant contribution in dimensionality reduction. The main goal is to compare their predictive perfor- mance versus model complexity and computational effort. Given the mathematical and practical convenience, the DP models are defined by specifying conditionally conjugate priors for their base distributions. Markov chain Monte Carlo (MCMC) methods, based on the Gibbs sampling and Adaptive Rejection Sampling (ARS), are the required methods for each variable to generate samples from its conditional distribution given the rest variables in the system. Clustering and classification performance of the models are examined on simulated and real data. We focus on the applications carried out on real clinical data regarding prostate cancer using this methodology as an aim to classify prostate cancer conditions. The implementation of DP-GMM using biomarkers only with age as a covariate increases the prediction accuracy as compared to the corresponding covariate-free model. Finally, the pro- posed classification model proved to be superior compared to the standard methods of support vector machines (SVM) and linear discriminant analysis (LDA) on three out of four applications on different data sets, including prostate cancer data.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: Adaptive rejection sampling, anomaly detection, Bayesian nonpara- metrics, biomarkers, dirichlet processes, doping, Gaussian mixtures, Markov chain Monte Carlo, multivariate Bayesian multilevel model, one-class classifier, predictive models, prostate cancer.
Colleges/Schools: College of Science and Engineering > School of Mathematics and Statistics
Supervisor's Name: Neocleous, Dr. Tereza
Date of Award: 2022
Depositing User: Theses Team
Unique ID: glathesis:2022-83094
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 24 Aug 2022 10:17
Last Modified: 24 Aug 2022 10:18
Thesis DOI: 10.5525/gla.thesis.83094

Actions (login required)

View Item View Item


Downloads per month over past year