Bayesian methods for inference in biostatistical longitudinal studies and modelling of missing data

Alzahrani, Hanadi Mohammed (2024) Bayesian methods for inference in biostatistical longitudinal studies and modelling of missing data. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2024AlzahraniPhD.pdf] PDF
Download (18MB)

Abstract

Longitudinal studies repeatedly collect data from the same individuals over time to study long-term factors. A commonly used model in longitudinal studies is the linear mixed effects model, which considers the correlation between observations within individuals. There are two ways to fit the model in statistical fields: the Frequentist and Bayesian approaches. The Frequentist approach is widely used, while the Bayesian approach has become more common with computational advancements. The work in this thesis comprises a comparison study between the Frequentist linear mixed effects model and the Bayesian Hierarchical model, using simulated longitudinal data and data from a heart failure study (BIOSTAT-CHF). It was observed that inferences from both approaches were similar. However, the Bayesian approach offers an advantage by providing a probability distribution for the parameter estimates. This shows the probability of values falling within a certain range and incorporates prior information from previous studies into the inference.

In longitudinal studies, missing data is a common problem that can impact the statistical analysis estimates by producing biased estimates. A method that deals with non-ignorable missingness in the response using Correlated Random Effects (CRE) based on latent variables and Gibbs sampling has been proposed in the literature and has performed well in scenarios assuming semi-parametric modelling. However, when applied to linear mixed-effect modelling, the covariance matrix parameters had difficulty converging. To address this issue, the work in this thesis considers a weakly informative prior using the Inverse Wishart distribution. Additionally, this CRE method is unable to accommodate incomplete data in the analysis model explanatory variables. To address this problem, the work in this thesis proposed three methods to deal with missingness in the response and explanatory variables by adapting the CRE method.

Two proposed methods, the Two-Step and the GCRE-MAR methods, were designed to address non-ignorable missingness in the model response and ignorable missingness in the model explanatory variables. The GCRE-MNAR method was designed for non-ignorable missingness in both the model response and explanatory variables. In the Two-Step method, the CRE method was adapted by incorporating an additional step using the MICE algorithm, a common approach for handling MAR data and producing imputed datasets.

The CRE method is then applied to the imputed MICE datasets. The GCRE-MAR and GCRE-MNAR represent generalised versions of the CRE method. The GCRE-MAR method incorporates the incomplete explanatory variable model. The GCRE-MNAR method incorporates the incomplete explanatory variable model and the incomplete explanatory variable missingness process model. It considers correlated random effects between the incomplete explanatory variable model and the missingness process.

The proposed methods were compared with the CRE method and some baseline models using simulated longitudinal data for different numbers of repeated measures and missing proportion factors. The proposed methods perform similarly to the CRE method, given that the proposed methods consider missing data in both the response and explanatory variables. In contrast, the CRE method only has missing data in the response (no missing values are in the explanatory variables). Furthermore, the proposed methods outperform the available data method in out-of-sample predictive performance, and the parameter estimates closely match the parameters that generated the data.

Additionally, the proposed methods were applied to the BIOSTAT-CHF data, and the results were consistent regardless of the applied method. The correlated random effects indicated that the NT-proBNP missingness was MAR, and the eGFR missingness wasMNAR. Finally, the sensitivity analysis for the misspecified missingness mechanism for the proposed methods had a small impact on the overall results, whereas the misspecified response missingness model resulted in biased parameter estimates for some of the analysis model coefficients.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Additional Information: Supported by funding from the Ministry of Higher Education and King Saud bin Abdulaziz University for Health Sciences in Jeddah.
Subjects: H Social Sciences > HA Statistics
Q Science > QA Mathematics
Colleges/Schools: College of Science and Engineering > School of Mathematics and Statistics
Funder's Name: Ministry of Higher Education, Saudi Arabia, King Saud bin Abdulaziz University for Health Sciences
Supervisor's Name: MacDonald, Dr. Benn, Haig, Dr. Caroline and Cleland, Professor John
Date of Award: 2024
Depositing User: Theses Team
Unique ID: glathesis:2024-84782
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 06 Jan 2025 14:25
Last Modified: 06 Jan 2025 14:25
Thesis DOI: 10.5525/gla.thesis.84782
URI: https://theses.gla.ac.uk/id/eprint/84782

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year