Stewart, Catherine Helen (2010) Multilevel modelling of event history data: comparing methods appropriate for large datasets. PhD thesis, University of Glasgow.
Full text available as:

PDF
Download (5MB)  Preview 
Abstract
Abstract
When analysing medical or public health datasets, it may often be of interest to measure the time until a particular predefined event occurs, such as death from some disease. As it is known that the health status of individuals living within the same area tends to be more similar than for individuals from different areas, event times of individuals from the same area may be correlated. As a result, multilevel models must be used to account for the clustering of individuals within the same geographical location. When the outcome is time until some event, multilevel event history models must be used.
Although software does exist for fitting multilevel event history models, such as MLwiN, computational requirements mean that the use of these models is limited for large datasets. For example, to fit the proportional hazards model (PHM), the most commonly used event history model for modelling the effect of risk factors on event times, in MLwiN a Poisson model is fitted to a personperiod dataset. The personperiod dataset is created by rearranging the original dataset so that each individual has a line of data corresponding to every risk set they survive until either censoring or the event of interest occurs. When time is treated as a continuous variable so that each risk set corresponds to a distinct event time, as is the case for the PHM, the size of the personperiod dataset can be very large. This presents a problem for those working in public health as datasets used for measuring and monitoring public health are typically large. Furthermore, individuals may be followedup for a long period of time and this can also contribute to a large personperiod dataset. A further complication is that interest may be in modelling a rare event, resulting in a high proportion of censored observations. This can also be problematic when estimating multilevel event history models.
Since multilevel event history models are important in public health, the aim of this thesis is to develop these models so they can be fitted to large datasets considering, in particular, datasets with long periods of followup and rare events. Two datasets are used throughout the thesis to investigate three possible alternatives to fitting the multilevel proportional hazards model in MLwiN in order to overcome the problems discussed. The first is a moderatelysized Scottish dataset, which will be the main focus of the thesis, and is used as a ‘training dataset’ to explore the limitations of existing software packages for fitting multilevel event history models and also for investigating alternative methods. The second dataset, from Sweden, is used to test the effectiveness of each alternative method when fitted to a much larger dataset. The adequacy of the alternative methods are assessed on the following criteria: how effective they are at reducing the size of the personperiod dataset, how similar parameter estimates obtained from using methods are compared to the PHM and how easy they are to implement.
The first alternative method involves defining discretetime risk sets and then estimating discretetime hazard models via multilevel logistic regression models fitted to a personperiod dataset. The second alternative method involves aggregating the data of individuals within the same higherlevel units who have the same values for the covariates in a particular model. Aggregating the data like this means that one line of data is used to represent all such individuals since these individuals are at risk of experiencing the event of interest at the same time. This method is termed ‘grouping according to covariates’. Both continuoustime and discretetime event history models can be fitted to the aggregated personperiod dataset. The ‘grouping according to covariates’ method and the first method, which involves defining discretetime risk sets, are both implemented in MLwiN and pseudolikelihood methods of estimation are used. The third and final method to be considered, however, involves fitting Bayesian event history (frailty) models and using Markov chain Monte Carlo (MCMC) methods of estimation. These models are fitted in WinBUGS, a software package specially designed to make practical MCMC methods available to applied statisticians. In WinBUGS, an additive frailty model is adopted and a Weibull distribution is assumed for the survivor function.
Methodological findings were that the discretetime method led to a successful reduction in the continuoustime personperiod dataset; however, it was necessary to experiment with the length of time intervals in order to have the widest interval without influencing parameter estimates. The grouping according to covariates method worked best when there were, on average, a larger number of individuals per higherlevel unit, there were few risk factors in the model and little or none of the risk factors were continuous. The Bayesian method could be favourable as no data expansion is required to fit the Weibull model in WinBUGS and time is treated as a continuous variable. However, models took a much longer time to run using MCMC methods of estimation as opposed to likelihood methods. This thesis showed that it was possible to use a reparameterised version of the Weibull model, as well as a variance expansion technique, to overcome slow convergence by reducing correlation in the Markov chains. This may be a more efficient way to reduce computing time than running further iterations.
Item Type:  Thesis (PhD) 

Qualification Level:  Doctoral 
Keywords:  Survival analysis, multilevel models, large datasets, rare events, continuoustime Poisson model, discretetime model, Weibull distribution, MLwiN, WinBUGS 
Subjects:  R Medicine > RA Public aspects of medicine > RA0421 Public health. Hygiene. Preventive Medicine H Social Sciences > HA Statistics 
Colleges/Schools:  College of Science and Engineering > School of Mathematics and Statistics > Statistics 
Supervisor's Name:  Leyland, Professor Alastair H. 
Date of Award:  2010 
Depositing User:  Catherine H Stewart 
Unique ID:  glathesis:20102007 
Copyright:  Copyright of this thesis is held by the author. 
Date Deposited:  21 Jul 2010 
Last Modified:  10 Dec 2012 13:49 
URI:  http://theses.gla.ac.uk/id/eprint/2007 
Actions (login required)
View Item 