Multilevel modelling of event history data: comparing methods appropriate for large datasets

Stewart, Catherine Helen (2010) Multilevel modelling of event history data: comparing methods appropriate for large datasets. PhD thesis, University of Glasgow.

Full text available as:
Download (5MB) | Preview
Printed Thesis Information:


When analysing medical or public health datasets, it may often be of interest to measure the time until a particular pre-defined event occurs, such as death from some disease. As it is known that the health status of individuals living within the same area tends to be more similar than for individuals from different areas, event times of individuals from the same area may be correlated. As a result, multilevel models must be used to account for the clustering of individuals within the same geographical location. When the outcome is time until some event, multilevel event history models must be used.

Although software does exist for fitting multilevel event history models, such as MLwiN, computational requirements mean that the use of these models is limited for large datasets. For example, to fit the proportional hazards model (PHM), the most commonly used event history model for modelling the effect of risk factors on event times, in MLwiN a Poisson model is fitted to a person-period dataset. The person-period dataset is created by rearranging the original dataset so that each individual has a line of data corresponding to every risk set they survive until either censoring or the event of interest occurs. When time is treated as a continuous variable so that each risk set corresponds to a distinct event time, as is the case for the PHM, the size of the person-period dataset can be very large. This presents a problem for those working in public health as datasets used for measuring and monitoring public health are typically large. Furthermore, individuals may be followed-up for a long period of time and this can also contribute to a large person-period dataset. A further complication is that interest may be in modelling a rare event, resulting in a high proportion of censored observations. This can also be problematic when estimating multilevel event history models.

Since multilevel event history models are important in public health, the aim of this thesis is to develop these models so they can be fitted to large datasets considering, in particular, datasets with long periods of follow-up and rare events. Two datasets are used throughout the thesis to investigate three possible alternatives to fitting the multilevel proportional hazards model in MLwiN in order to overcome the problems discussed. The first is a moderately-sized Scottish dataset, which will be the main focus of the thesis, and is used as a ‘training dataset’ to explore the limitations of existing software packages for fitting multilevel event history models and also for investigating alternative methods. The second dataset, from Sweden, is used to test the effectiveness of each alternative method when fitted to a much larger dataset. The adequacy of the alternative methods are assessed on the following criteria: how effective they are at reducing the size of the person-period dataset, how similar parameter estimates obtained from using methods are compared to the PHM and how easy they are to implement.

The first alternative method involves defining discrete-time risk sets and then estimating discrete-time hazard models via multilevel logistic regression models fitted to a person-period dataset. The second alternative method involves aggregating the data of individuals within the same higher-level units who have the same values for the covariates in a particular model. Aggregating the data like this means that one line of data is used to represent all such individuals since these individuals are at risk of experiencing the event of interest at the same time. This method is termed ‘grouping according to covariates’. Both continuous-time and discrete-time event history models can be fitted to the aggregated person-period dataset. The ‘grouping according to covariates’ method and the first method, which involves defining discrete-time risk sets, are both implemented in MLwiN and pseudo-likelihood methods of estimation are used. The third and final method to be considered, however, involves fitting Bayesian event history (frailty) models and using Markov chain Monte Carlo (MCMC) methods of estimation. These models are fitted in WinBUGS, a software package specially designed to make practical MCMC methods available to applied statisticians. In WinBUGS, an additive frailty model is adopted and a Weibull distribution is assumed for the survivor function.

Methodological findings were that the discrete-time method led to a successful reduction in the continuous-time person-period dataset; however, it was necessary to experiment with the length of time intervals in order to have the widest interval without influencing parameter estimates. The grouping according to covariates method worked best when there were, on average, a larger number of individuals per higher-level unit, there were few risk factors in the model and little or none of the risk factors were continuous. The Bayesian method could be favourable as no data expansion is required to fit the Weibull model in WinBUGS and time is treated as a continuous variable. However, models took a much longer time to run using MCMC methods of estimation as opposed to likelihood methods. This thesis showed that it was possible to use a re-parameterised version of the Weibull model, as well as a variance expansion technique, to overcome slow convergence by reducing correlation in the Markov chains. This may be a more efficient way to reduce computing time than running further iterations.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: Survival analysis, multilevel models, large datasets, rare events, continuous-time Poisson model, discrete-time model, Weibull distribution, MLwiN, WinBUGS
Subjects: R Medicine > RA Public aspects of medicine > RA0421 Public health. Hygiene. Preventive Medicine
H Social Sciences > HA Statistics
Colleges/Schools: College of Science and Engineering > School of Mathematics and Statistics > Statistics
Supervisor's Name: Leyland, Professor Alastair H.
Date of Award: 2010
Depositing User: Catherine H Stewart
Unique ID: glathesis:2010-2007
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 21 Jul 2010
Last Modified: 10 Dec 2012 13:49

Actions (login required)

View Item View Item


Downloads per month over past year