Watt, Douglas Campbell
Categorising variables in medical contexts.
PhD thesis, University of Glasgow.
Full text available as:
Many medical studies involve modelling the relationship between an outcome
variable and a series of one or more continuous/interval scaled discrete explanatory
variables. It is common practice in many of these studies for some, or indeed all, of
the continuous/interval scaled discrete explanatory factors to be incorporated into the
analysisi n a categorisedo r groupedf orm.
One of the main reasons for adopting this methodology is that it will simplify
the interpretation of results for clinicians and hopefully patients. It is often easier to
interpret conclusions based on an explanatory variable with two or three levels (i. e.
categorisations) than from a continuous/interval scaled discrete explanatory. The
main drawback with this technique is in identifying the categorisation points. Often
preconceived and/or historical grounds are the determining factor used to decide the
location of these categorisation points. However, this may not give rise to sensible or
justifiable locations for such points for a given application.
This thesis will consider the analysis of data from various types of medical
study and, by applying non-parametric statistical methodology, provide alternative,
more logical rationale for identifying categorisation points. The analysis will
concentrate on data from three specific types of medical study -a cohort study with a
binary outcome, a matched case/control study and survival analysis.
In a cohort study with a binary response the standard methodology of logistic
regression will be applied and extended using a non-parametric logistic approach to
identify potential categorisation points. As a further extension consideration will be
given to the more formal methodology of examining the first derivative of the
resultant non-parametric logistic regression to provide the location of categorisation
In matched caselcontrol studies the standard technique used for analysis is
conditional logistic regression. The theory and application of this model will be
discussed before considering two new, alternative, non-parametric approaches to
analysing matched case/control studies with an interval scaled discrete explanatory
variable. The proposedn on-parametrica pproachesw ill be testedt o investigatet heir
usefulness in identification of categorisations for the explanatory variable. Possible
extensionst o thesea pproachesto incorporatea single continuouse xplanatoryv ariable
will be discussed. In order to compare the two non-parametric approaches a
simulation study will be carried out to investigate the power of these approaches.
Finally, consideration will be given to the analysis of survival data. Initially,
the standard methodologies of the Kaplan and Meier estimator in the absence of
explanatory variables and Cox's Proportional Hazards model to incorporate
explanatory variables will be discussed. A more detailed examination of three
alternative methods for analysing survival data in the presence of a single continuous
explanatory will be carried out. Each of the methods will be applied in turn to a
survival analysis problem to investigate if any categorisationsc an be identified for a
single continuous explanatory variable. Further simulations will be undertaken to
compare the three methods across a variety of scenarios.
Actions (login required)