Dynamic DNA and human disease: mathematical modelling and statistical inference for myotonic dystrophy type 1 and Huntington disease

Higham, Catherine F. (2013) Dynamic DNA and human disease: mathematical modelling and statistical inference for myotonic dystrophy type 1 and Huntington disease. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2013HighamPhD.pdf] PDF
Download (2MB)
Printed Thesis Information: https://eleanor.lib.gla.ac.uk/record=b2979514


Several human genetic diseases, including myotonic dystrophy type 1 (DM1) and Huntington disease (HD), are associated with inheriting an abnormally large unstable DNA simple sequence tandem repeat. These sequences mutate, by changing the number of repeats, many times during the lifetime of those affected, with a bias towards expansion. High repeat numbers are associated with early onset and disease severity. The presence of somatic instability compromises attempts to measure intergenerational repeat dynamics and infer genotype-phenotype relationships. Modelling the progression of repeat length throughout the lifetime of individuals has potential for improving prognostic information as well as providing a deeper understanding of the underlying biological process.

Dr Fernando Morales, Dr Anneli Cooper and others from the Monckton lab have characterised more than 25,000 de novo somatic mutations from a large cohort of DM1 patients using single-molecule polymerase chain reaction (SM-PCR). This rich dataset enables us to fully quantify levels of somatic instability across a representative DM1 population for the first time. We establish the relationship between inherited or progenitor allele length, age at sampling and levels of somatic instability using linear regression analysis. We show that the estimated progenitor allele length genotype is significantly better than modal repeat length (the current clinical standard) at predicting age of onset and this novel genotype is the major modifier of the age of onset phenotype. Further we show that somatic variation (adjusted for estimated progenitor allele length and age at sampling) is also a modifier of the age of onset phenotype. Several families form the large cohort, and we find that the level of somatic instability is highly heritable, implying a role for individual-specific trans-acting genetic modifiers.

We develop new mathematical models, the main focus of this thesis, by modifying a previously proposed stochastic birth process to incorporate possible contraction. A Bayesian likelihood approach is used as the basis for inference and parameter estimation. We use model comparison analysis to reveal, for the first time, that the expansion bias observed in the distributions of repeat lengths is likely to be the cumulative effect of many expansion and contraction events. We predict that mutation events can occur as frequently as every other day, which matches the timing of regular cell activities such as DNA repair and transcription, but not DNA replication.

Mutation rates estimated under the models described above are lower than expected among individuals with inherited repeat lengths less than 100 CTGs, suggesting that these rates may be suppressed at the lower end of the disease causing range. We propose that a length-specific effect may be operating within this range and test this hypothesis by introducing such an effect into the model. To calibrate this extended model, we use blood DNA data from DM1 individuals with small alleles (inherited repeat lengths less than 100 CTGs) and buccal DNA from HD individuals who almost always have inherited repeat lengths less than 100 CAGs. These datasets comprise single DNA molecules sized using SM-PCR. We find statistical support for a general length-specific effect which suppresses mutational rates among the smaller alleles and gives rise to a distinctive pattern in the repeat length distributions. In a novel application of this new model, fitted to a large cohort of DM1 individuals, we also show that this distinctive pattern may help identify individuals whose effective repeat length, with regards to somatic instability, is less than their actual repeat length. A plausible explanation for this distinction is that the expanded repeat tract is compromised by interruptions or other unusual features. For these individuals, we estimate the effective repeat length of their expanded repeat tracts and contribute to the on-going discussion about the effect of interruptions on phenotype.

The interpretation of the levels of somatic instability in many of the affected tissues in the triplet repeat diseases is hindered by complex cell compositions. We extend our model to two cell populations whose repeat lengths have different rates of mutation (fast and slow). Swami et al. have recently characterised repeat length distributions in end stage HD brain. Applying our model, we infer for each frontal cortex HD dataset the likely relative weight of these cell populations and their corresponding contribution towards somatic variation. By comparison with data from laser captured single cells we conclude that the neuronal repeat lengths most likely mutate at a higher rate than glial repeat lengths, explaining the characteristic skewed distributions observed in mixed cell tissue from the brain. We confirm that individual-specific mutation rates in neurons are, in addition to the inherited repeat length, a modifier of age of onset. Our results support a model of disease progression where individuals with the same inherited repeat length may reach age of onset, as much as 30 years earlier, because of greater somatic expansions underpinned by higher mutational rates. Therapies aimed at reducing somatic expansions would therefore have considerable benefits with regard to extending the age of onset.

Currently clinical diagnosis of DM1 is based on a measure of repeat length from blood cells, but variance in modal length only accounts for between 20 - 40% of the variance in age of onset and, therefore, is not a an accurate predictive tool. We show that in principle progenitor allele length improves the inverse correlation with age of onset over the traditional model length measure. We make use of second blood samples that are now available from 40 DM1 individuals. We show that inherited repeat length and the mutation rates underlying repeat length instability in blood, inferred from samples at two time points rather than one, are better predictors of age of onset than the traditional modal length measure. Our results are a step towards providing better prognostic information for DM1 individuals and their families. They should also lead to better predictions for drug/therapy response, which is emerging as key to successful clinical trials.

Microsatellites are another type of tandem repeat found in the genome with high levels of intergenerational and somatic mutation. Differences between individuals make microsatellites very useful biomarkers and they have many applications in forensics and medicine. As well as a general application to other expanded repeat diseases, the mathematical models developed here could be used to better understand instability at other mutational hotspots such as microsatellites.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: myotonic dystrophy type 1, Huntington disease, statistical inference, mathematical modelling.
Subjects: H Social Sciences > HA Statistics
Q Science > QA Mathematics
Q Science > QH Natural history > QH426 Genetics
Colleges/Schools: College of Medical Veterinary and Life Sciences > School of Molecular Biosciences
Supervisor's Name: Monckton, Prof. Darren G.
Date of Award: 2013
Depositing User: Mrs Catherine F Higham
Unique ID: glathesis:2013-4228
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 02 May 2013 14:25
Last Modified: 02 May 2016 13:10
URI: https://theses.gla.ac.uk/id/eprint/4228

Actions (login required)

View Item View Item


Downloads per month over past year