The effect on inferences of population size of the sampling scheme for intraspecific DNA sequences

Whoriskey, Suzy (2020) The effect on inferences of population size of the sampling scheme for intraspecific DNA sequences. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2020WhoriskeyPhDc.pdf] PDF
Download (12MB)


Variation in samples of DNA sequences from within one species can be informative about the demographic processes that have affected that species, revealing signals of migration patterns and population size changes in the past. The demographic models that are fitted to the data might vary, as might the way the data are used, but one almost ubiquitous assumption is that the samples sequenced in the study are randomly chosen. Yet this is rarely plausible either because random sampling is practically impossible to perform or indeed because the samples for analysis are very consciously selected in some non-random way.

This thesis explores the robustness of a particular flexible class of models used for inference of variable population size, the so-called skyline plot methods, to non-randomness of sampling by taking a simulation approach. The particular sampling scheme investigated takes sequences belonging to one subtree (or haplogroup) of the genealogy of a non-recombining locus. Pitfalls of analyses ignoring the sampling scheme are reported and a recommendation for the interpretation of such analyses is made.

This work uses the Bayesian skyline plot model to infer population sizes and in simulation settings this model proves to be accurate in estimating population size as a function of time, from random samples. When a non-random sample defined by a haplogroup is analysed, the model can infer the shape of the population curve well but fails to capture the magnitude, when compared to the population curve inferred from a random sample or to the true population curve. Functional data analysis techniques were used to explore the relationship between the population curves inferred from random and non-random samples. After establishing that there is indeed a strong relationship between the two, the goal was to develop a straightforward post hoc correction to the inferred population curve from the non-random sample that is easy to apply and permits practitioners to allow for the violations of model assumptions caused by the non-random sample, so obtaining a more reliable estimate of population size. This is illustrated by applying the approach to samples of sequences taken from human mitochondrial DNA. The correction uses information on the prevalence of the mutation defining the non-random subtree.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: Population size, mitochondrial DNA, Bayesian Skyline Plot, mutation, Coalescent process, haplogroup, non-random sampling, phylogenetic analysis, bias, population genetics.
Subjects: Q Science > QA Mathematics
Colleges/Schools: College of Science and Engineering > School of Mathematics and Statistics
Supervisor's Name: Macaulay, Dr. Vincent and Gupta, Dr. Mayetri
Date of Award: 2020
Depositing User: Dr Suzy Whoriskey
Unique ID: glathesis:2020-81328
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 05 May 2020 10:43
Last Modified: 05 May 2020 14:36

Actions (login required)

View Item View Item


Downloads per month over past year