Methods for demographic inference from single-nucleotide polymorphism data

Mair, Colette (2012) Methods for demographic inference from single-nucleotide polymorphism data. PhD thesis, University of Glasgow.

Full text available as:
[img]
Preview
PDF
Download (10MB) | Preview

Abstract

The distribution of the current human population is the result of many complex historical and prehistorical demographic events that have shaped variation in the human genome. Genomic dissimilarities between individuals from different geographical regions can potentially unveil something of these processes. The greatest differences lie between, and within, African populations and most research suggests the origin of modern humans lies within Africa. However, differing models have been proposed to model the evolutionary processes leading to humans inhabiting most of the world. This thesis develops a hypothesis test shown to be powerful in distinguishing between two such models. The first ("migration") model assumes the population of interest is divided into subpopulations that exchange migrants at a constant rate arbitrarily far back in the past, whilst the second ("isolation") model assumes that an ancestral population iteratively segregates into subpopulations that evolve independently. Although both models are simplistic, they do capture key aspects of the opposing theories of the history of modern humans. Given single nucleotide polymorphism (SNP) data from two subpopulations, the method described here tests a global null hypothesis that the data are from an isolation model. The test takes a parametric bootstrap approach, iteratively simulating data under the null hypothesis and computing a set of summary statistics shown to be able to distinguish between the two models. Each summary statistic forms the basis of a statistical hypothesis test where the observed value of the statistic is compared to the simulated values. The global null hypothesis is accepted if each individual test is accepted. A correction for multiple comparisons is used to control the type I error rate of this compound test. Extensions to this hypothesis test are given which adapt it to deal with SNP ascertainment and to better handle large genomic data sets. The methods are illustrated on data from the HapMap project using two Kenyan populations and the Japanese and Yoruba populations, after the method has been validated by simulation, where the `true' model is known.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Subjects: H Social Sciences > HA Statistics
Colleges/Schools: College of Science and Engineering > School of Mathematics and Statistics > Statistics
Funder's Name: UNSPECIFIED
Supervisor's Name: Macaulay, Dr. Vincent
Date of Award: 2012
Depositing User: Miss Colette Mair
Unique ID: glathesis:2012-3781
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 19 Dec 2012 11:56
Last Modified: 19 Dec 2012 11:59
URI: http://theses.gla.ac.uk/id/eprint/3781

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year