Statistical issues in modelling the ancestry from Y-chromosome and surname data

Sharif, Maarya (2012) Statistical issues in modelling the ancestry from Y-chromosome and surname data. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2012sharifphd.pdf] PDF
Download (4MB)
Printed Thesis Information:


A considerable industry has grown-up around genealogical inference from genetic testing, supplementing more traditional genealogical techniques but with very limited quantification of uncertainty. In many societies Y-chromosomes are co-inherited with surnames and as such passed down from father to son. This thesis seeks to explore what the correlation can say about ancestry. In particular it is concerned with estimation of the time to the most recent common paternal ancestor (TMRCA) for pairs of males who are not known to be directly related but share the same surname, based on the repeat number at short tandem repeat (STR) markers on their Y-chromosomes.
We develop a model of TMRCA estimation based on the difference in repeat numbers in pairs of male haplotypes using a Bayesian framework and Markov-Chain Monte-Carlo techniques, such as adaptive Metropolis-Hastings algorithm. The model incorporates the process of STR discovery and the calibration of mutation rates, which can differ across STRs. In simulation studies, we find that the estimates of TMRCA are rather robust to the ascertainment process and the way in which it is modelled. However, they are affected by the site-specific mutation rates at the typed STRs. Indeed sequencing the fastest mutating STRs yields a lower error in the estimated TMRCA than random STRs. In the British context, we extend our model to include additional information such as the haplogroup status (as determined from single nucleotide polymorphisms, SNPs) of the pair of males, as well as the frequency and origin of the surname. In general, the effect of this is to reduce estimates of the TMRCA for pairs of males with an older TMRCA, typically outwith the period of surname establishment (about 500-700 years ago). In the genealogical context, incorporating surname frequency (within the prior distribution) results in lower estimates of TMRCA for pairs of males who appear to have diverged from a common male ancestor since the period of surname establishment. In addition, we include uncertainty in the years per generation conversion factor in our model.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: Y-chromosome, surname, most recent common ancestor, haplotype, haplogroup, British, genealogy, short tandem repeat, generation.
Subjects: Q Science > QH Natural history > QH426 Genetics
H Social Sciences > HA Statistics
Colleges/Schools: College of Science and Engineering > School of Mathematics and Statistics > Statistics
Supervisor's Name: Macaulay, Dr. Vincent A.
Date of Award: 2012
Depositing User: Ms Maarya Sharif
Unique ID: glathesis:2012-3407
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 08 Jun 2012
Last Modified: 10 Dec 2012 14:06

Actions (login required)

View Item View Item


Downloads per month over past year