Genome Rearrangement Problems

Christie, David Alan (1998) Genome Rearrangement Problems. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 10992307.pdf] PDF
Download (84MB)

Abstract

Various global rearrangements of permutations, such as reversals and transpositions, have recently become of interest because of their applications in computational molecular biology. A reversal is an operation that reverses the order of a substring of a permutation. A transposition is an operation that swaps two adjacent substrings of a permutation. The problem of determining the smallest number of reversals required to transform a given permutation into the identity permutation is called sorting by reversals. Similar problems can be defined for transpositions and other global rearrangements. Related to sorting by reversals is the problem of establishing the reversal diameter. The reversal diameter of Sn (the symmetric group on n elements) is the maximum number of reversals required to sort a permutation of length n. Of course, diameter problems can be posed for other global rearrangements. These various problems are of interest because the permutations can be used to represent sequences of genes in chromosomes, and the global rearrangements then represent evolutionary events. As a result, we call these problems genome rearrangement problems. Genome rearrangement problems seem to be unlike previously studied algorithmic problems on sequences, so new methods have had to be developed to deal with them. These methods predominantly employ graphs to model permutation structure. However, even using these methods, often a genome rearrangement problem has no obvious polynomial-time algorithm, and in some cases can be shown to be NP-hard. For example, the problem of sorting by reversals is NP-hard, whereas the computational complexity of sorting by transpositions is open. For problems like these, it is natural to seek polynomial-time approximation algorithms that achieve an approximation guarantee. In this thesis, we study several genome rearrangement problems as interesting and challenging algorithmic problems in their own right, including some problems for which the global rearrangement has no immediate biological equivalent. For example, we define a block-interchange to be a rearrangement that swaps any two substrings of the permutation. We examine, in particular, how the graph theoretic models relate to the genome rearrangement problems that we study. The major new results contained in this thesis are as follows: We present a 3/2-approximation algorithm for sorting by reversals. This is the best known approximation algorithm for the problem, and improves upon the 7/4 approximation bound of the previous best algorithm. We give a polynomial-time algorithm for a significant special case of sorting by reversals, thereby disproving a conjecture of Kececioglu and Sankoff, who had suggested that this special case was likely to be NP-hard. We analyse the structure of the so-called cpcle graph of a permutation in the context of sorting by transpositions, and thereby gain a deeper insight into this problem. Among the consequences are; a tighter lower bound for the problem, a simpler 3/2-aproximation algorithm than had previously been described, and algorithms that, in empirical tests, almost always find the exact transposition distance of random permutations. We introduce a natural generalisation of sorting by transpositions called sorting by block-interchanges, and present a polynomial-time algorithm for this problem. We initiate the study of analogous problems on strings over a fixed length alphabet. We establish upper and lower bounds and diameter results for the problems over a binary alphabet. We also prove that the problems analogous to sorting by reversals and sorting by block-interchanges are NP-hard. (Abstract shortened by ProQuest.).

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: Bioinformatics, Computer science
Date of Award: 1998
Depositing User: Enlighten Team
Unique ID: glathesis:1998-74685
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 27 Sep 2019 17:10
Last Modified: 27 Sep 2019 17:10
URI: https://theses.gla.ac.uk/id/eprint/74685

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year