A disk-resident suffix tree index and generic framework for managing tunable indexes

Japp, Robert Philip (2004) A disk-resident suffix tree index and generic framework for managing tunable indexes. PhD thesis, University of Glasgow.

Full text available as:
[img]
Preview
PDF
Download (8MB) | Preview

Abstract

This thesis introduces two related technologies. The first is a disk-resident index for biological sequence data, and the second is a framework and toolkit for the management of operational parameters for applications of which this index is typical. The Top-Compressed Suffix Tree is a novel data structure that can be used to provide a scalable, disk-resident index for large sequences. This data structure is based on the suffix tree, but has been designed to overcome the problems associated with using such structures on secondary memory. Top-Compressed Suffix Trees can be constructed incrementally, allowing indexes to be created that are larger than the amount of available main memory. Correspondingly, querying such an index only requires part of the data structure to be resident in main memory, thus allowing support for on-demand faulting and eviction of index sections during search. Such an index may be of great benefit to scientists requiring efficient access to vast repositories of genomic data. The Generic Index Development and Operation Framework (GIDOF) is a framework and toolkit that supports various tasks relating to the management of operational parameters. The performance of an index's implementation is typically influenced by several operational parameters parameters that must be tuned carefully if optimum performance is to be obtained. Indexes implemented using GIDOF can be structured in such a way that values of selected operational parameters can be adjusted; resulting in an index implementation that can be tuned to suit a given workload or system environment. This thesis presents a detailed description of the design of both the Top-Compressed Suffix Tree and the algorithms that operate over it. Extensive performance measurements are then presented and discussed, covering such aspects of index performance as construction time, average query performance and the size of the completed index. An overview of the GIDOF parameter model and toolkit is then given together with examples of how this framework can be used to manage tunable indexes, such as the Top-Compressed Suffix Tree.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Additional Information: Adviser: Richard Cooper
Keywords: Computer science, Bioinformatics
Date of Award: 2004
Depositing User: Enlighten Team
Unique ID: glathesis:2004-74057
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 23 Sep 2019 15:33
Last Modified: 23 Sep 2019 15:33
URI: http://theses.gla.ac.uk/id/eprint/74057

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year