UVM Theses and Dissertations
Format:
Print
Author:
Singh, Divya R.
Dept./Program:
Computer Science
Year:
2005
Degree:
MS
Abstract:
This thesis presents an application of a generalized suffix tree extended by the use of frequency of patterns, to perform accurate and faster biological sequence analysis as an improvement on the computation time of existing tools in this area. This application utilizes the knowledge of frequency of prefixes shared by two or more sequences in a generalized suffix tree, to identify with good accuracy, sequences in a database which are highly similar to a given query sequence. The speedup is achieved by reducing the size of the database to very few sequence which are found closest to the query sequence in question. This results in a faster computation. It can also be viewed as an extension of exact pattern matching, where cumulative results of matched patterns indicate the closest sequences. The specific strategy is to pick matched patterns of the query sequence and identify sequences in the database which share a large number of these matched patterns. Experiments conducted in this study demonstrate that this application outperforms BLAST by obtaining a better computation time, while preserving the accuracy of alignments.