UVM Theses and Dissertations
Format:
Print
Author:
Gao, Like
Dept./Program:
Computer Science
Year:
2005
Degree:
PhD
Abstract:
In many applications continuous queries on data streams play an important role. Query optimization in these applications is to process these queries with satisfactory response time, answer accuracy, and query throughput. This optimization process becomes more important and challenging when these applications involve fast data streams, have limited system resources, and require continuous query evaluation. This dissertation studies the query optimization problem for applications in which data streams take the form of append-only time series (i.e., "streaming time series") and the queries are "continuous similarity-based queries" that involve nearest/near neighbor searches or other similarity-based operations. Little work has been done in providing optimization techniques for such queries. This dissertation proposes a general query optimization framework in which four optimization modules work together. Specifically, when precise evaluation is required, two modules, one for efficiently processing nearest/near neighbor searches and the other for generating query evaluation plans that reduce execution cost, serve to speed up the continuous query evaluation. When approximate evaluation is allowed, a module based on quality-driven evaluation strategies provides dynamical adjustments to the query evaluation. In addition, a feature selection module that pre-processes data streams helps to alleviate the curse of dimensionality the performance of similarity-.based search algorithms often degrades quickly as data dimensionality increases. This dissertation concentrates on developing the necessary new algorithms for the four optimization modules, and on evaluating these algorithms with extensive experiments. The experimental results demonstrate the effectiveness of these algorithms.