UVM Theses and Dissertations
Format:
Print
Author:
Chen, Gong
Dept./Program:
Computer Science
Year:
2005
Degree:
MS
Abstract:
There are extensive endeavors toward mining frequent items or itemsets in a single data stream, but rare efforts have been made to explore sequential patterns among literals in different data streams. In this thesis, we define a challenging problem of mining frequent sequential patterns across multiple data streams. We propose an -efficient algorithm MILEl to manage the mining process. The proposed algorithm recursively utilizes the knowledge of existing patterns to make new patterns' mining fast. We also apply a state-of-the-art sequential pattern mining algorithm PrefixSpan which was designed for transaction databases to solve our problem. Extensive empirical results show that MILE is significantly faster than PrefixSpan. One unique feature of our algorithm is when some prior knowledge of the data distribution in the data streams is available, it can be incorporated into the mining process to further improve the performance of MILE. As MILE consumes more memory than PrefixSpan, we also propose a solution to balance the memory usage and time efficiency in memory limited environments. 1 MIning from muLtiple strEams.