UVM Theses and Dissertations
Format:
Print
Author:
He, Yu
Dept./Program:
Computer Science
Year:
2006
Degree:
MS
Abstract:
This thesis studies a problem about mining frequent patterns with wildcards. Existing frequent pattern mining algorithms with gaps (or wildcards) allow users to find patterns with user-specified gap constraints, while it is often nontrivial to have users specify such gap constraints, and the change of any gap values will often result in the repeat of the whole algorithm. It is thus desirable to develop a solution to "automatically" and "efficiently" find frequent patterns so as to assist users in discovering interesting patterns, as well as gaining insight into the gap constraints. In this thesis, we study a frequent pattern mining problem to meet this need. Given a sequence T and a support threshold min_sup, our aim is to find all patterns with wildcards whose support is no less than min_sup. We have discovered and proved an ordering property for the proposed problem; and we have designed an algorithm called PMW2 (P̲attern M̲ining W̲ith W̲ildcards) for solving this problem with the ordering property. In the framework of PMW2, several heuristic methods are designed to estimate the maximum support of a candidate pattern, and the experiments show that our one-way scan heuristic performs best on the average for approximating the maximum support of a given pattern with about 90 percent accuracy.