UVM Theses and Dissertations
Format:
Print
Author:
Chen, Qijun
Dept./Program:
Computer Science
Year:
2004
Degree:
M.S.
Abstract:
Inductive Learning is a typical learning task in machine learning. Given a data set, inductive learning aims to discover patterns in the data and form concepts that describe the data. Research in inductive learning has sustained for decades. However, much of the existing work focuses on a relatively small amount of data, which will be infeasible in large, realistic situations. With the rapid advancement of information technology, scalability has become a necessity for learning algorithms to deal with large, real-world data repositories. This thesis aims to design some scalable inductive learning algorithms. Scalability is defined as the ability to process large data sets or handle data sets that are distributed at different sites. In our work, scalability is accomplished through a data reduction technique, which partitions a large data set into subsets, applies the learning algorithm on each subset sequentially or concurrently, and then integrates the learned results.
Five strategies to achieve scalability (Rule-Example Conversion, Rule Weighting, Iteration, Good Rule Selection, and Data Dependent Rule Selection) have been identified and their corresponding scalable schemes have been designed and developed. A substantial number of experiments have been performed to evaluate these schemes. Experimental results demonstrate that through data reduction some of our schemes can effectively generate accurate classifiers from inaccurate classifiers generated from data subsets. Furthermore, our schemes require significantly less training time than that of generating a global classifier. Among the five investigated strategies, Iteration and Data Dependent Rule Selection are the two most effective strategies in respect to the classification accuracy of the generated classifiers and the variety of the data sets that can be dealt with. These two strategies, combined with a Voting strategy, can generate schemes, which outperform Voting consistently.
Five strategies to achieve scalability (Rule-Example Conversion, Rule Weighting, Iteration, Good Rule Selection, and Data Dependent Rule Selection) have been identified and their corresponding scalable schemes have been designed and developed. A substantial number of experiments have been performed to evaluate these schemes. Experimental results demonstrate that through data reduction some of our schemes can effectively generate accurate classifiers from inaccurate classifiers generated from data subsets. Furthermore, our schemes require significantly less training time than that of generating a global classifier. Among the five investigated strategies, Iteration and Data Dependent Rule Selection are the two most effective strategies in respect to the classification accuracy of the generated classifiers and the variety of the data sets that can be dealt with. These two strategies, combined with a Voting strategy, can generate schemes, which outperform Voting consistently.