UVM Theses and Dissertations
Format:
Print
Author:
Lu, Zhenyu
Dept./Program:
Computer Science
Year:
2011
Degree:
PhD
Abstract:
Classification techniques build predictive models with data described by a set of features (attributes) and associated labels (a discrete set of possible classes). One popular approach to classification is ensemble methods, which instead of relying on one single classification model such as Decision Trees (DT), combine a set of models for prediction. Ensemble methods have been successfully applied for many classification tasks, as well as other tasks such as relevance ranking and recommendation systems. An open question in ensemble methods is how to choose one model type (homogeneous ensemble), or a set of model types (heterogeneous ensemble) to construct ensembles.
This dissertation addresses four fundamental questions for heterogeneous ensembles : 1) whether we need heterogeneous ensembles: we demonstrate that heterogeneous ensembles could outperform homogeneous ensembles of any involving classification model alone; and 2) how to construct appropriate heterogeneous ensembles: we introduce an algorithm called Adaptive Heterogeneous Ensembles (AHE) to automatically discover appropriate combinations of classification model types; and 3) why heterogeneous ensembles work: through empirical analysis we demonstrate that heterogeneous ensembles outperform homogeneous ensembles because different classification model types complement each other; and 4) when heterogeneous ensembles work: we discover that the advantage of heterogeneous ensembles over other methods is increased when the target data have more class labels. In our work, the efficacy of AHE is experimentally validated in the context of active learning. Extensive experiments on 18 DCI data sets show that AHE outperforms its homogeneous variants, as well as bagging, boosting and the random subspace method (RSM) with random sampling.
This dissertation addresses four fundamental questions for heterogeneous ensembles : 1) whether we need heterogeneous ensembles: we demonstrate that heterogeneous ensembles could outperform homogeneous ensembles of any involving classification model alone; and 2) how to construct appropriate heterogeneous ensembles: we introduce an algorithm called Adaptive Heterogeneous Ensembles (AHE) to automatically discover appropriate combinations of classification model types; and 3) why heterogeneous ensembles work: through empirical analysis we demonstrate that heterogeneous ensembles outperform homogeneous ensembles because different classification model types complement each other; and 4) when heterogeneous ensembles work: we discover that the advantage of heterogeneous ensembles over other methods is increased when the target data have more class labels. In our work, the efficacy of AHE is experimentally validated in the context of active learning. Extensive experiments on 18 DCI data sets show that AHE outperforms its homogeneous variants, as well as bagging, boosting and the random subspace method (RSM) with random sampling.