Ask a Librarian

Threre are lots of ways to contact a librarian. Choose what works best for you.

HOURS TODAY

10:00 am - 4:00 pm

Reference Desk

CONTACT US BY PHONE

(802) 656-2022

Voice

(802) 503-1703

Text

MAKE AN APPOINTMENT OR EMAIL A QUESTION

Schedule an Appointment

Meet with a librarian or subject specialist for in-depth help.

Email a Librarian

Submit a question for reply by e-mail.

WANT TO TALK TO SOMEONE RIGHT AWAY?

Library Hours for Thursday, November 21st

All of the hours for today can be found below. We look forward to seeing you in the library.
HOURS TODAY
8:00 am - 12:00 am
MAIN LIBRARY

SEE ALL LIBRARY HOURS
WITHIN HOWE LIBRARY

MapsM-Th by appointment, email govdocs@uvm.edu

Media Services8:00 am - 7:00 pm

Reference Desk10:00 am - 4:00 pm

OTHER DEPARTMENTS

Special Collections10:00 am - 6:00 pm

Dana Health Sciences Library7:30 am - 11:00 pm

 

CATQuest

Search the UVM Libraries' collections

UVM Theses and Dissertations

Browse by Department
Format:
Print
Author:
Zhang, Yan
Dept./Program:
Computer Science
Year:
2008
Degree:
PhD
Abstract:
Learning from noisy data sources is a practical and important issue in Data Mining research. As errors continuously impose difficulty on discriminant analysis, identifying and removing suspicious data items is regarded as one of the most effective data preprocessing techniques, commonly referred to as data cleansing. Despite the effectiveness of these traditional approaches, many problems still remain unsolved. In this thesis, we study the problem of noise tolerant data mining, which addresses the problem of learning from noisy information sources. We construct a noise Modeling, Diagnosis, and Utilization (MDU) framework, which bridges up the gap between data preprocessing methods and the actual data mining step. This framework provides a conceptual working flow for a typical data mining task and outlines the importance of noise tolerant data mining.
Based on the proposed mining framework, we present an effective strategy that combines noise handling and classifier ensembling so as to build robust learning algorithms on noisy data. Our study concludes that systems that consider both accuracy and the diversity among base learners, will eventually lead to a classifier superior to other alternatives. Based on this strategy, we design two algorithms, Aggressive Classifier Ensembling (ACE) and Corrective Classification (C2). The essential idea is to appropriately take care of the diversity and accuracy among base learners for effective classifier ensembling. In our experimental results, we show that the error detection and correction module is effective in ACE and C2, C2 outperforms its two degenerates ACE and Bagging constantly, and C2 is more stable than Boosting and DECORATE. From the prospective of systematic noise modeling, we propose an association based learning method to trace and analyze the erroneous data items. We assume the occurrence of the noise in the data is associated with certain events in other attribute values, and use a set of associative corruption (AC) rules to model this type of noise. We apply association rule mining to extract noise patterns and design an effective method to handle noisy data.