Ask a Librarian

Threre are lots of ways to contact a librarian. Choose what works best for you.

HOURS TODAY

10:00 am - 4:00 pm

Reference Desk

CONTACT US BY PHONE

(802) 656-2022

Voice

(802) 503-1703

Text

MAKE AN APPOINTMENT OR EMAIL A QUESTION

Schedule an Appointment

Meet with a librarian or subject specialist for in-depth help.

Email a Librarian

Submit a question for reply by e-mail.

WANT TO TALK TO SOMEONE RIGHT AWAY?

Library Hours for Thursday, November 21st

All of the hours for today can be found below. We look forward to seeing you in the library.
HOURS TODAY
8:00 am - 12:00 am
MAIN LIBRARY

SEE ALL LIBRARY HOURS
WITHIN HOWE LIBRARY

MapsM-Th by appointment, email govdocs@uvm.edu

Media Services8:00 am - 7:00 pm

Reference Desk10:00 am - 4:00 pm

OTHER DEPARTMENTS

Special Collections10:00 am - 6:00 pm

Dana Health Sciences Library7:30 am - 11:00 pm

 

CATQuest

Search the UVM Libraries' collections

UVM Theses and Dissertations

Browse by Department
Format:
Print
Author:
Al-Kateb, Mohammed
Dept./Program:
Computer Science
Year:
2011
Degree:
PhD
Abstract:
A large class of real-world applications deal with continuous, unbounded, and high-volume data streams. Methods for data management over data streams are, thus, important for this class of applications. This journal-format dissertation comprises four research problems on data management over data streams. The dissertation is organized in two parts, each part covers two research problems.
In the first part of the dissertation, we study two research problems regarding reservoir sampling - a well-known technique for maintaining a fixed-sized random sample over a stream of data. The first study pertains to situations in which it is necessary and/or advantageous to adaptively adjust the size of a reservoir in the middle of sampling as the data characteristics or the application behaviors change. We conduct a theoretical study on the effects of adjusting the size of a reservoir while sampling is in progress, and present a novel algorithm for maintaining the reservoir sample after the reservoir size is adjusted. The second study applies to applications in which an input stream may be naturally heterogeneous, i.e., composed of sub-streams whose statistical properties may vary considerably. We deal with this heterogeneity problem by presenting an adaptive stratified reservoir sampling algorithm, which utilizes our adaptive-size reservoir algorithm, and demonstrating through experiments the superior sample quality and the adaptivity of the algorithm.
In the second part of the dissertation, we study two research problems on supporting temporal coalescing - a key operation enabling the evaluation of interval predicates and functions on temporal tuples - over data streams. The first study focuses on the coalescing operator applied to the processing of temporal queries over windowed data stream. We distinguish between eager coalescing and lazy coalescing, the two known coalescing schemes. With these two schemes, we first present algorithms for updating a window extent for both tuple-based and time-based windows, and then address the problem of optimally selecting between eager and lazy coalescing for multiple concurrent queries. Through extensive performance study, the two schemes are compared and the optimal selection is demonstrated. The second research addresses the problem of load shedding from a data stream in the presence of temporal coalescing, with a focus on the case of insufficient memory to keep all tuples in the window specified, in a continuous temporal selection query. We propose a novel accuracy metric and a new load shedding algorithm that are suitable for this class of queries. In the performance study, we show that the proposed algorithm far outperforms the conventional random load shedding algorithm with regard to the achiveved accuracy.