UVM Theses and Dissertations
Format:
Print
Author:
Fuchs, Dennis E.
Dept./Program:
Computer Science
Year:
2004
Degree:
M.S.
Abstract:
This thesis introduces STHoles+, a new multidimensional histogram technique extended from the state-of-the-art STHoles technique. STHoles uses query feedback (i.e., the actual number of tuples returned by a query) and a flexible bucket layout to improve the accuracy of the histogram by avoiding wasting the limited memory on modeling infrequently queried regions of the data space (formed by the query attributes). STHoles+ improves STHoles by storing bucket location information more compactly, with the effect of increasing the selectivity estimation accuracy. Specifically, STHoles+ quantizes each coordinate of the bucket relative to the coordinate of the smallest enclosing bucket. Then it stores the quantized coordinates as the minimum-length binary numbers, which are more compact than full precision floating-point numbers. This quantization allows STHoles+ to trade some precision of histogram bucket locations for more efficient memory usage, thus increasing the number of histogram buckets that can be stored in the histogram, thereby making the histogram more accurate. Experimental results show that STHoles+ outperforms STHoles on various data distributions, query distributions, and other factors such as available memory size, quantization resolution, and dimensionality of the data space.