ItCompress: an iterative semantic compression algorithm

Real datasets are often large enough to necessitate data compression. Traditional 'syntactic' data compression methods treat the table as a large byte string and operate at the byte level. The tradeoff in such cases is usually between the ease of retrieval (the ease with which one can retr...

Full description

Saved in:
Bibliographic Details
Published in:20th International Conference on Data Engineering (ICDE 2004) pp. 646 - 657
Main Authors: Jagadish, H.V., Ng, R.T., Beng Chin Ooi, Tung, A.K.H.
Format: Conference Proceeding
Language:English
Published: Los Alamitos CA IEEE 2004
IEEE Computer Society
Subjects:
ISBN:9780769520650, 0769520650
ISSN:1063-6382
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Real datasets are often large enough to necessitate data compression. Traditional 'syntactic' data compression methods treat the table as a large byte string and operate at the byte level. The tradeoff in such cases is usually between the ease of retrieval (the ease with which one can retrieve a single tuple or attribute value without decompressing a much larger unit) and the effectiveness of the compression. In this regard, the use of semantic compression has generated considerable interest and motivated certain recent works. We propose a semantic compression algorithm called ItCompress ITerative Compression, which achieves good compression while permitting access even at attribute level without requiring the decompression of a larger unit. ItCompress iteratively improves the compression ratio of the compressed output during each scan of the table. The amount of compression can be tuned based on the number of iterations. Moreover, the initial iterations provide significant compression, thereby making it a cost-effective compression technique. Extensive experiments were conducted and the results indicate the superiority of ItCompress with respect to previously known techniques, such as 'SPARTAN' and 'fascicles'.
ISBN:9780769520650
0769520650
ISSN:1063-6382
DOI:10.1109/ICDE.2004.1320034