MASC: A bitmap index encoding algorithm for fast data retrieval

The fast retrieval in archival traffic data is essential for network security and forensic analysis. A bitmap index is a data structure enabling fast search over large data collections in a limited time, but the space consumption is always a problem. WAH, PLWAH and COMPAX are proposed for compressin...

Full description

Saved in:
Bibliographic Details
Published in:IEEE International Conference on Communications (2003) pp. 1 - 6
Main Authors: Wen, Yuhao, Wang, Han, Chen, Zhen, Cao, Junwei, Peng, Guodong, Huang, Wen-Liang, Hu, Ziwei, Zhou, Jing, Guo, Jinghong
Format: Conference Proceeding Journal Article
Language:English
Published: IEEE 01.05.2016
Subjects:
ISSN:1938-1883
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The fast retrieval in archival traffic data is essential for network security and forensic analysis. A bitmap index is a data structure enabling fast search over large data collections in a limited time, but the space consumption is always a problem. WAH, PLWAH and COMPAX are proposed for compressing bitmap indexes for less storage. In this paper, a new bitmap index encoding scheme, named MASC, is proposed to further improve the compression ratio without impairing the query performance. Instead of being limited to a fixed length (31 bits) in PLWAH and COMPAX, the stride size can be as long as possible to encode consecutive zero bits and nonzero bits in a more compact way. Instead of piggyback used in PLWAH, a new structure in MASC called carrier is introduced as piggyback in PLWAH only carries an individual nonzero bit. We also generalize the traditional literal word concept in PLWAH and COMPAX. The validity of MASC encoding scheme is demonstrated with the application in Internet Traffic Archival system. Based on experiments with real Internet traffic data set from CAIDA, MASC has a better compression ratio than PLWAH and COMPAX2 without the penalty in query performance.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Conference-1
ObjectType-Feature-3
content type line 23
SourceType-Conference Papers & Proceedings-2
ISSN:1938-1883
DOI:10.1109/ICC.2016.7510827