Efficient Partitioning Method for Optimizing the Compression on Array Data

Array partitioning is an important research problem in array management area, since the partitioning strategies have important influence on storage, query evaluation, and other components in array management systems. Meanwhile, compression is highly needed for the array data due to its growing volum...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of computer science and technology Vol. 37; no. 5; pp. 1049 - 1067
Main Authors:	Han, Shuai, Liu, Xian-Min, Li, Jian-Zhong
Format:	Journal Article
Language:	English
Published:	Singapore Springer Nature Singapore 01.10.2022 Springer Springer Nature B.V Faculty of Computing,Harbin Institute of Technology,Harbin 150001,China
Subjects:	Algorithms Arrays Artificial Intelligence Computer Science Data Structures and Information Theory First principles Information Systems Applications (incl.Internet) Management systems Methods Optimization Partitioning Query processing Random sampling Regular Paper Software Engineering Theory of Computation NP-hard array partitioning greedy strategy compression performance
ISSN:	1000-9000, 1860-4749
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Array partitioning is an important research problem in array management area, since the partitioning strategies have important influence on storage, query evaluation, and other components in array management systems. Meanwhile, compression is highly needed for the array data due to its growing volume. Observing that array partitioning can affect the compression performance significantly, this paper aims to design an efficient partitioning method for array data to optimize the compression performance. As far as we know, there still lacks research efforts on this problem. In this paper, the problem of array partitioning for optimizing the compression performance (PPCP for short) is firstly proposed. We adopt a popular compression technique which allows to process queries on the compressed data without decompression. Secondly, because the above problem is NP-hard, two essential principles for exploring the partitioning solution are introduced, which can explain the core idea of the partitioning algorithms proposed by us. The first principle shows that the compression performance can be improved if an array can be partitioned into two parts with different sparsities. The second principle introduces a greedy strategy which can well support the selection of the partitioning positions heuristically. Supported by the two principles, two greedy strategy based array partitioning algorithms are designed for the independent case and the dependent case respectively. Observing the expensive cost of the algorithm for the dependent case, a further optimization based on random sampling and dimension grouping is proposed to achieve linear time cost. Finally, the experiments are conducted on both synthetic and real-life data, and the results show that the two proposed partitioning algorithms achieve better performance on both compression and query evaluation.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1000-9000 1860-4749
DOI:	10.1007/s11390-022-2371-7