A Comparative Analysis of the Lossless Data Compression Methods for Unsparsed Tabular Data

This paper conducts a comparative analysis of the impact of unsparsing by data scaling on lossless data compression methods. The most commonly used data compression algorithms such as gzip, zlib, bzip2, and lzma are tested by their compression efficiency in unsparsed datasets by the performance metr...

Full description

Saved in:
Bibliographic Details
Published in:2024 International Conference on Electrical, Computer and Energy Technologies (ICECET pp. 1 - 6
Main Authors: Erkus, Ekin Can, Bursali, Ahmet
Format: Conference Proceeding
Language:English
Published: IEEE 25.07.2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper conducts a comparative analysis of the impact of unsparsing by data scaling on lossless data compression methods. The most commonly used data compression algorithms such as gzip, zlib, bzip2, and lzma are tested by their compression efficiency in unsparsed datasets by the performance metrics such as compression time, decompression time, and compression ratio for both the original and scaled datasets. Five different data scaling techniques are investigated to transform the original data into unsparsed data such as min-max, robust, absolute max, standardization, and normalization. Our research reveals relationships between data scaling methodologies and compression performance, with disparities in compression efficiency and computational complexity. Fur-thermore, we investigate the effects of scaling on compression ratio and provide results for improving the understanding of the factors that influence lossless data compression methods for non-sparse tabular data.
DOI:10.1109/ICECET61485.2024.10698440