A Comparative Analysis of the Lossless Data Compression Methods for Unsparsed Tabular Data

This paper conducts a comparative analysis of the impact of unsparsing by data scaling on lossless data compression methods. The most commonly used data compression algorithms such as gzip, zlib, bzip2, and lzma are tested by their compression efficiency in unsparsed datasets by the performance metr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2024 International Conference on Electrical, Computer and Energy Technologies (ICECET S. 1 - 6
Hauptverfasser: Erkus, Ekin Can, Bursali, Ahmet
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 25.07.2024
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper conducts a comparative analysis of the impact of unsparsing by data scaling on lossless data compression methods. The most commonly used data compression algorithms such as gzip, zlib, bzip2, and lzma are tested by their compression efficiency in unsparsed datasets by the performance metrics such as compression time, decompression time, and compression ratio for both the original and scaled datasets. Five different data scaling techniques are investigated to transform the original data into unsparsed data such as min-max, robust, absolute max, standardization, and normalization. Our research reveals relationships between data scaling methodologies and compression performance, with disparities in compression efficiency and computational complexity. Fur-thermore, we investigate the effects of scaling on compression ratio and provide results for improving the understanding of the factors that influence lossless data compression methods for non-sparse tabular data.
DOI:10.1109/ICECET61485.2024.10698440