A Comparative Analysis of the Lossless Data Compression Methods for Unsparsed Tabular Data
This paper conducts a comparative analysis of the impact of unsparsing by data scaling on lossless data compression methods. The most commonly used data compression algorithms such as gzip, zlib, bzip2, and lzma are tested by their compression efficiency in unsparsed datasets by the performance metr...
Gespeichert in:
| Veröffentlicht in: | 2024 International Conference on Electrical, Computer and Energy Technologies (ICECET S. 1 - 6 |
|---|---|
| Hauptverfasser: | , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
25.07.2024
|
| Schlagworte: | |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | This paper conducts a comparative analysis of the impact of unsparsing by data scaling on lossless data compression methods. The most commonly used data compression algorithms such as gzip, zlib, bzip2, and lzma are tested by their compression efficiency in unsparsed datasets by the performance metrics such as compression time, decompression time, and compression ratio for both the original and scaled datasets. Five different data scaling techniques are investigated to transform the original data into unsparsed data such as min-max, robust, absolute max, standardization, and normalization. Our research reveals relationships between data scaling methodologies and compression performance, with disparities in compression efficiency and computational complexity. Fur-thermore, we investigate the effects of scaling on compression ratio and provide results for improving the understanding of the factors that influence lossless data compression methods for non-sparse tabular data. |
|---|---|
| DOI: | 10.1109/ICECET61485.2024.10698440 |