Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization

Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis are becoming more challenging for scientific research. In this work, we design a new error-controlled lossy compression algorithm for large-scale scientific data. Our key contribution is...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings - IEEE International Parallel and Distributed Processing Symposium s. 1129 - 1139
Hlavní autoři: Dingwen Tao, Sheng Di, Zizhong Chen, Cappello, Franck
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.05.2017
Témata:
ISSN:1530-2075
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis are becoming more challenging for scientific research. In this work, we design a new error-controlled lossy compression algorithm for large-scale scientific data. Our key contribution is significantly improving the prediction hitting rate (or prediction accuracy) for each data point based on its nearby data values along multiple dimensions. We derive a series of multilayer prediction formulas and their unified formula in the context of data compression. One serious challenge is that the data prediction has to be performed based on the preceding decompressed values during the compression in order to guarantee the error bounds, which may degrade the prediction accuracy in turn. We explore the best layer for the prediction by considering the impact of compression errors on the prediction accuracy. Moreover, we propose an adaptive error-controlled quantization encoder, which can further improve the prediction hitting rate considerably. The data size can be reduced significantly after performing the variable-length encoding because of the uneven distribution produced by our quantization encoder. We evaluate the new compressor on production scientific data sets and compare it with many other state-of-the-art compressors: GZIP, FPZIP, ZFP, SZ-1.1, and ISABELA. Experiments show that our compressor is the best in class, especially with regard to compression factors (or bit-rates) and compression errors (including RMSE, NRMSE, and PSNR). Our solution is better than the second-best solution by more than a 2x increase in the compression factor and 3.8x reduction in the normalized root mean squared error on average, with reasonable error bounds and user-desired bit-rates.
AbstractList Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis are becoming more challenging for scientific research. In this work, we design a new error-controlled lossy compression algorithm for large-scale scientific data. Our key contribution is significantly improving the prediction hitting rate (or prediction accuracy) for each data point based on its nearby data values along multiple dimensions. We derive a series of multilayer prediction formulas and their unified formula in the context of data compression. One serious challenge is that the data prediction has to be performed based on the preceding decompressed values during the compression in order to guarantee the error bounds, which may degrade the prediction accuracy in turn. We explore the best layer for the prediction by considering the impact of compression errors on the prediction accuracy. Moreover, we propose an adaptive error-controlled quantization encoder, which can further improve the prediction hitting rate considerably. The data size can be reduced significantly after performing the variable-length encoding because of the uneven distribution produced by our quantization encoder. We evaluate the new compressor on production scientific data sets and compare it with many other state-of-the-art compressors: GZIP, FPZIP, ZFP, SZ-1.1, and ISABELA. Experiments show that our compressor is the best in class, especially with regard to compression factors (or bit-rates) and compression errors (including RMSE, NRMSE, and PSNR). Our solution is better than the second-best solution by more than a 2x increase in the compression factor and 3.8x reduction in the normalized root mean squared error on average, with reasonable error bounds and user-desired bit-rates.
Author Dingwen Tao
Cappello, Franck
Sheng Di
Zizhong Chen
Author_xml – sequence: 1
  surname: Dingwen Tao
  fullname: Dingwen Tao
  email: dtao001@cs.ucr.edu
  organization: Univ. of California, Riverside, Riverside, CA, USA
– sequence: 2
  surname: Sheng Di
  fullname: Sheng Di
  email: sdi1@anl.gov
  organization: Argonne Nat. Lab., Argonne, IL, USA
– sequence: 3
  surname: Zizhong Chen
  fullname: Zizhong Chen
  email: chen@cs.ucr.edu
  organization: Univ. of California, Riverside, Riverside, CA, USA
– sequence: 4
  givenname: Franck
  surname: Cappello
  fullname: Cappello, Franck
  email: cappello@anl.gov
  organization: Argonne Nat. Lab., Argonne, IL, USA
BookMark eNotkMFOwzAQRA0Cibb0yoWLfyDFjhM7OUJaoFIRRYFztY3tyii1K9tFKn_AX-MITqtZzdvVzBhdWGcVQjeUzCgl9d1yPV-3s5xQkXR5hqa1qGjJKs5qWvBzNEqCZDkR5RUah_BJSE5YUY_QT2t21mjTgY39CS_3B---jN3hlQvhhBuXFioE4yzWzuO2M8rGwY_nEAG3Kgb8AEFJnBwvxz4aafbKDgD0eO2VNF0caLASL7x3Pmucjd71fWLejumt-YbBcY0uNfRBTf_nBH08Lt6b52z1-rRs7leZyQsaM8WLguWC1CVnUlVANdc51yXfEgYdbEsNOiXrKg1Kb0EKoQSXHFIRoGvdsQm6_btrlFKbgzd78KeNqLlIlbBfIDNn8Q
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/IPDPS.2017.115
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781538639146
1538639149
EISSN 1530-2075
EndPage 1139
ExternalDocumentID 7967203
Genre orig-research
GroupedDBID 29O
6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
ID FETCH-LOGICAL-i241t-e64432709563de8a1f6f26f56b03acab5faf349c8faefbad77e76d6a391af9fc3
IEDL.DBID RIE
ISICitedReferencesCount 188
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000427044800115&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:19:55 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i241t-e64432709563de8a1f6f26f56b03acab5faf349c8faefbad77e76d6a391af9fc3
PageCount 11
ParticipantIDs ieee_primary_7967203
PublicationCentury 2000
PublicationDate 2017-05
PublicationDateYYYYMMDD 2017-05-01
PublicationDate_xml – month: 05
  year: 2017
  text: 2017-05
PublicationDecade 2010
PublicationTitle Proceedings - IEEE International Parallel and Distributed Processing Symposium
PublicationTitleAbbrev IPDPS
PublicationYear 2017
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0020349
ssib030101683
Score 2.1973698
Snippet Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis are becoming more challenging for scientific...
SourceID ieee
SourceType Publisher
StartPage 1129
SubjectTerms Adaptation models
Compression algorithms
Data models
Encoding
Measurement
Predictive models
Quantization (signal)
Title Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization
URI https://ieeexplore.ieee.org/document/7967203
WOSCitedRecordID wos000427044800115&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8NAEF3a4sFT1Vb8Zg8ejW2yzW72altRKCVShd7KfkqhpJKmQn-C_9rZTRoRvHhLQiZZZid5y86bNwjdEtPXlnISyIGMgoFUSSAAp4KIKy6spJRLrzM7YdNpMp_ztIHu6loYY4wnn5l7d-hz-Xqttm6rrMc4dVnDJmoyRstarX3sEK-VltR_4cjprlQijWGf957TUTpzTC4G5_GvVioeSR7b_xvDEer-lOThtAabY9Qw2Qlq73sy4OoT7aCv2fI9c_Qf8Nhqh-s9AzwBNNxhZ1AyXzMMy9XSztOF8EgUAs9MscEPgGwawx2-Olc7_f9SuwNG4PI6bi6xyDQe5_k6D4Yl230FNi9beG1V2dlFb4_j1-FTULVbCJYA40VgYGlEIuaUCYk2iQgttRG1MQXvCiVkbIUFp6rECmOl0IwZRjUVhIfCcqvIKWpl68ycISxhkRARCA8zUAMZKgEPIbFwvcMTlfTVOeo4zy4-SkWNReXUi78vX6JDN3ElzfAKtYp8a67Rgfoslpv8xofBN-PcuEM
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1dT4MwFG3mNNGnqZvx2z74KI5RKPTVfWSLuGA2k70tbWnNkoUZxkz2E_zX3gLDmPjiGxAuNLcXTtN77rkI3RNlx5oyYglXOJYrZGBxwCnLYZJxLShlIteZDf3xOJjNWFRDD1UtjFIqJ5-pR3OY5_LjldyYrbK2z6jJGu6hfc91Hbuo1tpFD8nV0oLqP-wY5ZVSprFjs_Yo6kUTw-Xy4dz71Uwlx5JB43-jOEatn6I8HFVwc4JqKjlFjV1XBlx-pE30NVm8J4YABD5bbnG1a4BDwMMtNgYF9zXBsGAt7HLCEO7xjOOJytb4CbAtxnBHXp8bmw4AhXoHjMBkdsxsYp7EuJ-mq9TqFnz3Jdi8buC1ZW1nC70N-tPu0CobLlgLAPLMUrA4Io5vtAlJrALe0VQ7VHsUvMslF57mGpwqA82VFjz2feXTmHLCOlwzLckZqierRJ0jLGCZ4BAIEOVKV3Qkh4cQj5vu4YEMbHmBmsaz849CU2NeOvXy78t36HA4fQnn4Wj8fIWOzCQWpMNrVM_SjbpBB_IzW6zT2zwkvgGeq7uK
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+IEEE+International+Parallel+and+Distributed+Processing+Symposium&rft.atitle=Significantly+Improving+Lossy+Compression+for+Scientific+Data+Sets+Based+on+Multidimensional+Prediction+and+Error-Controlled+Quantization&rft.au=Dingwen+Tao&rft.au=Sheng+Di&rft.au=Zizhong+Chen&rft.au=Cappello%2C+Franck&rft.date=2017-05-01&rft.pub=IEEE&rft.eissn=1530-2075&rft.spage=1129&rft.epage=1139&rft_id=info:doi/10.1109%2FIPDPS.2017.115&rft.externalDocID=7967203