CODC-pyParaQC: A design and implementation of parallel quality control for ocean observation big data

High-quality ocean observation is essential for research and applications in ocean exploration and climate change. With moving into the era of big data in recent years, it becomes crucial to process these massive raw observations accurately and efficiently. This paper addressed issues encountered in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the ... International Symposium on Parallel and Distributed Processing with Applications (Print) S. 1863 - 1870
Hauptverfasser: Yuan, Huifeng, Li, Tianyan, Jin, Zhong, Cheng, Lijing, Tan, Zhetao, Zhang, Bin, Wang, Yanjun
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 30.10.2024
Schlagworte:
ISSN:2158-9208
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract High-quality ocean observation is essential for research and applications in ocean exploration and climate change. With moving into the era of big data in recent years, it becomes crucial to process these massive raw observations accurately and efficiently. This paper addressed issues encountered in processing ocean big data within traditional delayed-mode quality control systems, including substantial serial I/O workloads and frequent context switching. A parallel quality control scheme named CODC-pyParaQC was proposed by constructing computing process groups. It retains the advantages of the existed delayed-mode quality control system (e.g. CODC-QC) while improving the efficiency of the quality control procedure, solving the feasibility of a large-scale parallel computation of the quality control scheme and realizing the (near) real-time quality control of massive ocean observation profiles. The results showed that the efficiency of single-node quality control has been improved by about 10 times. Leveraging the computing power of supercomputers and employing multi process groups for cross-node parallel computation, we have developed a fast and efficient (near) real-time quality control procedure. This system processed approximately 22,548,733 temperature profiles from the world ocean database (1940-2023) in about 6.5 hours. Our new quality control scheme can ensure the computing capability necessary for establishing a high-quality ocean observation profile database.
AbstractList High-quality ocean observation is essential for research and applications in ocean exploration and climate change. With moving into the era of big data in recent years, it becomes crucial to process these massive raw observations accurately and efficiently. This paper addressed issues encountered in processing ocean big data within traditional delayed-mode quality control systems, including substantial serial I/O workloads and frequent context switching. A parallel quality control scheme named CODC-pyParaQC was proposed by constructing computing process groups. It retains the advantages of the existed delayed-mode quality control system (e.g. CODC-QC) while improving the efficiency of the quality control procedure, solving the feasibility of a large-scale parallel computation of the quality control scheme and realizing the (near) real-time quality control of massive ocean observation profiles. The results showed that the efficiency of single-node quality control has been improved by about 10 times. Leveraging the computing power of supercomputers and employing multi process groups for cross-node parallel computation, we have developed a fast and efficient (near) real-time quality control procedure. This system processed approximately 22,548,733 temperature profiles from the world ocean database (1940-2023) in about 6.5 hours. Our new quality control scheme can ensure the computing capability necessary for establishing a high-quality ocean observation profile database.
Author Li, Tianyan
Wang, Yanjun
Jin, Zhong
Zhang, Bin
Yuan, Huifeng
Cheng, Lijing
Tan, Zhetao
Author_xml – sequence: 1
  givenname: Huifeng
  surname: Yuan
  fullname: Yuan, Huifeng
  email: hfyuan@cnic.cn
  organization: University of Chinese Academy of Sciences,Computer Internet Information Center, Chinese Academy of Sciences,Beijing,China
– sequence: 2
  givenname: Tianyan
  surname: Li
  fullname: Li, Tianyan
  email: tyli@cnic.cn
  organization: Chinese Academy of Sciences,Computer Internet Information Center,Beijing,China
– sequence: 3
  givenname: Zhong
  surname: Jin
  fullname: Jin, Zhong
  email: zjin@sccas.cn
  organization: Chinese Academy of Sciences,Computer Internet Information Center,Beijing,China
– sequence: 4
  givenname: Lijing
  surname: Cheng
  fullname: Cheng, Lijing
  email: chenglij@mail.iap.ac.cn
  organization: Chinese Academy of Sciences,Institute of Atmospheric Physics,Beijing,China
– sequence: 5
  givenname: Zhetao
  surname: Tan
  fullname: Tan, Zhetao
  email: tanzhetao19@mails.ucas.ac.cn
  organization: Chinese Academy of Sciences,Institute of Atmospheric Physics,Beijing,China
– sequence: 6
  givenname: Bin
  surname: Zhang
  fullname: Zhang, Bin
  email: zhangbin@qdio.ac.cn
  organization: Chinese Academy of Sciences,Institute of Oceanography,Qingdao,China
– sequence: 7
  givenname: Yanjun
  surname: Wang
  fullname: Wang, Yanjun
  email: yjwang@qdio.ac.cn
  organization: Chinese Academy of Sciences,Institute of Oceanography,Qingdao,China
BookMark eNotjctuwjAQRd2qlUopf8DCPxA6Y8eJ3R1KX0hIUJU9GpMBuTJOmqSV-Psi0dXZnHPvvbhJTWIhpggzRHCPi8_1vNBY2JkClc8AlMmvxMSVzmqNBlyJ6lqMFBqbOQX2Tkz6_gsANFrrHIwEV6vnKmtPa-roo3qSc1lzHw5JUqplOLaRj5wGGkKTZLOX7VmLkaP8_qEYhpPcNWnomij3TSebHdPZ8j13v5fCh4OsaaAHcbun2PPkn2OxeX3ZVO_ZcvW2qObLLDg9ZFgrXXsN3lHOnn1OllRZW4bClqU3hUH0zuSOnFbs3flOKWRNRlsoNeqxmF5mAzNv2y4cqTttEaw1GlH_AW6BWT0
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ISPA63168.2024.00254
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
Oceanography
EISBN 9798331509712
EISSN 2158-9208
EndPage 1870
ExternalDocumentID 10885311
Genre orig-research
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-i93t-1d23db30b9a4ebeb4a8a27d8e06877b56511b9549a932eb9cea221e3a53807313
IEDL.DBID RIE
IngestDate Wed Aug 27 01:52:33 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i93t-1d23db30b9a4ebeb4a8a27d8e06877b56511b9549a932eb9cea221e3a53807313
PageCount 8
ParticipantIDs ieee_primary_10885311
PublicationCentury 2000
PublicationDate 2024-Oct.-30
PublicationDateYYYYMMDD 2024-10-30
PublicationDate_xml – month: 10
  year: 2024
  text: 2024-Oct.-30
  day: 30
PublicationDecade 2020
PublicationTitle Proceedings of the ... International Symposium on Parallel and Distributed Processing with Applications (Print)
PublicationTitleAbbrev ISPA
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003188990
Score 1.8876755
Snippet High-quality ocean observation is essential for research and applications in ocean exploration and climate change. With moving into the era of big data in...
SourceID ieee
SourceType Publisher
StartPage 1863
SubjectTerms Big Data
Climate change
Computational efficiency
Control systems
Database systems
Observers
ocean observation
Ocean temperature
Oceanography
Oceans
parallel computation
Process control
Quality control
Real-time systems
Supercomputers
Title CODC-pyParaQC: A design and implementation of parallel quality control for ocean observation big data
URI https://ieeexplore.ieee.org/document/10885311
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgYkAMhVLEt25gDSR2GsdsVaCCpQ2iQ7fKji8oUkmqfiD132M7aTsxsERWBlvy-XTn8733CHnQWW6CFoZeiD6zH-UJ3ZNeLjLBZRwKzWqxCT4cxpOJSBuwusPCIKJrPsNHO3Rv-brK1rZUZjw8NtHFInkPOY9qsNauoGIOp7k7-A08LvDF0_tn2o-sMJO5BlJLkk0d6f9eRMXFkEH7n6ufku4ejQfpLs6ckQMsO6S9lWOAxjs75GSUoSwbCupzgsnoJfHmm1Qu5EfyDH3QrlsDZKmh-N72jVvDQJWD5QCfzXAGNcxyA00TO5isFio7M1RqV8IFVXyB7S7tkvHgdZy8eY2oglcItvICTZlWzFdChsZ-KpSxpFzH6Ecx58qkd0Gg7NOfNIkdKmGmpzRAJnuWmZ4F7IK0yqrESwI5zUMpRG7cPgupFqLHVJRRyeJIBSwXV6RrN3E6r2kzptv9u_7j_w05tnZygcG_Ja3VYo135Cj7WRXLxb0z9i9kOqs_
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELUQIIEYCqWIb25gDSS282G2KlC1orRBdOhW2bGDIpWkKi1S_z22k7YTA0tkZbAln093Pt97D6F7mWY6aCnqUOUS8xEOkz53MpaykEeUSVKJTYSDQTQes6QGq1ssjFLKNp-pBzO0b_myTJemVKY9PNLRxSB593xKsVvBtTYlFX089e3BrQFynsseex9JOzDSTPoiiA1NNra0_1sZFRtFOo1_rn-MWls8HiSbSHOCdlTRRI21IAPU_tlER8NU8aImoT5FKh4-x85slfA5f4-foA3S9msALyTkX-vOcWMaKDMwLODTqZpCBbRcQd3GDjqvhdLMDKXYFHFB5J9g-ktbaNR5GcVdp5ZVcHJGFo4nMZGCuIJxqi0oKI84DmWk3CAKQ6ETPM8T5vGP69ROCaanx9hThPuGm5545AztFmWhzhFkOKOcsUw7fkqxZMwnIkgxJ1EgPJKxC9QymziZVcQZk_X-Xf7x_w4ddEdv_Um_N3i9QofGZjZMuNdodzFfqhu0n_4s8u_5rTX8L3nqroY
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+...+International+Symposium+on+Parallel+and+Distributed+Processing+with+Applications+%28Print%29&rft.atitle=CODC-pyParaQC%3A+A+design+and+implementation+of+parallel+quality+control+for+ocean+observation+big+data&rft.au=Yuan%2C+Huifeng&rft.au=Li%2C+Tianyan&rft.au=Jin%2C+Zhong&rft.au=Cheng%2C+Lijing&rft.date=2024-10-30&rft.pub=IEEE&rft.eissn=2158-9208&rft.spage=1863&rft.epage=1870&rft_id=info:doi/10.1109%2FISPA63168.2024.00254&rft.externalDocID=10885311