CODC-pyParaQC: A design and implementation of parallel quality control for ocean observation big data

High-quality ocean observation is essential for research and applications in ocean exploration and climate change. With moving into the era of big data in recent years, it becomes crucial to process these massive raw observations accurately and efficiently. This paper addressed issues encountered in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the ... International Symposium on Parallel and Distributed Processing with Applications (Print) S. 1863 - 1870
Hauptverfasser: Yuan, Huifeng, Li, Tianyan, Jin, Zhong, Cheng, Lijing, Tan, Zhetao, Zhang, Bin, Wang, Yanjun
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 30.10.2024
Schlagworte:
ISSN:2158-9208
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:High-quality ocean observation is essential for research and applications in ocean exploration and climate change. With moving into the era of big data in recent years, it becomes crucial to process these massive raw observations accurately and efficiently. This paper addressed issues encountered in processing ocean big data within traditional delayed-mode quality control systems, including substantial serial I/O workloads and frequent context switching. A parallel quality control scheme named CODC-pyParaQC was proposed by constructing computing process groups. It retains the advantages of the existed delayed-mode quality control system (e.g. CODC-QC) while improving the efficiency of the quality control procedure, solving the feasibility of a large-scale parallel computation of the quality control scheme and realizing the (near) real-time quality control of massive ocean observation profiles. The results showed that the efficiency of single-node quality control has been improved by about 10 times. Leveraging the computing power of supercomputers and employing multi process groups for cross-node parallel computation, we have developed a fast and efficient (near) real-time quality control procedure. This system processed approximately 22,548,733 temperature profiles from the world ocean database (1940-2023) in about 6.5 hours. Our new quality control scheme can ensure the computing capability necessary for establishing a high-quality ocean observation profile database.
ISSN:2158-9208
DOI:10.1109/ISPA63168.2024.00254