CODC-pyParaQC: A design and implementation of parallel quality control for ocean observation big data

High-quality ocean observation is essential for research and applications in ocean exploration and climate change. With moving into the era of big data in recent years, it becomes crucial to process these massive raw observations accurately and efficiently. This paper addressed issues encountered in...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the ... International Symposium on Parallel and Distributed Processing with Applications (Print) pp. 1863 - 1870
Main Authors: Yuan, Huifeng, Li, Tianyan, Jin, Zhong, Cheng, Lijing, Tan, Zhetao, Zhang, Bin, Wang, Yanjun
Format: Conference Proceeding
Language:English
Published: IEEE 30.10.2024
Subjects:
ISSN:2158-9208
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:High-quality ocean observation is essential for research and applications in ocean exploration and climate change. With moving into the era of big data in recent years, it becomes crucial to process these massive raw observations accurately and efficiently. This paper addressed issues encountered in processing ocean big data within traditional delayed-mode quality control systems, including substantial serial I/O workloads and frequent context switching. A parallel quality control scheme named CODC-pyParaQC was proposed by constructing computing process groups. It retains the advantages of the existed delayed-mode quality control system (e.g. CODC-QC) while improving the efficiency of the quality control procedure, solving the feasibility of a large-scale parallel computation of the quality control scheme and realizing the (near) real-time quality control of massive ocean observation profiles. The results showed that the efficiency of single-node quality control has been improved by about 10 times. Leveraging the computing power of supercomputers and employing multi process groups for cross-node parallel computation, we have developed a fast and efficient (near) real-time quality control procedure. This system processed approximately 22,548,733 temperature profiles from the world ocean database (1940-2023) in about 6.5 hours. Our new quality control scheme can ensure the computing capability necessary for establishing a high-quality ocean observation profile database.
ISSN:2158-9208
DOI:10.1109/ISPA63168.2024.00254