A high performance query analytical framework for supporting data-intensive climate studies

Climate observations and model simulations produce vast amounts of data. The unprecedented data volume and the complexity of geospatial statistics and analysis requires efficient analysis of big climate data to investigate global problems such as climate change, natural disasters, diseases, and othe...

Full description

Saved in:
Bibliographic Details
Published in:Computers, environment and urban systems Vol. 62; pp. 210 - 221
Main Authors: Li, Zhenlong, Huang, Qunying, Carbone, Gregory J., Hu, Fei
Format: Journal Article
Language:English
Published: Oxford Elsevier Ltd 01.03.2017
Elsevier Science Ltd
Subjects:
ISSN:0198-9715, 1873-7587
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Climate observations and model simulations produce vast amounts of data. The unprecedented data volume and the complexity of geospatial statistics and analysis requires efficient analysis of big climate data to investigate global problems such as climate change, natural disasters, diseases, and other environmental issues. This paper introduces a high performance query analytical framework to tackle these challenges by leveraging Hive and cloud computing technologies. With this framework, we propose grid transformation, a new perspective for complex climate analysis that applies a series of atomic transformations to terabytes of climate data using SQL-style query (HiveQL). Specifically, we introduce four types of grid transformations (temporal, spatial, local, and arithmetic) to support a broad range of climate analyses, from the basic spatiotemporal aggregation to more sophisticated anomaly detection. Each query is processed as MapReduce tasks in a highly scalable Hadoop cluster as the parallel processing engine. Big climate data are directly stored and managed in a Hadoop Distributed File System without any data format conversion. A prototype is developed to evaluate the feasibility and performance of the framework. Experimental results show that complex and data-intensive climate analysis can be conducted using intuitive SQL queries with good flexibility and performance. This research provides a building block and practical insights in establishing a cyberinfrastructure that provides a high performance and collaborative environment for data-intensive geospatial applications in climate science. •We propose a high performance query analytical framework to support data-intensive climate applications.•We introduce a new perspective for complex climate data analysis that applies a series of grid transformations to terabytes of climate data.•Four types of grid transformations (temporal, spatial, local, and arithmetic) are introduced to support a broad range of climate analyses.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0198-9715
1873-7587
DOI:10.1016/j.compenvurbsys.2016.12.003