A method for assessment of the general circulation model quality using the K -means clustering algorithm: a case study with GETM v2.5

The model's ability to reproduce the state of the simulated object or particular feature or phenomenon is always a subject of discussion. Multidimensional model quality assessment is usually customized for the specific focus of the study and often for a limited number of locations. In this pape...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Geoscientific Model Development Jg. 15; H. 2; S. 535 - 551
Hauptverfasser:	Raudsepp, Urmas, Maljutenko, Ilja
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Katlenburg-Lindau Copernicus GmbH 25.01.2022 Copernicus Publications
Schlagworte:	Accuracy Algorithms Analysis Case studies Centroids Cluster analysis Clustering Dimensional analysis Elbow Errors General circulation models Learning algorithms Machine learning Methods Model accuracy Quality assessment Quality control Rivers Saline water Salinity Salinity effects Simulation Temperature Variables Vector quantization Gulf of Riga North Sea Gulf of Bothnia Gulf of Finland Baltic Sea Gotland (island)
ISSN:	1991-9603, 1991-959X, 1991-962X, 1991-9603, 1991-962X
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The model's ability to reproduce the state of the simulated object or particular feature or phenomenon is always a subject of discussion. Multidimensional model quality assessment is usually customized for the specific focus of the study and often for a limited number of locations. In this paper, we propose a method that provides information on the accuracy of the model in general, while all dimensional information for posterior analysis of the specific tasks is retained. The main goal of the method is to perform clustering of the multivariate model errors. The clustering is done using the K-means algorithm of unsupervised machine learning. In addition, the potential application of the K-means clustering of model errors for learning and predicting is shown. The method is tested on the 40-year simulation results of the general circulation model of the Baltic Sea. The model results are evaluated with the measurement data of temperature and salinity from more than 1 million casts by forming a two-dimensional error space and performing a clustering procedure in it. The optimal number of clusters that consist of four clusters was determined using the Elbow cluster selection criteria and based on the analysis of the different number of error clusters. In this particular model, the error cluster with good quality of the model with a bias of 0.4 ∘C (SD = 0.8 ∘C) for temperature and 0.6 g kg−1 (SD = 0.7 g kg−1) for salinity made up 57 % of all comparison data pairs. The prediction of centroids from a limited number of randomly selected data showed that the obtained centroids gained a stability of at least 100 000 error pairs in the learning dataset.
Bibliographie:	ObjectType-Case Study-2 SourceType-Scholarly Journals-1 content type line 14 ObjectType-Feature-4 ObjectType-Report-1 ObjectType-Article-3
ISSN:	1991-9603 1991-959X 1991-962X 1991-9603 1991-962X
DOI:	10.5194/gmd-15-535-2022