A data-driven persistence test for robust (probabilistic) quality control of measured environmental time series: constant value episodes
Robust quality control is a prerequisite and an essential component in any data application. That is especially important for time series of environmental observations such as air quality due to their dynamic and irreversible nature. One of the common issues in these data is constant value episodes...
Uloženo v:
| Vydáno v: | Atmospheric measurement techniques Ročník 16; číslo 12; s. 3085 - 3100 |
|---|---|
| Hlavní autor: | |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Katlenburg-Lindau
Copernicus GmbH
21.06.2023
Copernicus Publications |
| Témata: | |
| ISSN: | 1867-8548, 1867-1381, 1867-8548 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Robust quality control is a prerequisite and an essential
component in any data application. That is especially important for time
series of environmental observations such as air quality due to their
dynamic and irreversible nature. One of the common issues in these data is
constant value episodes (CVEs), where a set of consecutive data values
remains constant over a given period. Although CVEs are often considered to be an indicator of sensor failure or other measurement errors and are removed during quality control procedures, there are situations when CVEs reflect natural environmental phenomena, and they should not be removed from the data or analysis. Assessing whether the CVEs are erroneous data or valid observations is a challenge. As there are no formal procedures established for this, their classification is based on subjective judgment and is therefore uncertain and irreproducible. This paper presents a novel test procedure, i.e., constant value test, to estimate the probability of CVEs being valid data. The theoretical foundation of this test is based on
statistical characteristics and probability theory and takes into account
the numerical precision of the data values. The test is a data-driven
(parametric) approach, which makes it usable for time series analysis in
different environmental research domains, as long as serial dependency is
given and the data distribution is not too different from Gaussian. The
robustness of the test was demonstrated with sensitivity studies using
synthetic data with different distributions. Example applications to
measured air temperature and ozone mixing ratio data confirm the versatility
of the test. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1867-8548 1867-1381 1867-8548 |
| DOI: | 10.5194/amt-16-3085-2023 |